Pages in topic:   [1 2] >
Is this the solution to formatting problems from OCR?
Thread poster: Dylan J Hartmann
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
Jun 13, 2016

While I have purchased ABBYY Finereader, I've avoided using it much of the time because after doing (Thai) OCR, having to fix spelling errors throughout, making the source formatting right, and then running through my CAT tool, the final MS Word doc ends up with strange formatting problems. I had quizzed ABBYY about why the italics, bold and underlining was locked in some paragraphs, not in others and posted asking for help here, but it wasn't ever solved (my workaround was to type the bold head... See more
While I have purchased ABBYY Finereader, I've avoided using it much of the time because after doing (Thai) OCR, having to fix spelling errors throughout, making the source formatting right, and then running through my CAT tool, the final MS Word doc ends up with strange formatting problems. I had quizzed ABBYY about why the italics, bold and underlining was locked in some paragraphs, not in others and posted asking for help here, but it wasn't ever solved (my workaround was to type the bold heading, for example, in a new doc and then copy it to the file I was working on!). In addition, many of the agencies I work for now have as part of the instructions, "DO NOT OCR the source files – they create formatting that is unusable on the back end"!

Well, with this being said, there are certainly situations where OCR can be very helpful. I'm wondering if getting the OCR to export as a .txt file and manually inserting formatting will be the best workaround? I had previously tried exporting as plain text, as a word document, and fixing formatting in the source prior to running the CAT, but the final translated doc still had locked bold, italics and underlining, as mentioned earlier.

I've tested this new .txt method on a couple of PDF and have noticed no final problems, wondering what everyone else suggests?

Is this the solution to formatting problems from OCR?
Collapse


 
telefpro
telefpro
Local time: 00:14
Portuguese to English
+ ...
formatting problems Jun 14, 2016

There are formatting problems which still persist. OCR can't always solve this issue

 
esperantisto
esperantisto  Identity Verified
Local time: 21:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
OpenOffice Writer Jun 14, 2016

In my experience, removing unnecessary formatting and fixing other post-OCR problems is easier with Apache OpenOffice / LibreOffice Writer (even when working with MS Word formats), especially with the OOoFBTools add-on. However, I do not work with Thai and have no idea about any implications specific to that language / script. As I can understand, saving results to plain text is sometimes really the best option for languages of Asia / Far East with complex scripts.

[Edited at 2016-06-14
... See more
In my experience, removing unnecessary formatting and fixing other post-OCR problems is easier with Apache OpenOffice / LibreOffice Writer (even when working with MS Word formats), especially with the OOoFBTools add-on. However, I do not work with Thai and have no idea about any implications specific to that language / script. As I can understand, saving results to plain text is sometimes really the best option for languages of Asia / Far East with complex scripts.

[Edited at 2016-06-14 04:48 GMT]
Collapse


 
Christine Andersen
Christine Andersen  Identity Verified
Denmark
Local time: 20:44
Member (2003)
Danish to English
+ ...
Acrobat or Trados Studio Jun 14, 2016

I can open PDFs in Trados Studio, but threatening to do so made one recent client send me the InDesign file.

The trouble was that there were hard line breaks, often two or three in a sentence, so the source was broken up into tiny segments that could not be merged. Apart from that, I could not create a target file that I could open. The client was happy, but it was sheer guesswork on my part, as I had no WYSIWYG of the document, graphics and formatting, and had to assume the DTP per
... See more
I can open PDFs in Trados Studio, but threatening to do so made one recent client send me the InDesign file.

The trouble was that there were hard line breaks, often two or three in a sentence, so the source was broken up into tiny segments that could not be merged. Apart from that, I could not create a target file that I could open. The client was happy, but it was sheer guesswork on my part, as I had no WYSIWYG of the document, graphics and formatting, and had to assume the DTP person could make any adjustments.

A file like that is a pain to translate anyway...

Sometimes Adobe Acrobat works well with Danish, my source language, if the settings are correct, but if the scanned quality is not good, then nothing helps much - Danish has three extra letters, which are very often garbled. A 'search and replace' may or may not help - they are not always garbled consistently!
Then come all the other spelling errors...

Whichever workaround is best for a given situation, I hope translators are making clients aware of the need for a workaround and charging for the time spent re-creating documents and formatting. It should not be included in the same standard word rate as for a simple document in Word!
Collapse


 
Tom in London
Tom in London
United Kingdom
Local time: 19:44
Member (2008)
Italian to English
Is this the solution? No- Jun 14, 2016

DJHartmann wrote:

....

Is this the solution to formatting problems from OCR?



None of what you describe has anything to do with translating.


 
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
TOPIC STARTER
No blanket rates Jun 14, 2016

Christine Andersen wrote:

I hope translators are making clients aware of the need for a workaround and charging for the time spent re-creating documents and formatting. It should not be included in the same standard word rate as for a simple document in Word!


I totally agree with this point.

Tom in London wrote:

Nothing worth noting



Thanks for your two-cents Tom


 
Katerina O.
Katerina O.  Identity Verified
Russian Federation
English to Russian
+ ...
Clear All Formatting Jun 14, 2016

I use 'Clear All Formatting' function in Word, and then apply styles as necessary. It's not that time consuming after all

 
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
TOPIC STARTER
Locked Jun 14, 2016

Katerina O. wrote:

I use 'Clear All Formatting' function in Word.


Yes, likewise.

However the bold, italics and underline functions were still locked afterwards.

In certain documents document language was locked (either as Thai or Arabic) and I couldn't change to English after translation to run a spellcheck.


 
Tina Vonhof (X)
Tina Vonhof (X)
Canada
Local time: 12:44
Dutch to English
+ ...
Why bother? Jun 14, 2016

Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing!

 
Anton Konashenok
Anton Konashenok  Identity Verified
Czech Republic
Local time: 20:44
French to English
+ ...
Styles? Jun 14, 2016

If the formatting remains locked after being cleared, it is most likely due to Word styles. Open the list of styles used in the document and delete the styles responsible for that formatting.

 
Tom in London
Tom in London
United Kingdom
Local time: 19:44
Member (2008)
Italian to English
Yes, and..... Jun 14, 2016

Tina Vonhof wrote:

Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing!


Yes - and translating

If you use dictation software you can just read out your translation from the PDF and hey presto, it will type itself out in the target language. I too struggled with PDF conversion for a long time until I realised I could do it with dictation.

[Edited at 2016-06-14 15:18 GMT]


 
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
TOPIC STARTER
TM Jun 14, 2016

Tina Vonhof wrote:

Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing!


In plenty of situations this is the best option, however sometimes OCR can be very useful!


 
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
TOPIC STARTER
List of styles? Jun 14, 2016

Anton Konashenok wrote:

If the formatting remains locked after being cleared, it is most likely due to Word styles. Open the list of styles used in the document and delete the styles responsible for that formatting.


I have never found instructions for this. Most point only to the clear formatting icon.

Nevertheless, shouldn't a .txt be clear of all styles, formatting and be safe to use? While I have my own issues with the MS Word docs, something must lead the agencies to not allow OCR!


 
Dylan J Hartmann
Dylan J Hartmann  Identity Verified
Australia
Member (2014)
Thai to English
+ ...

MODERATOR
TOPIC STARTER
Still locked Jun 14, 2016

Well, even using a .txt has caused issues.

It seems to be related to MS Word and Thai fonts because the latin fonts can be formatted fine.

My process was as follows:

Exported the OCR as plain text .txt file.

Opened with MS Word.

Bolded the heading of the first line (worked) and then tried to correct the spelling of the first character of the first word. The new text that I typed couldn't be bolded! However, if I typed new text in
... See more
Well, even using a .txt has caused issues.

It seems to be related to MS Word and Thai fonts because the latin fonts can be formatted fine.

My process was as follows:

Exported the OCR as plain text .txt file.

Opened with MS Word.

Bolded the heading of the first line (worked) and then tried to correct the spelling of the first character of the first word. The new text that I typed couldn't be bolded! However, if I typed new text in English, it could!

If anyone can clarify this situation, it'd be very appreciated!
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 21:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
File sample Jun 15, 2016

You should better share a file (a sample page where the problem appears).

 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is this the solution to formatting problems from OCR?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »