Word count for html files
Thread poster: o-callaghan
o-callaghan
o-callaghan  Identity Verified
Germany
Local time: 14:26
German to English
Jun 1, 2011

Hi,
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference). I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool w
... See more
Hi,
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference). I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool was the most accurate but since I am billing a customer I need to be sure that the method is accurate.

What is the best way of counting the words? I wouldn't expect OmegaT to include words inside in the word count, but it seems like it does. Is there a way of removing the tags so that I can use the OmegaT word count function?

Thanks for your help,
Amy
Collapse


Eveline Gomes
 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:26
English to French
+ ...
Check the options in the HTML filter Jun 1, 2011

gocacp wrote:
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference).

Such a difference is not usual.

Check the options in the HTML filter (or the XHTML filter, depending on your source files), and uncheck things you are not translating.


I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool was the most accurate but since I am billing a customer I need to be sure that the method is accurate.

There is no such thing as an accurate word count. There are different methods, all giving different results. The important thing is to understand what is being counted.

I wouldn't expect OmegaT to include words inside < > in the word count, but it seems like it does.

It doesn't.


Is there a way of removing the tags so that I can use the OmegaT word count function?

OmegaT doesn't count the tags.
What may happen is that you are declaring things as translatable (e.g., images) while they are not to be translated for this project.

Didier


 
Manticore (X)
Manticore (X)  Identity Verified

Local time: 15:26
English to German
+ ...
@Didier Jun 2, 2011

It might interest you - I have just started translating a large *.docx text. OmegaT is better than anything else on the market, irrespective of price.

 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:26
English to French
+ ...
Thank you for the feedback Jun 3, 2011

Roland Fischer wrote:
It might interest you - I have just started translating a large *.docx text. OmegaT is better than anything else on the market, irrespective of price.

Thank you for the feedback.

OmegaT relies on its user community.

There are plenty ways of getting involved, from a simple "yes" on Sourceforge, to more active roles.

Didier


 
Post removed: This post was hidden by a moderator or staff member because it was not in line with site rule
Malcolm Rowe
Malcolm Rowe
United Kingdom
Local time: 13:26
French to English
+ ...
large variance between OmegaT, SmartCAT and pasting text into Word, when counting words in Excel Mar 5, 2018

I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.

Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just num
... See more
I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.

Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just numbers or symbols, which, I think, were not included in the word count. OmegaT counted this file at 8,029 words.

This variance seems enormous. I can understand if it's not counting number/symbol-only segments, which, I think, counts for much of the discrepancy between SmartCAT and Word but, even allowing for that, OmegaT's count comes out at about two thirds of MS Word's count. There is a lot of repetition but this should, surely, just be shown in the statistics and not affect the total words.

Do I have something majorly wrong in my OmegaT settings or have I somehow misunderstood how OmegaT presents word counts?

Can anyone explain how I might have got such different word counts and what I can do to restore my faith in the statistics generated by these CAT tools? I am using OmegaT 3.6.0 update 8.

Thanks.

Malc
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:26
English to French
+ ...
OmegaT does not count repetitions on XLSX files Mar 5, 2018

ma1cius wrote:
I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.

OmegaT does not count repetitions in XSLS files, simply because they are not in the file (Microsoft removes them). To get a word count including repetitions, save the XSLS file under another format (e.g., XML 2003 spreadsheet).

Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just numbers or symbols, which, I think, were not included in the word count. OmegaT counted this file at 8,029 words.

It's not usual. Generally, OmegaT is rather close to Word.

This variance seems enormous. I can understand if it's not counting number/symbol-only segments,

Indeed, OmegaT does not count numbers.

which, I think, counts for much of the discrepancy between SmartCAT and Word but, even allowing for that, OmegaT's count comes out at about two thirds of MS Word's count. There is a lot of repetition but this should, surely, just be shown in the statistics and not affect the total words.

As I wrote above, this is not usual for Word documents. Have you checked what is loaded or not in OmegaT for the Word filter? Options > File Filters > Microsoft XML.

Do I have something majorly wrong in my OmegaT settings or have I somehow misunderstood how OmegaT presents word counts?

Another setting that might affect word count is Options > Tag processing (whether you include custom tags or not in statistics).

Can anyone explain how I might have got such different word counts and what I can do to restore my faith in the statistics generated by these CAT tools? I am using OmegaT 3.6.0 update 8.

For XLSX files, the explanation is obvious. For Word, it's hard to say without details.

Didier


 
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 14:26
Member (2008)
English to Italian
+ ...
Word Mar 5, 2018

Right-click, select "Open with...", select MS Word, use that word count.

 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Word count for html files






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »