How to find a list of most repeated words/phrases in a document
Thread poster: Zeki Güler
Zeki Güler
Zeki Güler  Identity Verified
Local time: 00:52
Member (2012)
English to Turkish
+ ...
May 22, 2015

Hi everybody,

While translating on my CAT tool (MemoQ), I add some ffrequent terms to my Term Base to avoid translating them several times later throughout the document. Is there a way to list "the most frequently used terms throghout the text" ?

That would help add them to my TM at the very beginning of my Project, rather than doing it after translating some terms several times. To save time and effort.

Best,


 
Selcuk Akyuz
Selcuk Akyuz  Identity Verified
Türkiye
Local time: 02:52
English to Turkish
+ ...
try CafeTran May 23, 2015

Hi Zeki,

This feature and many other useful ones are available at CafeTran. See http://cafetran.wikidot.com/extracting-frequent-words


 
Zeki Güler
Zeki Güler  Identity Verified
Local time: 00:52
Member (2012)
English to Turkish
+ ...
TOPIC STARTER
Thanks May 23, 2015

Hi Mr. Selçuk,

That's exactly what I looked for. Many Thanks.

Kind Regards,


 
Meta Arkadia
Meta Arkadia
Local time: 06:52
English to Indonesian
+ ...
The free version of CafeTran... May 23, 2015

... will do. Download the free version for your operating system (OS X or Linux, it's even available for Windows), it's only some 10 MBs. Run CafeTran, a window will appear, drop your document on the Dashboard in that window, and that's it. I don't think there's a need to set languages or change other settings. Your document will load. Next (and this is different from what the Wiki entry Selçuk mentioned says)... See more
... will do. Download the free version for your operating system (OS X or Linux, it's even available for Windows), it's only some 10 MBs. Run CafeTran, a window will appear, drop your document on the Dashboard in that window, and that's it. I don't think there's a need to set languages or change other settings. Your document will load. Next (and this is different from what the Wiki entry Selçuk mentioned says), go to Menu | Task | Frequent Words, and pick your choice.

If not for Selçuk's mentioning the link to the Wiki, I wouldn't even have replied to this topic, but it solves a problem I have at the moment. I was going to use AntConc to extract the words from a rather large file, only because I can make AntConc to "deduct" stopwords from the resulting list. That will reduce the list of frequent words considerably. But AntConc is incredibly slow. And just now I read the Wiki page referred to. Large files are no problem for CafeTran, it's fast, and I can add the stopwords list to deduct them from the frequent words. Far better, I can leverage that stopwords list and my termbases to eliminate all the stopwords and all the words already in my termbase, leaving me with only the terms that need to be translated. Incredible!
However, the free version of CafeTran is limited to a certain number of TM and termbase entries - 1,000? - so if you want to use larger termbases, you'll have to buy CafeTran (€ 80/year), or... repeat the process a number of times.

Cheers,

Hans
Collapse


 
Meta Arkadia
Meta Arkadia
Local time: 06:52
English to Indonesian
+ ...
Thinking aloud May 23, 2015

This is what you'll get:



It's pretty obvious that those stopwords ruin the results. You'll have to deduct them from the frequent words list. You can find a list of stopwords for various languages here. However, I don't think you can use them as such, because they contain no target language. So you'll have to add something as the second part of a tab delimited file. I don't think it matters what you add - "I cheat" for all entries will do. Those files from the link I mentioned, seem to consist of less than a 1,000 words, so that's OK for CafeTran. There are several files per language though, but since CafeTran can handle an unlimited number of resource files, you should be able to deduct all stopwords in one go.

As per the subject line, I'm thinking aloud. This is completely new for me. Where do I go in the wrong?

[Edit] You probably can't deduct them in one go, because the limit for termbases goes for all termbases combined. Unfortunately, I forgot what that limit is, and I can't seem to find it[/Edit]

Cheers,

Hans

[Edited at 2015-05-23 03:58 GMT]


 
Emin Arı
Emin Arı  Identity Verified
Türkiye
Local time: 02:52
English to Turkish
+ ...
Extract term? May 23, 2015

Though I have not used much, there is "extract term function" in memoQ. Does not help?

 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 02:52
Finnish to French
Term extraction in memoQ May 23, 2015

Zeki G. wrote:
While translating on my CAT tool (MemoQ), I add some ffrequent terms to my Term Base to avoid translating them several times later throughout the document. Is there a way to list "the most frequently used terms throghout the text" ?

Did you have a look at this:

http://kilgray.com/memoq/2015/help-en/index.html?term_extraction.html


 
M Pradeep Kumar
M Pradeep Kumar  Identity Verified
India
Local time: 05:22
English to Telugu
+ ...
Try this link May 23, 2015

Hi,

You can try this link.

http://www.textfixer.com/tools/online-word-counter.php

It also has an option to remove common words.


 
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
PlusTools May 23, 2015

If it's about txt/doc/rtf and some others, then a MS Word add-on PlusTools served me fine from MS WORD XP and 2003 (not sure about newer versions though), analyzing not just weighted words and common phrases/synonyms, but also with quite flexible settings.

[Edited at 2015-05-23 15:14 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to find a list of most repeated words/phrases in a document







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »