Fuzzy matches not working for Russian
Autor wątku: Tatiana Ozerov (X)
Tatiana Ozerov (X)
Tatiana Ozerov (X)  Identity Verified
USA
Local time: 17:39
szwedzki > angielski
+ ...
Jan 23, 2013

Hi colleagues,

I am a Wordfast Classic 6.0 user who normally translates from Swedish to English, but I occasionally translate from Russian. However, the TM does not work correctly when I use Russian as the source language. Wordfast claims that every sentence is a fuzzy match, even when the terms in the source sentence are totally new. I just translated a sentence that was supposedly a 94% match, but Wordfast highlighted only a few words of the sentence, which probably contained 10-1
... See more
Hi colleagues,

I am a Wordfast Classic 6.0 user who normally translates from Swedish to English, but I occasionally translate from Russian. However, the TM does not work correctly when I use Russian as the source language. Wordfast claims that every sentence is a fuzzy match, even when the terms in the source sentence are totally new. I just translated a sentence that was supposedly a 94% match, but Wordfast highlighted only a few words of the sentence, which probably contained 10-12 words. The words that were highlighted were new to the TM, so there was no possibility that this was that high of a match, if one at all.

I tried changing my fuzzy settings from 70% to 80%, but that didn't help.

Has anyone had this issue and know of a solution?

Many thanks!
Collapse


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 00:39
fiński > francuski
Is the TM Unicode? Jan 23, 2013

I'm not aware of any particular problems with Russian as source. Here is a sample text with four segments, two brand-new (no matches) and two modified from the first two (fuzzy matches):

http://www.screencast.com/t/oyVoryhNxBl

As you can see, no matches were served for the first two (I translated them with MT, since I don't speak Russian) and fuzzy matches were served for
... See more
I'm not aware of any particular problems with Russian as source. Here is a sample text with four segments, two brand-new (no matches) and two modified from the first two (fuzzy matches):

http://www.screencast.com/t/oyVoryhNxBl

As you can see, no matches were served for the first two (I translated them with MT, since I don't speak Russian) and fuzzy matches were served for the last two segments, with differences highlighted in yellow.

Make sure your TM is Unicode (this should be the default in version 6), otherwise Russian text will
be stored as question marks or other non-sense. What do you see if you open your TM in Wordfast's data editor? Here is what it should look like:

Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 00:39
Członek ProZ.com
od 2006

angielski > rosyjski
+ ...
SITE LOCALIZER
Not really that necessary… Jan 23, 2013

Dominique Pivard wrote:

Make sure your TM is Unicode (this should be the default in version 6), otherwise Russian text will
be stored as question marks or other non-sense.


Well, if your system locale is Russian, non-Unicode TMs are fine for ENG-RUS (with other pairs, such as DEU-RUS it’s different). However, this piece of advise is generally correct: use Unicode to be on the safe side.

The initial post misses at least one important thing: what is the origin of the TM? Created from a scratch or imported? I have never had any problem with new TMs, but importing TMs from other programs may be problematic.


 
Tatiana Ozerov (X)
Tatiana Ozerov (X)  Identity Verified
USA
Local time: 17:39
szwedzki > angielski
+ ...
NOWY TEMAT
TM is in Unicode Jan 24, 2013

Thanks for the helpful response, Dominique! I opened the TM in TextEdit and confirmed that the encoding is Unicode (UTF-16), and re-saved it without changing anything. Now it looks like the TM is working; that is, it's not providing me with false fuzzy matches anymore.

However, I do see that in both the TM file and when I open the Data Editor, the source text comes out as jumbled symbols until my most recent segments.

Maybe re-saving the TM did the trick?


Dominique Pivard wrote:

I'm not aware of any particular problems with Russian as source. Here is a sample text with four segments, two brand-new (no matches) and two modified from the first two (fuzzy matches):

http://www.screencast.com/t/oyVoryhNxBl

As you can see, no matches were served for the first two (I translated them with MT, since I don't speak Russian) and fuzzy matches were served for the last two segments, with differences highlighted in yellow.

Make sure your TM is Unicode (this should be the default in version 6), otherwise Russian text will
be stored as question marks or other non-sense. What do you see if you open your TM in Wordfast's data editor? Here is what it should look like:



 
Tatiana Ozerov (X)
Tatiana Ozerov (X)  Identity Verified
USA
Local time: 17:39
szwedzki > angielski
+ ...
NOWY TEMAT
It's a new TM Jan 24, 2013

Thanks for the feedback, Esperantisto. It is a brand new TM that I created specifically for this document. I try not to work with imported TMs if I can help it!


esperantisto wrote:

Dominique Pivard wrote:

Make sure your TM is Unicode (this should be the default in version 6), otherwise Russian text will
be stored as question marks or other non-sense.


Well, if your system locale is Russian, non-Unicode TMs are fine for ENG-RUS (with other pairs, such as DEU-RUS it’s different). However, this piece of advise is generally correct: use Unicode to be on the safe side.

The initial post misses at least one important thing: what is the origin of the TM? Created from a scratch or imported? I have never had any problem with new TMs, but importing TMs from other programs may be problematic.


 
Tatiana Ozerov (X)
Tatiana Ozerov (X)  Identity Verified
USA
Local time: 17:39
szwedzki > angielski
+ ...
NOWY TEMAT
Spoke too soon Jan 24, 2013

Looks like I spoke too soon. After translating a couple of segments, I'm getting false fuzzy matches that claim to be 90% and above when the source text is entirely new! I tried re-saving the TM again as Unicode, but that didn't help.



[quote]Tatiana Ozerov wrote:

I opened the TM in TextEdit and confirmed that the encoding is Unicode (UTF-16), and re-saved it without changing anything. Now it looks like the TM is working; that is, it's not providing me with false fuzzy matches anymore.

Maybe re-saving the TM did the trick?


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 00:39
fiński > francuski
Mac Jan 24, 2013

Tatiana Ozerov wrote:
I opened the TM in TextEdit and confirmed that the encoding is Unicode (UTF-16), and re-saved it without changing anything. Now it looks like the TM is working; that is, it's not providing me with false fuzzy matches anymore.

However, I do see that in both the TM file and when I open the Data Editor, the source text comes out as jumbled symbols until my most recent segments.

OK, I see "TextEdit", so I assume you must be using a Mac, which is an important clue. What version of Word do you use (2011, 2004, older?) and where do you store your TM's? What is the path/name of the folder where your TM is located, and what is the name of the TM? You can press Ctrl+Alt+P in Word (or select the 3rd 'More...' option from the Wordfast menu and then 'SetupReport') and paste the relevant information from there.

If a TM is created by Wordfast (as opposed to converted from a TMX), you should have no encoding problems. Any recent (= less than 3-4 year old) version of Wordfast will create it as Unicode (UTF-16). Encoding problems do occur when TM's "travel" back and forth between Mac and Windows. Hence the above questions, just in case you are using Wordfast in a Windows virtual machine, but editing it on the Mac side, for instance.

If everything happens on the Mac side, with the latest version of Wordfast (6.03t) and adhering to the naming recommendations for the TM (there are stricter requirements when using Word 2011), there shouldn't be any problems with Russian. I'll perform the same test in Word 2011 on my Mac as I did yesterday in Word 2010 on my PC.


 
Tatiana Ozerov (X)
Tatiana Ozerov (X)  Identity Verified
USA
Local time: 17:39
szwedzki > angielski
+ ...
NOWY TEMAT
Mac, Setup Report Jan 24, 2013

Many thanks for your continued help, Dominique. I use Word 2011 from my Mac, not a Windows virtual machine. As a side note, I ended up plugging the document into Wordfast Anywhere and creating a new TM and it is working fine there. Do you think I need to upgrade to the next version of Wordfast Classic? I was mistaken - my version is 6.01g.

Here is the relevant information from the Setup Report:

Current setup: Macintosh HD:Users:Tatiana:Dropbox:Translation:Wordfast.ini... See more
Many thanks for your continued help, Dominique. I use Word 2011 from my Mac, not a Windows virtual machine. As a side note, I ended up plugging the document into Wordfast Anywhere and creating a new TM and it is working fine there. Do you think I need to upgrade to the next version of Wordfast Classic? I was mistaken - my version is 6.01g.

Here is the relevant information from the Setup Report:

Current setup: Macintosh HD:Users:Tatiana:Dropbox:Translation:Wordfast.ini
Wordfast installation: Macintosh HD:Users:Tatiana:Dropbox:Translation:wordfast.dot

System specs: Intel on Mac 10.6.8赑鏬2. Country code: 1
Ms-Word version: 14.2.2Language pair: RU-RU > EN-US
Wordfast version: 6.01g

TM: Macintosh HD:Users:Tatiana:Dropbox:Translation:TM:RU-EN.txt (18 Kb, Unicode)

When editing (changing) a 100% match= 0 (Overwrite TU)
When re-using an existing TU, update it if...= 0 (No)

BTM is off
VLTM is off
Word MT is off
Web MT #1 is off
Web MT #2 is off

Fuzzy threshold 80 ESPs: : . ! ? ^t




Dominique Pivard wrote:


OK, I see "TextEdit", so I assume you must be using a Mac, which is an important clue. What version of Word do you use (2011, 2004, older?) and where do you store your TM's? What is the path/name of the folder where your TM is located, and what is the name of the TM? You can press Ctrl+Alt+P in Word (or select the 3rd 'More...' option from the Wordfast menu and then 'SetupReport') and paste the relevant information from there.

If a TM is created by Wordfast (as opposed to converted from a TMX), you should have no encoding problems. Any recent (= less than 3-4 year old) version of Wordfast will create it as Unicode (UTF-16). Encoding problems do occur when TM's "travel" back and forth between Mac and Windows. Hence the above questions, just in case you are using Wordfast in a Windows virtual machine, but editing it on the Mac side, for instance.

Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Fuzzy matches not working for Russian







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »