Problems with WF TMX files
Autor wątku: John Fossey
John Fossey
John Fossey  Identity Verified
Kanada
Local time: 03:33
Członek ProZ.com
od 2008

francuski > angielski
+ ...
Feb 19, 2013

A posting has been made on the Wordfast Yahoogroups forum about an agency that doesn't want translators to use WF because of "technical problems" with the TMX created by WF.

I have noticed major problems with importing TMX files created by WFC into Studio 2011. Studio will typically import a few hundred TUs and fail with an error about an unexpected token or invalid character. Of course, Studio fails the import completely, while other tools will skip the bad segment and continue. So
... See more
A posting has been made on the Wordfast Yahoogroups forum about an agency that doesn't want translators to use WF because of "technical problems" with the TMX created by WF.

I have noticed major problems with importing TMX files created by WFC into Studio 2011. Studio will typically import a few hundred TUs and fail with an error about an unexpected token or invalid character. Of course, Studio fails the import completely, while other tools will skip the bad segment and continue. So I will often have to import the TMX into another tool that is more forgiving, such as Workbench or Olifant, and export it again, in order to successfully import it into Studio.

So from my experience there does appear to be a problem with the TMX files created by WFC. Has anyone else had this experience?
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 10:33
Członek ProZ.com
od 2006

angielski > rosyjski
+ ...
SITE LOCALIZER
I'd say... Feb 19, 2013

... It's barking up the wrong tree. Using WF TMX files with OmegaT is not problematic, thus, the problems are probably about Trados.

 
John Fossey
John Fossey  Identity Verified
Kanada
Local time: 03:33
Członek ProZ.com
od 2008

francuski > angielski
+ ...
NOWY TEMAT
Good to have feedback Feb 19, 2013

esperantisto wrote:

... It's barking up the wrong tree. Using WF TMX files with OmegaT is not problematic, thus, the problems are probably about Trados.


Thanks for the feedback. In which case the complaint by the agency could well be a problem with their software as well.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 09:33
angielski > węgierski
+ ...
How do you know? Feb 19, 2013

esperantisto wrote:

... It's barking up the wrong tree. Using WF TMX files with OmegaT is not problematic, thus, the problems are probably about Trados.

The fact that OmT accepts the files doesn't necessarily mean that they are good files (i.e. that they meet the TMX spec). It may just be that OmT is very permissive.
Studio is unfortunately very picky when it comes to accepting TMX files, rejecting many files that other tools, including earlier trados versions, accept. Some of those files are good, some of them not so much (i.e. they are bad, they just aren't malformed enough to fail completely on other tools).

It'd be interesting to know what category the WF files fall into and what the exact problem is.

All that said, SDL should get its act together and write a TMX import filter that can skip malformed segments and move on with the import while issuing meaningful error messages - much like they would do well to write a doc/docx export filter that can tolerate certain flaws in the sdlxliff. Even partial, mangled but completed operations would be better that the current practice of leaving the user high and dry - better by a long shot.

[Edited at 2013-02-19 18:35 GMT]


 
Milan Condak
Milan Condak  Identity Verified
Local time: 09:33
angielski > czeski
WF2TMX Feb 19, 2013

John Fossey wrote:

So from my experience there does appear to be a problem with the TMX files created by WFC.


I have to edit it. The text in "" is not visible.

1. Standard TMX is in UTF-8 and this is declared:

""
""
""

(from LF-Aligner)

2. TMX from WFC 6.03 looks so:

""
""

Some CATs can recognize that data are in Unicode, some not.

3. New tool WfConvertor converts data from more formats into TXT Wordfast TM in Unicode. Conversion from TM into TMX is in non-standard Unicode.

4. The solution is very simple, to use older tool WF2TMX and on radio-button select an encoding UTF-8. The conversion is very fast and clear, see declaration of TMX:

""
""
"


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 09:33
angielski > węgierski
+ ...
Fixed Feb 19, 2013

You need to use character references for < and > to get them to show up.
Like so:


Milan Condak wrote:

John Fossey wrote:

So from my experience there does appear to be a problem with the TMX files created by WFC.


I have to edit it. The text in "<" and ">" is not visible.

1. Standard TMX is in UTF-8 and this is declared:

"<?xml version="1.0" encoding="utf-8" ?>"
"<!DOCTYPE tmx SYSTEM "tmx14.dtd">"
"<tmx version="1.4">"

(from LF-Aligner)

2. TMX from WFC 6.03 looks so:

"<?xml version="1.0" ?>"
"<tmx version="1.4">"

Some CATs can recognize that data are in Unicode, some not.

3. New tool WfConvertor converts data from more formats into TXT Wordfast TM in Unicode. Conversion from TM into TMX is in non-standard Unicode.

4. The solution is very simple, to use older tool WF2TMX and on radion-button select an encoding UTF-8. The conversion is very fast and clear, see declaration of TMX:

"<?xml version="1.0" encoding="UTF-8"?>"
"<tmx version="1.4">"
"<header"
creationtool="Wf2Tmx.exe"
creationtoolversion="1.0.11.41"
---
Milan Condak
Czech WF Trainer

br><br>[Upraveno: 2013-02-19 19:12 GMT]<br><br>[Upraveno: 2013-02-19 19:13 GMT]<br><br>[Upraveno: 2013-02-19 19:14 GMT]<br><br>[Upraveno: 2013-02-19 19:37 GMT]


 
esperantisto
esperantisto  Identity Verified
Local time: 10:33
Członek ProZ.com
od 2006

angielski > rosyjski
+ ...
SITE LOCALIZER
Sure, but… Feb 20, 2013

FarkasAndras wrote:

The fact that OmT accepts the files doesn't necessarily mean that they are good files (i.e. that they meet the TMX spec). It may just be that OmT is very permissive.


Good point, still…

Studio is unfortunately very picky


This only confirms that the problem is about Studio.

Anyway, a question to John Fossey: how did you produce the TMX file(s)? By export using the WF data editor? If yes, try converting the source WF translation memory using Olifant. Or try exporting it using Anaphraseus. Obviously, no guarantee…


 
esperantisto
esperantisto  Identity Verified
Local time: 10:33
Członek ProZ.com
od 2006

angielski > rosyjski
+ ...
SITE LOCALIZER
Lemme correct it Feb 20, 2013

Milan Condak wrote:

1. Standard TMX is in UTF-8 and this is declared:

Code:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">



(from LF-Al
... See more
Milan Condak wrote:

1. Standard TMX is in UTF-8 and this is declared:

Code:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">



(from LF-Aligner)

2. TMX from WFC 6.03 looks so:

Code:
<?xml version="1.0" ?>
<tmx version="1.4">



Some CATs can recognize that data are in Unicode, some not.

3. New tool WfConvertor converts data from more formats into TXT Wordfast TM in Unicode. Conversion from TM into TMX is in non-standard Unicode.

4. The solution is very simple, to use older tool WF2TMX and on radio-button select an encoding UTF-8. The conversion is very fast and clear, see declaration of TMX:

Code:
<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
<header
creationtool="Wf2Tmx.exe
creationtoolversion="1.0.11.41



Hint: use & l t ; (without spaces) for the less-than sign and & g t ; (without spaces) for the greater-than sign. Note that post preview will ruing it

P. S. This forum allows using code tags but handles them wrongly. I’m going to submit a support ticket.
Collapse


 
Milan Condak
Milan Condak  Identity Verified
Local time: 09:33
angielski > czeski
Presentation Feb 20, 2013

esperantisto wrote:

Hint: use & l t ; (without spaces) for the less-than sign and & g t ; (without spaces) for the greater-than sign. Note that post preview will ruing it



I created a presentation: WF2TMX, Unicode vs. UTF-8

http://condak.net/tmx/wfconverter/cs/00.html

I hope that it is clear: how to import TMX into WFC and TMX converted in WF2TMX.

Thank you esperantisto for hint. I found it in HTML editor, too.

Milan

[Upraveno: 2013-02-20 09:01 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Problems with WF TMX files







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »