Advice for working with large or numerous TMX files?
Thread poster: Mercer
Nov 19, 2016

Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time… Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have?

The fi
... See more
Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time… Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have?

The files were originally aligned with LF Aligner. I have run the files through TMX Cleaner and that I think that helped since it reduced the file size, but I am not certain how much of a difference it really makes.

My main questions would be:

1. Is there any approximate guideline as to how much RAM is needed for certain sizes of translation memories (i.e. 2 GB of RAM for 500 mb of TMX files, etc.?)

2. Is it generally better to have 100 x 1 MB TMX files or a single 100 MB TMX file? (And is there a certain file size that we shouldn’t exceed?)

3. Does gzipping TMX files helps performance?

4. Does stripping TMX files of “useless” metadata (creation date, etc.) helps?

Any help or advice is appreciated, thanks!
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 18:49
English to French
+ ...
Check memory first Nov 21, 2016

Mercer wrote:
Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time…

What memory is allocated to OmegaT?
Are you using a 64-bit Java?

Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have?

The files were originally aligned with LF Aligner. I have run the files through TMX Cleaner and that I think that helped since it reduced the file size, but I am not certain how much of a difference it really makes.

My main questions would be:

1. Is there any approximate guideline as to how much RAM is needed for certain sizes of translation memories (i.e. 2 GB of RAM for 500 mb of TMX files, etc.?)

I don't think so.


2. Is it generally better to have 100 x 1 MB TMX files or a single 100 MB TMX file? (And is there a certain file size that we shouldn’t exceed?)

I don't think it makes much difference.


3. Does gzipping TMX files helps performance?

No, as it doesn't change the data size in memory.


4. Does stripping TMX files of “useless” metadata (creation date, etc.) helps?

Yes.

Didier


 
MikeTrans
MikeTrans
Germany
Local time: 18:49
Italian to German
+ ...
The trick I have used in the past... Nov 21, 2016

Hi Mercer,
currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick.

What you need:
Word or expression extraction tool. A free one is Extphr32
XBench v. 2.9

After extracting the whole word / expression list from your project to be translated, you do the following:

a)
Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (
... See more
Hi Mercer,
currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick.

What you need:
Word or expression extraction tool. A free one is Extphr32
XBench v. 2.9

After extracting the whole word / expression list from your project to be translated, you do the following:

a)
Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (the search string will be huge but I remember that XBench was able to read it and display all occurences in the database).

b)
make all your TMXs file read by XBench and apply the search string from a).

c) Export and create a TMX from all the occurences found by XBench.

The result is: you work only with TMX segments that your project needs.

Note: Trados Studio does this automatically on every project in the background, it is called "Project TM".

I hope this helps,
Mike
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 17:49
Member (2009)
Dutch to English
+ ...
three tips Nov 21, 2016

MikeTrans wrote:

Hi Mercer,
currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick.

What you need:
Word or expression extraction tool. A free one is Extphr32
XBench v. 2.9

After extracting the whole word / expression list from your project to be translated, you do the following:

a)
Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (the search string will be huge but I remember that XBench was able to read it and display all occurences in the database).

b)
make all your TMXs file read by XBench and apply the search string from a).

c) Export and create a TMX from all the occurences found by XBench.

The result is: you work only with TMX segments that your project needs.

Note: Trados Studio does this automatically on every project in the background, it is called "Project TM".

I hope this helps,
Mike


1. If you happen to have memoQ installed, you can also connect all the TMXs to a project, and then create a TM with only the relevant TUs (via the Statistics, Create TM, route).

2. CafeTran can also do sth similar, using Total Recall. Create a huge Total Recall containing all your TMXs, create project with your files to be translated, run Total recall on the project, export matches to TMX, use in OmegaT

3. The Olifant TMX editor is very good at quickly removing useless metadata (fields) (Edit > Attributes)


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 18:49
Compacting TMX files Nov 21, 2016

OP asked:

4. Does stripping TMX files of “useless” metadata (creation date, etc.) helps?


Didier Briel wrote:

Yes.


Some time ago I posted a generic AppleScript to compact TMX files:

http://www.proz.com/forum/apple_mac_operating_systems/307780-applescript_to_compact_tmx_files_with_textwrangler.html

You can either use this script (after some modification) or use it for inspiration for your own script (e.g. with AutoHotkey).

[Edited at 2016-11-21 13:39 GMT]


 
MikeTrans
MikeTrans
Germany
Local time: 18:49
Italian to German
+ ...
Olifant may also help... Nov 21, 2016

Just to complete what I said above, I think that Olifant (free TM management tool) can also help because it can read virtually any TM size as long as you have the RAM necessary. It must be possible to create such a 'search string' similar to the one I told above, or better: use the string as a filter in Olifant. SQL and RegEx filters are supported.

Mike


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 18:49
English to Hungarian
+ ...
External search or prune Nov 21, 2016

I think you either need to cull your TMs (one could possibly use search terms or other methods to customize it for the project on hand á la project TM in trados), or you need to use an external search tool, such as TMLookup. 5 GB of TMX es probably amounts to 10 million TUs or more, which is more than what normal CAT tools can handle on normal hardware.

 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Advice for working with large or numerous TMX files?






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »