Advice for working with large or numerous TMX files? Thread poster: Mercer
|
Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time… Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have? The fi... See more Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time… Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have? The files were originally aligned with LF Aligner. I have run the files through TMX Cleaner and that I think that helped since it reduced the file size, but I am not certain how much of a difference it really makes. My main questions would be: 1. Is there any approximate guideline as to how much RAM is needed for certain sizes of translation memories (i.e. 2 GB of RAM for 500 mb of TMX files, etc.?) 2. Is it generally better to have 100 x 1 MB TMX files or a single 100 MB TMX file? (And is there a certain file size that we shouldn’t exceed?) 3. Does gzipping TMX files helps performance? 4. Does stripping TMX files of “useless” metadata (creation date, etc.) helps? Any help or advice is appreciated, thanks! ▲ Collapse | | | Didier Briel France Local time: 18:49 English to French + ... Check memory first | Nov 21, 2016 |
Mercer wrote: Hi, I have around 5gb of TMX files (3000 aligned documents) that I would like to use with OmegaT, but I finding now that I cannot use them all at the same time or else the system slows to a crawl or becomes unusable. I am guessing the obvious solution would be to use more RAM, but the computer is limited to 8 GB and also needs to run other programs at the same time… What memory is allocated to OmegaT? Are you using a 64-bit Java? Are there any ways to optimize my TMX files so that I can use as many as possible with the resources I have? The files were originally aligned with LF Aligner. I have run the files through TMX Cleaner and that I think that helped since it reduced the file size, but I am not certain how much of a difference it really makes. My main questions would be: 1. Is there any approximate guideline as to how much RAM is needed for certain sizes of translation memories (i.e. 2 GB of RAM for 500 mb of TMX files, etc.?) I don't think so. 2. Is it generally better to have 100 x 1 MB TMX files or a single 100 MB TMX file? (And is there a certain file size that we shouldn’t exceed?)
I don't think it makes much difference. 3. Does gzipping TMX files helps performance?
No, as it doesn't change the data size in memory. 4. Does stripping TMX files of “useless” metadata (creation date, etc.) helps?
Yes. Didier | | | MikeTrans Germany Local time: 18:49 Italian to German + ... The trick I have used in the past... | Nov 21, 2016 |
Hi Mercer, currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick. What you need: Word or expression extraction tool. A free one is Extphr32 XBench v. 2.9 After extracting the whole word / expression list from your project to be translated, you do the following: a) Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (... See more Hi Mercer, currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick. What you need: Word or expression extraction tool. A free one is Extphr32 XBench v. 2.9 After extracting the whole word / expression list from your project to be translated, you do the following: a) Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (the search string will be huge but I remember that XBench was able to read it and display all occurences in the database). b) make all your TMXs file read by XBench and apply the search string from a). c) Export and create a TMX from all the occurences found by XBench. The result is: you work only with TMX segments that your project needs. Note: Trados Studio does this automatically on every project in the background, it is called "Project TM". I hope this helps, Mike ▲ Collapse | | | Michael Beijer United Kingdom Local time: 17:49 Member (2009) Dutch to English + ...
MikeTrans wrote: Hi Mercer, currently I'm working with other CAT tools able to handle large TMXs, but in the past I used the following trick. What you need: Word or expression extraction tool. A free one is Extphr32 XBench v. 2.9 After extracting the whole word / expression list from your project to be translated, you do the following: a) Arrange all extracted expressions in a search string where all expressions are separated by ; IIRC it's the ; (the search string will be huge but I remember that XBench was able to read it and display all occurences in the database). b) make all your TMXs file read by XBench and apply the search string from a). c) Export and create a TMX from all the occurences found by XBench. The result is: you work only with TMX segments that your project needs. Note: Trados Studio does this automatically on every project in the background, it is called "Project TM". I hope this helps, Mike 1. If you happen to have memoQ installed, you can also connect all the TMXs to a project, and then create a TM with only the relevant TUs (via the Statistics, Create TM, route). 2. CafeTran can also do sth similar, using Total Recall. Create a huge Total Recall containing all your TMXs, create project with your files to be translated, run Total recall on the project, export matches to TMX, use in OmegaT 3. The Olifant TMX editor is very good at quickly removing useless metadata (fields) (Edit > Attributes) | |
|
|
CafeTran Training (X) Netherlands Local time: 18:49 | MikeTrans Germany Local time: 18:49 Italian to German + ... Olifant may also help... | Nov 21, 2016 |
Just to complete what I said above, I think that Olifant (free TM management tool) can also help because it can read virtually any TM size as long as you have the RAM necessary. It must be possible to create such a 'search string' similar to the one I told above, or better: use the string as a filter in Olifant. SQL and RegEx filters are supported. Mike | | | External search or prune | Nov 21, 2016 |
I think you either need to cull your TMs (one could possibly use search terms or other methods to customize it for the project on hand á la project TM in trados), or you need to use an external search tool, such as TMLookup. 5 GB of TMX es probably amounts to 10 million TUs or more, which is more than what normal CAT tools can handle on normal hardware. | | | There is no moderator assigned specifically to this forum. To report site rules violations or get help, please contact site staff » Advice for working with large or numerous TMX files? Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |