File extraction: can I deactivate the function that separates repetitions into a second document?
Autor wątku: Martin Purdy
Martin Purdy
Martin Purdy
Local time: 23:54
niderlandzki > angielski
+ ...
Jul 2

WF Classic 8.93 (and every previous edition I've used since 2001), Word 2019

Something that's bugged me for the almost 23 years I've been using Wordfast: Is it possible to configure the extraction function so that the entire document is extracted into a single file, without peeling repetitions off into a separate file? Although the manual says this is intended to save time, it actually costs me more time: the repetitions in the source and target-language documents often don't match
... See more
WF Classic 8.93 (and every previous edition I've used since 2001), Word 2019

Something that's bugged me for the almost 23 years I've been using Wordfast: Is it possible to configure the extraction function so that the entire document is extracted into a single file, without peeling repetitions off into a separate file? Although the manual says this is intended to save time, it actually costs me more time: the repetitions in the source and target-language documents often don't match, so when I align the two extracted files, I'm left with non-matching segments and have to go hunting in the source files to fill the corresponding cell in the table, or worse, if a segment is mis-divided (say because of a period marking an abbreviation that's taken as an end-of-segment marker), I can have fragments of segments lost if they're identified as repetitions. Again, more hunting and picking required to rebuild the segments in the alignment table. Simply extracting the entire source document into one file (and ditto the target-language document) regardless of repetitions would streamline the process greatly.

I'd be immensely grateful if anyone can tell me how to deactivate this function.

Martin
NZ
Collapse


 
Martin Purdy
Martin Purdy
Local time: 23:54
niderlandzki > angielski
+ ...
NOWY TEMAT
Follow-up re extraction issue Jul 8

Surprised no-one else has reported similar frustrations. I've heard back from WF in the meantime, and it seems the option doesn't exist and won't be introduced - as far as I can make out, on the mistaken assumption that, when the Extract function pulls out a repetition from the source document, it will pull out a matching repetition in the same place from the target one. Sadly, that doesn't happen in actual use. So manual fix-ups it will continue to be.

 
Hans Lenting
Hans Lenting
Holandia
Członek ProZ.com
od 2006

niemiecki > niderlandzki
Extractamento Jul 9

From the manual:

Extract opens all selected documents and extracts all segments into a text document named "WfExtracted.txt". This document in text mode is presented to you so you can save it under a different name and/or folder if needed (save the document as Unicode if your language requires unicode).

The Extract tool also produces a second file named WfRepetitions.txt, located in the same folder as WfExtracted.txt, which contains all segments that were found repeated more than once. This allows a project manager to have repetitions translated before the project starts, and to add these translated repetitions to the TM being distributed to translators. This method ensures consistency across the project, and further cost-cutting.


Since I was disappointed by the extraction feature of my own CAT tool, I've been examining the use of regular expressions to achieve a better result. I expect that WFC uses them too.

So you could try to write your own Word macro to achieve exactly what you want. Here is an example.


Perhaps TransTools offers a tool for extraction?


 
Martin Purdy
Martin Purdy
Local time: 23:54
niderlandzki > angielski
+ ...
NOWY TEMAT
Wfrepetitions.text as a separate step Jul 9

Hans Lenting wrote:

The Extract tool also produces a second file named WfRepetitions.txt, l


This is the crux of the matter for me. This step is clearly an add-on, so those with the code should surely be able to offer the option of stopping the process before this second step is run. For the reasons stated in my original question, peeling off repetitions in this way is actually counterproductive when it comes to aligning two extracted files to create a translation memory. I'm not a coder/programmer so creating something of my own that will do this is not an option.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

File extraction: can I deactivate the function that separates repetitions into a second document?







Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »