How to generate a multiple language terminology db from a Web glossary

translation_articles_icon

ProZ.com Translation Article Knowledgebase

Articles about translation and interpreting
Article Categories
Search Articles


Advanced Search
About the Articles Knowledgebase
ProZ.com has created this section with the goals of:

Further enabling knowledge sharing among professionals
Providing resources for the education of clients and translators
Offering an additional channel for promotion of ProZ.com members (as authors)

We invite your participation and feedback concerning this new resource.

More info and discussion >

Article Options
Your Favorite Articles
Recommended Articles
  1. ProZ.com overview and action plan (#1 of 8): Sourcing (ie. jobs / directory)
  2. Réalité de la traduction automatique en 2014
  3. Getting the most out of ProZ.com: A guide for translators and interpreters
  4. Does Juliet's Rose, by Any Other Name, Smell as Sweet?
  5. The difference between editing and proofreading
No recommended articles found.

 »  Articles Overview  »  Technology  »  Software and the Internet  »  How to generate a multiple language terminology db from a Web glossary

How to generate a multiple language terminology db from a Web glossary

By Maurizio Valente | Published  05/23/2007 | Software and the Internet | Recommendation:RateSecARateSecARateSecARateSecARateSecI
Contact the author
Quicklink: http://pol.proz.com/doc/1238
Author:
Maurizio Valente
Włochy
angielski translator
 
View all articles by Maurizio Valente

See this author's ProZ.com profile
Macros can be used to automatize repetitive, tedious and time-consuming jobs. "Windows Macros", i.e. third-party macros working in all programs and objects within the Microsoft Windows environment, have a much wider scope than Office Macros (please see http://www.proz.com/doc/657).
Suppose you have downloaded a lot of htm files from a 5-language glossary on the Web. Each htm file contains a lot of tags and text, but you are only interested in the 5 terms in 5 different languages. How can you get rid of the rest?

Suppose that our glossary contains several hundreds of pages such as
http://www.markisenmotor.de/en;service;glossary;d:1327.htm.
In this case we are only interested in

accessories (EN)
Zubehör (DE)
accesorio (ES)
accessoires (FR)
accessori (IT)

Suppose you have already downloaded all these pages and saved them in folder on your hard disk. You can open one of these pages with your favorite browser, Select All, Copy and Paste them into an empty Excel file. (You should use the 'Paste Special' option). Now you don't want to do this manually, because you have several hundreds of pages. So you can conceive a macro which makes this job for you.
This is very simple.

My initial flow chart was:
Focus on the Explorer window
Double-click on the item No. 1 in order to open File No. 1 with the default browser
Focus on the browser window
Select All
Copy
Focus on the (initially empty) Excel file
Paste Special
Arrow Left as many times as required
Focus on the Explorer window
Double-click on the item No. 2 in order to open File No. 2 with the default browser
Focus on the browser window
Select All
Copy
Focus on the Excel file (now the cursor has moved to an empty cell)
Paste Special
etc…

This way we get a file with a lot of useless stuff, but English terms are all on a same ROW, and the same applies for German/Spanish/French/Italian ones. When the macro execution is completed, we will have to delete all useless rows.

But this is not exactly our goal, as we wish to get an Excel file with each COLUMN corresponding to a given language.

This can be solved by using an additional Excel file, copying and pasting Special with the Transpose option, in order to obtain a 90° rotation of the content.

For those of you who are using Macro Express, the macro is as follows (For display reasons, I have replaced the 'less than' and 'greater than' symbols with ≤ and ≥, respectively):

Activate Window: "H:\Glossari\MECCANIC\Meccanica multilingue da Internet"
Wait For Window Title: "H:\Glossari\MECCANIC\Meccanica multilingue da Internet"
Text Type: ≤ENTER≥
Delay 1 Seconds
Wait For Window Title: "Mozilla"
Text Type: ≤CONTROL≥a
Text Type: ≤CONTROL≥≤INSERT≥
Window Close: "Mozilla Firefox"
Activate Window: "Ausiliario alla macro.xls"
Wait For Window Title: "Ausiliario alla macro.xls"
Text Type: ≤ALT≥m
Text Type: p
Text Type: ≤ARROW DOWN≥≤ARROW DOWN≥
Text Type: ≤ENTER≥
Text Type: ≤HOME≥
Keystroke Speed: 0 Milliseconds
Repeat Start (Repeat 38 times)
Text Type: ≤SHIFT≥≤ARROW DOWN≥
Repeat End
Keystroke Speed: 50 Milliseconds
Text Type: ≤CONTROL≥≤INSERT≥
Delay 1 Seconds
Text Type: ≤CONTROL≥≤HOME≥
Activate Window: "Ausiliario alla macro con traspo.xls"
Wait For Window Title: "Ausiliario alla macro con traspo.xls"
Delay 1 Seconds
Text Type: ≤ALT≥m
Text Type: p
Text Type: t
Text Type: ≤ENTER≥
Delay 1 Seconds
Text Type: ≤ARROW DOWN≥
Delay 1 Seconds
Activate Window: "H:\Glossari\MECCANIC\Meccanica multilingue da Internet"
Wait For Window Title: "H:\Glossari\MECCANIC\Meccanica multilingue da Internet"
Text Type: ≤ARROW DOWN≥
Delay 1 Seconds
Repeat End


Copyright © ProZ.com, 1999-2024. All rights reserved.
Comments on this article

Knowledgebase Contributions Related to this Article
  • No contributions found.
     
Want to contribute to the article knowledgebase? Join ProZ.com.


Articles are copyright © ProZ.com, 1999-2024, except where otherwise indicated. All rights reserved.
Content may not be republished without the consent of ProZ.com.