Extracting terminology from a text prior to translation allows translators to create a language-specific and even job-specific glossary or translation memory database. This greatly increases the speed of translation and reduces the cost of your projects.

Types of Term Extraction

The two main types of term extraction are manual and automatic. Manual term extraction is just like is sounds: a translator goes through the text, catalogues words, and prepares translations. In automatic term extraction, a computer scans the document and, based on preset parameters, extracts words or phrases quickly and efficiently.

As you may expect, there are pros and cons to both systems. The automatic term extraction seems like the obvious choice, however, it has some serious limitations. For example, the computer does not limit the extraction to lexical form (nouns, verbs, adjectives, etc.) or contextual variation. If you were looking at the word “run” it would be extracted regardless of how it was used.

Automatic term extraction is often combined with the “find and replace” feature of computer translation. When used with a word like “run” however, your text will likely be filled with errors. In this case, the word has so many variations that are not defined that an automated extraction would not even be helpful in creating a termbase or glossary.

A variation on the automatic term extractor is the concordance. Concordance software extracts all usages of a particular term and shows it in relations to the text next to it. This allows translators to better determine the word form and context. They can then update the translation memory database prior to running the machine translation.

Note that both the automatic extraction and the concordance require a human to sort through the terms in order to separate the many variation and forms of words and phrases.

The advantage to manual term extraction is obvious. The person extracting the terms sees the word in context and can assign the proper translation right on the spot. Having a human going through each and every word of the document is much the same as just having a human hand-translate it. While there may be some time savings, it may not be a whole lot.

The tradeoff between the two term extraction forms is found in the length of text. For shorter texts, a hands-on approach generally comes out ahead. On longer texts, the computer is often a more efficient choice even with a human post-edit.

Translation Benefits of Term Extraction

If the benefits of term extraction do not seem clear, consider the following:

  • Entire documents can be updated with localized terms quickly and efficiently.
  • Translations of language-specific terminology can be efficiently assigned to a document. Consider the variations between Mexican and Andalusian Spanish. If the translator knew the specific phrases or words they were looking for (and an expert in Spanish translation would), they could quickly extract those terms, update the glossary or translation memory database, and enhance the document translation.
  • Project-specific language can be pulled out and updated to the glossary. If the author did not provide a list of all the words that may not be in a standard glossary, an automatic extraction will pull them out and provide the translator with a list of specialized words or phrases that need translation.

Today, most commercial translation and localization projects are carried out without a comprehensive, project-specific, up-to-date glossary in place. Some of the important information that may be included in a project that are not in standard glossaries are:

client’s business name             product names             trademarks

idiomatic expressions              neologisms                  buzz words

Further, consider the number of new genes, chemical compounds, drugs, and so forth that may be found in a medical or scientific journal, paper, or book. The creation of new information and words to explain the new information is growing faster than commercial databases can keep up. A concordance will help extract these words and prepare them for translation.

Some terms may be so important that to get them wrong could not only lead to great embarrassment but to the failure of a project, the loss of a contract, or even a loss of trust and reputation. Seeing them in context helps ensure they are properly translated.

Term extraction prior to the start of translation isolates high use words and phrases, new words and phrases, and essential terms. Term extraction also allows the entire translation team to take a common approach and have a unified vocabulary. This not only improves the overall flow of the document or project, but it cuts down on the time needed during post-translation review and correction.