Translation Language Pairs, Machine Translation, and the Human Translator

A language pair is formed when one language is translated into another language. When English is translated into Spanish, this is called and English-Spanish language pair. If you are able to translate information the other way, a second language pair is created: Spanish-English. Language pairs allow information to be shared and transferred from one language (source language) to another language (target language).

There are roughly 6,500 languages spoken in the world. If each of these languages were to have a language pair, there would be over 36 million of them. If you eliminate the 2,000 languages that have less than 1,000 speakers, you drop the necessary language pairings to just over 20 million. While this is a great goal, the best start would be to target the 1,500 languages currently used on the internet. By focusing on these 1,500 languages, the potential language pairs drops to 2.25 million.

The top 10 languages in the world, ordered by numbers of speakers, are French, Malay-Indonesian, Portuguese, Bengali, Arabic, Russian, Spanish, Hindustani, English, and Mandarin. If each of these languages was paired with the 1,490 other languages of popular usage, there would be a need for just over 30,000 language pairs. If each of these main languages had pivots with each other, you would still need less than 300,000 total language pairs to reach 99 percent of all the users on the internet in their local language.

Why is this important? Business studies have consistently proven that consumers are more likely to purchase products or read information that are marketed or presented in their native language.

There are over 2 billion people in the world, but, according to “English as a Global Language” by David Crystal, there are only around 400 million who speak English as their first language and another 600 million who speak it as a second language. This leaves over half the world’s population, over 1 billion people, that does not speak English. This is a huge market that is not being reached if the only language your information is presented in is English.

Machine Translation and Global Language Pairing

Computers have been used to create termbases and translation memory databases for decades. With the increase in computing speed, time spent collecting data, and the advances in translation algorithms, the advances in computer translation have been exponential.

Termbases take individual words from one language and translate them directly into a literal equivalent in another. Translation memory is a database that matches segments of text (unique phrases, idioms, or concepts) with an equivalent translation. Not only does this ensure there is cohesion in your text but consistency among all of your translated documents. Even better, since these translations are saved, future translations will be much faster and less expensive. In terms of speed and cost, the more you translate and build your termbase and translation memory, the faster your projects can be completed and the more money you save.

There are two main systems for translating using termbases and translation memory: rules-based systems and statistical systems. Rules-based systems focus on grammar rules, accurate terminology, etc. Statistical systems do not focus on language rules, but on analyzing data from previously translated data pairs in an attempt to sound more fluent.

Academic papers, medical papers, and the like often demand exacting translations and great reliability. The rule-based translation would be better suited for this sort of text.

The Machine Translation Dilemma

Machine translation can open up markets to translation that were previously closed while significantly chipping away at the number of pairings needed to reach the global market. The downside to using machine translation is the need for human oversight. At present, there are no computer models which can produce documents of a human-translation quality. There are simply too many idioms and colloquialisms, too much lexical ambiguity, and too many cultural nuances that make machine translations read like translations instead of a native production.

There is no doubt that machine translation is making a mark on the translation industry, the number of language pairs made available by the use of computing technology is revolutionary, and, at times, all you need to get across is the gist of a message. In this case, you can reach a global audience in their local languages with machine translation. However, when an error, misperception, or miscalculation could lead to a loss of contract, a major cultural faux pas, or (as in the case of a medical text), even the loss of life, close is not close enough.

For businesses looking to get their message out to an ever-expanding global market, the fields are ripe. What must not be overlooked, however, is the importance of having a human translator with native-level fluency in the source and target language to ensure the message is accurate and devoid of inaccurate, erroneous, or potentially embarrassing or offensive material.

Share your thoughts