Machine Translation in Translation

A cheat sheet: Machine Translation terminology from Artificial Intelligence to Large Language Models and beyond

Last updated: December 14, 2023 5:38PM

Decades into its long history, Machine Translation (MT) is thriving, with advancements in Large Language Models (LLMs) garnering newfound excitement by the public and within language service circles.

In recent years, all things Artificial Intelligence (AI) have cemented themselves firmly into the zeitgeist, with generative AI (GenAI) emerging as one of the latest buzzwords, captivating attention across multiple industries. No matter how you connect to the concept of Machine Translation, you need to know how to talk about it.

As applications of AI have become increasingly accessible to companies and consumers, a lexicon of closely related terms has emerged. If you’re an outsider looking in, how do you parse the difference among terms sometimes used interchangeably?

How do you translate Machine Translation?

We’re here to help. Here at Lionbridge, some of the most experienced MT experts in the world are part of our pride. We’ve worked with them to develop this cheat sheet to help you determine the subtle and not-so-subtle differences in the terms that keep the industry moving.

1. Artificial Intelligence

To understand recent trends in MT, you first need to familiarize yourself with the backdrop against which they have been happening: heady, hefty Artificial Intelligence. AI is “intelligence” that machines demonstrate when they perform tasks usually considered to require inherently human types of thinking, such as learning and problem-solving. In recent years, AI has benefited from increasing computer power. More powerful computers yield more intensive processing during a task at hand and more advanced machine learning, which is how computers gain the knowledge required for AI applications.

2. Machine Learning

Machine learning is a branch of computer science that uses massive amounts of data to teach computers how to perform tasks. Machine learning examines data related to a particular task, finds patterns in those data, makes associations among those patterns, and then uses those new learnings to shape how the computer performs the task. If, after this analysis, the computer gets better at performing the task, then we say machine learning has occurred.

Because we have data on almost everything you can imagine, people are using machine learning to improve computer performance in everything from weather forecasting to automatic stock selection to Machine Translation.

3. Machine Translation

Put simply, Machine Translation is automated translation: You present source material to a computer in one language, and it gives it back to you in another language. It’s imperfect, but it’s one of the most powerful tools we have for producing high-quality translations more efficiently.

Over the last several decades, MT has evolved in the quality of its output and the breadth of languages it supports. From simple word replacement systems in the very early days of MT to the explicitly coded grammar and lexicons of rules-based MT, to the number-crunching paradigm of Statistical MT, to the Deep Learning and neural networks of Neural MT, to the uncanny human-like output of generative AI, the development of Machine Translation has mirrored our increasingly sophisticated use of computers.

Visit our Machine Translation thought leadership page for the latest trends on MT.

A futuristic hologram depicting Machine Translation

4. Statistical Machine Translation

Statistical Machine Translation (SMT) leverages machine learning to generate a massive number of translation candidates for a given source sentence and then select the best one based on the likelihood of words and phrases appearing together in the target language. SMT learns about translation through the lens of “n-grams”— small groupings of words that appear together in the source and target language. During the machine learning phases, an SMT system is given training material: that is, many, many examples of sentences in the source language and their translations into the target language. The learning algorithm divides source sentences and target sentences into n-grams. Then, it determines which target language n-grams are likely to appear in a translation when a certain source language n-gram appears in a sentence.

The learning algorithm then builds a language model that calculates the likelihood that given words and phrases appear next to one another in the target language. When the learning is done, and it’s time to translate new material, the SMT system breaks the new source sentence down into n-grams, finds the highly associated target language n-grams, and generates candidate sentences. The final translation is that sentence whose target language n-grams correlate most highly with the source sentence’s n-grams and whose target language words are most likely to appear together in the target language.

SMT works surprisingly well, particularly considering that there is nothing linguistic about an SMT system; indeed, the system only considers n-grams, never a comprehensive sentence. This approach differs from an alternative approach to MT: Neural Machine Translation.

5. Neural Machine Translation

Neural Machine Translation (NMT) overcomes the greatest shortcoming of SMT: Its reliance on n-gram analysis. NMT empowers the machine — the system receives the training material, just as it would with SMT, but there’s a key difference. Once the system receives the material, it decides how to learn everything it can about that data on its own.

NMT systems build information vectors for each source sentence, associating information about each word with the words surrounding it. Some systems develop hundreds of pieces of information per word, creating a deep sense of accuracy. Through deep learning, NMT systems capture a massive amount of information about each word and source sentence, then use what’s called an attention model to focus on the critical features it has learned through analysis of these massive data streams, which are important for the translation process. The result has been translations showing marked improvements in fluency; computer-generated translations began to sound more and more natural.

NMT has been game-changing in our industry, and we increased our use of MT to accelerate our production processes as toolsets matured and the technology improved. By 2022, the major Neural Machine Translation engines failed to improve quality substantially, signaling that this paradigm is ending and creating conditions ripe for disruption.

6. Generative AI / Large Language Models

Generative AI is an AI system that can generate novel content, including text and images, based on prompts and comprehensive multimodal training. It’s notable for its ability to produce output with human-like quality. A Large Language Model is an AI system focused on languages. It can summarize, translate, predict, and generate text from knowledge gained from massive databases. Although it’s not specifically trained to translate text, it can do so with good (though not excellent) quality and is quickly improving. ChatGPT was the first LLM to go mainstream in November 2022 and attracted 100 million users in only two months following its launch.

GenAI/LLM technology will increasingly address repetitious, core linguistic activities as it expands in capability. We expect it to create more space for higher-value human activities in the following three areas:

Content Ideation — People ignite the content creation process with ideation.
Content Validation — People ensure accuracy, security, and authenticity.
Content Analysis — People enable stronger monitoring and better performance.

Higher-value services like transcreation will become more economically attainable for companies, ultimately enabling brands to deliver content that better resonates with their buyers and is more convincing and trustworthy to buyers in different countries. Lionbridge is identifying generative AI use cases and developing applications to leverage LLMs to their fullest capability to automate the localization workflow.

7. Large Language Model Machine Translation

Large Language Model Machine Translation refers to the use of LLMs for MT. LLM MT may replace the Neural MT paradigm one day, but the technology is not yet mature. LLMs produce decent output, and OpenAI’s GPT-4 model even outperformed the Yandex Neural Machine Translation engine in the English-to-Chinese language pair in one Lionbridge evaluation. Nonetheless, at the time of this writing, LLMs cannot match the speed, quality, and affordability of the five major Neural Machine Translation engines, making them an unsuitable substitution for Neural MT engines. Lionbridge monitors LLM Machine Translation’s performance via the Lionbridge Machine Translation Tracker. The tracker now measures several LLM models, including evaluating GPT-4 Machine Translation.

8. Human-in-the-Loop AI Translation

Human-in-the-loop AI translation refers to the combined efforts of humans and machines to produce the translation outcomes you need.

While GenAI/LLMs enhance translation efficiency and cost-effectiveness, human input is indispensable for the following reasons:

The technology can’t replace human ingenuity.
You can’t entirely trust the technology without supervision.
The technology is incapable of running independently.

Here’s how humans overcome some key issues that LLMs present and add value:

They review the translated output in its entirety, which is especially important for consistency. GenAI/LLM technology works best when the prompt is kept to a few hundred words, a constraint that often results in chunks of inconsistent translation output.
They infuse multiple glossaries and instructions per project type into a series of prompts for a consistent brand voice.
They generate prompts, an initial step and critical requirement for effective GenAI/LLM performance that the technology cannot execute on its own.
They create sophisticated, dedicated platforms that organize thousands of prompts, recycle their usage, and interject instructions and glossaries as needed for desired outcomes. Lionbridge has designed and launched a prompt iteration platform to recycle and iterate effective prompts.
They determine how to integrate LLM technology into existing workflows that leverage Translation Memories and Neural Machine Translation side-by-side to save time, reduce effort, and cut costs. Lionbridge’s dedicated AI team excels at harmonizing LLMs into existing workflows.

Why Lionbridge

At Lionbridge, we speak the MT language fluently. We’ve been offering MT at scale since 2002 and are at the forefront of the latest, exciting developments. Read the Lionbridge Machine Translation Report for a comprehensive look at MT.

Get in touch

Interested in implementing the latest tools to automate your translations? Reach out to us today to learn more.

#blog_posts
#translation_localization

AUTHORED BY

Lionbridge

WHAT WE DO

GENERATIVE AI

INDUSTRIES

LAⁱNGUAGE CLOUD™

WHO WE ARE