Lionbridge Expert Commentary: Automated Translation Analysis

Lionbridge technology experts examine the Machine Translation and generative AI paradigms and share insights into the latest automated translation trends.

Machine Translation Technology Maintains Its Relevance Despite the Disruptive Nature of Generative AI

Changes abound: Understanding developments in automated translation

We’ve been saying for some time that the Machine Translation (MT) paradigm was ripe for disruption. Read our expert commentaries, and you’ll learn why.

Our automated translation experts offer insight into numerous topics, including:

The translation performance of MT engines and generative AI (GenAI) models at given points in time and what the results mean in a larger context
The limitations of automated translation tools
Ways to bolster the effectiveness of Machine Translation

The more you understand MT and GenAI, the more you can deploy the tools selectively to meet your needs. Capitalize on the strengths offered by each paradigm to ultimately achieve enhanced translation efficiency, increased content output, and cost savings.

Featured Lionbridge Expert Commentary

Noteworthy GPT-4 Peculiarities, October 2023

We’ve enhanced the Lionbridge Machine Translation (MT) Tracker, given the prevalence and promise of GenAI / Large Language Models (LLMs). From here on in, the tracker will include GPT-4 translation results in addition to GPT-3.5 and Davinci results and, of course, Neural MT (NMT) engine performance.

What are some of our latest findings? Some noteworthy GPT-4 peculiarities.

We faced several issues associated with GPT-4, including slow performance, its inability to provide translations for various reasons, and inconsistent behavior, such as missing translations in some runs but not in others.

Finding #1 — GPT-4’s failure to translate some text.

GPT-4 failed to translate a particular sentence in our MT test set.

After some research, we determined that a term with a sexual connotation in particular contexts caused the issue. To be clear, the sentence in our test set was entirely standard and acceptable. Nonetheless, the term triggered GPT-4’s sexual content filter anyway, and the AI subsequently censored the translation of that sentence and outputted nothing. We were surprised by this result for two reasons:

The typical use of that term in isolation had no issues.

The context of that particular sentence had no problematic interpretation.

This observation led us to conclude that perhaps a part of the GPT-4 filtering mechanism was based on a simple forbidden word list that also includes ambiguous terms. This approach is problematic as it is prone to overfire and provoke false positives, which is a serious issue for professional translation.

Because earlier Machine Translation technologies, such as Neural MT engines, do not have this type of content filtering issue, we can conclude it is a limitation of LLM technology.

The limitation has implications for real-world scenarios. For instance, imagine you need to translate medical content associated with gynecology or sexual education. You may be surprised that the LLM will not translate some of your text.

Interestingly, this issue happened to us only when translating that sentence into a particular language, Chinese, but not when translating it into other languages. This result indicates that the filter was on the GPT-4 output. The solution is to turn off the content filters for translation tasks.

Finding #2 — GPT-4’s output variability.

We found LLM Machine Translation output highly variable after five weeks of tracking, particularly with GPT-4.

While we expected this outcome for generative AI, the variability was more significant than anticipated — even when we used Temperature and Top Probability (Top_p) parameter settings to reduce creativity and make the output more deterministic. The translation output was different in every single GPT run we conducted, even when we ran translations one right after the other.

Both translations may be acceptable even though they differ. Nonetheless, this is another aspect to control and another difference from the previous Neural MT paradigm.

We are starting to intuit that this potential change of paradigm — from NMT to LLM MT — may not only be a technological change but also require us to have a change in mindset: We may need to be prepared to live with less deterministic outputs, even when using the very same input and the very same parameters, and expect to see more variability than what we are used to with current automation.

While we may have to live with more uncertainty to some extent, it may be possible to use some mechanisms and best practices to make that variability somewhat controllable.

Final note: There was a decrease in the Edit Distance for GPT-4 at the time of publication; this finding does not indicate decreasing quality. It is merely a reflection of the variability of GPT outputs.

—Rafa Moral, Lionbridge Vice President, Innovation

Index of Expert Commentary Topics

Browse the executive summaries below to explore the topics of our past expert commentaries.

March 2023 — A Large Language Model (LLM) outperforms a Neural Machine Translation (MT) engine: Now what?

February 2023 — Enhancing Machine Translation (MT): MT customization vs. MT training

January 2023 — Translation quality comparison between ChatGPT and the major MT engines

November 2022 — Microsoft MT improvement

October 2022 — MT and language formality

September 2022 — Using terminology for enhanced MT quality

August 2022 — Overcoming catastrophic errors during MT

July 2022 — Language ranking for MT

June 2022 — Accurately analyzing MT quality

May 2022 — Amazon and Yandex performance in May

April 2022 — Yandex performance in April

March 2022 — Custom MT comparative evaluations

February 2022 — The future of Neural Machine Translation (NMT) 

January 2022 — MT engine performance in January

December 2021 — Lionbridge adds Yandex MT to the MT Quality Tracker competitive check

November 2021 — Bing Translator makes improvements

October 2021 — How Amazon’s MT engine is progressing

September 2021 — Amazon makes improvements to MT quality

August 2021 — Top tech companies and their MT engine development

The Lionbridge Machine Translation Tracker

The Lionbridge Machine Translation tracker is the longest-standing measure of MT in the industry.

The tracker measures the overall performance of the five major neural MT engines and several GenAI models. It also evaluates translation quality based on language pairs and domains. GenAI does not outperform the major neural MT engines, with some exceptions. However, these models produce decent results, especially considering they haven’t been trained explicitly for translations.

What’s the takeaway? Amidst the strong interest in deploying GenAI/LLMs, Machine Translation continues to prove itself to be a worthy automated translation tool.

Translation results are constantly changing, and the tracker captures these fluctuations.

Go to the Tracker

Lionbridge Expert Commentaries

Gain insight from our automated translation experts.

March 2023

Generative Artificial Intelligence (AI) has achieved a significant milestone: It outperformed a Neural Machine Translation (MT) engine in one of our comparative evaluations. Specifically, Large Language Model (LLM) GPT-4 provided slightly better quality than Yandex for the English-to-Chinese language pair, as shown in Figure 1.

This development is noteworthy because it’s the first time a different type of MT approach has beaten a Neural MT engine since the advent of Neural MT. Moreover, a non-MT approach — a multi-purpose language automation not specifically prepared for Machine Translation — has beaten the Neural MT engine.

Why should you care about this occurrence? If you are an MT provider, you must be at the forefront of technological advancements and consider how they will impact your current MT offering to stay competitive. If you are an MT buyer, you must be privy to these developments to make sound MT investments, which will likely include some LLM-based technology instead of pure Neural MT offerings.

It's worth noting that generative AI is still in its early stages. As such, it falls short in some key areas. For instance, it produces variable outputs during multiple runs, has Application Programming Interface (API) instability, and makes more errors than Neural MT engines. These issues must be resolved for the technology to mature, and we are already seeing improvements being made at breathtaking speed.

The incredible speed at which LLMs can improve supports the notion that LLMs will become the next paradigm for Machine Translation. We expect a hybrid period whereby Neural MT providers integrate some aspects of LLMs into the Neural MT architecture as the paradigm evolves.

Read our blog for a translation quality comparison between Neural MT and LLM for two more language pairs and additional thoughts on whether it is the beginning of the end of the Neural Machine Translation paradigm.

—Rafa Moral, Lionbridge Vice President, Innovation

	MT Training
What it is and how it works	The building and training of an MT engine by using extensive bilingual data from corpora and Translation Memories (TMs) to improve the accuracy of machine-generated translations
What it does	Improves MT’s suggestions for more accurate output and reduces the need for post-editing
Specific benefits	Enables companies to attain a specific brand voice, tone, and style and achieve regional variations
The risks of using it	MT training may fail to impact output if there is not enough quality data to train the engine; the MT could generate poor suggestions and negatively impact overall quality if inexperienced authors overuse terminology
When to use it	Ideal for highly specialized content, marketing and creative content, and any content that requires: A specific brand voice, tone, or style Regional variation, and you have enough data for MT training
Success factors	A minimum of 15K unique segments to adequately train the engine
Cost considerations	There are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement; MT training can be worth the investment in certain cases when factoring in the potential benefits

	MT Customization	MT Training
What it is and how it works	An adaptation of a pre-existing Machine Translation engine with a glossary and Do Not Translate (DNT) list to improve the accuracy of machine-generated translations	The building and training of an MT engine by using extensive bilingual data from corpora and Translation Memories (TMs) to improve the accuracy of machine-generated translations
What it does	Improves MT’s suggestions for more accurate output and reduces the need for post-editing	Improves MT’s suggestions for more accurate output and reduces the need for post-editing
Specific benefits	Enables companies to adhere to their brand name and terminology and achieve regional variations	Enables companies to attain a specific brand voice, tone, and style and achieve regional variations
The risks of using it	The MT could make poor suggestions and negatively impact overall quality when executed improperly	MT training may fail to impact output if there is not enough quality data to train the engine; the MT could generate poor suggestions and negatively impact overall quality if inexperienced authors overuse terminology
When to use it	Ideal for technological and detail-oriented content and any content that requires: Accurate translations of terminology Regional variation, but you lack sufficient data for MT training	Ideal for highly specialized content, marketing and creative content, and any content that requires: A specific brand voice, tone, or style Regional variation, and you have enough data for MT training
Success factors	An experienced MT expert who can successfully manage input and output normalization rules, glossaries, and DNT	A minimum of 15K unique segments to adequately train the engine
Cost considerations	There is a one-time cost to update the profile that goes into the MT engine and some ongoing costs to maintain a glossary over time; costs are relatively inexpensive when factoring in the potential benefits and are typically lower than MT training costs	There are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement; MT training can be worth the investment in certain cases when factoring in the potential benefits

WHAT WE DO

GENERATIVE AI

INDUSTRIES

LAⁱNGUAGE CLOUD™

WHO WE ARE

Lionbridge Expert Commentary: Automated Translation Analysis

Machine Translation Technology Maintains Its Relevance Despite the Disruptive Nature of Generative AI

Changes abound: Understanding developments in automated translation

Featured Lionbridge Expert Commentary

Noteworthy GPT-4 Peculiarities, October 2023

What are some of our latest findings? Some noteworthy GPT-4 peculiarities.

Finding #1 — GPT-4’s failure to translate some text.

Finding #2 — GPT-4’s output variability.

Index of Expert Commentary Topics

The Lionbridge Machine Translation Tracker

Lionbridge Expert Commentaries

March 2023

February 2023

Machine Translation Customization vs. Machine Translation Training

January 2023

November 2022

October 2022

September 2022

August 2022

July 2022

June 2022

May 2022

April 2022

March 2022

February 2022

January 2022

December 2021

November 2021

October 2021

September 2021

August 2021

Meet the Experts

Rafa Moral

Yolanda Martin

Thomas McCarthy

Contact Us

LAⁱNGUAGE CLOUD™

INDUSTRIES