AI in Language Quality Assessment: Hands-On Lessons from the Field
A client came to Beyont with a complex challenge: how to assess translation quality across more than 120 web pages in seven languages while keeping costs down.
Together with our client, we came up with a possible solution. Could generative AI help us save time and money without compromising on results?
To put this idea into action, we designed and implemented a process that combined AI with human oversight. Instead of replacing human skill, we looked to direct expert eyes where they were needed most.
In these early days, our field is still searching for the best way to make language quality management more efficient with AI. Let’s see what we learned this time around, despite technical and practical constraints.
The Challenge: Quality Assessment at Scale
In this project, we faced a familiar conundrum. Manual review would be too slow and expensive for the client’s needs. But these pages represented the client’s global brand—so quality checks also needed to be accurate.
To solve this challenge, we needed a quick first-pass screening system to uncover which pages needed expert eyes. If content scored beneath a minimum threshold, it would go to our specialists for a full quality assessment.
The content itself made things trickier. We weren’t working with structured translation files. Instead, we had to use a web scraper to extract content from the client’s website—so we faced the extra challenge of separating essential information from unnecessary elements like metadata.
To complicate things further, the web pages also varied from language to language. Some markets had entire pages that didn’t exist for others. That meant it was challenging to compare translations directly.
A Hybrid Solution: Combining AI and Human Expertise
We needed a smarter approach, not just a faster one. Our solution combined standard automation, generative AI, and human expertise. The goal: streamline initial quality screening while reducing manual work.
- Automating the Basics: Data Collection and Filtering
First, we used a no-code tool to gather data. The tool extracted URLs from spreadsheets and collected webpage content.
We added filters to remove invisible elements like video thumbnails and metadata, focusing only on relevant text. A standard glossary check compared the source and target content against approved terminology.
- Quality Evaluation with AI
For the actual quality evaluation, we tried out several different engines before settling on the best one for the project.
Then, we prompted the AI to:
- Analyze both source and target content for quality, providing numerical scores (0-4) for adequacy and fluency
- Generate brief summaries of the translation quality for each URL.
- Establish baseline quality scores for the overall evaluation.
By performing multiple iterations on our initial prompts, we were able to refine the AI’s output over time. This approach sped up the process of screening content for basic quality and flagging it for in-depth assessment.
- Human Intervention: Still a Necessity
While AI played a key role, we found it impractical to automate the entire workflow. Team members had to intervene at various points to:
- Verify which terminology issues were actual problems, as the differences between source and target turned up too many false positives.
- Adjust overall scores to prevent the strict scoring system from resulting in automatic failures.
- Set up and troubleshoot the workflow at the initial stage.
Finally, our language quality specialists were still responsible for assessing any content that failed initial review—so human expertise remained central to the process.
Time and Cost Savings—with Room for Improvement
As we hoped, our AI-assisted workflow boosted efficiency and saved resources by reducing manual work.
We sped up our initial quality screening by 22 percent—and cut costs by 33 percent. A totally human review would have been more accurate, but we reduced errors by keeping people in the loop.
What’s more, we could have achieved better results had it not been for some project limitations.
Some tasks were simply hard to automate, even with AI. We also faced technical hurdles because we were working with unstructured content scraped from online sources, instead of organized bilingual files.
Looking Forward: Lessons for the Future
So, what can we learn from this experience? Here are a few takeaways if you’re thinking about bringing AI into your language quality processes:
- Be realistic about current AI capabilities. AI can streamline tasks like initial quality screening for large projects. It doesn’t replace human experts but can help flag content for deeper attention.
- Start small and refine your process: It can take time to find the right AI prompts and workflows. Test your approach with a limited content sample before expanding your scope.
- Pay attention to content formatting. The more structured your content, the more efficient your process is likely to be. Bilingual translation files make work easier for humans and AI alike.
With thoughtful implementation, AI tools can make some phases of language quality management faster and more cost-effective. Take a balanced approach to maintain quality standards while meeting deadlines and staying on budget.