Native vs. Translation
Why using translation services for multi-language text analysis sucks.
Companies and services that operate in global markets and are using real-time text analysis to deliver competitive advantage rely on services that analyse multi-language content.
For example, if you want to know how customers view your product in different markets then you need to be analysing online customer reviews across multiple languages. Other areas ripe for text analysis such as surveys, call centre transcripts, Zoom call transcripts, and more are also going to rely on multi-language coverage.
But here providers such as Voice of Customer agents or NLP platforms fall well short in terms of accurate analysis. The reason? Not only do the text analytical techniques often lack true Natural Language Understanding (NLU), but they also rely on translation services that distort the text being analysed, and therefore compromising accurate sentiment analysis, categorisation and more.
The text analysis results for multi-lingual content end up being somewhere between ‘not that great’ and ‘extremely underwhelming’; and that’s simply not good enough as a base for effective decision-making.
Multi-language Text Analysis with No Translation
One of the most remarkable things about HIPSTO’s FALCON V platform and the underlying native Blind Vision technology that powers it is our highly accurate Natural Language Understanding of over 100 languages. We deliver ground-breaking web scraping, sentiment analysis, text classification and other intelligent services across all the world’s major languages, all without sacrificing levels of accuracy.
We believe the combination of the breadth of language coverage with high-quality results within one, single platform is unique and offers an industry-leading solution for global organizations who need to scrape, analyse and extract information from global content in real-time. One of the reasons we achieve this is because we don’t use translation services that fundamentally change the content that is being analysed.
The Inaccuracy Caused by Translation
If you’re using an NLP or AI platform that claims to supply multi-lingual text data analysis than it is likely it leverages translation services and agents such as Google Translate or Azure Cognitive Services in its core process. It will scrape the web content, translate it via the agent and then analyse the translated text. But that’s a significant problem, as often the translated copy is simply not good enough to analyse.
How do we know this? We put it to the test! This summer we ran an experiment that pits the accuracy of our native text analysis which completely avoids translation, against text that has been translated via the Google Translate API.
We conducted experiments on two sets of short-form text that had been translated from using Google Translate from Chinese to English and Russian to English. We selected these languages because Google Translate has a good translation model and record for these languages. We also used a typical area for short-form text web scraping; using customer reviews on e-commerce platforms that relate to products that have their own Stock Keeping Unit (SKU).
We carried out classification and sentiment analysis using our own proprietary technology against the existing dataset in its original language and the translated dataset. Using a robust methodology to establish accuracy (including sampling with human input) we compared the results. The outcome? A minimum of 10% distortion across both classification and sentiment analysis!
We then repeated the exercise using longer form text, for example, SEC filings, legal documentation and news articles. This time we used a lower-quality open-source translation agent to translate into multiple languages. Here the distortion was even higher. We estimate with confidence that minimum levels of distortion are >40%!
Native text analysis is far superior to analysis via translation agents, which causes distortion rates of >40%.*
*HIPSTO research available upon request to validate these claims.
A Wake-up Call
These results are a wake-up call. When scaled up across millions of data points, these are mission-critical levels of inaccuracy that severely undermine the success of any web scraping and textual analysis platform that is leveraging translation services for multi-lingual coverage.
If your service provider is using translation services, it is important to take this into account when considering the value of the output. Using services like Google Translate at scale is also going to be very expensive for the service provider, with the likely cost passed on to you in the monthly fees you pay. Are you getting value for money?
Avoiding Distortion and Maximising Accuracy
We take a different approach that involves no translation agents. We use native analysis only based on our proprietary Blind Vision engine that delivers true Natural Language Understanding (NLU) and avoids inferior statistical approaches that hold most other NLP engines back. It means you get real, meaningful and accurate data insights, and usually at a more cost-effective price.
If you want to avoid distortion in our textual analysis of 10% or more, and need unrivalled multi-language coverage, then come and talk to us.