No inferior Translation Agents
Native Understanding of 100+ languages
Powerful Multilingual Capabilities
We have developed a single model for a multiple-language approach’, trained to understand over 100 languages in native text, using advanced Transformers (e.g. mT5) reinforced by our own proprietary neural network architectures.
This gives our multilingual text analysis solutions in the area of sentiment analysis, related content detection and automated classification a superior edge over any machine-translation service.
This capability is highly important for global brands that operate in multiple language territories that require absolute accuracy and consistency on semantically similar text data forms.
Why Translation Sucks
HIPSTO’S expert AI technology team have tested the accuracy of native text analysis vs. Google Translate API on short-form text (Customer Reviews: Chinese to English and Russian to English).
We’ve always known that using ‘Translation Agents’ distorts the accuracy in the translation of text data, but even we were shocked by our findings!
❌ Minimum of 10% distortion in both Classification and Sentiment Analysis, translating just into English (where Google Translate has a decent enough model) on short-form text (e.g. Customer Reviews under SKUs on E-commerce platforms).
❌ Using longer form text (e.g. SEC Filing, Legal Documentation, Publisher Content) and a lower quality open source translation agent to translate into multiple languages (e.g. Turkish into English), we estimate with confidence this margin to be in the minimum >40% range of distortion.
At the scale of millions of data points, this represents mission-critical levels of inaccuracy which brands are paying for.
Want +35% Accuracy, With -65% Cost in Your Text Data Analytics?
We have dubbed our proprietary web text data extraction and labeling technology, Blind Vision. It is now, arguably, the best in class globally and superior to the established and trusted Computer Vision and other technologies.
Test studies* have been conducted versus a US based web data integration platform that uses Computer Vision, and the results are conclusive. They showed a +35% increase in accuracy and -65% lower running costs using Blind Vision technology.
We like to remain a little secretive about the ‘sauce’, but we can say that Blind Vision combines sophisticated Raw Code Processing algorithms and our own deep learning network architecture.
Advanced Sentiment Analysis
Sentiment analysis is a very powerful tool with many commercial applications. However, it is very difficult to do well. Many claim to have a sentiment solution, but upon analysis, few in the market really do. Current solutions suffer from poor consistency, limited accuracy and lack of advanced deep learning techniques.
Humans can easily judge the polarity of text, unlike machines. We have developed an apex sentiment tool, using our proprietary neural network architecture, which enables real Natural Language Understanding (NLU) and emulates how humans judge the content and context of text.
Our solution provides consistent sentiment analysis of high or low-frequency content, in long or short format, across 100+ languages. And, we can do all of this with an impressive F1 score of 0.9443!
Named Entity Recognition
Valuable (business) information is buried in a largely text-based (79%) data explosion, most of which also resides in unstructured data on the web. The ability to extract, organize, analyze and connect large amounts of unstructured text data has become of paramount importance.
Extracting, classifying, and connecting entities via Named Entity Recognition (NER) technology plays an important role in sorting unstructured data and identifying valuable information. NER is a key foundational block for any information discovery pipeline and the basis for most Natural Language Processing (NLP) solutions.
We have built the new industry standard: Multilingual NER (in 100+ languages) that is unrivaled in accuracy vs. current ‘open source’ solutions and performs with an F1 score of 0.95.
Web Scraping may sound easy, but it’s not! We have solved the 5 most prevalent issues faced by standard web scraping methods.
One key issue involves constant website layout changes. SEO improvements and UX/UI changes are delivered through HTML layout amendments. As a result, element locators that web scrapers are configured to in order to extract data, change and break the scraping process by extracting incorrect or no data. It takes a lot of manual effort to update these configurations and maintaining thousands of sources becomes near impossible.
We have fully automated the process of source reconfiguration to present you with a truly scalable, leading-edge web scraping solution. One that operates in 100+ languages and can scrape any text data from any web source in real-time.
Automated Text Classification
We have built an industry leading, multilingual, automated text classification capability that demonstrates superior accuracy (underpinned by Natural Language Understanding), uses no language translation layer (which significantly distorts the meaning of content) and is able to proces all length content, via our single, one stop shop, platform.
These accuracy levels now mimic human understanding of any text.
BERT Semantic Similarities
Research into the possibilities of using the multilingual BERT model for determining semantic similarities of news content.
Semantic Similarity of
Arbitrary-length Text Content
Research on the specific features of determining the semantic similarity of arbitrary–length text content using multilingual Transformer based models.