Revolutionary Text Data Extraction

Unbeatable Accuracy, Scale and Cost

Blind Vision

 

We’re on a perpetual curve of explosive data growth.

 

By 2025, global data creation is projected to grow to more than 180 zettabytes, leaving organisations from governments, to global businesses to start-ups, scrambling to monitor, leverage and extract web content, for competitive decision-making and risk management.

Luckily, the next generation of text data extraction is here.

Blind Vision is HIPSTO’s revolutionary, proprietary data extraction and labelling technology for web pages.

It offers a game-changing, AI-driven approach to extracting meaningful data across the web, significantly improving on existing solutions – faster, cheaper and more accurate.

Blind Vision is more than a product – it’s a paradigm shift for an expanding data universe.

Diagram of HIPSTO's Blind Vision Pipeline

Skip face value;
Trust in the numbers.

Blind Vision is superior to Computer Vision.

Privately Commissioned Test vs. Computer Vision show:

Blind Vision has +35% Accuracy & - 65% Cost compared to Computer Vision

How Does Blind Vision Work?

Blind Vision: UNDERSTANDING

Existing AI extraction solutions struggle to handle the volume of data across multiple and diverse ranges of web sources and formats.

Web page layouts and code structure for even a single source change continuously –  a huge headache with lots of human maintenance efforts involved.

Using basic raw data processing alone delivers poor results. Massive datasets are required to get disappointing levels of accuracy – this simply isn’t sustainable or scalable.

Blind Vision’s understanding and processing technology understands any global web page layout – just like a human – and identifies changes to web pages automatically, reconfiguring the code structure changes by itself, cutting out any manual update.

Blind Vision: CLEANING

AI analytics can be incredibly powerful…but they are only as powerful as the quality of their input.

Text found on many web pages is often compromised by ‘content-related noise’, such as advertisements or unrelated excerpts from other sources.

Existing technologies including Raw Code Processing and Computer Vision are being used to cleanse and process text data, but the results are unreliable, costly and inefficient.

Blind Vision’s cutting-edge web page layout understanding and processing technology automatically cleans web pages and transforms ‘unclean’ unstructured text into a structured format for further analytics – at any scale.

Blind Vision: INPUT

Data can drive decision-making and empower customer insights – but only if you analyse it.

Over 75% of the current data explosion is unstructured and text-based, a major headache for global companies, who know that insights abound in unstructured text data, but don’t have hundreds of man-hours to process it.

HIPSTO’s end-to-end, reliable, (no-code) AI text data pipelines to extract and analyse actionable and valuable insights, using excellent and reliable input layers that provide efficient and accurate extraction, and clean and structured text data for further analysis, all at scale. 

Blind Vision: Applications

One of Blind Vision’s most exciting applications is in global e-commerce.

To us humans, all e-commerce platforms look the same, but not to computers.

Each platform has its own specific code structure that changes continuously.

Global brands that sell their products in many territories and on many e-commerce platforms, are forced to integrate or build crawling and scraping agencies to match the challenge of these code changes to capture data – an enormous investment of time and money.

Built into our own advanced Web Scraper, Blind Vision understands every global e-commerce platform, creating a single point and fully automated extraction capability that significantly reduces your global data extraction costs.

Want +35% Accuracy, With -65% Cost in Your Text Data Analytics?

Shine a light on your hidden text data insights, anywhere in the world, with Blind Vision.

  Blind Vision

 

We have dubbed our proprietary web text data extraction and labeling technology, Blind Vision. It is now, arguably, the best in class globally and superior to the established and trusted Computer Vision and other technologies.

Test studies* have been conducted versus a US based web data integration platform that uses Computer Vision, and the results are conclusive. They showed a +35% increase in accuracy and -65% lower running costs using Blind Vision technology.

We like to remain a little secretive about the ‘sauce’, but we can say that Blind Vision combines sophisticated Raw Code Processing algorithms and our own deep learning network architecture.

Q

  Advanced Sentiment Analysis

 

Sentiment analysis is a very powerful tool with many commercial applications. However, it is very difficult to do well. Many claim to have a sentiment solution, but upon analysis, few in the market really do. Current solutions suffer from poor consistency, limited accuracy and lack of advanced deep learning techniques.

Humans can easily judge the polarity of text, unlike machines. We have developed an apex sentiment tool, using our proprietary neural network architecture, which enables real Natural Language Understanding (NLU) and emulates how humans judge the content and context of text.

Our solution provides consistent sentiment analysis of high or low-frequency content, in long or short format, across 100+ languages. And, we can do all of this with an impressive F1 score of 0.9443!

.

Q

  Named Entity Recognition

 

Valuable (business) information is buried in a largely text-based (79%) data explosion, most of which also resides in unstructured data on the web. The ability to extract, organize, analyze and connect large amounts of unstructured text data has become of paramount importance.

Extracting, classifying, and connecting entities via Named Entity Recognition (NER) technology plays an important role in sorting unstructured data and identifying valuable information. NER is a key foundational block for any information discovery pipeline and the basis for most Natural Language Processing (NLP) solutions.

We have built the new industry standard: Multilingual NER (in 100+ languages) that is unrivaled in accuracy vs. current ‘open source’ solutions and performs with an F1 score of 0.95.

.

Q

  Web Scraping

 

Web Scraping may sound easy, but it’s not! We have solved the 5 most prevalent issues faced by standard web scraping methods.

One key issue involves constant website layout changes. SEO improvements and UX/UI changes are delivered through HTML layout amendments. As a result, element locators that web scrapers are configured to in order to extract data, change and break the scraping process by extracting incorrect or no data. It takes a lot of manual effort to update these configurations and maintaining thousands of sources becomes near impossible.

We have fully automated the process of source reconfiguration to present you with a truly scalable, leading-edge web scraping solution. One that operates in 100+ languages and can scrape any text data from any web source in real-time.

Q

  Automated Text Classification

 

We have built an industry leading, multilingual, automated text classification capability that demonstrates superior accuracy (underpinned by Natural Language Understanding), uses no language translation layer (which significantly distorts the meaning of content) and is able to proces all length content, via our single, one stop shop, platform. 

These accuracy levels now mimic human understanding of any text.

Q

  BERT Semantic Similarities

 

Research into the possibilities of using the multilingual BERT model for determining semantic similarities of news content.

Q

  Semantic Similarity of
Arbitrary-length Text Content

Research on the specific features of determining the semantic similarity of arbitrary–length text content using multilingual Transformer based models.

Q
Q
Q
Q
Q
Q
Q
Q
Q

Read our latest Blog - Native vs. Translation Learn More >>