Reinventing Automatic Text Classification

In The Cookie-less Age


Automatic text classification – taking an automated approach to meaningfully labelling a piece of text based on its subject, content type or other contextual information – is a key practice for customer experience professionals, data providers and global brands. It is used across a wide range of use cases, including the critical world of brand safety and suitability, and is set to rocket in importance as third-party tracking cookies are phased out and new contextual approaches are required to drive effective and risk-free online advertising.

In response to this ever-growing need, we have recently launched a major update to the automatic text classification capabilities built into our unique FalconV platform. FalconV is HIPSTO’s ground-breaking, end-to-end, cloud-based, artificial intelligence platform that delivers unmatched AI capabilities and enables unrivalled resource efficiency while analyzing content in over 100 languages. Automatic text classification is available as a microservice, delivered through an API.


falconV Platform link


In this long read, we’re going to explore the world of automatic text classification, why it’s so important to the future of online advertising and how HIPSTO’s approach to automatic text navigation, including multi-language coverage, excellent levels of accuracy (underpinned by Natural Language Understanding) and the advantages of a single platform approach, can support automatic text classification so it is fit-for-purpose for the cookie-less age.

Like our previous posts, we’ve written this article for a general business audience. If you’re already familiar with the brand safety and suitability market you might want to skip to the second half of the article where we explore HIPSTO’s approach in more detail.



What is automatic text classification?


Automatic text classification is the practice of applying an appropriate, pre-defined label or category to a piece of text based on an automated approach. If you have a collection of articles that you want to categorize into different industry sectors based on the subject of each piece, you may have a pre-agreed list of terms for industry sectors to use such as “Retail” or “Automotive”. An article about shopping trends might be labelled “Retail”, another about car manufacturing trends labelled “Automotive” and so on.

An automatic text classification engine will analyze text and then apply the right labels for each article. Accuracy is extremely important here; the FalconV platform leverages cutting-edge AI to intelligently analyze your content, mimic human understanding and apply the right labels to each article.



What is automatic text classification used for?


Automatic text classification is used across multiple business processes and different content collections. A news website or service that allows users to filter items by subject or present content across different themed pages might use automatic text classification to ensure the right content appears on the right page, or the correct search filters can be applied. Automatic text classification might also help to populate a database such as a CRM system
to add appropriate identifiers to text.

Automatic Text Classification diagram

But it is in the world of programmatic advertising and brand safety and suitability where automatic text classification is coming into its own.

Brand safety and brand suitability are important and linked concepts in online advertising. Brand safety is about protecting brands from placing their online advertisements adjacent to content that may damage their reputation, including illegal, pornographic, hate speech, violent and obscene content. Brand suitability is the other side of the coin, ensuring advertisements are placed alongside the right content to reach their target audience; an automotive manufacturer might place an advertisement on an article about buying cars or adjacent to content that positively enhances the reputation of the brand.

As we will see, because of the demise of third-party tracking cookies, automatic text classification is set to play an increasingly essential role in supporting brand safety and suitability, matching brands to the right content to advertise on.



The demise of tracking cookies


Third-party tracking cookies still underpin the global online advertising industry. By using cookies to track our behavior across multiple sites, relevant advertisements are displayed for each individual user.

But third-party tracking cookies as they stand are set to be effectively phased out. Extensive concerns over digital privacy mean regulators such as the EU are introducing legislation to control their usage, requiring consent from site visitors. Everyday customer experience is also reducing their influence – does anybody really want to have adverts appearing everywhere based on their browsing patterns?

Major browsers such as Safari and Firefox have already reduced their support for tracking cookies and introduced alternative measures, but the biggest step will be the withdrawal of support for tracking cookies in Chrome, the world’s most popular browser. Although Google has now delayed this until 2023, it’s only a stay of execution. Third-party tracking cookies are effectively dead in the water.

Picture of a cookie gradually disappearing

The rise of contextual advertising


The demise of third-party cookies raises the question of what will replace them to ensure online adverts are directed to relevant target audiences. The advertising industry is still searching for that answer, and some approaches are being developed, such as Google’s Privacy Sandbox project.

However, one highly likely outcome from the demise of tracking cookies is the rise (or more accurately, the comeback) of contextual advertising. This is the successful placing of advertisements on websites based on the content of a page and its relative subject or subjects, rather than having to rely on the third-party data provided by cookies. Put simply, an article about new cars is more likely to be visited by a user looking for a car. Therefore, an automotive manufacturer or retailer is clearly going to want to advertise on this page so they can reach the right audience and achieve strong brand suitability.

Some analysts argue that the extent to which contextual advertising will grow as tracking cookies will be very significant. Global Industry Analytics predict that the value of the global contextual advertising market will more than double over seven years, rising from approximately US$154.7 billion in 2020 to an estimated US$335.1 billion by 2026.

Global Contextual Market diagram


Why accurate automatic text classification is critical to contextual advertising


Successful contextual advertising that meets the safety and suitability expectations of global, fast-moving brands is heavily reliant on accurate information about the content on a page. Brands need to be certain about the subject of a webpage to ensure it is effective and safe there. The only way to do this at the required scale and speed is through automatic text classification. Accuracy is a necessity – the more accurate the labels given to a piece of text, the better the levels of brand safety and suitability, and the more confidence advertisers have.

But not all solutions that deliver automatic text classification are equal, and inaccuracy can be a real problem. Ensuring high levels of accuracy for text requires intelligent approaches that mimic human understanding. Take the word “Fiat” – does this describe a car brand or a government-issued currency? To meet the needs of global brands and deliver the best possible accuracy, state-of-the-art approaches to artificial intelligence are required.



HIPSTO’s approach to automatic text classification


Realizing that automatic text classification is going to be a strategic priority for brand safety and suitability, we’re excited about our new, state-of-the-art approach to automatic text classification, now released on the FalconV platform.



How it works


HIPSTO applies the same proprietary architecture to automatic text classification as we do to our industry-leading sentiment analysis. First, we apply the state-of-the-art mT5 Transformer, a best-in-class advanced neural network architecture that understands the context of a piece of text, before running our own unique set of algorithms that provide the appropriate label for each piece of text, effective across 100 languages. Output is then made available as an AI microservice on the FalconV platform supported by the API the team has built.

Getting to this point has required significant effort, involving a mind-bogglingly huge dataset comprised of over 20 years of publicly searched internet pages as well as multiple rounds of testing and training. The result is an exciting extension to our FalconV platform that has several strengths which we cover below.


Natural Language Understanding


HIPSTO’s propriety approach to AI using Transformers, proprietary neural network architectures and a further set of algorithms goes way beyond Natural Language Processing (NLP), and produces results based on Natural Language Understanding, mimicking how humans analyze and understand a piece of content. Automatic text classification is not straightforward, and we produce excellent levels of accuracy.


Multi-language coverage


Global brands require content labelling across multiple languages. We provide automatic text classification with consistently high rates of accuracy across over 100 languages, enabled by our proprietary architecture. Some automatic text classification services from other providers rely on a translation layer to be able to operate across multiple languages, but this significantly distorts the meaning of an item, leading to inaccurate results. We do not rely on any translation services, leading to more accurate classification.


Automatic classification to IAB Content Taxonomy 2.2 Tier 1 categories


Standardizing the classification used by brands to describe content will underpin and unlock the power of more brand safety and suitability solutions and support programmatic advertising during the demise of third-party tracking cookies. Our new automatic text classification now covers all thirty of the Tier 1 categories of the Interactive Advertising Bureau (IAB)’s content taxonomy version 2.2. Mapping to further levels as well as other relevant industry standard classifications is being considered.


Long- and short-form content coverage


Text content comes in multiple lengths and types, including articles, reports, web pages, social media posts, discussion threads, video transcripts, product reviews and more. Organizations may need to apply automatic text classification to one or more of these content types now and then add further content forms later. Our solution covers both long- and short-form content, supporting what you need today and tomorrow.


Single platform providing cost-effectiveness and scalability


Because HIPSTO’s FalconV is a single platform that can be deployed across multi-language text content, it can prove more efficient, cost-effective and scalable than other solutions, without compromising accuracy. The language and content type coverage of some other automatic text classification providers is limited, so you may have to deploy more than one solution to achieve the reach you need.

Deploying multiple solutions rapidly racks up costs due to the server capacity required and the licensing costs. The technical solution required can also get complicated and hard to manage. FalconV requires much less effort – one platform, one end-to-end solution, one API – delivering robust automatic text classification into your system.


Flexibility for custom use cases and taxonomies

Brands, data platforms and technical providers may have custom needs around automatic text classification, based on specific use cases and organization-specific taxonomies. We can work in partnership with you to train the FalconV platform to meet your custom needs. Contact us if you’d like to discuss this option.


Delivering automatic text classification


Here at HIPSTO, we’re excited about the launch of our new automatic text classification approach on the FalconV platform.

Sebastian Owen, HIPSTO’s CEO, comments: “Automatic text classification is about to get serious. The inevitable demise of tracking cookies means there will be a far greater need for high levels of accuracy for classifying text to support the global online advertising industry. Solutions that provide automatic text classification need to raise the game. The release of the advanced HIPSTO’s capabilities with precision accuracy, multi-language coverage, alignment with the IAB Content Taxonomy first-level terms and all-length content coverage all available in one single platform is delivering text classification to support digital customer experience and online advertising in the cookie-less age.”

For more information about automatic text classification and the FalconV platform, get in touch!



Read our latest Blog - Native vs. Translation Learn More >>