True data science

Deep academic research

Related Content Detection

Discovering and determining how to treat ‘similar content’ is one of the fundamental challenges in Natural Language Processing (NLP). This problem of semantic similarity becomes more complex when the automatic search, retrieval and the analysis process is focused on multilingual text content from several web sources.

Our engineering and data science team have released 2 science papers detailing how to use and extend the current leading Transformer models, then apply them to tasks focused on the semantic similarities of news content.

We are now able to find any related content in 106 languages, cross lingual, and irrespective of the input text length of the content. A global breakthrough in the field of NLP science.

Abstractive Text Summarization


The majority of text summarisation tools in the market work via a so-called ‘Extraction Approach’. In the fields of Natural Language Understanding and Natural Language Generation (NLG), the most innovative route is currently via “Abstractive Technologies’.

‘Abstractive Technology’ approaches still suffer from accuracy inefficiencies and are only available as single language tools. Our technology team has created a radical method and new neural network based off the mT5 Transformer. We have overcome theses inaccuracies and created a tool that is multilingual.

We aim to release our third science paper, on this topic, in H1 2021.

Science Papers 

Paper 1:

BERT Semantic Similarities

June. 2019

Research on the specific features of determining the semantic similarity of arbitrary – length text content using multilingual Transformer based models.

Paper 2:

Semantic Similarity of Arbitrary-length Text Content

June. 2019

Research into the possibilities of using the multilingual BERT model for determining semantic similarities of news content. 

Paper 3:


Multilingual Abstractive Text Summarization

Due Q1 of 2021