The NewsBots models

Our approach to using language models in macro



We have started to release our News Inflation Pressure Indices (NIPI, see here) which synthetise the inflation news in a given country and/or in a specific sector. The first results show an interesting correlation with actual inflation.

We plan the full release of the NIPI data to be on 25/11/2020. You can register to be notified, by subscribing to our Newsletter or following us on LinkedIn.

Most of our language models have been re-estimated recently, as we have added more sources, two languages and several new features. So we take this opportunity to refresh our NewsBots models description.

The framework

Supervised learning

We train pre-existing language models on specific tasks, mostly classification.

Specialised models beat general models

For training, we are getting generally better results with pre-trained models which have been initially elaborated on the same family of tasks, rather than huge and generalists models.

Transformers

We use almost exclusively transformers (i.e. context aware) models, the latest generation of language models, able to perform language tasks pretty much on par with humans.

What the NewsBot models do

1/ Identify the English news relevant to the near-term inflation forecast using the RoBERTa model, referenced below, which we have fine-tuned on the classification task.

2/ Do the same in a couple of other languages (French and Italian for now, more to come).

3/ Take out the news about CPI and other official inflation releases, using another transformers model classifier, multi-language this time. (we do this because we don't want a spurious or lagging indicator)

4/ Apply Named Entity Recognition to identify the location, using a combination of pre-trained transformers models and our own algorithms.

5/ Apply further transformers classification models, to detect the theme (utility, food, airfares, etc) and sign (positive, negative and neutral). We have found the best results with one model for each classification task. We occasionally use multi-language models to get more training examples.

6/ Finally, compile the NIPI which is the difference between positive and negative news normalised.

The framework is reproducible. It can be transposed to pretty much any news classification problem (or investment focus).




For reference, two of the models we use the most: RoBERTa, XLM-RoBERTa.

We are happy to discuss the model details, our results and any extension of this work, so don't hesitate to reach out.