Machine learning can considerably enhance information gathering, within pretty much any investment process
There is a lot of confusion about what Machine Learning can and cannot achieve in finance. Our approach is to use ML to enhance targetted tasks in the investment workflow.
In this article, we illustrate why and how ML can be deployed within pretty much any macro financial strategy, including discretionary strategies, for at least one purpose: to gather information.
We will explore why unstructured data are so important and how ML can help gather them.
Unstructured data
The explosion of available structured data is well documented. Many investors now routinely use satellite data, geolocalisation and other app-usage data to try and figure out what’s going on in a given firm, or within a sector, or in the economy as a whole. As with many digital transformations, Covid-19 has accelerated this evolution with alternative traffic data routinely mentioned in the press, for instance.
There is still enough work in that field for many years to come, because new datasets keep coming. There are also not so easy to use.
From here, however, we view unstructured data as the real deal. Unstructured data are basically every data that does not fit a clearly defined dataset template. It can be text, audio, video, images. Anything else than a clean dataset. Think of the amount of text of various kinds and languages now available on websites, blogs and social medias: the growth is exponential and it takes some effort to keep up.
At its core, the investment process is about gathering information, analyzing it and making decision. The explosion of data definitely changes at least the first part (information gathering) and almost certainly the second one (analysis). Unstructured data have always been part of the investment process, but the scale is changing.
How Machine Learning can help
ML can make several critical contributions to the investment process, including for instance to estimate non-linear relationships between asset prices and fundamentals.
Another, often overlooked, application is process scaling up.
The point is to get a machine to replicate a specific task, for instance one of the things analysts routinely do. My days in investment firms have always started with a fair amount of reading. I have a bunch of reliable sources and, in any good day, I would have a chance to read 10, maybe 20, articles that can be relevant to my work, while having my first coffee. I have also learnt to detect a relevant article with just a glance at the headline and the source.
Now, imagine a program would be able to check tens of thousands of sources in the same way I do, but orders of magnitude more. It would read hundreds of thousands, maybe millions, of articles. From these, it would tell which ones I might find interesting, which are the most likely to be relevant to my work.
The program would have accomplished the task before breakfast time. It may refresh many times a day. Maybe it would also check news in foreign languages and translate the relevant ones for me.
These are precisely tasks ML can achieve now. It is about up-scaling what an individual does. By the way, the model doesn’t replace the analyst, it just makes him or her considerably more productive.
To analyse such data requires Natural Language Processing (NLP) models. One reason unstructured data will get a lot more traction from now on is because these NLP models have made huge progress just in the last couple of years. These models have started to perform in a way that can compare with human-beings.
The task also requires expertise. Such classification NLP model is only going to be as good (at best) as the analyst who teaches it what to search for. As always, a great model without the right purpose is not so useful.
This sums pretty much up our approach for ML in the investment process: to use these new tools in a targeted way, primarily to perform tasks where they can scale up what analysts do. If you are interested in seeing a concrete application, you can check this other article.