Unless quarantined somewhere without internet access for the last month, you recently came across ChatGPT: an impressive chat bot, built upon the "knowledge" of billions of books and Wikipedia, among other things. It is able to interact with humans and answer sophisticated queries in an intuitive and seemingly clever way. ChatGPT is an excellent text generator, including coding; it is not a search engine, however, as it does not look for "true" answers to questions.
Let's give to Caesar... ChatGPT is actually NOT the technological breakthrough here. Rather, it is a specific application of an earlier breakthrough (see also this).
Indeed, ChatGPT is one real-life use of a big language model: GPT-3, which has been around for over two years, in different versions.
The proper technological breakthrough dates back 2018, with the release of the "Bidirectional Encoder Representations from Transformers" (BERT) model by a Google team. BERT was the first Transformer language model. Those models learn the meaning of words from their context, from all of their surroundings (for more details, see for instance). GPT-3 belongs to this class of models, standing out mostly by its huge size.
Our language tools rely on Transformer models too, some of BERT successors, which we have fine-tuned to specific tasks. GPT-3 is not the only option out there, far from that, and super-large models like GPT-3 have their flip side, too. For instance, they can become difficult to fine-tune to highly specialized tasks and will demand more training iterations, requiring more input for the same result, than some more targeted Language Models. But, no doubt, for a chatbot, GPT-3 is outstanding.
We say this to explain that we do not need to be convinced of the power of Transformers language models (we also know their current limitations)...
... and, as builders of applications based on those models, we have learned one or two things in the process. Here are a couple of take-away for research and investment:
|