Klinton Bicknell was let into one of the technology world’s great secrets last September. The head of AI at the language learning app Duolingo, was given rare access to GPT-4, a new artificial intelligence model created by Microsoft-backed OpenAI.
He soon discovered the new AI system was even more advanced than OpenAI’s earlier version used to power the hit ChatGPT chatbot that provides realistic answers in response to text prompts.
Within six months, Bicknell’s team had used GPT-4 to create a sophisticated chatbot of their own that human users could talk with, in order to practice conversational French, Spanish and English as if they were in real-world settings like airports or cafés.
“It was amazing how the model had such detailed and specialised knowledge of how languages work and of the correspondences between different languages,” said Bicknell. “With GPT-3, which we had already been using, this just would not be a viable feature.”
Duolingo is one of a handful of companies, including Morgan Stanley Wealth Management and online education group Khan Academy, given prior access to GPT-4, before it was launched more widely this week.
The release reveals how OpenAI has transformed from a research-focused group into a company worth almost $30bn, racing giants such as Google in efforts to commercialise AI technologies.
OpenAI announced that GPT-4 showed “human-level” performance on a range of standardised tests such as the US Bar exam and the SAT school tests, and showed off how its partners were using the AI software to create new products and services.
But for the first time, OpenAI did not reveal any details about the technical aspects of GPT-4, such as what data it was trained on or the hardware and computing capacity used to deploy it, because of both the “competitive landscape and the safety implications”.
This represents a shift since OpenAI was created as a non-profit in 2015, in part, the brainchild of some of the tech world’s most radical thinkers, including Elon Musk and Peter Thiel. It was built on the principles of making AI accessible to everybody through scientific publications, and developing the technology safely.
A pivot in 2019 turned it into a profitmaking enterprise with a $1bn investment from Microsoft. That was followed this year by a further multibillion-dollar funding from the tech giant, with OpenAI quickly becoming a crucial part of Microsoft’s bet that AI systems will transform its business model and products.
This transformation led Musk, who left OpenAI’s board in 2018, to tweet this week that he was “still confused as to how a non-profit to which I donated ~$100mn somehow became a $30bn market cap for-profit. If this is legal, why doesn’t everyone do it?”
OpenAI’s lack of transparency regarding the technical details of GPT-4 has drawn criticism from others within the AI community.
“It’s so opaque, they’re saying ‘trust us, we’ve done the right thing’,” said Alex Hanna, director of research at the Distributed AI Research Institute (DAIR) and a former member of Google’s Ethical AI team. “They’re cherry-picking these tasks, because there is no scientifically agreed-upon set of benchmarks.”
GPT-4, which can be accessed through the $20 paid version of ChatGPT, has shown rapid improvement to earlier AI models on certain tasks. For instance, GPT-4 scored in the 90th percentile on the Uniform Bar Exam taken by would-be lawyers in the US. ChatGPT only reached the 10th percentile.
While OpenAI did not provide details, AI experts believe the size of the model is larger than previous generations and that it has had a lot more human training to fine tune it.
The most obvious new feature is that GPT-4 can accept input in both text and image form — although it only responds using text. This means users can upload a photo to ask the model to describe the picture in painstaking detail, request ideas for a meal made with ingredients present in the image, or ask it to explain the joke behind a visual meme.
GPT-4 is also able to generate and ingest far bigger volumes of text, compared to other models of its type: users can feed in up to 25,000 words compared with 3,000 words into ChatGPT. This means it can handle detailed financial documentation, literary works or technical manuals.
Its more advanced reasoning and parsing abilities mean it is far more proficient at analysing complex legal contracts for risks, said Winston Weinberg, co-founder of Harvey, an AI chatbot that was built using GPT-4 and is used by PwC and magic circle law firm Allen & Overy.
Despite these advances, OpenAI has warned of several risks and limitations of GPT-4. This includes its ability to provide detailed information on how to conduct illegal activities — including developing biological weapons, and generating hateful and discriminatory speech.
OpenAI put GPT-4 through a safety testing process known as red-teaming, where more than 50 external experts in disciplines ranging from medicinal chemistry to nuclear physics and misinformation were asked to try to break the model.
Paul Röttger, an AI researcher at the Oxford Internet Institute who focuses on the identification of toxic content online, was contracted by OpenAI for six months to try to elicit harmful responses from GPT-4 and provide feedback, on topics ranging from suicide or self-harm content, to graphic descriptions of violence or examples of extremism and hate speech.
He said that overall the model improved its responses over the months of testing, where it would initially hedge its answers but later become more unequivocal in its responses to bad prompts.
“On one hand, safety research has progressed since GPT-3, and there’s a lot of good ideas that went into making this model safer,” he said. “But at the same time, this model is so much more powerful and can do a lot more things than GPT-3, so the risk surface has gotten a lot bigger too.”