OpenAI has released GPT-4, its latest artificial intelligence model that it claims exhibits “human-level performance” on several academic and professional benchmarks such as the US bar exam, advanced placement tests and the SAT school exams.
The new model, which can be accessed via the $20 paid version of ChatGPT, is multimodal, which means it can accept input in both text and image form. It can then parse and respond to these queries using text.
OpenAI said it has embedded its new software into a variety of apps including language-learning app Duolingo, which is using it to build conversational language bots; education company Khan Academy, which has designed an online tutor; and Morgan Stanley Wealth Management, which is testing an internal chatbot using GPT-4 to retrieve and synthesise information for its employees.
The model’s ability to accept images and text as input means it can now generate detailed descriptions and answer questions based on the contents of a photograph. The company said it has teamed up with Danish start-up Be My Eyes — which connects people with visual impairments to human volunteers — to build a GPT-4-based virtual volunteer that can guide or help those who are blind or partially sighted.
GPT-4’s predecessor, GPT-3.5, captured the imaginations of millions of people late last year who experimented with the question-and-answer chatbot ChatGPT.
According to OpenAI, GPT-4 is its “most advanced system yet”. It claims it is more reliable and able to handle nuanced queries far better than its predecessor. For instance, GPT-4 scored in the 90th percentile on the Uniform Bar Exam taken by would-be lawyers in the US compared to ChatGPT, which only reached the 10th percentile.
The company noted some problems, however: “Despite its capabilities, GPT-4 has similar limitations to earlier GPT models: it is not fully reliable (eg can suffer from ‘hallucinations’), has a limited context window, and does not learn from experience.”
“Care should be taken when using the outputs of GPT-4, particularly in contexts where reliability is important,” the company added.
Earlier this year, Microsoft confirmed a “multibillion-dollar investment” in OpenAI over several years, placing a bet on the future of generative AI — software that can respond to complex human queries in natural-sounding language. GPT-4 will underpin Microsoft’s Bing chatbot, which had a limited release earlier this year. Microsoft is also expected to announce its integration into its consumer products in coming days.
Meanwhile, Google has opened up its own conversational chatbot, Bard, to a limited pool of testers and announced that it will allow customers of Google Cloud to access its large language model PaLM for the first time to build applications.
OpenAI, which had published some details of previous models such as GPT-3, said it would not reveal any details about the technical aspects of GPT-4, including the architecture of the model, what data it was trained on or the hardware and computing capacity used to deploy it, because of competitive and safety concerns.
To test out the harms of the technology, the company put GPT-4 through stress tests and set out the risks it foresees around bias, disinformation, privacy and cyber security. It revealed GPT-4 can “generate potentially harmful content, such as advice on planning attacks or hate speech. It can represent various biases and world views . . . it can also generate code that is compromised or vulnerable.” OpenAI said it can provide detailed information on how to conduct illegal activities, including developing biological weapons.
The company said it also worked with an external organisation to test whether GPT-4 was capable of carrying out autonomous actions without human input and concluded that it was “probably” not yet capable of this.
Additional reporting from Richard Waters