OpenAI has finally launched its much-anticipated GPT update, GPT-4. The Large Language Model (LLM) comes with some powerful new features and capabilities that have already shocked users worldwide.

In addition to being significantly better than GPT-3.5, the existing LLM that powers OpenAI's viral chatbot ChatGPT, GPT-4 can understand more complex inputs, has a far larger character input limit, has multimodal capabilities, and is reportedly safer to use.

1. GPT-4 Can Understand More Complex Inputs

One of GPT-4's biggest new features is its ability to understand more complex and nuanced prompts. According to OpenAI, GPT-4 "exhibits human-level performance on various professional and academic benchmarks."

This was demonstrated by putting GPT-4 through several human-level exams and standardized tests, such as the SAT, BAR, and GRE, with no specific training. Not only did GTP-4 understand and solve these tests with a relatively high score across the board, but it also beat out its predecessor, GPT-3.5, each time.

A graph comparing GPT-4's academic exam performance to GPT-3.5
Image Credit: OpenAI

The ability to understand more nuanced input prompts is also aided by the fact that GPT-4 has a much larger word limit. The new model can handle input prompts of up to 25,000 words (for context, GPT-3.5 was limited to 8,000 words). This will directly affect the detail users can squeeze into their prompts, hence giving the model much more information to work with and producing lengthier outputs.

GPT-4 also supports over 26 languages, including low-resource languages such as Latvian, Welsh, and Swahili. When benchmarked on three-shot accuracy on the MMLU benchmark, GPT-4 beat GPT-3.5 as well as other leading LLMs such as PaLM and Chinchilla in terms of English-language performance in 24 languages.

2. Multimodal Capabilities

The previous version of ChatGPT was limited to just text prompts. In contrast, one of GPT-4's newest features is its multimodal capabilities. The model can accept both text and image prompts.

This means that the AI can accept an image as input and interpret and understand it just like a text prompt. This capability spans all sizes and types of images and text, including documents combining the two, hand-drawn sketches, and even screenshots.

However, GPT-4's image reading capabilities go beyond simply interpreting them. OpenAI showcased this in its developer stream (above), where they provided GPT-4 with a hand-drawn mockup of a joke website. The model was tasked to write HTML and JavaScript code to turn the mockup into a website while replacing the jokes with actual ones.

GPT-4 wrote the code while using the layout specified in the mockup. Upon testing, the code produced a working site with, as you can guess, actual jokes. Does it mean AI advancements will mean the end of programming? Not quite, but it's still a feature that'll come in handy in assisting programmers.

As promising as this feature seems, it is still in research preview and not publicly available. Additionally, the model takes a lot of time to process visual inputs, with OpenAI itself stating that it could take work and time to get faster.

3. Greater Steerability

OpenAI also claims that GPT-4 has a high degree of steerability. It has also made it harder for the AI to break character, meaning it's less likely to fail when implemented in an app to play a certain character.

Developers can prescribe their AI's style and task by describing the direction in the "system" message. These messages allow API users to heavily customize the user experience within certain bounds. Since these messages are also the easiest way to "jailbreak" the model, they're also working on making them more secure. The demo for GPT-4 nailed home this point by getting a user to try to stop GPT-4 from being a Socratic tutor and answer their query. However, the model refused to break character.

4. Safety

OpenAI spent six months making GPT-4 safer and more aligned. The company claims that it's 82% less likely to respond to requests for inappropriate or otherwise disallowed content, 29% more likely to respond in accordance with OpenAI's policies to sensitive requests, and 40% more likely to produce factual responses as compared to GPT-3.5.

It's not perfect, and you can still expect it to "hallucinate" from time to time and can be wrong in its predictions. Sure, GPT-4 has better perceptions and prediction power, but you still shouldn't blindly trust the AI.

5. Performance Improvements

Outside of evaluating the model's performance on human exams, OpenAI also evaluated the bot on traditional benchmarks designed for machine learning models.

It claims that GPT-4 "considerably outperforms" existing LLMs and "most state-of-the-art models." These benchmarks include the aforementioned MMLU, AI2 Reasoning Challenge (ARC), WinoGrande, HumanEval, and Drop, all of which test individual capabilities.

GPT-4 performance benchmark scores

You'll find similar results when comparing performance on academic vision benchmarks. Tests run include VQAv2, TextVQA, ChartQA, AI2 Diagram (AI2D), DocVQA, Infographic VQA, TVQA, and LSMDC, all of which GPT-4 tops. However, OpenAI has stated that GPT-4's results in these tests "do not fully represent the extent of its capabilities" as researchers keep finding new and more challenging things the model can tackle.

Small Step for GPT-4, Giant Leap for AI

With more accuracy, safety of use, and advanced capabilities, GPT-4 has been released to the public via the ChatGPT+ monthly subscription plan that costs 20 per month. Additionally, OpenAI has partnered with different organizations to start building consumer-facing products with GPT-4. Microsoft Bing, Duolingo, Stripe, Be My Eyes, and Khan Academy, among others, have already implemented GPT-4 in their products.

GPT-4 may be an incremental update over GPT-3.5, but it's a huge win for AI overall. As the model gets more accessible, both to the average user and developers through its API, it seems like it'll be making a good case for LLM implementations across fields.