Google unveiled the next generation of its Pathways Language Model (PaLM 2) on May 10, 2023, at Google I/O 2023. Its new large language model (LLM) boasts a lot of improvement over its predecessor (PaLM) and might finally be ready to take on its biggest rival, OpenAI's GPT-4.

But just how much improvement has Google made? Is PaLM 2 the difference maker Google hopes it will be, and more importantly, with so many similar capabilities, how is PaLM 2 different from OpenAI's GPT-4?

PaLM 2 vs. GPT-4: Performance Overview

PaLM 2 is packed with new and improved capabilities over its predecessor. One of the unique advantages PaLM 2 has over GPT-4 is that it's available in smaller sizes specific to certain applications that do not have as much onboard processing power.

All these different sizes have their own smaller models called Gecko, Otter, Bison, and Unicorn, with Gecko being the smallest, followed by Otter, Bison, and finally, Unicorn, the largest model.

Neon Pink Brain Image on a Black Background

Google also claims an improvement in reasoning capabilities over GPT-4 in WinoGrande and DROP, with the former pulling a narrow margin in ARC-C. However, there's significant improvement across the board regarding PaLM and SOTA.

PaLM 2 is also better at math, according to Google's 91-page PaLM 2 research paper [PDF]. However, the way Google and OpenAI have structured their test results makes it difficult to compare the two models directly. Google also omitted some comparisons, likely because PaLM 2 didn't perform nearly as well as GPT-4.

In MMLU, GPT-4 scored 86.4, while PaLM 2 scored 81.2. The same goes for HellaSwag, where GPT-4 scored 95.3, but PaLM 2 could only muster 86.8, and ARC-E, where GPT-4 and PaLM 2 got 96.3 and 89.7, respectively.

The largest model in the PaLM 2 family is PaLM 2-L. While we don't know its exact size, we do know that it's significantly smaller than the largest PaLM model but uses more training computing. According to Google, PaLM has 540 billion parameters, so the "significantly smaller" should put PaLM 2 anywhere between 10 billion to 300 billion parameters. Do keep in mind that these numbers are just assumptions based on what Google has said in the PaLM 2 paper.

If this number is anywhere close to 100 billion or under, PaLM 2 is most likely smaller in terms of parameters than GPT-3.5. Considering a model potentially under 100 billion can go toe to toe with GPT-4 and even beat it at some tasks is impressive. GPT-3.5 initially blew everything out of the water, including PaLM, but PaLM 2 has made quite the recovery.

Another obvious performance advantage that PaLM 2 carries over GPT-4 is its availability in different sizes. This means that different versions of the model, Gecko, for instance, can run on mobile devices, even without an internet connection, and provide onboard AI capabilities—something we're yet to see from GPT-4. This kind of on-device processing gives PaLM 2 an edge over GPT-4 when it comes to accessibility and deployment.

Differences in GPT-4 and PaLM 2 Training Data

While Google hasn't unveiled the size of PaLM 2's training dataset, the company reports in its research paper that the new LLM's training data set is significantly larger. OpenAI also took the same approach when unveiling GPT-4, making no claims about the size of the training dataset.

However, Google wanted to focus on a deeper understanding of mathematics, logic, reasoning, and science, meaning a large part of PaLM 2's training data is focused on the aforementioned topics. Google says in its paper that PaLM 2's pre-training corpus is composed of multiple sources, including web documents, books, code, mathematics, and conversational data, giving it improvements across the board, at least when compared to PaLM.

PaLM 2's conversational skills should also be on another level, considering the model has been trained in over 100 languages to give it a better contextual understanding and better translation capabilities. That said, Google claims that PaLM 2 will generate less toxic output primarily because it was trained on less data, avoiding websites that might contain hate speech or other toxic behavior. Large textual sources such as Reddit have reportedly not been included in the training set, leading to a "cleaner" output of sorts.

As far as GPT-4's training data is concerned, OpenAI has told us that it has trained the model using publicly available data and the data it licensed. GPT-4's research page states, "The data is a web-scale corpus of data including correct and incorrect solutions to math problems, weak and strong reasoning, self-contradictory and consistent statements, and representing a great variety of ideologies and ideas."

the ChatGPT logo over a photograph of an essay written on a laptop

When GPT-4 is asked a question, it can produce a wide variety of responses, not all of which might be relevant to your query. To align it with the user's intent, OpenAI fine-tuned the model's behavior using reinforcement learning with human feedback.

While we may not know the exact training data either of these models were trained on, we know that the training intent was very different. We'll have to wait and see how this difference in training intent differentiates between the two models in a real-world deployment.

That said, the sheer volume of training data used for GPT-4 means that it has an advantage when it comes to understanding the nuances of language and should theoretically generate higher-quality outputs. However, this also means that OpenAI needs to put stronger restrictions in place to keep the model from going haywire, something that Google can avoid, considering it left potentially toxic sources out of the training data.

PaLM 2 and GPT-4 Chatbots and Services

The first portal to access both the LLMs is using their respective chatbots, PaLM 2's Bard and GPT-4's ChatGPT. That said, GPT-4 is behind a paywall with ChatGPT Plus, and free users only get access to GPT-3.5. Bard, on the other hand, is free for all and available across 180 countries.

That's not to say you can't access GPT-4 for free, either. Microsoft's Bing AI Chat uses GPT-4 and is completely free, open to all, and available right next to Bing Search, Google's biggest rival in the space.

Setting an emoji wallpaper on Android

Google I/O 2023 was filled with announcements about how PaLM 2 and generative AI integration will improve the Google Workspace experience with AI features coming to Google Docs, Sheets, Slides, Gmail, and just about every service the search giant offers. In addition, Google has confirmed that PaLM 2 has already been integrated into over 25 Google products, including Android and YouTube.

In comparison, Microsoft has already brought AI features to the Microsoft Office suite of programs and many of its services. At the moment, you can experience both LLMs in their versions of similar offerings from two rival companies going head to head in the AI battle.

However, since GPT-4 came out early and has been careful to avoid many of the blunders Google made with the original Bard, it has been the de facto LLM for third-party developers, startups, and just about anyone else looking to incorporate a capable AI model in their service so far. We have a list of GPT-4 apps if you want to check them out.

A screenshot of Microsoft's Bing AI

That's not to say that developers won't be switching to or at least trying out PaLM 2, but Google still has to play catch-up with OpenAI on that front. The fact that PaLM 2 is open-source, instead of being locked behind a paid API, means it has the potential to be more widely adopted than GPT-4.

All things considered, at the moment, the PaLM 2-powered Bard appears to be the better choice when it comes to research as it is better at answering questions with relevant information and accessing the latest information on the internet about any given subject. According to Bard's latest update, delivered on September 19, 2023, Bard is now using its "most capable model yet" with support for another 40 languages, in-depth coding assistance, the ability to present different perspectives on a given topic, and general quality and accuracy improvements.

You also get the option of double-checking Bard's responses with Google searches. However, on the performance front, the model still takes longer to generate responses as compared to the GPT-4 powered ChatGPT or Bing Chat.

Can PaLM 2 Take on GPT-4?

PaLM 2 is still very new, so the answer to whether or not it can take on GPT-4 remains to be answered. However, with everything that Google is promising and the aggressive manner it has decided to use to propagate it, it does look like PaLM 2 can give GPT-4 a run for its money. With Google's ongoing development of a multimodal AI model called Gemini also in the works, it's about time for OpenAI to get on its toes.

However, GPT-4 is still quite a capable model and, as mentioned before, beats PaLM 2 in quite a few comparisons. That said, PaLM 2's multiple smaller models give it an irrefutable edge. Gecko itself is so lightweight that it can work on mobile devices, even when offline. This means that PaLM 2 can support an entirely different class of products and devices that might struggle to use GPT-4.

The AI Race Is Heating Up

With the launch of PaLM2, the race for AI dominance has heated up, as this might just be the first worthy opponent to go against GPT-4. With a newer multimodal AI model called "Gemini" also in training, Google isn't showing any signs of slowing down here.