Large language models (LLMs) come in all shapes and sizes, and will assist you in any way you see fit. But which is best? We put the dominant AIs from Alphabet, OpenAI, and Meta to the test.

What You Need to Know About AI Chatbots

AI robot using a virtual touch screen

Artificial general intelligence has been a goal of computer scientists for decades, and AI has served as a mainstay for science fiction writers and moviemakers for even longer.

AGI exhibits intelligence similar to human cognitive capabilities, and the Turing Test—a test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human—remained almost unchallenged in the seven decades since it was first laid out.

The recent convergence of extremely large-scale computing, vast quantities of money, and the astounding volume of information freely available on the open internet allowed tech giants to train models which can predict the next word section—or token—in a sequence of tokens.

At the time of writing, both Google's Bard and OpenAI's ChatGPT are available for you to use and test through their web interfaces.

Meta's language model, LLaMa, is not available on the web, but you can easily download and run LLaMa on your own hardware and use it through a command line or run Dalai on your own machine—one of several apps with a user-friendly interface.

For the purposes of the test, we'll be running Stanford University's Alpaca 7B model—an adaptation of LLaMa—and pitching it against Bard and ChatGPT.

The following comparisons and tests are not meant to be exhaustive but rather give you an indication of key points and capabilities.

Which Is the Easiest Large Language Model to Use?

Both Bard and ChatGPT require an account to use the service. Both Google and OpenAI accounts are easy and free to create, and you can immediately start asking questions.

However, to run LLaMa locally, you will need to have some specialized knowledge or the ability to follow a tutorial. You'll also need a significant amount of storage space.

Which Is the Most Private Large Language Model?

Lego police surrounding a Lego person sitting at a desk.

Both Bard and ChatGPT have extensive privacy policies, and Google repeatedly stresses in its documents that you should "not include information that can be used to identify you or others in your Bard conversations."

By default, Google collects your conversations and your general location based on your IP address, your feedback, and usage information. This information is stored in your Google account for up to 18 months. Although you can pause saving your Bard activity, you should be aware that "to help with quality and improve our products, human reviewers read, annotate, and process your Bard conversations."

Use of Bard is also subject to the standard Google Privacy Policy.

OpenAI's Privacy policy is broadly similar and collects IP address and usage data. In contrast with Google's time-limited retention, OpenAI will "retain your Personal Information for only as long as we need in order to provide our Service to you, or for other legitimate business purposes such as resolving disputes, safety and security reasons, or complying with our legal obligations."

In contrast, a local model on your own machine doesn't require an account or share user data with anyone.

Which LLM Has the Best General Knowledge?

man at the base of a flagpole flying the flag of nepal

In order to test which LLM has the best general knowledge, we asked three questions.

The first question, "Which national flag has five sides?" was only correctly answered by Bard, which identified the national flag of Nepal as having five sides.

ChatGPT confidently claimed that "There is no national flag that has five sides. National flags are typically rectangular or square in shape, characterized by their distinct colors, patterns, and symbols".

Our local model came close, stating that "The Indian National Flag has five sides and was designed in 1916 to represent India's independence movement." While this flag did exist and did have five sides, it was the flag of the Indian Home Rule Movement—not a national flag.

None of our models could respond that the correct term for a pea-shaped object is "pisiform," with ChatGPT going so far as to suggest that peas have a "three-dimensional geometric shape that is perfectly round and symmetrical."

All three chatbots correctly identified Franco Malerba as an Italian astronaut and member of the European Parliament, with Bard giving an answer worded identically to a section of Malerba's Wikipedia entry.

Which LLM Is Good for Technical Instructions?

burnt socket for a  BS 1363 type plug

When you have technical problems, you might be tempted to turn to a chatbot for help. While technology marches on, some things remain the same. The BS 1363 electrical plug has been in use in Britain, Ireland, and many other countries since 1947. We asked the language models how to correctly wire it up.

Cables attaching to the plug have a live wire (brown), an earth wire (yellow/green), and a neutral wire (blue). These must be attached to the correct terminals within the plug housing.

Our Dalai implementation correctly identified the plug as "English-style," then veered off-course and instead gave instructions for the older round-pin BS 546 plug together with older wiring colors.

ChatGPT was slightly more helpful. It correctly labeled the wiring colors and gave a materials list and a set of eight instructions. ChatGPT also suggested putting the brown wire into the terminal labeled "L," the blue wire into the "N" terminal, and the yellow wire into "E." This would be correct if BS1363 terminals were labeled, but they aren't.

Bard identified the correct colors for the wires and instructed us to connect them to Live, Neutral, and Earth terminals. It gave no instructions on how to identify these.

In our opinion. none of the chatbots gave instructions sufficient to help someone correctly wire a BS 1363 electrical plug. A concise and correct response would be, "Blue on the left, brown on the right."

Which LLM Is Good for Writing Code?

MicroPython logo with snake sitting on a microchip

Python is a useful programming language that runs on most modern platforms. We instructed our models to use Python and "Build a basic calculator program that can perform arithmetic operations like addition, subtraction, multiplication, and division. It should take user input and display the result." This is one of the best programming projects for beginners.

While both Bard and ChatGPT instantly returned usable and thoroughly commented code, which we were able to test and verify, none of the code from our local model would run.

Which LLM Tells the Best Jokes?

Yellow ball with laughing expression painted on

Humor is one of the fundamentals of being human and surely one of the best ways of telling man and machine apart. To each of our models, we gave the simple prompt: "Create an original and funny joke."

Fortunately for comedians everywhere and the human race at large, none of the models were capable of generating an original joke.

Bard rolled out the classic, "Why did the scarecrow win an award? He was outstanding in his field".

Both our local implementation and ChatGPT offered the groan-worthy, "Why don't scientists trust atoms? Because they make up everything!"

A derivative but original joke would be, "How are Large Language Models like atoms? They both make things up!"

You read it here first, folks.

No Chatbot Is Perfect

We found that while all three large language models have their advantages and disadvantages, none of them can replace the real expertise of a human being with specialized knowledge.

While both Bard and ChatGPT gave better responses to our coding question and are very easy to use, running a large language model locally means you don't need to be concerned about privacy or censorship.

If you'd like to create great AI art without worrying that somebody's looking over your shoulder, it's easy to run an art AI model on your local machine, too.