Are AI Voice Generators the Next Big Security Threat?

Artificial Intelligence (AI) is a potent technology that promises to transform our lives. Never has that been as clear as today, when powerful tools are available to anyone with an internet connection.

This includes AI voice generators, advanced software capable of mimicking human speech so competently that it can be impossible to distinguish between the two. What does this mean for cybersecurity?

How Do AI Voice Generators Work?

Speech synthesis, the process of producing human speech artificially, has been around for decades. And like all technology, it has undergone profound changes over the years.

Those that have used Windows 2000 and XP might remember Microsoft Sam, the operating system's default text-to-speech male voice. Microsoft Sam got the job done, but the sounds it produced were robotic, stiff, and unnatural. The tools we have at our disposal today are considerably more advanced, largely thanks to deep learning.

Deep learning is a method of machine learning that is based on artificial neural networks. Because of these neural networks, modern AI is capable of processing data almost like the neurons in human brain interpret information. That is to say, the more human-like AI becomes, the better it is at emulating human behavior.

That, in a nutshell, is how modern AI voice generators work. The more speech data they are exposed to, the more adept they become at emulating human speech. Due to relatively recent advancements in this technology, state of the art text-to-speech software can essentially replicate the sounds it is fed.

How Threat Actors Use AI Voice Generators

Unsurprisingly, this technology is being abused by threat actors. And not just cybercriminals in the typical sense of the word, but also by disinformation agents, scammers, black hat marketers, and trolls.

The moment ElevenLabs released a beta version of its text-to-speech software in January 2023, far-right trolls on the message board 4chan began abusing it. Using the advanced AI, they reproduced the voices of individuals like David Attenborough and Emma Watson, making it seem as though the celebrities were going on vile, hateful tirades.

As Vice reported at the time, ElevenLabs conceded that people were misusing its software, in particular voice cloning. This feature allows anyone to "clone" another person's voice; all you need to do is upload a one-minute recording, and let the AI do the rest. Presumably, the longer a recording is, the better the output.

In March 2023, a viral TikTok video caught the attention of The New York Times. In the video, famous podcaster Joe Rogan and Dr. Andrew Huberman, a frequent guest on The Joe Rogan Experience, were heard discussing a "libido-boosting" caffeine drink. The video made it appear as though both Rogan and Huberman were unequivocally endorsing the product. In reality, their voices were cloned using AI.

Around the same time, the Santa Clara, California-based Silicon Valley Bank collapsed due to risk management mistakes and other issues, and was taken over by the state government. This was the largest bank failure in the United States since the 2008 Financial Crisis, so it sent shock-waves across global markets.

What contributed to the panic was a fake audio recording of US President Joe Biden. In the recording, Biden was apparently heard warning of an imminent "collapse," and directing his administration to "use the full force of the media to calm the public." Fact-checkers like PolitiFact were quick to debunk the clip, but it's likely millions had heard it by that point.

If AI voice generators can be used to impersonate celebrities, they can also be used to impersonate regular people, and that's exactly what cybercriminals have been doing. According to ZDNet, thousands of Americans fall for scams known as vishing, or voice phishing every year. One elderly couple made national headlines in 2023 when they received a phone call from their "grandson," who claimed to be in prison and asked for money.

If you've ever uploaded a YouTube video (or appeared in one), participated in a large group call with people you don't know, or uploaded your voice to the internet in some capacity, you or your loved ones could theoretically be in danger. What would stop a scammer from uploading your voice to an AI generator, cloning it, and contacting your family?

AI Voice Generators Are Disrupting the Cybersecurity Landscape

Digital illustration of an AI robot with sound waves

It doesn't take a cybersecurity expert to recognize how dangerous AI can be in the wrong hands. And while it is true that the same can be said for all technology, AI is a unique threat for several reasons.

For one, it is relatively new, which means we don't really know what to expect from it. Modern AI tools allow cybercriminals to scale and automate their operations in an unprecedented way, while taking advantage of the public's relative ignorance as it pertains to this matter. Also, generative AI enables threat actors with little knowledge and skill to create malicious code, build scam sites, spread spam, write phishing emails, generate realistic images, and produce endless hours of fake audio and video content.

Crucially, this works both ways: AI is also used to protect systems, and likely will be for decades to come. It wouldn't be unreasonable to assume that what awaits us is a sort of AI arms race between cybercriminals and the cybersecurity industry, being that these tools' defensive and offensive capacities are inherently equal.

For the average person, the advent of widespread generative AI calls for a radical rethinking of security practices. As exciting and useful as AI might be, it can at the very least blur the line between what is real and what isn't, and at worst exacerbate existing security issues and create new space for threat actors to maneuver in.

Voice Generators Show the Destructive Potential of AI

As soon as ChatGPT hit the market, talks of regulating AI ramped up. Any attempt at constraining this technology would probably require international cooperation to a degree we haven't seen in decades, which makes it unlikely.

The genie is out of the bottle, and the best we can do is get used to it. That, and hope the cybersecurity sector adjusts accordingly.