What Are Adversarial Attacks in Machine Learning and How Can We Fight Them?

Technology often means our lives are more convenient and secure. At the same time, however, such advances have unlocked more sophisticated ways for cybercriminals to attack us and corrupt our security systems, making them powerless.

Artificial intelligence (AI) can be utilized by cybersecurity professionals and cybercriminals alike; similarly, machine learning (ML) systems can be used for both good and evil. This lack of moral compass has made adversarial attacks in ML a growing challenge. So what actually are adversarial attacks? What are their purpose? And how can you protect against them?

What Are Adversarial Attacks in Machine Learning?

The real ostrich looks at the fake one over the fence.

Adversarial ML or adversarial attacks are cyberattacks that aim to trick an ML model with malicious input and thus lead to lower accuracy and poor performance. So, despite its name, adversarial ML is not a type of machine learning but a variety of techniques that cybercriminals—aka adversaries—use to target ML systems.

The main objective of such attacks is usually to trick the model into handing out sensitive information, failing to detect fraudulent activities, producing incorrect predictions, or corrupting analysis-based reports. While there are several types of adversarial attacks, they frequently target deep learning-based spam detection.

You’ve probably heard about an adversary-in-the-middle attack, which is a new and more effective sophisticated phishing technique that involves the theft of private information, session cookies, and even bypassing multi-factor authentication (MFA) methods. Fortunately, you can combat these with phishing-resistant MFA technology.

Types of Adversarial Attacks

A cardboard cutout of a woman looking out the window.

The simplest way to classify types of adversarial attacks is to separate them into two main categories—targeted attacks and untargeted attacks. As is suggested, targeted attacks have a specific target (like a particular person) while untargeted ones don’t have anyone specific in mind: they can target almost anybody. Not surprisingly, untargeted attacks are less time-consuming but also less successful than their targeted counterparts.

These two types can be further subdivided into white-box and black-box adversarial attacks, where the color suggests the knowledge or the lack of knowledge of the targeted ML model. Before we dive deeper into white-box and black-box attacks, let’s take a quick look at the most common types of adversarial attacks.

Evasion: Mostly used in malware scenarios, evasion attacks attempt to evade detection by concealing the content of malware-infested and spam emails. By utilizing the trial-and-error method, the attacker manipulates the data at the time of deployment and corrupts the confidentiality of an ML model. Biometric spoofing is one of the most common examples of an evasion attack.
Data poisoning: Also known as contaminating attacks, these aim to manipulate an ML model during the training or deployment period, and decrease accuracy and performance. By introducing malicious inputs, attackers disrupt the model and make it hard for security professionals to detect the type of sample data that corrupts the ML model.
Byzantine faults: This type of attack causes the loss of a system service as a result of a Byzantine fault in systems that require consensus among all its nodes. Once one of its trusted nodes turns rogue, it can lunch a denial-of-service (DoS) attack and shut down the system preventing other nodes from communicating.
Model extraction: In an extraction attack, the adversary will probe a black-box ML system to extract its training data or—in worst-case scenarios—the model itself. Then, with a copy of an ML model in their hands, an adversary could test their malware against the antimalware/antivirus and figure out how to bypass it.
Inference attacks: Like with extraction attacks, the aim here is to make an ML model leak information about its training data. However, the adversary will then try to work out which data set was used to train the system, so they can exploit vulnerabilities or biases in it.

White-Box vs. Black-Box vs. Grey-Box Adversarial Attacks

What sets these three types of adversarial attacks apart is the amount of knowledge adversaries have about the inner workings of the ML systems they’re planning to attack. While the white-box method requires exhaustive information about the targeted ML model (including its architecture and parameters), the black-box method requires no information and can only observe its outputs.

The grey-box model, meanwhile, stands in the middle of these two extremes. According to it, adversaries can have some information about the data set or other details about the ML model but not all of it.

How Can You Defend Machine Learning Against Adversarial Attacks?

A bunch of angry-looking nutcrackers with swords.

While humans are still the critical component in strengthening cybersecurity, AI and ML have learned how to detect and prevent malicious attacks—they can increase the accuracy of detecting malicious threats, monitoring user activity, identifying suspicious content, and much more. But can they push back adversarial attacks and protect ML models?

One way we can combat cyberattacks is to train ML systems to recognize adversarial attacks ahead of time by adding examples to their training procedure.

Unlike this brute force approach, the defensive distillation method proposes we use the primary, more efficient model to figure out the critical features of a secondary, less efficient model and then improve the accuracy of the secondary with the primary one. ML models trained with defensive distillation are less sensitive to adversarial samples, which makes them less susceptible to exploitation.

We could also constantly modify the algorithms the ML models use for data classification, which could make adversarial attacks less successful.

Another notable technique is feature squeezing, which will cut back the search space available to adversaries by “squeezing out” unnecessary input features. Here, the aim is to minimize false positives and make adversarial examples detection more effective.

Protecting Machine Learning and Artificial Intelligence

Adversarial attacks have shown us that many ML models can be shattered in surprising ways. After all, adversarial machine learning is still a new research field within the realm of cybersecurity, and it comes with many complex problems for AI and ML.

While there isn’t a magical solution for protecting these models against all adversarial attacks, the future will likely bring more advanced techniques and smarter strategies for tackling this terrible adversary.