How Can Computer Vision Be Used to Detect Phishing Attacks?

The rise of artificial intelligence platforms like ChatGPT has seen the technology thrust into the public domain. Whether you love it, loathe it, or fear it, AI is here to stay. But AI represents more than just a clever chatbot. Behind the scenes, it is being used in many innovative ways.

One such way is the use of AI-powered computer vision (CV) as another layer of cybersecurity. Let’s take a look at how CV is helping against phishing attacks.

What Is Computer Vision?

Computer vision is similar in concept to large language models like GPT-4. Tools such as ChatGPT and Bing Chat use these huge text databases to generate human-like responses to user inputs. CV uses the same concept only with a massive repository of image data.

But CV is more complex than just having a huge database of visuals. Context is a critical factor that needs to be included in the equation.

The large language models behind AI chatbots work by using deep learning to understand factors like context. Similarly, CV uses deep learning to understand the context of images. It could be described as human vision at computer speeds.

But how does CV help detect phishing attacks?

How Computer Vision Is Being Used to Detect Phishing Attacks

Phishing attacks are one of the biggest cybersecurity tactics used by scammers. Traditional methods of detecting them are far from perfect and the threats are becoming increasingly sophisticated. CV aims to plug one of the known vulnerabilities—that of time. More specifically, the reliance on blacklists of more “traditional” methods.

The issue here is that keeping blacklists up to date is problematic. Even a few hours between a phishing website being launched and its inclusion on a blacklist is long enough for a lot of damage to be done.

CV has no reliance on blacklists, nor does it detect embedded malicious code. Instead, it uses several techniques to flag suspicious items.

Images are collected from relevant emails, web pages, or other sources that may contain threats. These are then processed using computer vision.
The image processing stage examines four main elements: logo/trademark detection, object/scene detection, text detection, and visual search.
These are checked using a process called “Risk Elements Aggregation” and the results flag suspicious items.

Let’s have a closer look at how CV finds clues in the elements it examines.

Logo/Trademark Detection

Brand spoofing is a common technique used by scammers. Computer Vision is programmed to detect logos that are commonly used by scammers, but it can also marry this info with the content and priority of the email.

For example, an email marked as urgent with the logo of a bank could be flagged as potentially fraudulent. It can also check the veracity of the logo against expected results from the CV data repository.

Object Detection

Scammers will often convert objects such as buttons or forms into graphics. This is done using a variety of graphical and code techniques designed to “muddy the waters”. Additionally, encrypted scripts can be used to perform actions such as creating forms, but only after the email or website has been rendered.

Object detection looks for visual clues after a website or email has been rendered. It can detect objects such as buttons or forms even in graphic format. Also, because it checks after the email or website has been rendered, encrypted elements are checked.

Text Detection

Similarly, text can be disguised using a range of techniques. Among the favored tactics used by scammers are:

Padding words with random letters that are removed when the page or email is rendered.
Disguising words by misspelling them. A common example is Login which can be easily disguised by switching the L for a capital I as in—Iogin. Could you tell?
Converting text to graphics.

CV can use text analysis (a bit like Optical Character Recognition but on steroids!) to detect trigger words such as password, account details, and login. Again, because it runs after rendering all the text can be captured and scanned.

Visual Search

While this is part of the CV anti-phishing toolkit, it does rely on reference data to work. Therefore, it is only as good as the data it has on record. This leaves it with the same Achilles heel as any other system that relies on a blacklist.

It works by holding a “template” of known good images (KGI) and known bad images (KBI) in the image database. This information can then be used to perform comparisons to detect anomalies.

Is Computer Vision a Standalone Phishing Protection System?

Laptop running code with a malicious actor on the screen

The short answer is "no." Currently, CV acts as an extra layer of security and is only a viable option for commercial enterprises.

However, for these enterprises, CV adds a new layer of security that can scan objects in real-time without reliance on blacklists or detecting coded threats. And in the ongoing arms race between scammers and security professionals, this can only be a good thing.

Looking ahead, the sudden and meteoric rise of AI-powered chatbots like ChatGPT shows how difficult predictions are when discussing any form of AI. But let’s have a shot at it anyway!

What Is the Future of Computer Vision as an Anti-Phishing Weapon?

While it is unlikely to have the same dramatic impact as AI-powered chatbots, CV anti-phishing is already making steady progress on a concept known as the technology adoption curve.

Not so long ago the technology was the domain of larger enterprises that had the network infrastructure and bandwidth to either run it as a cloud-based solution or as an on-premise service.

This is no longer the case.

More practical subscription services are now opening up to enterprises of any size. Equally critical in the age of cloud computing is the ability to protect any device from any location. This is now an option with many of the services.

However, if you are looking to add this to your home computer, this is not yet a realistic option. "Yet" is the critical word here. The exponential increase in sophistication and availability of AI models will almost certainly bring this functionality to the home user.

The only real question is when.

Computer Vision: Seeing Is Protecting

AI has been in the news a lot recently, and stealing the limelight are platforms like ChatGPT, Bing Chat, and Google Bard. These are disruptive technologies that, when the dust finally settles, will have radically changed how we access information and what we can do with it.

While these are undoubtedly the headline grabbers, less disruptive technologies like CV are quietly making gentle waves in the background. And anything that helps disrupt the growing blight of phishing attacks has to be a good thing.