Google's ReCAPTCHAs Also Capture Your Private Information

CAPTCHAs are great for security, but terrible for your privacy.

Interesting fact: you actually rarely encounter an original CAPTCHA. They've largely been supplanted by reCAPTCHAs, a system owned by search engine giant, Google. And in an effort to stop spambots, reCAPTCHAs have evolved so much, they're now a threat to your privacy.

What Are reCAPTCHAs?

A Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) was a useful tool in stopping bots from spamming forms. Traditional CAPTCHAs skewed text in an effort to make it unreadable for malicious software. Humans could decipher it, however, so these acted as filters.

Spambots got smarter. CAPTCHAs had to evolve. They changed into reCAPTCHAs, developed by the same team who originally came up with the tests. Google acquired the project in 2009. This raised some eyebrows because many users are doubtful of its intentions.

ReCAPTCHAs were nonetheless used for great effect. They advanced machine learning. Instead of random letters, a reCAPTCHA comprised of words Google's bots couldn't decipher. Knowledge gained from these upgraded tests meant automated digitization of much classic literature for its Google Books service.

Then algorithms overtook humans. ReCAPTCHAs became redundant. Which is why Google introduced NoCAPTCHA reCAPTCHA.

What Is NoCAPTCHA reCAPTCHA?

Have you ever clicked on the "I'm not a robot" button and been approved without having to insert any additional information? That's because the site uses reCAPTCHA v2 or later.

With its second version, Google introduced verification based on other metrics---namely, if the user's other activities on the site are indicative of a human or bot. CAPTCHAs are only presented if that fails.

Then along came reCAPTCHAs v3. This update eliminated the "I'm not a robot" checkbox. It's also designed to streamline the process, so the user experience is a much nicer one.

This uses the same foundation as version 2, in that it assesses activity across the site. It goes further than that, though, by digging deeper into your online movements.

It further cuts the risk of CAPTCHA farmers (people employed to break traditional CAPTCHAs) making it through the security measure. With v3, their task would essentially be to guess how normal users interact with the site; but due to v3's wider scope, a more comprehensive online profile must be in place too.

4.5 million sites use reCAPTCHA already (including a quarter of the top-ranking 10,000). More than 650,000 of these have reCAPTCHA v3 installed.

You'll find different versions of reCAPTCHAs across the internet because website admins could still use outdated plug-ins. ReCAPTCHAs v1 (basic CAPTCHAs) are bad for a site's security, but better for your privacy.

Why reCAPTCHA Is Bad for Your Privacy

How does reCAPTCHA v3 work and why is that a negative thing for your privacy?

One of the ways v3 checks validity is through examining whether you already have a Google cookie installed on your browser. Cookies are stored data about your interactions with a site, generally so elements can load again faster. Sign into a Google account, and reCAPTCHAs like you already.

The rationale is sound: anyone with a Google account is more likely to be a real human, not a bot.

Admins are encouraged to embed the reCAPTCHA code on all web pages (protected through changing encryption keys), so the service can more accurately gauge typical activity. Which raises questions about the data collected and what Google does with it.

Based on these, reCAPTCHA assigns visitors a score, marking them low or high risk. 1.0 means you're definitely human. 0.0 means you're almost certainly a spambot. Generally, low-risk users won't need to go through any further validation.

It also means anyone using a Virtual Private Network (VPN) is automatically flagged as a high risk. And yet many---including MakeUseOf---recommend you use a VPN to enhance your online privacy. Activity data isn't captured because visitors are anonymized. VPNs beat region locks and censorship. They can save you cash. And they're a barrier against cybercriminals.

In fact, there are loads of reasons you should use a VPN. It's a considerable setback for reCAPTCHA to penalize those who use one. It's not a major shock, however: Google relies on information about its users for revenue.

What Does Google Do With Personal Information?

How does Google use the data collected?

The service gathers software and hardware information about site visitors, like IP address, browser plug-ins, and the device you're using.

Google assures users that anything collected through the reCAPTCHA API isn't used to ascertain your interests. It's not used for ads---which might surprise you. The company says:

"The information collected in connection with your use of the service will be used for improving reCAPTCHA and for general security purposes. It will not be used for personalized advertising by Google."

Of course, Google isn't the only company that tracks you. Look at social media plug-ins, used to share articles on Facebook, Twitter, and the like. Some of these widgets collect visitor information too---meaning it doesn't matter if you've got a profile: Facebook can still track you.

There's nothing about reCAPTCHA v3 in Google's Terms of Service. This is despite reCAPTCHAs linking to these policies. It means we just have to take their word for it.

What Does the Future Hold for CAPTCHA?

The core issue, aside from privacy concerns, is that even reCAPTCHA v3 isn't good enough. A research team found that artificial intelligence still had a 90 percent success rate.

There's added pressure now because we're aware of potential privacy violations.

The problem is that human diversity means finding common solutions is difficult. Image-based CAPTCHAs typically ask you to look for road signs, but a trial tested whether deciphering facial expressions could also work. As you can imagine, it didn't.

Game-based tests seem a good option. These could be simply moving puzzle pieces into the right slots and would require rotating elements. Without instructions, bots could struggle with making such connections. However, the system would rely on human logic---which isn't exactly reliable.

Amazon patented an interesting, if seemingly flawed, notion in 2017. It posited that human fallibility is the key. The "Turing Test via Failure" would present challenges most people would find too difficult to complete, especially in a short timeframe. Humans do it wrong and get verified. Bots always give the correct answers (or that's the theory).

Increasing Google's Monopoly?

Luis von Ahn, CAPTCHAs' co-creator who worked with a team at Carnegie Mellon University, argues reCAPTCHAs' acquisition by Google is fair because many already assumed the internet behemoth owned the service. Version 3 makes it clear that reCAPTCHAs favor Google users. Is this another way the company is getting a stranglehold on the internet?

Or are its intentions true?

Either way, if you feel at odds with Google, you could switch browser to a more private option. Nonetheless, in our assessment of mainstream browser security, Chrome came out on top…