The internet is the world's greatest surveillance tool.

Or at least that's how it often feels. We've always known that we're being watched online, but many of us thought it was just to sell us more. Post-Snowden it became clear that governments and companies around the world use every last drop of data they can find in order to surveil and profile us.

The NSA wants to know every digital move we make. Amazon and Google are installing surveillance devices in our homes. Facebook wants to profile and commodify our lives. Now there is another thing to add to the ever-expanding list. Hundreds of websites want to know everything we type, even if we don't submit it to them.

Somebody's Watching Me

Amazon, Facebook, and Google have all trained us to expect that if we search for something, it'll be magically recommended to us in an ad. Web tracking is often used in order to build up a profile of the sites we visit, what our interests are, and most importantly, how they can manipulate us into spending more. We are often distrustful of this type of tracking -- especially since the companies that build profiles of us can't be trusted with that information.

Though tracking is often done for a more mundane reason: analytics. The website developer's want to offer a useful, error free site to you. To do that they need data to show what works and what doesn't.

UX questions like "When do users click that button?" and "How long do readers spend on our site?" can be answered through analytics. Analytics firms angling for business are keen to prove their worth by how much data they can capture. In a quest to improve their data capturing prowess, the industry created Session Replay Scripts.

Session Replay Scripts

Traditional analytics works with aggregates so website owners can see how many clicks there were on a specific area of the site, for instance. However, it doesn't show how that click was made, how long it took, or what the user's behavior was before the click. Session replay scripts allow the analytics firms to dive into individual browsing sessions. Purportedly this is to improve the customer experience, but the data collected often exceeds reasonable expectations.

Session replay scripts are similar to screen recordings. The website can see everything you do from mouse movements, to the words that you type. Unfortunately, this also includes what you type but choose not to submit. Consider how often you've typed something into a search box, thought twice about it, and promptly deleted the text. Session replay scripts mean that the website would have already captured your now-deleted and never submitted text.

So, What's the Problem?

You may be wondering how you've never heard of this invasive tracking before. That would be because the firms that deploy session replays have chosen not to inform you. It's an attitude that suggests that they realize that people may not be comfortable with the level of data captured.

There is no obvious sign that a given website is using session replays -- so how do you know which are? Researchers from Princeton's Center for Information Technology Policy (CITP) analyzed the Alexa Top 1 Million websites for evidence of session recordings.

how websites record activity with session replay scripts

They found that nearly 100,000 websites (or 10 percent of the Alexa Top 1 Million) contained scripts which enable session recordings. That's not to say that every single one of those sites performs the tracking -- each site has the ability to disable the session recordings. However, the process of disabling the service is fairly complex with most analytics providers, and so it is quite possible that session replays are being recorded.

From those that had capable analytics scripts, the researchers were able to produce evidence that close to 10,000 were actively engaging in session replay recordings. Counted in that list were some big names including Microsoft, Walgreens, Intel, and the Australian government.

How to Protect Yourself

Analytics in itself isn't inherently bad. Arguably it is thanks to analytics that we have fast, responsive modern websites that work seamlessly across multiple devices. One of the major concerns with session replay scripts is that you have no awareness that you are being tracked. Imagine how unsettled you'd feel to wake up one day to discover security cameras dotted around your home. Failing to disclose their presence implies that the scripts, and the data they record, may be used for nefarious purposes.

how websites record activity with session replay scripts

This is particularly troubling for websites where you have to enter confidential information like credit card numbers and passwords, which are captured in plain text by the session replays. This further complicates matters as your confidential information is now handled by multiple companies, who may not secure it as they would other sensitive information. The companies behind the tracking would likely claim that the use of this data is covered in their privacy policy.

However, it is unreasonable and unrealistic to expect a visitor to read the website's privacy policy, find the site's analytics firm, and read their privacy policy too. Of course, being unreasonable doesn't prevent these firms from operating in a morally ambiguous manner.

So, how do you protect yourself? Sadly, in most instances you won't be able to.

Session replay scripts come in two forms: client-side and server-side. The client-side scripts can be blocked by ad-blockers and tracking prevention add-ins. Server-side scripts cannot be blocked, but are unable to perform full recordings. The most common approach is a hybrid between the two, where even blocking client-side scripts won't prevent the recordings.

Ultimately, the best protection is to be aware that session replay exists, and to be wary of what you type anywhere on the internet.

Peak Surveillance

Session replay scripts expose what we previously believed to be private information held only in our browsers. Unfortunately, it's far from the only information our browsers leak about us. The currency of the digital economy is data, providing an incentive for every company to vacuum up as much information as they can about you. Remain cautious with your data, and be sure to read the privacy policy -- as tedious as that may be. Taking precautions and maintaining good cyber hygiene are your best defences against abuse of your data.

While the prevalence of session replays is troubling, it should be put into perspective. There is currently no evidence that data has been compromised by this practice. Equally, there are legitimate reasons for using session replays that will allow website owner's to continue to make the internet easier to use -- even if their end goal is to just make you spend more money.

How do you feel about the companies that spy on your typing? Do you think the internet is a huge surveillance tool? Or do you think the fear is overblown? Let us know in the comments!