10 Real Examples of When Data Harvesting Exposed Your Personal Info

The opposing relationship between user privacy and company advertising sales continues to make headlines as Facebook and other companies are brought to account for their use of personal data.

But this is not a recent trend, as companies have repeatedly compromised personal data over the years in the push for growth and revenue. Here are some noteworthy occasions of when data harvesting put your personal information at risk.

LocalBlox is a social media data aggregator that has recently made headlines for all the wrong reasons. This is after cybersecurity firm UpGuard discovered that LocalBlox had left the data of 48 million accounts exposed.

"The UpGuard Cyber Risk Team can now confirm that a cloud storage repository containing information belonging to LocalBlox, a personal and business data search service, was left publicly accessible, exposing 48 million records of detailed personal information on tens of millions of individuals, gathered and scraped from multiple sources," UpGuard stated in a report.

But the leaked data didn't only include social media details. It included much more personal information. Exposed data included real names, physical addresses, birthdays and more. In addition to this, user data also included details from a variety of networks such as LinkedIn, Facebook and Twitter.

As UpGuard points out, criminals use this data for a multitude of scams, from socially engineered phishing attempts and attacks, social manipulation, and identity theft.

2. Data on 198 Million US Voters Exposed

In 2017, a data firm hired by the Republican National Committee (RNC) compromised the data of almost all of America's 200 million registered voters.

The company, Deep Root Analytics, provided profiles on American voters based on their personal information. To gather this data, it worked with two other firms: Targetpoint Consulting and DataTrust.

This resulted in an expansive collection of information. The leaked information included names, addresses, and phone numbers. The data also had models which predicted a person's ethnicity and religion.

The leak resulted in a class action lawsuit against Deep Root Analytics. After all, the firm exposed over 1.1 terabytes of data their unsecured cloud server.

3. Spammer Leaks Data From 1.4 Billion Users

While data aggregation companies are legal, not all data harvesters work within the confines of the law. This was the case for City River Media (CRM), a huge illegal spamming operation which accidentally leaked the data of over a billion users.

The leak compromised 1.4 billion email accounts combined with real names, user IP addresses, and sometimes physical addresses.

How did this happen? According to investigators from MacKeeper Security Research Center, CSOOnline, and Spamhaus; improperly configured Rsync backups left the data vulnerable.

The only good to come of this is that CRM, which had been posing as a legitimate marketing company, was exposed as a spamming operation which sent over a billion automated emails every day. MacKeeper's Chris Vickery was able to access CRM's Hipchat logs, domain registration records, accounting details, infrastructure planning, production notes, scripts, and business affiliations. He then handed these details over to authorities.

However new companies like this pop up every day, so you should take precautions to protect your email address from spammers.

4. Grindr Shares HIV Statuses of Its Users

Not all leaked personal data is the result of a security flaw or misconfiguration. As we saw with Facebook and Cambridge Analytica, sometimes services and apps harvest data from social users and then give them to a third party.

The handover of data by Aleksandr Kogan to Cambridge Analytica breached Facebook's terms of services. But the sheer volume of harvested data was gathered within the confines of Facebook's policies and API at the time.

Similarly, when users discovered that third parties had access to Grindr users' HIV status, they also found out that this was business-as-usual for the LGBT dating network. The data also included a user's GPS location, phone ID, and email address.

Users considered this a gross violation of their privacy. The company shared particularly sensitive and usually confidential medical information with two other companies: Apptimize and Localytics.

Grindr assured users that they did not sell or leak the data. Rather they shared the data to help with app optimization. Regardless, the company later announced that it would no longer share users' HIV status with third parties.

Security experts pointed out sharing such sensitive information with third parties increases the likelihood of a leak or breach. Luckily there are tools available online to help you check if your online accounts were hacked or compromised.

5. Leaked Records Exceed Country's Population

South Africa's largest data leak was so encompassing that the number of personal records leaked exceeds the country's entire population. Not only did the leak include the personal information of the majority of people in the country, but also dead people. Data even included the identification (ID) numbers of over 12 million minors.

In total, the data exposed 60 million unique ID numbers, along with personal information such as contact details, full names and more. The leak was particularly severe as a South Africa citizen's ID number can be used to glean personal information about them such as birthdays, gender and age. Criminals often use these numbers to steal identities or commit fraud.

So how did this data end up exposed? A database backup by the name of masterdeeds.sql was found on a public-facing, unsecured server. Cybersecurity expert and founder of HaveIBeenPwned.com Troy Hunt was tipped off about the data, which was exposed for at least seven months.

A company named Dracore aggregated the data and created the database. But one of their clients, Jigsaw Holdings, exposed the data with an unsecured server.

6. Data Company's Files Shared on Twitter

Modern Business Solutions, a US-based data management company, found itself on the wrong side of public opinion in 2016. Its lax security resulted in the exposure of 58 million consumer records.

A hacker was able to access and share the information of millions of people all thanks to an unsecured MongoDB database. The hacker downloaded the database, uploaded it on to a public site and then shared the links on Twitter. Misconfigured MongoDB databases are one of the many ways that hackers steal information from unsuspecting people.

In this instance, the exposed data included names, dates of birth, email and postal addresses, job titles, phone numbers, vehicle data, and IP addresses.

7. Millions of Identities Stolen From Data Brokers

Privacy concerns posed by data harvesting companies have existed for some time. Even in 2013, the dangers of data harvesting came to the fore when it was discovered that hackers had accessed several major data brokers' servers. This access allowed them to steal the information of millions of Americans.

Hackers accessed much of this data through misconfigured servers, security flaws, and unsecured databases and uploaded it to a site named SSNDOB. SSNDOB itself was also a data aggregator that sold stolen information.

The stolen data included social security numbers, credit records, background checks, birthdays, addresses and other personal data. When hacktivist teens breached SSNDOB, they discovered just how extensive the records were. Even the addresses and personal information of celebrities such as Kanye West, Jay Z, and Beyonce; as well as prominent figures such as then-First Lady Michelle Obama had been accessible.

SSNDOB's botnet accessed the servers of major data brokers such as LexisNexis Inc, Dun & Bradstreet, and Kroll Background America Inc. The FBI eventually launched an investigation into the matter.

8. Alteryx Leaks Data on 123 Million US Households

In 2017, UpGuard discovered that data analytics company Alteryx had exposed the data of 123 million American households through an unsecured data repository.

The publicly accessible information was particularly sensitive, as one of Alteryx's partners is the consumer credit reporting agency Experian. The repository included home addresses, contact details, mortgage details, financial histories, and purchase history. Anyone with an Amazon Web Services account could access this information.

UpGuard described the data as "a remarkably invasive glimpse into the lives of American consumers". Luckily the data is no longer publicly accessible, but as with most of these leaks, it's uncertain how many people stumbled across and downloaded the sensitive information.

The leak also reminded consumers just how much personal data companies collect. Even simple internet browsing results in websites harvesting personal information about you.

9. Another Facebook Quiz Results in Leaked User Data

Facebook users are still reeling from the Cambridge Analytica scandal. But it seems that Cambridge Analytica wasn't alone in using Facebook quizzes for data harvesting.

According to New Scientist, researchers at the University of Cambridge created a quiz called myPersonality. The quiz harvested data on participants, which researchers uploaded to an online database. Hundreds of researchers from other institutions could access this data for research purposes.

However, insufficient security measures exposed this data for four years. While only a registered collaborator login could access the data, an exposed working set of credentials compromised any security.

"For the last four years, a working username and password has been available online that could be found from a single web search. Anyone who wanted access to the data set could have found the key to download it in less than a minute," New Scientist said.

The data included personal information of around 3 million Facebook users and their results from psychological tests.

10. Database Exposes 33 Million Employees

In 2017, the public discovered that a Dun & Bradstreet database on US government and corporate employees had been leaked. This exposed over 33 million records, which included details such as names, job positions and functions, salaries, contact details, and email addresses.

If Dun & Bradstreet sounds familiar, it's because their database was included in SSNDOB's collection (mentioned earlier). The company, which aggregates employee data and sells records to marketers, denied responsibility for the leak. They created the database, but the likely source of the leak was one of their thousands of clients.

Troy Hunt discovered the leak after a source sent him the database. Hunt noted that Department of Defence employee records made up the bulk of the data. This put them at particular risk as job titles such as intelligence analyst, chemical engineer, soldier, and platoon sergeant were identified in the data---making it useful to foreign agencies who may want to infiltrate or attack specific government roles.

"We've lost control of our personal data and as [Tim] Berners-Lee said only a few days ago, we often do not have any way of feeding back to companies what data we’d rather not share," Hunt said in his report on the Dun & Bradstreet leak.

Most of the people affected by the leak would likely had no idea that companies collected their data and sold it off in carefully aggregated lists.

Companies Know More About You Than You Think

In many of these incidents, you can't blame the victims of the data leaks for their information reaching the public domain. Rather, companies harvested this data from multiple services and records. Consumers often had no idea that companies shared this data with third-parties.

That's why it's important to check the privacy policies of the services you use. You should also keep up with any breaches and leaks that could affect you.

After all, companies know a lot more about you that you expect. But you can take a more active role in protecting your data. Make sure to check out our guide on how to protect your privacy online.

Image Credit: AllaSerebrina/Depositphotos