What is Data Scraping?
Data scraping is the mass extraction of personal information and profile data from social media platforms and websites without the specific permission of the data owner. Most of the time, people and companies utilize automated software programs – also referred to as bots, crawlers or spiders – to gather this sort of information rapidly and on a large scale. The technique is widely used in a variety of industries. Recruiters, trend watchers, marketers, business development managers, insurance companies and other types of organizations and individuals all use this technique to gather information. The data may be used to enhance databases, generate business leads or gather data-driven insights, for example. In essence, data scraping involves gathering publicly available information only. However, some seem to interpret this rule very broadly. This is not surprising, as the monetary value of personal information has grown significantly over the past decade. Unfortunately, this puts all those whose data is scraped increasingly at risk. Especially since cybercriminals also want to monetize this treasure.
Evolving Legal Landscape
In recent years, governments around the world have drafted legislation that protects people’s privacy, like California’s CCPA and Europe’s GDPR. These laws generally require a legitimate purpose for gathering and processing data. This means that people and businesses can only collect and process the data they absolutely need to achieve their intended business purposes, and no more. Moreover, not all data is fair game for scraping. Even if a person or company is deemed to have a legitimate purpose for scraping, some data categories are off-limits. Under the GDPR, for example, information about a person’s race, religion, health, political opinion and sexual preferences cannot be gathered nor processed without their explicit consent. However, there are many grey areas when it comes to data scraping. What will happen, for example, if someone scrapes data that’s normally only visible to users with an account? What about copyrighted data, such as photos? Clearview AI, for example, scraped images and videos from the internet to perfect their face recognition app. More than 600 law enforcement agencies now use their application.
Recent Scraping Events
Early in April, news broke that the private data of more than half a billion Facebook users from 106 countries ended up on a hackers’ forum. Shortly after, the personal information of LinkedIn and Clubhouse users was also dumped on the streets. In each case, an unknown person, or persons, managed to scrape a huge volume of valuable and privacy-sensitive information from people’s user profiles. Technically, and in a strict sense, this is not considered to be a data “breach”. Because with data scraping, no computer system is hacked. According to privacy advocates, however, this argument is incomplete. Especially since for the data’s owners, the result is just the same. If this trove of valuable personal information falls into the wrong hands, cybercriminals can launch sophisticated phishing scams, attempt identity theft, and more.
People’s Privacy Violated Nonetheless
Most data breach laws require companies to notify users if a data breach occurred. That’s why so many people raised their eyebrows when Facebook refused to notify users about the scraping incident. Worse still, an internal email downplayed data scraping as a “broad industry issue”. Yet, while it may not be in direct violation of privacy regulations, it does violate people’s privacy. Later, the Facebook data scraping incident appeared to be of a different nature than other incidents. In this case, the incident related to a bug Facebook had fixed in 2019. At the time, people could misuse the “import contacts” feature to access information people only shared with their connections. On 14 April, the Irish Data Protection Commission launched an investigation into the breach. In an ongoing lawsuit between LinkedIn and hiQ, however, a Californian judge ruled that, based on the Computer Fraud and Abuse Act (CFAA), LinkedIn couldn’t stop hiQ from scraping publicly available information. In similar data scraping cases, however, courts have not interpreted this particular law in the same manner. Moreover, in a settlement case between the US Federal Trade Commission (FTC) and Flo Health earlier this year, the FTC made it clear that the compromise of data can be a breach even when no technical hacking is involved.
Privacy by Design
No doubt it’s time for lawmakers to review data breach laws. And to clear up the vast grey area that is data scraping. But perhaps the onus should also be on social media platforms. They should build their infrastructure with user privacy in mind and prevent the rapid, automated scraping of user profiles.