Web Scraping, What is it and Why Should I Care About it?

Mariam Salmanzadeh

Associate Editor

Loyola University Chicago School of Law, JD 2025

In today's information-driven world, data is a valuable currency, and businesses continually seek innovative means to obtain, analyze, and harness this resource. One such method involves web scraping, an automated process that extracts website data. However, as web scraping gains wider popularity, it attracts regulatory scrutiny and raises questions about its legality.

What is web scraping?

Web scraping involves extracting data from websites, enabling the rapid collection of web data to provide valuable insights into trends, prices, and consumer insights. Web scraping has gained massive popularity amongst new companies seeking to reach a broader target audience because it allows for the collection of large amounts of data from websites. While scraping can be done manually, automation has maximized its value as a practice. Manually copying and pasting text from over a hundred websites is a laborious and monotonous task. Still, a machine with the proper instructions can accomplish it in less than a minute, and it can perform this task repeatedly, tirelessly, and at any desired scale.

So, how does web scraping work? First, the web scraper is like a digital detective. It is given one or more website addresses to investigate. It then looks at the entire web page, reading the whole page from top to bottom, including things like pictures. Next, the scraper can collect all the information on the page or just the specific things the user wants, like checking prices on an Amazon page without caring about reviews. Lastly, the scraper organizes all the collected information into a neat file like an Excel spreadsheet.

The legal framework

Web scraping operates within a legal gray area, primarily because it can serve legitimate or illicit purposes. Scraping is a common and often harmless activity. Still, it can also lead to various perceived harms, ranging from appropriating intellectual property to overloading computing resources and invading individual privacy. Regulatory agencies have endeavored to address the concerns drawn from scraping mainly due to a lack of legislation.

Three emerging trends are making the issue of malicious web scraping worse. These include a higher demand for scraped data due to the rise of generative AI, a decrease in the tech community's adherence to norms against unwelcome scraping, and a concern that recent federal court decisions limiting the Computer Fraud and Abuse Act (CFAA) might encourage unwanted scraping.

While the CFAA criminalizes unauthorized access to computer systems, the United States Court of Appeals for the Ninth Circuit significantly limited its scope in 2022 by ruling that the CFAA does not apply to unauthorized scraping of public websites such as LinkedIn.

The same controversy between HiQ and LinkedIn asserted violations of the Digital Millennium Copyright Act (DMCA), which encompasses provisions aimed at safeguarding against the circumvention of technological protection measures. Endeavoring to evade these measures on websites to facilitate scraping could lead to legal consequences such as breach of contract claims.

Many website owners are concerned about the perceived lack of enforcement of CFAA provisions by regulatory agencies. Specifically, there is a notable omission of an explicit enforcement warning against those who engage in unauthorized scraping, which showcases the limits of data protection laws as a solution to this behavior. However, there has been recent progress from the U.S. Department of Justice (DOJ). In a 2022 policy passed by the DOJ they addressed enforcement concerns by requiring federal prosecutors to consult with the Criminal Division’s Computer Crime and Intellectual Property Section (CCIPS) before bringing CFAA charges.

Navigating the complexities of compliance

Web scraping can be an invaluable tool for businesses, researchers, and developers. However, it is crucial to navigate the intricate web of compliance thoughtfully. It is essential to consult a website's terms of use to ascertain whether web scraping is permitted. Nevertheless, the Ninth Circuit has held that the website must clearly display the terms the consumer will be bound by, and the consumer must take clear action, like clicking a button or checking a box, to show their agreement with those terms.

If it is prohibited, consider obtaining authorization or exploring alternative data sources. When engaged in data scraping, ensure compliance with relevant data protection laws and obtain consent when dealing with personal information.

Many critics of web scraping argue that Congress could amend the CFAA by addressing the problem of harmful scraping in comprehensive privacy legislation or by doing both. If adequately tailored, an amendment to the CFAA could provide direct legal remedies for the most malicious kinds of scraping.

What lies ahead?

Web scraping, a potent method for data acquisition, presents intricate legal and ethical considerations that cannot be overlooked. Understanding the regulatory and legal implications of web scraping is essential for individuals and organizations to avert potential legal consequences. Website owners must take accountability into their own hands and secure their sites as much as possible from potentially threatening scrapers.

INSIDE COMPLIANCE

Web Scraping, What is it and Why Should I Care About it?

Archives

Meta