For all the good copyright law does to protect the creative works of authors, it causes problems for Text and Data Mining (TDM) by researchers and AI. TDM is a tool that allows computers to “read” and analyze large amounts of text or data. As I explained in Part 1 of this post, TDM is likely copyright infringement under current U.S. law. However, the initial copyright infringement isn’t the only issue.
Have you ever wanted to use an article, only to find that it’s held behind a paywall? Although copyright doesn’t grant the right to exclude other people from accessing the work, a copyright holder might effectively restrict access by licensing the work to a journal or publisher. Why would they do this? It helps the copyright owner protect their rights by preventing others from unauthorized copying or distribution.
A paywall creates access issues for researchers and AI, though. A researcher using TDM to scan a high volume of journals might not have access to the data they need because the materials are only accessible via a paid subscription. Other times, a researcher might want to recreate a study but not have the same means to access a copyrighted work that another team had. An AI being trained or already developed using data might not have access to information because it might be stuck behind a paywall.
Even if there isn’t a paywall, use of TDM may be stymied because TDM is likely copyright infringement if there isn’t a clear exception. As I discussed in Part 1 of this post, the fair use doctrine requires that courts evaluate the four fair use factors on a case-by-case basis. Because the facts of every case are different, it’s difficult to say whether fair use protects all forms of TDM. Given the uncertainty of whether TDM is protected by fair use, why risk a potential copyright infringement lawsuit and the expense of litigating fair use?
Considering the problems of copyright infringement, reduced access to copyrighted works, and fear of potential lawsuits, what should be done? Let’s look at some other countries that have created statutory exceptions for TDM and why they made them.
Countries with Statutory Exceptions for TDM
Several countries have adopted express exceptions to copyright law for the purposes of TDM. For example, Japan created an exception for TDM for machine learning that is AI-promoting. Singapore adopted very permissive copyright exceptions that even includes use by commercial entities. Just like Japan, Signapore’s exception was created for the purposes of helping AI research and development. In 2019, the European Union issued a directive that included mandatory exceptions from copyright for TDM. The exceptions greatly broaden the permissibility of TDM to include non-compensated exceptions for research organizations and cultural heritage institutions. Copyright holders can still opt out of commercial TDM uses, but not these scientific and cultural ones. In the directive, the European Parliament stated that these exceptions should be created because TDM has the ability to promote innovation and discovery, especially in the fields of research and cultural heritage.
In 2022, the United Kingdom Intellectual Property Office announced that the government would consider broadening permissibility for TDM activities and introduce exceptions that would allow TDM for any purpose, so long as the copyrighted works were accessed lawfully. This would have been a huge step in artificial intelligence development and would have extended beyond TDM used for research – it would have extended into TDM being used in the creative arts. The proposal received massive backlash from the creative industry, who claimed that it would drastically weaken the strength of copyright law. This would directly and negatively impact the liveliehood of artists. After much debate, the UK Minister for Science, Research and Innovation confirmed on February 2, 2023 that the proposal would not be moving forward.
The U.S. has yet to create any copyright exceptions for TDM, but should it? Let’s look at some arguments for creating an exception for TDM.
TDM’s Resume
TDM is already being used for some pretty amazing things when copyrighted material isn’t involved! It’s being used by climate change researchers to do things like plant crops more efficiently or monitor water availability around the globe. On top of that, researchers are using it to scan climate change literature from throughout the years to find trends in what people are doing to stop climate change at a broader level!
Disease researchers use TDM to look out for disease outbreaks before they happen. One program created by the International Society for Infectious Diseases accepts thousands of user-submitted reports from around the globe every year to monitor for infectious outbreaks. However, the reports that users submit can’t be analyzed fast enough to be used for early-warning systems. To get around this, the program used TDM to scan the reports and upload the results to its database.
Businesses use TDM to detect fraud. Has your credit card company ever blocked your card because it thinks the card is being used fraudulently? Whether there’s real fraud taking place, or you’ve just travelled out of the country for a week, the company is likely using TDM to review all of your previous credit card transactions and compare them to any outliers.
TDM has already done a lot to help researchers and businesses when it’s used to analyze non-copyrightable material – imagine what it could do if it had access to copyrighted data! Not only is TDM powerful on its own, but it can help other technologies to flourish.
TDM Helps AI Grow
AI is a hot topic lately, but TDM plays a key role in how effectively AI works. Put simply, while TDM is for gathering information from a massive data set, artificial intelligence and machine learning (computers that can learn and improve themselves without much help from humans) are for solving complex tasks. A researcher might use TDM to scan a high volume of data to detect patterns a human researcher might not spot, then feed the results to an AI program to make future predictions. If data is protected by copyright, the AI won’t have anything to study and learn from. This is a quickly evolving topic as seen from this 2022 IP Bytes blog about AI to this 2023 blog about ChatGPT that includes discussion of the March, 2023 Copyright Office policy change.
So Now What?
Should the U.S. implement a copyright exception for TDM? I think it should! All-in-all, TDM provides countless benefits to researchers and businesses. These benefits are currently being restricted due to issues like potential copyright infringement, the lack of access to copyrighted materials, and fear of potential lawsuits. Some scholars argue that the reason the U.S. hasn’t implemented any formal copyright exceptions yet is because the fair use doctrine is pretty flexible for dealing with these issues. As we’ve noted, though, it’s unclear whether fair use will account for every case of TDM research.
For now, I think an exception should work similarly to Europe’s Direct. So, an exception that only allows uses related to research and cultural preservation. To me, allowing TDM for just these purposes leads to more straightforward benefits without running into issues like copying artist’s work for personal gain. As someone with a background as a musician and writer, I still haven’t made up my mind all the way about how AI should interact with the creative arts. Either way, I can see the good that TDM has done for research in so many fields. I’m excited to see how it could be used next!
Connor Druhan
Associate Blogger
Loyola University Chicago School of Law, J.D. 2024