What’s Mine is Yours?

What’s Mine is Yours?

When I was younger, my dad told me that he worked on “artificial intelligence” after graduating college. I thought this was the coolest job ever. What if my dad was part of the team that made a self-aware AI like in the movies? He told me that AI didn’t really work that way. In fact, he worked on a smaller piece of AI technology dealing with sorting data based on algorithms. I didn’t really understand what he meant at the time, but it still seemed pretty neat.

Decades have passed, and AI has come a long way from the research people like my dad performed when he was younger. When you say “AI” now, most people immediately think of programs like Dall-E  or Chat GPT which would seem like miracles to researchers in the ‘80s. However, despite the new AI bells and whistles that are available to consumers, artificial intelligence is still a very deep field with a lot of nuance. In fact, something you may not have heard of called “text and data mining” is essential to making AI run properly.

Photo by Adi Goldstein, licensed under Unsplash

Text and data mining (TDM) is a process through which machines can “read” and analyze large amounts of text or data. Computer algorithms can then study the results and find patterns across vast amounts of information. TDM isn’t just essential for AI – it’s a powerful tool for a variety of fields! TDM can help scientific researchers to comb through volumes of material to make otherwise difficult-to-find discoveries. It’s already been used by pancreatic cancer researchers to develop new therapies. TDM can assist businesses with large-scale data analysis to make decisions that will benefit consumers. For example, Groupon uses TDM to analyze extremely high volumes of data from customers in real time to get ahead of shopping trends. TDM is even being used by Domino’s Pizza to manage the thousands of pizza orders it receives from online orders. Domino’s data platform stores and organizes the heaps of data the company collects from its customers via pizza orders, social media platforms, and advertisements to review.

One big problem arises when thinking about how TDM works, though. What if some of the volumes of data that the computers analyze are protected by copyright? If they are, does that mean that TDM constitutes copyright infringement?

Is TDM Copyright Infringement?

Copyright law is a form of intellectual property protection that grants a bundle of rights to help copyright holders protect their works during the term of the copyright. The rights granted to copyright holders include the rights to prevent others from reproducing (i.e. copying) the work or from distributing it to the public.

TDM can involve copying a work protected by copyright law and publishing it in a new format. For example, imagine if a literary scholar wanted to analyze the entire Lord of the Rings series to count the number of lines of dialogue per character. The researcher could input the Lord of the Rings series into a computer program and request that the computer sort all dialogue by character. The computer would then scan the books, delivering its results by including snippets of dialogue for each character. Through TDM, the scholar has reproduced and distributed the Lord of the Rings text, had access to it, and copied the snippets verbatim. This would likely constitute a case of copyright infringement-unless there is an exception including the flexible fair use exception in the US, or a specific exception for TDM as other countries have created.

Photo by Kevin Ku, licensed under Unsplash

Is TDM Fair Use?

It is currently unclear whether the fair use defense applies to TDM. The fair use defense is meant to allow people to make use of copyrighted works for beneficial purposes without requiring a license. Imagine a world where to legally use any copyrighted material for any reason, you needed to wait for the copyright to expire or to get a license!

This lack of clarity regarding fair use and TDM is because courts evaluate whether the use of the work meets the four fair use factors, with no single factor being dispositive. It’s important to note that every case of fair use must be decided based on its own facts, but courts can and do disagree about the same case. Optimistic TDM researchers point to a significant case called Authors Guild v. Google in which the Second Circuit found that Google’s digitally copying millions of books and making the results searchable through Google Books was fair use. I’m not sure that this case applies to all forms of TDM, though. Notably, copying the entirety of the books was permitted in that case, but the modifications to make the results searchable played a huge role. In another case involving digitization of copyrighted books, the court didn’t find fair use. As we’ve mentioned, TDM can take different forms based on what it’s used for – a researcher might use TDM to scan hundreds of journal articles, while a business might use TDM to scan customer purchase data. On top of that, fear of litigation might scare researchers away from using TDM at all.

Do you think copyright law needs to change with the times and account for TDM for the benefit of research? Stay tuned for a follow-up post coming soon where we’ll discuss whether the United States should adopt an exception for TDM in copyright law!


Connor Druhan

Associate Blogger

Class of 2024