ChatGPT – is it already payback time?

31. 08. 2023

Do you also get the impression that with the official launch of ChatGTP in November 2022, the artificial intelligence industry has taken over not only the Internet, but almost every aspect of our daily lives? By crossing another milestone in the field of artificial intelligence, OpenAI has opened Pandora’s box and entered uncharted legal territory. Today, as the first wave of excitement has settled, the inevitable reality check is underway, with high-profile legal disputes over copyright protection on the horizon.

But first, for those who still take a dim view of artificial intelligence and new technologies, let’s briefly explain how so-called large language models such as ChatGPT work, and why they can be considered contentious from a legal standpoint.

Currently available AI-powered products including OpenAI’s GPT-4, Anthropic’s Claude 2, Meta’s Llama 2, and Google’s PaLM 2 are model-based chat boxes that primarily allow users to generate human-like responses of the desired length, format, style, level of detail, and language to text input.

These generative AI tools on the surface look pristine, but when you take a closer look, you will easily spot an incomplete picture of how their datasets were developed and trained. A top concern is whether the massive data collection sourced from the publicly available online information was legal. As you can imagine, more and more media organizations and authors are coming to the conclusion that the training of AI models constitutes a serious and notorious infringement of their copyrights due to the unauthorized use of data.

As we all witness a new era of generative AI, together we are also entering a hazy area of law where copyright infringement lawsuits may be just the tip of the iceberg. While for now the outcome of the potential legal action remains unknown, one thing is already clear. A huge wave of legal challenges is coming for companies working on AI-powered products and the only question is who will be the first to decide to challenge the major technology companies and try to set a precedent in this area.

A couple of examples to illustrate where we are now:

• In May, tech giants Samsung and Apple both banned the internal usage of AI chatbots like ChatGPT over concerns of sensitive internal data being outsourced by the models.

• Media giants including Disney, Bloomberg, The Washington Post, The Atlantic, Axios, Insider, ABC News have coded their platforms to the generative AI-tools from having access to their content.

• In early August, The New York Times updated its terms of service (TOS) and prohibited the use of Times content—which includes articles, videos, images, and metadata—for training any AI model without express written permission.

• Some media companies have already united fronts against major technology companies to jointly negotiate the use of their content in AI tools.

• Several authors have sued OpenAI and Meta for using pirated copies of their books to train language models without consent and compensation.

With a potential avalanche of lawsuits questioning whether AI technology violates privacy and copyright laws, one may wonder if artificial intelligence companies have developed, or perhaps are already implementing, a contingency plan for collecting data to further develop and train their tools. To tone down the enthusiasm of the proponents of this scenario, let’s clarify that with the roaring number of copyright disputes, it doesn’t seem that AI companies are going to and should give up without a fight, especially given the lack of strong privacy protections in the United States.

To defend its AI training models, firms like OpenAI would probably invoke the “fair use doctrine”. Section 107 of the Copyright Act of the United States provides the statutory framework for determining whether something is a fair use and identifies certain types of uses—such as criticism, comment, news reporting, teaching and research—as examples of activities that enables limited use of copyrighted material without having to first acquire permission from the copyright holder.

Since the developers of large AI language models have already proven to be unpredictable and highly effective, we soon find out if the fall of 23′ will belong to them, or another AI-xciting twist await us.

By Joanna Iwanicka