The UK government has decided not to introduce broad copyright exceptions for text and data mining for AI systems. In addition, the reproduction of content by AI systems violates copyright law.
Such an exception was previously considered and is now off the table. Text and data mining refers to the process of extracting information from large amounts of text or data using AI and machine learning.
The UK government recognizes the potential benefits of AI in the creative industries, such as increased productivity and efficiency. However, it also recognizes the risks and concerns of the media and creative industries, including the impact of AI-generated content on human creativity.
The government therefore intends to develop a code of practice for copyright and AI. The aim is to enable the AI and creative industries to grow in partnership while ensuring that the UK’s copyright framework continues to encourage and reward investment in creativity.
The government is working closely with AI and creative industries stakeholders to achieve this. The code of practice is expected to be published in early 2024.
Before the New York Times lawsuit, OpenAI told the UK government that it was impossible to train foundational AI models without copyrighted data.
Reproduction of content by AI systems violates copyright law
According to the government, AI’s reproduction of copyrighted works violates copyright law unless permitted by a license or exception.
It is not clear from the UK government’s response whether the reproductions must be exact copies, or whether a close approximation of the original is sufficient. The latter is likely to be much more difficult to regulate as the number of infringements would be much higher.
In any case, the exact reproduction of training material, as the New York Times was able to prove for the OpenAI models, is likely to constitute copyright infringement from the UK government’s standpoint.
Whether this reproduction is a bug or a characteristic feature of large AI models remains to be seen. From OpenAI’s perspective, these exact copies are a “rare bug” that can be fixed.
However, if the courts conclude that the output of training material is part of the core function of AI models, providers could face hard times. They would not be able to guarantee that such copyright infringements would not occur again and again.