Illegal Data Scraping Allegations Against AI Firms by News Media Alliance

The News Media Alliance (NMA) has accused artificial intelligence (AI) firms of engaging in illegal data scraping to train their large language models (LLMs). In a submission to the United States Copyright Office, the NMA expressed concerns about the violation of copyrights by AI chatbots and their impact on news outlets.

The Concerns Raised by the NMA

The NMA raised several issues in its submission, highlighting the illegal acquisition of data by AI developers and the competition posed by AI chatbots’ narrative answers to search queries. The use of copyrighted data without permission eliminates the need for consumers to visit news sources, negatively affecting the revenue of news outlets.

The NMA emphasized that AI developers generate significant revenues without taking on the risks associated with reporting, which is unfair to news publishers. It specifically mentioned popular generative AI models like Bing Chat, Bard, Claude, and ChatGPT as examples of copyright infringement.

The Impact on Market Valuations

The NMA pointed out that unauthorized use of third-party content has resulted in soaring valuations for leading AI developers like OpenAI and Anthropic. These companies have transitioned from non-profit research organizations to profit-oriented entities, with market capitalizations reaching new highs.

Seeking Dialogue for Resolution

Instead of resorting to litigation, the NMA expressed its willingness to engage in dialogue to find reasonable licensing solutions. The goal is to ensure reliable and updated access to trustworthy expressive content, benefiting all parties involved and society as a whole.

AI Firms and Troubles with Copyright Violations

AI firms, including Meta, Anthropic AI, and OpenAI, have faced numerous class-action lawsuits from copyright holders alleging violations. These firms have invoked the fair use defense to protect themselves against legal action.

The Potential for Blockchain and AI

Experts believe that by combining blockchain technology and AI, the issues surrounding data collection by AI firms can be addressed. They propose using blockchain to identify AI-generated content and provide traceability for training data used in LLMs.

