Meta Caught Red-Handed: Pirated Books Used to Train AI? Leaked Chat Logs Spark Outrage

December 13, 2023
Muneeb Tariq

Meta, the parent company of Facebook and Instagram, is facing renewed legal pressure after leaked chat logs reveal its researchers were aware of the legal risks involved in using thousands of pirated books to train its AI language model, Llama.

Meta Platform’s use of pirated books to train its AI language model, Llama, has been thrust back into the spotlight with the revelation of chat logs between Meta researchers discussing the legal risks involved.

In a new filing consolidating two lawsuits against Meta by authors including Sarah Silverman and Michael Chabon, evidence suggests that Meta was aware of potential copyright infringement issues yet chose to proceed with using the data.

Chat logs from 2021 show Meta researcher Tim Dettmers acknowledging concerns from the legal department about the legality of using the book files as training data. Despite these concerns, Dettmers and others in the chat express belief that using the data would fall under fair use doctrine.

If the authors’ lawsuit proves successful, it could have significant implications for the future of generative AI. Companies could be forced to compensate artists and creators for the use of their works, potentially slowing the development of this rapidly evolving technology.

Additionally, new regulations in Europe could require companies to disclose the data used to train their AI models, exposing them to further legal scrutiny.

With the growing awareness of ethical and legal implications surrounding AI development, Meta’s alleged use of copyrighted material without permission raises serious questions about the company’s practices and the potential consequences for the future of AI technology.