top of page

AI LLMs are being trained on Pirated Books - Robot Pirates!

When Our Future AI is Trained on Pirated Books, What Are We Really Teaching It?


AI Pirate Art- Robot Pirates
I made this blog so I could make pirate images :)

Imagine you've spent years writing a book, pouring your heart and soul into it. Then, you find out that your work is being used without your permission to train a computer program. How would you feel? This isn't a hypothetical scenario; it's happening right now in the world of artificial intelligence (AI). And it's a topic that should concern us all, from Invercargill to the Far North, techie or not.


The Lawsuit That Opened Pandora's Box

Recently, a lawsuit in California has shaken the tech world. Authors like Sarah Silverman, Richard Kadrey, and Christopher Golden have accused tech giant Meta of using their copyrighted books to train an AI model called LLaMA. This isn't just a legal issue; it's an ethical dilemma that questions the very foundation of how we develop AI.



AI Pirates stealing books
AI Pirates stealing books

The Extent of the Problem Might be Bigger Than We Thought

An investigation revealed that LLaMA's training data includes upwards of 170,000 books, many of which are copyrighted. And it's not just Meta; other companies and open-source projects are also using this dataset, known as "Books3," to train their AI models. This means that the AI tools we interact with daily could be built on a foundation of stolen intellectual property.



AI Pirate Ship Fight
AI Pirate Ship Fight

Why Should Kiwis Care?

"Why should this matter to me?" Well, as New Zealanders, we value fairness, integrity, and respect for intellectual property. Whether it's respecting the haka or protecting our unique flora and fauna, we understand the importance of ownership and heritage. The issue of pirated books in AI training data is a global problem that also taints the technology we use here at home.



Cute Robot reading a book ai art
This pirate robot is so cute I can't even be mad

Complexity of AI: Quantity Over Quality?

AI models like ChatGPT or LLaMA require a massive amount of text to function effectively. But should the quantity of data overshadow the quality and ethics of its source? The answer should be a resounding "no." The tech industry's cavalier attitude towards intellectual property needs a rethink, especially when it affects the livelihoods of authors and creators.



AI Art Pirates destroying a library
AI Art: Pirates Storming a Library

It could be a Slippery Slope

If we turn a blind eye to this, what's next? The exploitation of pirated books for profit sets a dangerous precedent. It's not just about the tech companies; it's about the culture we're fostering. A culture that disregards intellectual property rights can lead to a future where creativity and innovation are stifled.



AI Space Pirate
Space Pirate!

Time for a Change?

As New Zealanders, we have a role to play in this global conversation. We can advocate for ethical AI development that respects intellectual property and promotes fairness. After all, the quality of AI is only as good as the data it's trained on. And if that data is stolen, what kind of future are we building?



OooooooAARRRRRHHHHH!!! - Bot Bot (your friendly AI blog pirate)

bottom of page