Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.
All this copyright/AI stuff is so silly and a transparent money grab.
They’re not worried that people are going to ask the LLM to spit out their book; they’re worried that they will no longer be needed because a LLM can write a book for free. (I’m not sure this is feasible right now, but maybe one day?) They’re trying to strangle the technology in the courts to protect their income. That is never going to work.
Notably, there is no “right to control who gets trained on the work” aspect of copyright law. Obviously.
Designing and marketing a system to plagiarize works en masse? That’s the cash grab.
Can you elaborate on this concept of a LLM “plagiarizing”? What do you mean when you say that?
What I mean is that it is a statistical model used to generate things by combining parts of extant works. Everything that it “creates” is a piece of something that already exists, often without the author’s consent. Just because it is done at a massive scale doesn’t make it less so. It’s basically just a tracer.
Not saying that the tech isn’t amazing or likely a component of future AI but, it’s really just being used commercially to rip people off and worsen the human condition for profit.
Everything that it “creates” is a piece of something that already exists, often without the author’s consent
This describes all art. Nothing is created in a vacuum.
No, it really doesn’t, nor does it function like human cognition. Take this example:
I, personally, to decide that I wanted to make a sci-fi show. I don’t want to come up with ideas so, I want to try to do something that works. I take the scripts of every Star Trek: The Search for Spock, Alien, and Earth Girls Are Easy and feed them into a database, seperating words into individual data entries with some grammatical classification. Then, using this database, I generate a script, averaging the length of the films, with every word based upon its occurrence in the films or randomized, if it’s a tie. I go straight into production with “Star Alien: The Girls Are Spock”. I am immediately sued by Disney, Lionsgate, and Paramount for trademark and copyright infringement, even though I basically just used a small LLM.
You are right that nothing is created in a vacuum. However, plagiarism is still plagiarism, even if it is using a technically sophisticated LLM plagiarism engine.
ChatGPT doesn’t have direct access to the material it’s trained on. Go ask it to quote a book to you.
That really doesn’t make an appreciable difference. It doesn’t need direct access to source data, if it’s already been transferred into statistical data.
I seriously doubt Sarah Silverman is suing OpenAI because she’s worried ChatGPT will one day be funnier than she is. She just doesn’t want it ripping off her work.
What do you mean when you say “ripping off her work”? What do you think an LLM does, exactly?
In her case, taking elements of her book and regurgitating them back to her. Which sounds a lot like they could be pirating her book for training purposes to me.
Quoting someone’s book is not “ripping off” the work.
How is it able to quote the book? Magic?
So you’re saying that as long as they buy 1 copy of the book, it’s all good?
No, I’m not saying that. If she’s right and it can spit out any part of her book when asked (and someone else showed that it does that with Harry Potter), it’s plagiarism. They are profiting off of her book without compensating her. Which is a form of ripping someone off. I’m not sure what the confusion here is. If I buy someone’s book, that doesn’t give me the right to put it all online for free.
How do you know they didn’t just buy the book?
Again, that’s not relevant.
There is nothing silly about that. It’s a fundamental question about using content of any kind to train artificial intelligence that affects way more than just writers.
This is so stupid. If I read a book and get inspired by it and write my own stuff, as long as I’m not using the copyrighted characters, I don’t need to pay anyone anything other than purchasing the book which inspired me originally.
If this were a law, why shouldn’t pretty much each modern day fantasy author not pay Tolkien foundation or any non fiction pay each citation.
There’s a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
All AI creations are derivative and subject to copyright law.
But for text to be a derivative work of other text, you need to be able to know by looking at the two texts and comparing them.
Training an AI on a copyrighted work might necessarily involve making copies of the work that would be illegal to make without a license. But the output of the AI model is only going to be a for-copyright-purposes derivative work of any of the training inputs when it actually looks like one.
Did the AI regurgitate your book? Derivative work.
Did the AI spit out text that isn’t particularly similar to any existing book? Which, if written by a human, would have qualified as original? Then it can’t be a derivative work. It might not itself be a copyrightable product of authorship, having no real author, but it can’t be secretly a derivative work in a way not detectable from the text itself.
Otherwise we open ourselves up to all sorts of claims along the lines of “That book looks original, but actually it is a derivative work of my book because I say the author actually used an AI model trained on my book to make it! Now I need to subpoena everything they ever did to try and find evidence of this having happened!”
The thing is these models aren’t aiming to re-create the work of any single authors, but merely to put words in the right order. Imo, If we allow authors to copyright the order of their words instead of their whole original creations then we are actually reducing the threshold for copyright protection and (again imo) increasing the number of acts that would be determined to be copyright protected
There’s a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
But the AI is looking at thousands, if not millions of books, articles, comments, etc. That’s what humans do as well - they draw inspiration from a variety of sources. So is sentience the distinguishing criteria for copyright? Only a being capable of original thought can create original work, and therefore anything not capable of original thought cannot create copyrighted work?
Also, irrelevant here but calling LLMs a glorified autocomplete is like calling jet engines a “glorified horse”. Technically true but you’re trivialising it.
The trivialization doesn’t negate the point though, and LLMs aren’t intelligence.
The AI consumed all of that content and I would bet that not a single of the people who created the content were compensated, but the AI strictly on those people to produce anything coherent.
I would argue that yes, generative artificial stupidity doesn’t meet the minimum bar of original thought necessary to create a standard copyrightable work unless every input has consent to be used, and laundering content through multiple generations of an LLM or through multiple distinct LLMs should not impact the need for consent.
Without full consent, it’s just a massive loophole for those with money to exploit the hard work of the masses who generated all of the actual content.
Yes. Creative work is made by creative people. Writing is creative work. A computer cannot be creative, and thus generative AI is a disgusting perversion of what you wanna call “literature”. Fuck, writing and art have always been primarily about self-expression. Computers can’t express themselves with original thoughts. That’s the whole entire point. And this is why humanistic studies are important, by the way.
I absolutely agree with the second half, guided by Ian Kerr’s paper “Death of the AI Author”; quoting from the abstract:
Claims of AI authorship depend on a romanticized conception of both authorship and AI, and simply do not make sense in terms of the realities of the world in which the problem exists. Those realities should push us past bare doctrinal or utilitarian considerations about what an author must do. Instead, they demand an ontological consideration of what an author must be.
I think the part courts will struggle with is if this ‘thing’ is not an author of the works then it can’t infringe either?
Courts already expressed themselves, and what they said is basically copyright can’t be claimed for the throw up AIs come up with, which means corporations can’t use it to make money or sue anyone for using those products. Which means generated AI products are a whole bowl of nothing legally, and have no identity nor any value. The whole reason commissions are expensive is that someone has spent money, time and effort to make the thing you asked of them, and that’s why corresponding them with money is right.
Also, why can’t AI be used to automatize the shit jobs and allow us to do the creative work? Why are artists and creatives being pushed out of doing the jobs only humans can do? Like this is the thing that makes me furious: that STEM bros are blowing each other in the fields over humans being pushed out of humanity. Without once thinking AI is much more apt at replacing THEIR jobs, but I’m not calling for their jobs to be removed. This is just a dystopic reality we’re barreling towards, and there are people who are HAPPY about humans losing what makes us human and speeding toward pure, total, complete misery. That’s why I’m emotional about this: because art is only, solely made by humans, and people create art to communicate something they have inside. And only humans can do that - and some animals, maybe. Machines have nothing inside. They are nothing, they are only tools. It’s like asking a hammer to write its own poetry, it’s just insane.
Machine learning algorithms does not get inspired, they replicate. If I tell a MLM to write a scene for a film in the style of Charlie Kaufman, it has to been told who Kaufman is and been fed alot of manuscripts. Then it tries to mimicks the style and guess what words come next.
This is not how we humans get inspired. And if we do, we get accused of stealing. Which it is.
Because a computer can only read the stuff, chew it and throw it up. With no permission. Without needing to practice and create its own personal voice. It’s literally recycled work by other people, because computers cannot be creative. On the other hand, human writers DO develop their own style, find their own voice, and what they write becomes unique because of how they write it and the meaning they give to it. It’s not the same thing, and writers deserve to get repaid for having their art stolen by corporations to make a quick and easy buck. Seriously, you wanna write? Pick up a pen and do it. Practice, practice, practice for weeks months years decades. And only then you may profit. That’s how it always was and it always worked fine that way. Fuck computers.
but to read the book and be inspired by it, you first had to buy the book. That’s the difference.
What did you pay the author of the books and papers published that you used as sources in your own work? Do you pay those authors each time someone buys or reads your work? At most you pay $0-$15 for a book anyway.
In regards to free advertising when your source material is used… if your material is a good source and someone asks say ChatGPT, shouldn’t your work be mentioned if someone asks for a book or paper and you have written something useful for it? Assuming it doesn’t hallucinate.
That’s the “paid in exposure” argument.
And I’m not sure what my company pays, but they purchase access to scientific papers and industrial standards. The market price I’ve seen for them is hundreds of dollars. You either pay an ongoing subscription to access the information, or you pay a larger lump sum to own a copy that cannot legally be reproduced.
Companies pay for this sort of thing. AI shouldn’t get an exception.
TBF, access to scientific papers funded by public money should be free to the public anyway. The whole needing a subscription to access them is malarkey. The researchers aren’t the ones getting the money.
This needs to be signal boosted, regarding researchers, research, and money.
I think this is more about frustration experienced by artists in our society at being given so little compensation.
The answer is staring us in the face. UBI goes hand in hand with developments in AI. Give artists a basic salary from the government so they can afford to live well. This isn’t a AI problem this is a broken society problem. I support artists advocating for themselves, but the fact that they aren’t asking for UBI really speaks to how hopeless our society feels right now.
What incentive is there at all to work with UBI? Why would anyone try hard at anything if you’re not rewarded?
Good question. I’ll admit that I like UBI, but I haven’t done any serious reading into it. I have a break from work this month so maybe I’ll try and find a book so I can answer this type of question better in the future. I don’t think it’s so much an issue for good jobs, but the real shit jobs might be an issue, but maybe not, and UBI in the beginning wouldn’t just be for everyone it would be for selected groups.
A brief search gave this: https://www.vox.com/future-perfect/2020/2/19/21112570/universal-basic-income-ubi-map
Some gains:
- lower crime
- increased fertility (maybe a good thing idk?)
- decrease or eliminate extreme poverty
- improves education
I don’t think anyone is thinking of the broader implications of this, they rather downvote opposing opinions. If UBI starts, where’s the money come from? Higher taxes, in turn higher product cost, which just completes the cycle, making ubi not enough to live on, making an increase needed. They already tried this with minimum wage, it’s still what $7? Full time work won’t even pay for your apartment, let alone 80 hour weeks
couple options include taking profits gained through AI/automation that have historically gone to shareholders. The other is VAT tax targeting the wealthy. We don’t need UBI for everyone all at once, so funding would be incremental. I don’t think it’s the largest challenge to UBI, the main one being people who oppose it for any number of reasons.
I don’t think either of us should try to assume an advanced knowledge of economics. We don’t know what will happen to the cycle, but the idea of UBI wouldn’t be proposed at all unless that were also a consideration already answered.
Big surprise, people do things despite not being paid for them!
Also a UBI should be just enough to live (afford food and shelter) wherever you live. Then you can work for more.
UBI is about freeing people from having to work multiple dead end jobs just to survive and enables them to have an actual pursuit of happiness. Not everyone will want to work harder, but the option opens to those who do.
Currently if you’re struggling just to pay for food and shelter, it’s incredibly hard to spend time developing skills needed to make more.
To answer you seriously (and these are out of date figures from memory) that in Australia all it would take to give everyone UBI is to tax every dollar outside of that people make starting at 30%. (Currently its 19% after your first 18k. goes to 32% over 45k and 37% over 120k and 45% after that)
The positives are that
-
People can retire younger, meaning upward job mobility is greatly improved.
-
The cost of means testing, managing welfare and aged pensions and combating fraud of those systems effectively vanishes.
-
Students could afford to study full time and work only part time or not at all AND pay their rent meaning a better educated population.
-
It effectively combats the minimum wage being too low. Its ok for part time baristas to make the minimum when the govt is making sure that people are already at “survival”.
-
It indirectly funds the arts. Lets be real, how many great musicians had to stop chasing their dreams because it was "practice or go to work and eat.
For example. Some guy making 100k a year and paying about $25k in tax currently. Under the 30% arrangement would pay 30k in tax, but the ubi would pay about $20k a year. So still $15k in front. I’m no accountant but I think for you to be worse off you have to be on about $200k a year or more.
-
Also I think it’s funny that we can bail out large companies on repeat, but bailing out people is a show stopper. It’s backwards. The economy is supposed to serve us, not the other way around.
There will always be people who seek to challenge themselves.
Others will want more money than is included with their UBI. What on earth would be wrong with people having a little more, as opposed to so many struggling, needing roommates, and so on? I imagine with an extra 1k-2k in their pocket monthly, a lot more people would buy or build housing, and a lot of service industries would boom with all the additional potentially disposable income.
Or how about people being able to retire, like actually retire, without stress. We could lower the retirement age, or people could retire independently from government assistance, leading to more available jobs for younger people as more roles transition away due to automation.
And frankly, I honestly don’t see anything wrong with some portion of the populace just living on UBI and enjoying life if that’s how they want to do things. Nothing wrong with people being happier, less stressed, and potentially mentally and/or physically healthier for it.
You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?
Why the fuck are we still allowing a handful of people to control things like this??
Because the tech behind it isn’t cheap and money does not fall from trees.
No entity on the planet has more money than our governments. It’d be more efficient for a government to fund this than any private company.
Many governments on the planet have less money than some big tech or oil companies. Obviously not those of large industrious nations, but most nations aren’t large and industrious.
The government and efficiency don’t go together
That’s a lazy generalization.
Plenty of research shows that each dollar into government programs gets much more returns than private companies. This literally a neolib propaganda talking point.
Money literally does fall from trees as they are pieces of paper
Actually many bills are more of a fabric material now than an actual paper product. Many bills in Europe now are polymer based. Both of which add to the difficulty of counterfeiting
Actually most of the money are just 1‘s and 0‘s in a computer, coming into existence from nothing and vanishing into nothing. Fiat money backed by “trust”. As Henry Ford once said:
It is well enough that people of the nation do not understand our banking and monetary system, for if they did, I believe there would be a revolution before tomorrow morning.
This comment is excellent. You now have ten trillion LemBux.
There is nothing objectively wrong with your statement. However, we somehow always default to solving that issue by having some dragon hoard enough gold, and there is something objectively wrong with that.
You think it is so simple you can just download it and run it on your laptop?
You kind of can though? The bigger models aren’t really more complicated, just bigger. If you can cram enough ram or swap into a laptop,
lamma.cpp
will get there eventually.
Setting aside the obvious answer of “because capitalism”, there are a lot of obstacles towards democratizing this technology. Training of these models is done on clusters of A100 GPU’s, which are priced at $10,000USD each. Then there’s also the fact that a lot of the progress being made is being done by highly specialized academics, often with the resources of large corporations like Microsoft.
Additionally the curation of datasets is another massive obstacle. We’ve mostly reached the point of diminishing returns of just throwing all the data at the training of models, it’s quickly becoming apparent that the quality of data is far more important than the quantity of the data (see TinyStories as an example). This means a lot of work and research needs to go into qualitative analysis when preparing a dataset. You need a large corpus of input, each of which are above a quality threshold, but then also as a whole they need to represent a wide enough variety of circumstances for you to reach emergence in the domain(s) you’re trying to train for.
There is a large and growing body of open source model development, but even that only exists because of Meta “leaking” the original Llama models, and now more recently releasing Llama 2 with a commercial license. Practically overnight an entire ecosystem was born creating higher quality fine-tunes and specialized datasets, but all of that was only possible because Meta invested the resources and made it available to the public.
Actually in hindsight it looks like the answer is still “because capitalism” despite everything I’ve just said.
I know the answer to pretty much all of our “why the hell don’t we solve this already?” questions is: capitalism.
But I mean, as Lrrr would say “why does the working class, as the biggest of the classes, doesn’t just eat the other one?”.
The short answer is friction. The friction of overcoming the forces of violence the larger class has at its disposal and utilizes at the smallest hint of uprising is greater than the friction of accepting the status quo.
Most people don’t even think that’s an option though.
The end of history, with the fall of USSR and capitalism winning the propaganda wars, means most people don’t even see a different future.
Why would you fight a future that looks the same?
People need to wake up and have hope for a different, better future. That’s the only way they’ll more against this.
But for that 100+ years of propaganda have to be overcome…
The friction of accepting the status quo only seems to grow stronger though.
One would hope
Because we shy away from responsibility.
I think the longer response to this is more accurate. It’s more “because capitalism” than anything else.
And capitalism over the course of the 20th century made very successful attempts of alienating completely the working class and destroying all class consciousness or material awareness.
So people keep thinking that the problems is we as individuals are doing capitalism wrong. Not capitalism.
Why the fuck are we still allowing a handful of people to control things like this??
For many many reasons, i’ll start with this one: because if you don’t complain with authority they will send their thugs (the police) to arrest you.
Someone should AGPL their novel and force the AI company to open source their entire neural network.
Love to see it!
removed by mod
While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he’ll design a bridge that I and my company profit from, the textbook company can’t sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.
It’s not really a parallel.
The text books don’t have copyrights on the concepts and formulae they teach. They only have copyrights for the actual text.
If you memorize the text book and write it down 1:1 (or close to it) and then sell that text you wrote down, then you are still in violation of the copyright.
And that’s what the likes of ChatGPT are doing here. For example, ask it to output the lyrics for a song and it will spit out the whole (copyrighted) lyrics 1:1 (or very close to it). Same with pages of books.
The memorization is closer to that of a fanatic fan of the author. It usually knows the beginning of the book and the more well known passages, but not entire longer works.
By now, ChatGPT is trying to refuse to output copyrighted materials know even where it could, and though it can be tricked, they appear to have implemented a hard filter for some more well known passages, which stops generation a few words in.
Have you tried just telling it to “continue”?
Somewhere in the comments to this post I posted screenshots of me trying to get lyrics for “We will rock you” from ChatGPT. It first just spat out “Verse 1: Buddy,” and ended there. So I answered with “continue”, it spat out the next line and after the second “continue” it gave me the rest of the lyrics.
Similar story with e.g. the first chapter of Harry Potter 1 and other stuff I tried. The output is often not perfect, with a few words being wrong, but it’s very clearly a “derived work” of the original. In the view of copyright law, changing a few words here is not a valid way of getting around copyrights.
But you paid for the textbook
Libraries exist
I think that these are fiction writers. The maths you’d use to design that bridge is fact and the book company merely decided how to display facts. They do not own that information, whereas the Handmaid’s Tale was the creation of Margaret Atwood and was an original work.
Plagiarism filters frequently trigger on chatgpt written books and articles.
An AI analyzes the words of a query and generates its response(s) based on word-use probabilities derived from a large corpus of copyrighted texts. This makes its output derivative of those texts in a way that someone applying knowledge learned from the texts is not.
Why, though?
Is it because we can’t explain the causal relationships between the words in the text and the human’s output or actions?
If a very good neuroscientist traced out the engineer’s brain and could prove that, actually, if it wasn’t for the comma on page 73 they wouldn’t have used exactly this kind of bolt in the bridge, now is the human’s output derivative of the text?
Any rule we make here should treat people who are animals and people who are computers the same.
And even regardless of that principle, surely a set of AI weights is either not copyrightable or else a sufficiently transformative use of almost anything that could go into it? If it decides to regurgitate what it read, that output could be infringing, same as for a human. But a mere but-for causal connection between one work and another can’t make text that would be non-infringing if written by a human suddenly infringing because it was generated automatically.
Because word-use probabilities in a text are not the same thing as the information expressed by the text.
Any rule we make here should treat people who are animals and people who are computers the same.
W-what?
I think what he meant was that we should an AI the same way we treat people - if a person making a derivative work can be copyright striked, then so should an AI making a derivative work. The same rule should apply to all creators*, regardless of whether they are an AI or not.
In the future, some people might not be human. Or some people might be mostly human, but use computers to do things like fill in for pieces of their brain that got damaged.
Some people can’t regognize faces, for example, but computers are great at that now and Apple has that thing that is Google Glass but better. But a law against doing facial recognition with a computer, and allowing it to only be done with a brain, would prevent that solution from working.
And currently there are a lot of people running around trying to legislate exactly how people’s human bodies are allowed to work inside, over those people’s objections.
I think we should write laws on the principle that anybody could be a human, or a robot, or a river, or a sentient collection of bees in a trench coat, that is 100% their own business.
But the subject under discussion is large language models that exist today.
I think we should write laws on the principle that anybody could be a human, or a robot, or a river, or a sentient collection of bees in a trench coat, that is 100% their own business.
I’m sorry, but that’s ridiculous.
I have indeed made a list of ridiculous and heretofore unobserved things somebody could be. I’m trying to gesture at a principle here.
If you can’t make your own hormones, store bought should be fine. If you are bad at writing, you should be allowed to use a computer to make you good at writing now. If you don’t have legs, you should get to roll, and people should stop expecting you to have legs. None of these differences between people, or in the ways that people choose to do things, should really be important.
Is there a word for that idea? Is it just what happens to your brain when you try to read the Office of Consensus Maintenance Analog Simulation System?
The issue under discussion is whether or not LLM companies should pay royalties on the training data, not the personhood of hypothetical future AGIs.
You have a point but there’s a pretty big difference between something like a statistics textbook and the novel “Dune” for instance. One was specifically written to teach mostly pre-existing ideas and the other was created as entertainment to sell to a wide an audience as possible.
How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?
I think that to protect creators they either need to be transparent about all content used to train the AI (highly unlikely) or have a disclaimer of liability, wherein if original content has been used is training of AI then the Original Content creator who have standing for legal action.
The only other alternative would be to insure that the AI specifically avoid copyright or trademarked content going back to a certain date.
Why a certain date? That feels arbitrary
At a certain age some media becomes public domain
Then it is no longer copywrited
They can’t. All they could prove is that their work is part of a dataset that still exists.
I’d think that given the nature of the language models and how the whole AI thing tends to work, an author can pluck a unique sentence from one of their works, ask AI to write something about that, and if AI somehow ‘magically’ writes out an entire paragraph or even chapter of the author’s original work, well tada, AI ripped them off.
Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.
you could assume that it must have had access to the original.
I don’t know if that’s true. If Google grabs that book from a pirate site. Then publishes the work as search results. ChatGPT grabs the work from Google results and cobbles it back together as the original.
Who’s at fault?
I don’t think it’s a straight forward ChatGPT can reproduce the work therefore it stole it.
Both are at fault: Google for distributing pirated material and OpenAI for using said material for financial gain.
Copyright doesn’t work like that. Say I sell you the rights to Thriller by Michael Jackson. You might not know that I don’t have the rights. But even if you bought the rights from me, whoever actually has the rights is totally in their legal right to sue you, because you never actually purchased any rights.
So if ChatGPT ripps it off Google who ripped it off a pirate site, then everyone in that chain who reproduced copyrighted works without permission from the copyright owners is liable for the damages caused by their unpermitted reproduction.
It’s literally the same as downloading something from a pirate site doesn’t make it legal, just because someone ripped it before you.
That’s a terrible example because under copyright law downloading a pirated thing isn’t actually illegal. It’s the distribution that is illegal (uploading).
Yes, downloading is illegal, and the media is still an illegally obtained copy. It’s just never prosecuted, because the damages are miniscule if you just download. They can only fine you for the amount of damages you caused by violating the copyright.
If you upload to 10k people, they can claim that everyone of them would have paid for it, so the damages are (if one copy is worth €30) ~€300k. That’s a lot of money and totally worth the lawsuit.
On the other hand, if you just download, the damages are just the value of one copy (in this case €30). That’s so miniscule, that even having a lawyer write a letter is more expensive.
But that’s totally besides the point. OpenAI didn’t just download, they replicate. Which is causing massive damages, especially to the original artists, which in many cases are now not hired any more, since ChatGPT replaces them.
Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.
This is what I got. Looks pretty 1:1 for me.
Hilarious that it started with just “Buddy”, like you’d be happy with only the first word.
Yeah, for some reason it does that a lot when I ask it for copyrighted stuff.
As if it knew it wasn’t supposed to output that.
To be fair you’d get the same result easier by just googling “we will rock you lyrics”
How is chatgpt knowing the lyrics to that song different from a website that just tells you the lyrics of the song?
Two points:
-
Google spitting out the lyrics isn’t ok from a copyright standpoint either. The reason why songwriters/singers/music companies don’t sue people who publish lyrics (even though they totally could) is because no damages. They sell music, so the lyrics being published for free doesn’t hurt their music business and it also doesn’t hurt their songwriting business. Other types of copyright infringement that musicians/music companies care about are heavily policed, also on Google.
-
Content generation AI has a different use case, and it could totally hurt both of these businesses. My test from above that got it to spit out the lyrics verbatim shows, that the AI did indeed use copyrighted works for it’s training. Now I can ask GPT to generate lyrics in the style of Queen, and it will basically perform the song texter’s job. This can easily be done on a commercial scale, replacing the very human that has written these song texts. Now take this a step further and take a voice-generating AI (of which there are many), which was similarly trained on copyrighted audio samples of Freddie Mercury. Then add to the mix a music-generating AI, also fed with works of Queen, and now you have a machine capable of generating fake Queen songs based directly on Queen’s works. You can do the very same with other types of media as well.
And this is where the real conflict comes from.
-
Or at least excerpts from it. But even then, it’s one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.
Even more so, if you consider that the LLMs are marketed to replace the authors.
Yeah which I still feel is utterly ridiculous. I love the idea of AI tools to assist with things, but as a complete replacement? No thank you.
I enjoy using things like SynthesizerV and VOCALOID because my own voice is pretty meh and my singing skills aren’t there. It’s fun to explore the voices, and learn how to use the tools. That doesn’t mean I’d like to see all singers replaced with synthesized versions. I view SynthV and the like as instruments, not much more.
I’ve used LLVMs to proofread stuff, and help me rephrase letters and such, but I’d never hire an editor to do such small tasks for me anyway. The result has always required editing anyway, because the LLVMs have a tendency to make stuff up.
Cases like that I don’t see a huge problem with. At my workplace though they’re talking about generating entire application layouts and codebases with AI and, being in charge of the AI evaluation project, the tech just isn’t there yet. You can in a sense use AI to make entire projects, but it’ll generate gnarly unmaintainable rubbish. You need a human hand in there to guide it.
Otherwise you end up with garbage websites with endlessly generated AI content, that can easily be manipulated by third party actors.
Not without some seriously invasive warrants! Ones that will never be granted for an intellectual property case.
Intellectual property is an outdated concept. It used to exist so wealthier outfits couldn’t copy your work at scale and muscle you out of an industry you were championing.
It simply does not work the way it was intended. As technology spreads, the barrier for entry into most industries wherein intellectual property is important has been all but demolished.
i.e. 50 years ago: your song that your band performed is great. I have a recording studio and am gonna steal it muahahaha.
Today: “anyone have an audio interface I can borrow so my band can record, mix, master, and release this track?”
Intellectual property ignores the fact that, idk, Issac Newton and Gottfried Wilhelm Leibniz both independently invented calculus at the same time on opposite ends of a disconnected globe. That is to say, intellectual property doesn’t exist.
Ever opened a post to make a witty comment to find someone else already made the same witty comment? Yeah. It’s like that.
Spoken by someone who has never had something you’ve worked years on, be stolen.
deleted by creator
I think you said this facetiously… but it literally is.
https://www.howtogeek.com/310158/are-other-people-allowed-to-use-my-tweets/
deleted by creator
Copyright isn’t Twitter rules…
deleted by creator
What was “stolen” from you and how?
Spoken like someone who is having trouble admitting they’re standing on the shoulders of Giants.
I don’t expect a nuanced response from you, nor will I waste time with folks who can’t be bothered to respond in any form beyond attack, nor do I expect you to watch this
Intellectual property died with the advent of the internet. It’s now just a way for the wealthy to remain wealthy.
Here is an alternative Piped link(s): https://piped.video/PJSTFzhs1O4
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source, check me out at GitHub.
Personally speaking, I’ve generated some stupid images like different cities covered in baked beans and have had crude watermarks generate with them where they were decipherable enough that I could find some of the source images used to train the ai. When it comes to photo realistic image generation, if all the ai does is mildly tweak the watermark then it’s not too hard to trace back.
All but a very small few generative AI programs use completely destructive methods to create their models. There is no way to recover the training images outside of infantesimally small random chance.
What you are seeing is the AI recognising that images of the sort you are asking for generally include watermarks, and creating one of its own.
Do you have examples? It should only happen in case of overfitting, i.e. too many identical image for the same subject
Here’s one I generated and an image from the photographer. Prompt was Charleston SC covered in baked beans lol
Out of curiosity what model did you use?
there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.
This is a good debate about copyright/ownership. On one hand, yes, the authors works went into ‘training’ the AI…but we would need a scale to then grade how well a source piece is good at being absorbed by the AI’s learning. for example. did the AI learn more from the MAD magazine i just fed it or did it learn more from Moby Dick? who gets to determine that grading system. Sadly musicians know this struggle. there are just so many notes and so many words. eventually overlap and similiarities occur. but did that musician steal a riff or did both musicians come to a similar riff seperately? Authors dont own words or letters so a computer that just copies those words and then uses an algo to write up something else is no more different than you or i being influenced by our favorite heroes or i formation we have been given. do i pay the author for reading his book? or do i just pay the store to buy it?
Copyright laws are really out of control at this point. Their periods are far too long and, like you said, how can anyone claim to truly be original at this point? A dedicated lawyer can find reasonable prior art for pretty much anything nowadays. The only reason old sources look original is because no records exist of the sources they used.
Obligatory xkcd: https://xkcd.com/827/
So what’s the difference between a person reading their books and using the information within to write something and an ai doing it?
A person is human and capable of artistry and creativity, computers aren’t. Even questioning this just means dehumanizing artists and art in general.
Not being allowed to question things is a really shitty precedent, don’t you think?
Do you think a hammer and a nail could do anything on their own, without a hand picking them up guiding them? Because that’s what a computer is. Nothing wrong with using a computer to paint or write or record songs or create something, but it has to be YOU creating it, using the machine as a tool. It’s also in the actual definition of the word: art is made by humans. Which explicitly excludes machines. Period. Like I’m fine with AI when it SUPPORTS an artist (although sometimes it’s an obstacle because sometimes I don’t want to be autocorrected, I want the thing I write to be written exactly as I wrote it, for whatever reason). But REPLACING an artist? Fuck no. There is no excuse for making a machine do the work and then to take the credit just to make a quick easy buck on the backs of actual artists who were used WITHOUT THEIR CONSENT to train a THING to replace them. Nah fuck off my guy. I can clearly see you never did anything creative in your whole life, otherwise you’d get it.
Nah fuck off my guy. I can clearly see you never did anything creative in your whole life, otherwise you’d get it.
Oh, right. So I guess my 20+ year Graphic Design career doesn’t fit YOUR idea of creative. You sure have a narrow life view. I don’t like AI art at all. I think it’s a bad idea. you’re a bit too worked up about this to try to discuss anything. Not to excited about getting told to fuck off about an opinion. This place is no better than reddit ever was.
Of course I’m worked up. I love art, I love doing art, i have multiple friends and family members who work with art, and art is the last genuine thing that’s left in this economy. So yeah, obviously I’m angry at people who don’t get it and celebrate this bullshit just because they are too lazy to pick up a pencil, get good and draw their own shit, or alternatively commission what they wanna see from a real artist. Art was already PERFECT as it was, I have a right to be angry that tech bros are trying to completely ruin it after turning their nose up at art all their lives. They don’t care about why art is good? Ok cool, they can keep doing their graphs and shit and just leave art alone.
Because AIs aren’t inspired by anything and they don’t learn anything
Language models actually do learn things in the sense that: the information encoded in the training model isn’t usually* taken directly from the training data; instead, it’s information that describes the training data, but is new. That’s why it can generate text that’s never appeared in the data.
- the bigger models seem to remember some of the data and can reproduce it verbatim; but that’s not really the goal.
So uninspired writing is illegal?
No but a lazy copy of someone else’s work might be copyright infringement.
So when does Kevin Costner get to sue James Cameron for his lazy copy of Dances With Wolves?
Avatar is not Dances with Wolves. It’s Ferngully.
Idk, maybe. There are thousands of copyright infringement lawsuits, sometimes they win.
I don’t necessarily agree with how copyright law works, but that’s a different question. Doesn’t change the fact that sometimes you can successfully sue for copyright infringement if someone copies your stuff to make something new.
Why not? Hollywood is full to the brim with people suing for copyright infringement. And sometimes they win. Why should it be different for AI companies?
What does inspiration have to do with anything? And to be honest, humans being inspired has led to far more blatant copyright infringement.
As for learning, they do learn. No different than us, except we learn silly abstractions to make sense of things while AI learns from trial and error. Ask any artist if they’ve ever looked at someone else’s work to figure out how to draw something, even if they’re not explicitly looking up a picture, if they’ve ever seen a depiction of it, they recall and use that. Why is it wrong if an AI does the same?
Large language models can only calculate the probability that words should go together based on existing texts.
Isn’t this correct? What’s missing?
Let’s ask chatGPT3.5:
Mostly accurate. Large language models like me can generate text based on patterns learned from existing texts, but we don’t “calculate probabilities” in the traditional sense. Instead, we use statistical methods to predict the likelihood of certain word sequences based on the training data.
“Mostly accurate” is pretty good for an anonymous internet post.
I don’t see how “calculate the probability” and “predict the likelihood” are different. Seems perfectly accurate to me.
I thought so too so I’m still confused about the votes. Oh well
the person bought the book before reading it
not if i checked it out from a library. a WORLD of knowledge at your fingertips and it’s all free to me, the consumer. So who’s to say the people training the ai didn’t check it out from a library, or even buy the books they are using to train the ai with? would you feel better about it had they purchased their copy?
Yea sure, right after Google and Amazon pay me for all the data they’ve stolen from me. LOL
Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.
Isn’t learning the basic act of reading text?
not even close. that’s not how AI training models work, either.
if your position is that only humans can learn and adapt text
nope-- their demands are right at the top of the article and in the summary for this post:
Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools
that broadly rules out any AI ever
only if the companies training AI refuse to pay
Isn’t learning the basic act of reading text?
not even close. that’s not how AI training models work, either.
Of course it is. It’s not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.
Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools
Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.
What we’re broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That’s not materially different from how anyone learns to write. Even my use of the word “materially” in the last sentence is, surely, based on seeing it used in similar patterns of text.
The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.
There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models, and maybe Congress should remedy that, but on its face I don’t think it’s feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.
Of course it is. It’s not a 1:1 comparison
no, it really isn’t–it’s not a 1000:1 comparison. AI generative models are advanced relational algorithms and databases. they don’t work at all the way the human mind does.
but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.
no, the results are just designed to be familiar because they’re designed by humans, for humans to be that way, and none of this has anything to do with this discussion.
Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.
nobody is saying it should be individually-licensed. these companies can get bulk license access to entire libraries from publishers.
That’s not materially different from how anyone learns to write.
yes it is. you’re just framing it in those terms because you don’t understand the cognitive processes behind human learning. but if you want to make a meta comparison between the cognitive processes behind human learning and the training processes behind AI generative models, please start by citing your sources.
The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.
this is not the difference between humans and AI learning, this is the difference between human and computer lifespans.
There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models
no, it’s a case of your lack of imagination and understanding of the subject matter
and maybe Congress should remedy that
yes
but on its face I don’t think it’s feasible to just shut it all down.
nobody is suggesting that
Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.
lmao
You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.
First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That’s irrelevant. There’s nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn’t. Either it’s derivative of it isn’t.
What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That’s true regardless of algorithm and there’s certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.
Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy’s estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy’s writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.
So it’s really, really hard to make the case that there’s any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.
The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn’t be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).
You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.
you’re accusing me of what you are clearly doing after I’ve explained twice how you’re doing that. I’m not going to waste my time doing it again. except:
Where copyright comes into play is in whether the new work produced is derivative or transformative.
except that the contention isn’t necessarily over what work is being produced (although whether it’s derivative work is still a matter for a court to decide anyway), it’s regarding that the source material is used for training without compensation.
The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it.
and, likewise, so are these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.
these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.
That wouldn’t be copyright infringement.
It isn’t infringement to use a copyrighted work for whatever purpose you please. What’s infringement is reproducing it.
It’s infringement to use copyrighted material for commercial purposes.
It isn’t infringement to use a copyrighted work for whatever purpose you please.
and you accused me of “completely misunderstanding copyright law” lmao wow
Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?
Even if we say we are going to pay out a measly dollar for every work it looks over, you’re immediately talking millions of dollars in operating costs. Doesn’t this just box out anyone who can’t afford to spend tens or even hundreds of millions of dollars on AI development? Maybe good if you’ve always wanted big companies like Google and Microsoft to be the only ones able to develop these world-altering tools.
Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?
Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?
Congress has been here before. In the early days of radio, DJs were infringing on recording copyrights by playing music on the air. Congress knew it wasn’t feasible to require every song be explicitly licensed for radio reproduction, so they created a compulsory license system where creators are required to license their songs for radio distribution. They do get paid for each play, but at a rate set by the government, not negotiated directly.
Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?
I’d say no one. Just like Taylor Swift gets the same payment as your garage band per play, a compulsory licensing model doesn’t care who you are.
Doesn’t this just box out anyone who can’t afford to spend tens or even hundreds of millions of dollars on Al development?
The government could allow the donation of original art for the purpose of tech research to be a tax write-off, and then there can be non-profits that work between artists and tech developers to collect all the legally obtained art, and grant access to those that need it for projects
That’s just one option off the top of my head, which I’m sure would have some procedural obstacles, and chances for problems to be baked in, but I’m sure there are other options as well.
i admit it’s a hug issue, but the licensing costs are something that can be negotiated by the license holders in a structured settlement.
moving forward, AI companies can negotiate licensing deals for access to licensed works for AI training, and authors of published works can decide whether they want to make their works available to AI training (and their compensation rates) in future publishing contracts.
the solutions are simple-- the AI companies like OpenAI, Google, et al are just complaining because they don’t want to fork over money to the copyright holders they ripped off and set a precedent that what their doing is wrong (legally or otherwise).
Sure, but what I’m asking is: what do you think is a reasonable rate?
We are talking data sets that have millions of written works in them. If it costs hundreds or thousands per work, this venture almost doesn’t make sense anymore. If its $1 per work, or cents per work, then is it even worth it for each individual contributor to get $1 when it adds millions in operating costs?
In my opinion, this needs to be handled a lot more carefully than what is being proposed. We are potentially going to make AI datasets wayyyy too expensive for anyone to use aside from the largest companies in the market, and even then this will cause huge delays to that progress.
If AI is just blatantly copy and pasting what it read, then yes, I see that as a huge issue. But reading and learning from what it reads, no matter how rudimentary that “learning” may be, is much different than just copying works.
that’s not for me to decide. as I said, it is for either the courts to decide or for the content owners and the AI companies to negotiate a settlement (for prior infringements) and a negotiated contracted amount moving forward.
also, I agree that’s it’s a massive clusterfuck that these companies just purloined a fuckton of copyrighted material for profit without paying for it, but I’m glad that they’re finally being called out.
Dude, they said
If AI is just blatantly copy and pasting what it read, then yes, I see that as a huge issue.
That’s in no way agreeing “that’s it’s a massive clusterfuck that these companies just purloined a fuckton of copyrighted material for profit without paying for it”. Do you not understand that AI is not just copy and pasting content?
removed by mod
AI isn’t doing anything creative. These tools are merely ways to deliver the information you put into it in a way that’s more natural and dynamic. There is no creation happening. The consequence is that you either pay for use of content, or you’ve basically diminished the value of creating content and potentiated plagiarism at a gargantuan level.
Being that this “AI” doesn’t actually have the capacity for creativity, if actual creativity becomes worthless, there will be a whole lot less incentive to create.
The “utility” of it right now is being created by effectively stealing other people’s work. Hence, the court cases.
Please first define “creativity” without artificially restricting it to humans. Then, please explain how AI isn’t doing anything creative.
deleted by creator
Sure, AI is not doing anything creative, but neither is my pen, its the tool im using to be creative. Lets think about this more with some scenarios:
Lets say software developer “A” comes along, and they’re pretty fucking smart. They sit down, read through all of Mark Twains novels, and over the course of the next 5 years, create a piece of software that generates works in Twain’s style. Its so good that people begin using it to write real books. It doesn’t copy anything specifically from Twain, it just mimics his writing style.
We also have developer “B”. While Dev A is working on his project, Dev B is working on a very similar project, but with one difference: Dev B writes an LLM to read the books for him, and develop a writing style similar to Twain’s based off of that. The final product is more or less the same as Dev A’s product, but he saves himself the time of needing to read through every work on his own, he just reads a couple to get an idea of what the output might look like.
Is the work from Dev A’s software legitimate? Why or why not?
Is the work from Dev B’s software legitimate? Why or why not?
Assume both of these developers own copies of the works they used as training data, what is honestly the difference here? This is what I am struggling with so much.
Both developers have created a parrot tool. A utility to plagiarise a style.
So now the output of both programs is “illegimate” in your eyes, despite one of them never even getting direct access to the original text.
Now lets say one of them just writes a story in the style of Twain, still plagiarism? Because I don’t know if you can copyright a style.
The first painter painted on cave walls with his fingers. Was the brush a parrot tool? A utility to plagiarize? You could use it for plagiarism, yes, and by your logic, it shouldn’t be used. And any work created using it is not “legitimate”.
Why is any of that the author’s problem
A key point is that intellectual property law was written to balance the limitations of human memory and intelligence, public interest, and economic incentives. It’s certainly never been in perfect balance. But the possibility of a machine being able to consume enormous amounts of information in a very short period of time has never been a variable for legislators. It throws the balance off completely in another direction.
There’s no good way to resolve this without amending both our common understanding of how intellectual property should work and serve both producers and consumers fairly, as well as our legal framework. The current laws are simply not fit for purpose in this domain.
Nothing about todays iteration of copyright is reasonable or good for us. And in any other context, this (relatively) leftist forum would clamour to hate on copyright. But since it could now hurt a big corporation, suddenly copyright is totally cool and awesome.
(for reference, the true problem here is, as always, capitalism)
I very much agree.