Sad to see you leave (not really, tho’), love to watch you go!
Edit: I bet if any AI developing company would stop acting and being so damned shady and would just ASK FOR PERMISSION, they’d receive a huge amount of data from all over. There are a lot of people who would like to see AGI become a real thing, but not if it’s being developed by greedy and unscrupulous shitheads. As it stands now, I think the only ones who are actually doing it for the R&D and not as eye-candy to glitz away people’s money for aesthetically believable nonsense are a handful of start-up-likes with (not in a condescending way) kids who’ve yet to have their dreams and idealism trampled.
But what data would it be?
Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE
You’ve highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.
Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn’t what irks most people, it’s calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.
Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.
Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!
In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.
It was a public project not a private company.
I have conflicting feelings about this whole thing. If you are selling the result of training like OpenAI does (and every other company), then I feel like it’s absolutely and clearly not fair use. It’s just theft with extra steps.
On the other hand, what about open source projects and individuals who aren’t selling or competing with the owners of the training material? I feel like that would be fair use.
What keeps me up at night is if training is never fair use, then the natural result is that AI becomes monopolized by big companies with deep pockets who can pay for an infinite amount of random content licensing, and then we are all forever at their mercy for this entire branch of technology.
The practical, socioeconomic, and ethical considerations are really complex, but all I ever see discussed are these hard-line binary stances that would only have awful corporate-empowering consequences, either because they can steal content freely or because they are the only ones that will have the resources to control the technology.
Oh no! How will I generate a picture of Sam Altman blowing himself now!?
Wdym? He removed his rib or something?
I was thinking more of a Sam 1 and Sam 2 type situation.
Photoshop, just like the rest of us.
If AI gets to use copyrighted material for free and makes a profit off of the results, that means piracy is 1000% Legal. Excuse me while I go and download a car!!
No, stop! You wouldn’t!
I would, and a house. I’m a menace!
DAMMIT ALL TO HELL!
…This must be DEI’s fault.
Thank a lot Obama
I also downloaded Obama as well. Now I’m a Super Menace.
I can tell you it’s not. I downloaded all the DEI documents and read them, sitting in my new house. :)
All you have to do is present credible evidence that these companies are distributing copyrighted works or a direct substitute for those copyrighted works. They have filters to specifically exclude matches though, so it doesn’t really happen.
That’s a good litmus test. If asking/paying artists to train your AI destroys your business model, maybe you’re the arsehole. ;)
This particular vein of “pro-copyright” thought continuously baffles me. Copyright has not, was not intended to, and does not currently, pay artists.
Its totally valid to hate these AI companies. But its absolutely just industry propaganda to think that copyright was protecting your data on your behalf
Copyright has not, was not intended to, and does not currently, pay artists.
You are correct, copyright is ownership, not income. I own the copyright for all my work (but not work for hire) and what I do with it is my discretion.
What is income, is the content I sell for the price acceptable to the buyer. Copyright (as originally conceived) is my protection so someone doesn’t take my work and use it to undermine my skillset. One of the reasons why penalties for copyright infringement don’t need actual damages and why Facebook (and other AI companies) are starting to sweat bullets and hire lawyers.
That said, as a creative who relied on artistic income and pays other creatives appropriately, modern copyright law is far, far overreaching and in need of major overhaul. Gatekeeping was never the intent of early copyright and can fuck right off; if I paid for it, they don’t get to say no.
Gatekeeping absolutely was the intention of copyright, not to provide artists with income.
By gatekeeping I mean the use of digital methods to verify or restrict use of purchased copyright material after a sale such as Digital rights management, encryption such as CSS/AACS/HDCP, or obfuscation.
The whole “you didn’t buy a copy, you bought a license” BS undermines what copyright was supposed to be IMO.
Copyright does not give the holder control over every “use”, especially something as vague as “using it to undermine their skillset”.
Copyright gives the rights holder a limited monopoly on three activities: to make and sell copies of their works, to create derivative works, and to perform or display their works publicly.
Not all uses involve making a copy, derivative, or performance.
Bingo. I was being more general in my response, but that is the more technical way of putting it.
modern copyright law is far, far overreaching and in need of major overhaul.
https://rufuspollock.com/papers/optimal_copyright_term.pdf
This research paper from Rufus Pollock in 2009 suggests that the optimal timeframe for copyright is 15 years. I’ve been referencing this for, well, 16 years now, a year longer than the optimum copyright range. If I recall correctly I first saw this referenced by Mike Masnick of techdirt.
Copyright has not, was not intended to, and does not currently, pay artists.
Wrong in all points.
Copyright has paid artists (though maybe not enough). Copyright was intended to do that (though maybe not that alone). Copyright does currently pay artists (maybe not in your country, I don’t know that).
Wrong in all points.
No, actually, I’m not at all. In-fact, I’m totally right:
https://www.youtube.com/watch?v=mhBpI13dxkI
Copyright originated create a monopoly to protect printers, not artists, to create a monopoly around a means of distribution.
How many artists do you know? You must know a few. How many of them have received any income through copyright. I dare you, to in good faith, try and identify even one individual you personally know, engaged in creative work, who makes any meaningful amount of money through copyright.
I know quite a few people who rely on royalties for a good chunk of their income. That includes musicians, visual artists and film workers.
Saying it doesn’t exist seems very ignorant.
Cool. What artists?
Any experienced union film director, editor, DOP, writer, sound designer comes to mind (at least where I’m from)
Cool. Name one. A specific one that we can directly reference, where they themselves can make that claim. Not a secondary source, but a primary one. And specifically, not the production companies either, keeping in mind that the argument that I’m making is that copyright law, was intended to protect those who control the means of production and the production system itself. Not the artists.
The artists I know, and I know several. They make their money the way almost all people make money, by contracting for their time and services, or through selling tickets and merchandise, and through patreon subscriptions: in other words, the way artists and creatives have always made their money. The “product” in the sense of their music or art being a product, is given away practically for free. In fact, actually for free in the case of the most successful artists I know personally. If they didn’t give this “product” of their creativity away for free, they would not be able to survive.
There is practically 0 revenue through copyright. Production companies like Universal make money through copyright. Copyright was also built, and historically based intended for, and is currently used for, the protection of production systems: not artists.
You forgot to link a legitimate source.
A lecture from a professional free software developer and activist whose focus is the legal history and relevance of copyright isn’t a legitimate source? His website: https://questioncopyright.org/promise/index.html
The anti-intelectualism of the modern era baffles me.
Also, he’s on the fediverse!
@kfogel@kfogel.org
YouTube is not a legitimate source. The prof is fine but video only links are for the semi literate. It is frankly rude to post a minor comment and expect people to endure a video when a decent reader can absorb the main points from text in 20 seconds.
removed by mod
I know several artists living off of selling their copyrighted work, and no one in the history of the Internet has ever watched a 55 minute YouTube video someone linked to support their argument.
Cool. What artist?
Edit because I didn’t read the second half of your comment. If you are too up-your-own ass and anti-intellectual to educate yourself on this matter, maybe just don’t have an opinion.
Not only that, but their business model doesn’t hold up if they were required to provide their model weights for free because the material that went into it was “free”.
There’s also an argument that if the business was that reliant on free things to start with, then it shouldn’t be a business.
No-one would bat their eyes if the CEO of a real estate company was sobbing that it’s the end of the rental market, because the company is no longer allowed to get houses for free.
Businesses relying on free things. Logging, mining, ranching, and oil come to mind. Extracting free resources of the land belonging to the public, destroying those public lands and selling those resources back to the public at an exorbitant markup.
You misspelled capitalism.
Unregulated capitalism. That’s why people in dominant market positions want less regulation.
Entrenched companies often want more regulation to prevent startup competition. Pulling the ladder up behind them.
To be fair, they want more regulation n others, not on them. Specially if they’re doing shady things.
Extracting free resources of the land
Not to be contrarian, but there is a cost to extract those “free” resources; like labor, equipment, transportation, lobbying (AKA: bribes for the non-Americans), processing raw material into something useful, research and development, et cetera.
While true, they tend not to bare the costs of the environmental damage, at least when these activities are poorly regulated.
Was about to post the same thing
deleted by creator
If basic economics get you upset, then alright.
Bye o/
Agribusiness in shambles after draining the water table (it is still free)
The entire internet is built on free things.
Just saying.
Doesn’t mean that businesses should allowed to be.
even the top phds can learn things off the amount of books that openai could easily purchase, assuming they can convince a judge that if the works aren’t pirated the “learning” is fair use. however, they’re all pirating and then regurgitating the works which wouldn’t really be legal even if a human did it.
also, they can’t really say how they need fair use and open standards and shit and in the next breathe be begging trump to ban chinese models. the cool thing about allowing china to have global influence is that they will start to respect IP more… or the US can just copy their shit until they do.
imo that would have been the play against tik tok etc. just straight up we will not protect the IP of your company (as in technical IP not logo, etc.) until you do the same. even if it never happens, we could at least have a direct tik tok knock off and it could “compete” for american eyes rather than some blanket ban bullshit.
Interesting copyright question: if I own a copy of a book, can I feed it to a local AI installation for personal use?
Can a library train a local AI installation on everything it has and then allow use of that on their library computers? <— this one could breathe new life into libraries
First off, I’m by far no lawyer, but it was covered in a couple classes.
According to law as I know it, question 1 yes if there is no encryption, and question 2 no.
In reality, if you keep it for personal use, artists don’t care. A library however, isn’t personal use and they have to jump through more hoops than a circus especially when it comes to digital media.
But you raise a great point! I’d love to see a law library train AI for in-house use and test the system!
I wonder if there’s some validity to what OpenAI is saying though (but I certainly don’t completely agree with them).
If the US makes it too costly to train AI models, then maybe China will relax any copyright laws so that Chinese AI models can be trained quickly and cheaply. This might result in China developing better AI models than the US.
Maybe the US should require AI companies to pay a large chunk of their profits to copyright holders. So copyright holders would be compensated, but an AI company would only have to pay if they generate profits.
Maybe someone more knowledgeable in this field will tell me I’m totally wrong.
No, it means that copyrights should not exist in the first place.
These fuckers are the first one to send tons of lawyers whenever you republish or use any IP of them. Fuck these idiots.
Good. I hope this is what happens.
- LLM algorithms can be maintained and sold to corpos to scrape their own data so they can use them for in house tools, or re-sell them to their own clients.
- Open Source LLMs can be made available for end users to do the same with their own data, or scrape whats available in the public domain for whatever they want so long as they don’t re-sell
- Altman can go fuck himself
So what Altman is saying here is that without the low hanging fruit of human generated training data, the AI race is over.
He’s either full of shit or this AI bubble is about to burst.
looks good
The only way this would be ok is if openai was actually open. make the entire damn thing free and open source, and most of the complaints will go away.
Good. Fuck AI
Corporations trying to profit by closing off vast tracts of human output are bumping into other corporations trying to mine it for profit.
TLDR: “we should be able to steal other people’s work, or we’ll go crying to daddy Trump. But DeepSeek shouldn’t be able to steal from the stuff we stole, because China and open source”