I’m rather curious to see how the EU’s privacy laws are going to handle this.
(Original article is from Fortune, but Yahoo Finance doesn’t have a paywall)
I’m not an AI expert, and I wouldn’t say it is too hard, but I believe removing a specific piece of data from a model is like trying to remove excess salt from a stew. You can add things to make the stew less salty but you can’t really remove the salt.
The alternative, which is a lot of effort but boo-hoo for big tech, is to throw out the model and start over without the data in question. These companies would do well to start with models built on public or royalty free data and then add more risky data on top of that (so you only have to rebake starting from the “public” version).
sounds like big tech shouldn’t have spent the last decade investing in a kitchen refit so that they could make stew really well but nothing else
Replace salt with poison or an allergenic substance and if fully holds. If a batch has been contaminated, then yes, you should try again.
But now that the cat is out of the bag, other companies are less willing to let something be scrap able due to how valuable it can be.
I think big tech knew this, that they can only build these models on unfiltered data before the AI craze.
If there’s something illegal in your dish, you throw it out. It’s not a question. I don’t care that you spent a lot of time and money on it. “I spent a lot of time preparing the circumstances leading to this crime” is not an excuse, neither is “if I have to face consequences for committing this crime, I might lose money”.
Perhaps long pig stew could serve as an apt comparison, lol
Fuck no.
It’s illegal to be gay in many places, should we throw out any AI that isn’t homophobic as shit?
It will probably be way shittier without all the private data they put in the first time too.
I work in this field a good bit, and you’re largely correct. That’s a great analogy of trying to remove salt from a stew. The only issue with that analogy is that that’s technically possible still by distilling the stew and recovering the salt. Even though it would destroy the stew.
At the point that pii data is in the model, it’s fully baked. It’d be like trying to get the eggs out of a baked cake. The chemical composition has changed into something else completely.
That’s how building a model works today. Like baking a cake.
I’m order to remove or even identify pii data in ML models or LLMs today, we’d need a whole new way of baking a cake that would keep the eggs separate from the cake until just before you tried to take a bite out of it. The tools today don’t allow you to do anything like that. They bake you a complete cake.