

It’s a little deeper than that, a lot of advertising works on engagement -based heuristics. Today, most people would call it “AI” but it’s fundamentally just a reinforcement learning network that trains itself constantly on user interactions. It’s difficult-to-impossible to determine why input X is associated with output Y, but we can measure in aggregate how subtle changes propagate across engagement metrics.
It is absolutely truthful to say we don’t know how a modern reinforcement learning network got to the state it’s in today, because transactions on the network usually aren’t journaled, just periodically snapshot for A/B testing.
To be clear, that’s not an excuse for undesirable heuristic behavior. Somebody somewhere made the choice to do this, and they should be liable for the output of their code.
“Open source” in ML is a really bad description for what it is. “Free binary with a bit of metadata” would be more accurate. The code used to create deepseek is not open source, nor is the training datasets. 99% of “open source” models are this way. The only interesting part of the open sourcing is the architecture used to run the models, as it lends a lot of insight into the training process, and allows for derivatives via post-training