Can we discuss how it’s possible that the paid model (gpt4) got worse and the free one (gpt3.5) got better? Is it because the free one is being trained on a larger pool of users or what?
You must log in or register to comment.
It’s because the research in question used a really small and unrepresentative dataset. I want to see these findings reproduced on a proper task collection.
True, checking whether a number is prime is very limited in scope for chargpt, but this is in line with other reports of progressive dumbing down.