You must log in or register to comment.
Exactly. The difference between a cached response and a live one even for non-AI queries is an OOM difference.
At this point, a lot of people just care about the ‘feel’ of anti-AI articles even if the substance is BS though.
And then people just feed whatever gets clicks and shares.
Googles tpu can’t handle llm’s lol. What do you mean “exactly”?
In fact, Gemini was trained on, and is served, using TPUs.
Google said its TPUs allow Gemini to run “significantly faster” than earlier, less-capable models.
Did you think Google’s only TPUs are the ones in the Pixel phones, and didn’t know that they have server TPUs?