[Not loaded yet]
The LLM doesn’t know how to say ‘I don’t know’ because the makers can’t have that. What’s the point? The model doesn’t care a bit about accuracy or human safety. It is just playing a crossword game or a better metaphor.
It's not that "the makers can't have that", it's that it's very statistically uncommon for people to write down, in the same place, <some question> followed by <I don't know>.
It happens, but *usually* if we don't know something, we don't write that. We write answers to things we think we know.
May 10, 2025 06:10So models trained on all the things we've written, are only seeing the output of when we thought we had answers. Every time we *don't* have an answer, we go back to the drawing board until we do, then write *that*.
They're not being trained on the intermediate failures, until very recently.
(and even those are mostly simulated)