Why Does AI Sound So Confident When It's Wrong?

TL;DR: A language model generates text by predicting the next most-plausible word, over and over. It’s optimizing for sounds right, not is right — so a true answer and a made-up one are produced the exact same way, in the exact same confident tone. There’s no separate step where it checks whether what it’s saying is true, and (by default) no “I don’t know” setting. So the confidence you hear tells you nothing about whether it’s correct. For now: fluent does not mean true.

The scary thing about AI getting something wrong is that the tone stays identical to when it’s right. No hesitation, no hedging, no tell. You read a confident, well-organized paragraph, it sounds completely reasonable, and then you check it and the whole thing was made up.

I find this genuinely interesting, so I spent a while trying to understand why it happens. Here’s my current take. It’s the mental model that made it click for me.

What’s actually going on: it predicts the next word, not the truth

The core thing to know is that the model is predicting text, one word at a time.

Next-token prediction means: look at the words so far, and guess the most plausible next word. That’s it. Then it looks at the now-slightly-longer text and guesses the next word again, one piece at a time, until a whole answer exists.

Think of your phone’s autocomplete. Type “I’m running a bit” and it offers “late.” It doesn’t know your schedule; it knows that across billions of sentences, “late” is what usually follows. A language model is that idea scaled up enormously and made much better at it, but it’s the same move underneath.

The important part: what it’s optimizing for is plausibility. Does this read like something a person would write? Accuracy is a separate question. Those two usually line up, because true statements are common in its training data. But when they come apart, it’ll happily pick the fluent-sounding option and hand you a sentence that flows perfectly and is also wrong. It isn’t lying to you. It just has no step where it stops to check whether what it’s saying is true before it says it — nothing wired up to make it hold back when it’s unsure. (Side note: that same one-word-at-a-time process is also why it hands you a different answer each time you re-ask — a separate thing from being wrong, which I get into in Why Does AI Give a Different Answer Every Time You Ask?.)

So why does it sound so sure of itself?

Because it learned to talk from confident writing, and confidence is just another pattern it copies.

Almost everything it trained on — articles, textbooks, documentation, answers — is written in a fairly assertive voice. People state things. So the model picked up that register along with everything else. Its default output sounds self-assured because the text it’s imitating sounds self-assured. (The fine-tuning it gets afterward, where humans rate its answers, tends to push the same way: direct, helpful-sounding replies score better.)

And here’s the catch: it has no separate dial for “actually, I’m not sure about this one.” When a person doesn’t know, they slow down, they hedge, they say “I think?” The model doesn’t, by default. Whether it’s repeating a rock-solid fact or inventing something on the spot, the output comes out equally smooth and equally certain. To it, “I know this” and “I’m guessing” look almost the same on the way out.

Two identical AI answer cards with identical full confidence bars, one tagged TRUE and one tagged MADE UP

Same confident tone — one’s true, one’s invented. The tone won’t tell you which.

Why doesn’t it just say “I don’t know”?

This is the part I found most interesting, and it turns out there’s a real answer beyond the mechanism: it was effectively trained to guess rather than abstain.

OpenAI researchers made this argument in a 2025 paper, Why Language Models Hallucinate. Their point, roughly: the standard ways we train and evaluate these models reward guessing over admitting uncertainty. On a typical benchmark, a confident guess that happens to be right earns points, while “I don’t know” earns nothing — same as a wrong answer. It’s like a multiple-choice exam where blanks score zero: if you’re unsure, guessing is the better strategy. Do that across enough training, and the model learns the same lesson a test-taking student does — always put something down.

So the “always answer, never abstain” behavior looks more like a habit we accidentally trained in by grading the wrong thing. (I’ve written before about how our benchmarks can end up rewarding the wrong thing — this is a pretty clean example of it.) The encouraging flip side, which the same paper makes, is that this is fixable: change the scoring to give credit for a well-placed “I don’t know,” and you’d expect less confident nonsense. Some newer models are already being nudged this way.

I don’t want to oversell it. The same paper is honest that you won’t get this to zero — some baseline error rate is baked in, clean data or not.

When is AI most likely to make things up?

It confabulates most when it has the least to go on — obscure, very recent, or hyper-specific things.

Some niche tool’s exact flag, what happened last week, what a particular book says on a particular page — the model’s “data” on these is thin. But it still can’t not answer (see the section above), so the next-word machine runs anyway and produces a complete, fluent-looking response by filling the gaps with whatever fits the pattern. The less it actually has, the more it’s improvising.

A decent rule of thumb: the more obscure, specific, or precision-dependent your question, the higher you should turn your skepticism. The wrong answers it gives in those spots tend to be the most polished ones.

How do you actually work with something like this?

You don’t have to distrust everything — you just separate two things in your head: fluent and correct.

Fluent and helpful is great for drafts, brainstorming, rephrasing, getting unstuck. I take all of that at face value. But anything load-bearing (a name, a number, a date, a claim I’m about to repeat or act on), I check. This is basically the same instinct as “no evidence, no completion”: a confident-sounding output isn’t proof of anything until you’ve seen the evidence. It’s also the habit I lean on hardest in my day-to-day setup across ChatGPT, Claude, and Gemini.

The mental model that works for me is treating it like a fast, widely-read, very articulate friend who occasionally bluffs with a completely straight face. You’ll listen to them, you’ll get a lot of value from them, and on the stuff that matters you’ll quietly double-check.

One caveat: this is a snapshot

I should be honest that all of the above is simplified, and there’s plenty I’ve left out (and don’t fully understand). It’s also a moving target. People are actively working on getting models to express calibrated uncertainty, to say “I’m not sure,” to cite and verify before answering. It’s plausible that in a couple of years “AI bluffs with total confidence” stops being such a reliable complaint.

But at least for now, the assured tone is worth treating as decoration. It comes free with every answer, so it doesn’t really weigh on either side.

想看中文版的話：為什麼 AI 唬爛的時候，口氣跟講真話一模一樣？

What’s actually going on: it predicts the next word, not the truth#

So why does it sound so sure of itself?#

Why doesn’t it just say “I don’t know”?#

When is AI most likely to make things up?#

How do you actually work with something like this?#

One caveat: this is a snapshot#

More in this thread

What Makes an AI Skill Different from a Prompt?

How Many Tokens Is Your Prompt Actually Using?

Does Saying 'Thank You' to ChatGPT Actually Cost Anything?

JSON formatter: format, validate, and debug JSON