Hallucination on KbWen Blog

Why Does AI Sound So Confident When It's Wrong?

KbWen — Tue, 02 Jun 2026 15:45:00 +0800

TL;DR: A language model generates text by predicting the next most-plausible word, over and over. It’s optimizing for sounds right, not is right — so a true answer and a made-up one are produced the exact same way, in the exact same confident tone. There’s no separate step where it checks whether what it’s saying is true, and (by default) no “I don’t know” setting. So the confidence you hear tells you nothing about whether it’s correct. This is a simplified picture and the models are improving, but for now: fluent does not mean true.

The scary thing about AI getting something wrong isn’t the error itself. It’s that the tone is identical to when it’s right. No hesitation, no hedging, no tell. You read a confident, well-organized paragraph, it sounds completely reasonable, and then you check it and the whole thing was made up.

I find this genuinely interesting, so I spent a while trying to understand why it happens. Here’s my current take. It’s simplified, probably not perfectly accurate, but it’s the mental model that made it click for me.

What’s actually going on: it predicts the next word, not the truth

The core thing to know is that the model isn’t deciding whether something is true. It’s predicting text.

Next-token prediction means: look at the words so far, and guess the most plausible next word. That’s it. Then it looks at the now-slightly-longer text and guesses the next word again, one piece at a time, until a whole answer exists.

Think of your phone’s autocomplete. Type “I’m running a bit” and it offers “late.” It doesn’t know your schedule; it knows that across billions of sentences, “late” is what usually follows. A language model is that idea scaled up enormously and made much better at it, but it’s the same move underneath.

The important part: what it’s optimizing for is plausibility. Does this read like something a person would write? Not accuracy. Those two usually line up, because true statements are common in its training data. But when they come apart, it’ll happily pick the fluent-sounding option and hand you a sentence that flows perfectly and is also wrong. It isn’t lying to you. It just has no step where it stops to check whether what it’s saying is true before it says it — nothing wired up to make it hold back when it’s unsure.

So why does it sound so sure of itself?

Because it learned to talk from confident writing, and confidence is just another pattern it copies.

Almost everything it trained on — articles, textbooks, documentation, answers — is written in a fairly assertive voice. People state things. So the model picked up that register along with everything else. Its default output sounds self-assured because the text it’s imitating sounds self-assured. (The fine-tuning it gets afterward, where humans rate its answers, tends to push the same way: direct, helpful-sounding replies score better.)

And here’s the catch: it has no separate dial for “actually, I’m not sure about this one.” When a person doesn’t know, they slow down, they hedge, they say “I think?” The model doesn’t, by default. Whether it’s repeating a rock-solid fact or inventing something on the spot, the output comes out equally smooth and equally certain. To it, “I know this” and “I’m guessing” look almost the same on the way out.

Same confident tone — one’s true, one’s invented. The tone won’t tell you which.

So confidence is just not usable as evidence that it’s right. That’s the single most useful thing to keep in mind, I think.

Why doesn’t it just say “I don’t know”?

This is the part I found most interesting, and it turns out there’s a real answer beyond the mechanism: it was effectively trained to guess rather than abstain.

OpenAI researchers made this argument in a 2025 paper, Why Language Models Hallucinate. Their point, roughly: the standard ways we train and evaluate these models reward guessing over admitting uncertainty. On a typical benchmark, a confident guess that happens to be right earns points, while “I don’t know” earns nothing — same as a wrong answer. It’s like a multiple-choice exam where blanks score zero: if you’re unsure, guessing is the better strategy. Do that across enough training, and the model learns the same lesson a test-taking student does — always put something down.

So the “always answer, never abstain” behavior isn’t a random quirk. It’s closer to a habit we accidentally trained in by grading the wrong thing. (I’ve written before about how our benchmarks can end up rewarding the wrong thing — this is a pretty clean example of it.) The encouraging flip side, which the same paper makes, is that this is fixable: change the scoring to give credit for a well-placed “I don’t know,” and you’d expect less confident nonsense. Some newer models are already being nudged this way.

I don’t want to oversell it, though. The same paper is honest that you won’t get this to zero — it argues some baseline rate of error is baked into how these models are trained at all, clean data or not. So the realistic goal isn’t “no more mistakes,” it’s “fewer confident ones, and more that arrive flagged as uncertain.”

When is AI most likely to make things up?

It confabulates most when it has the least to go on — obscure, very recent, or hyper-specific things.

Some niche tool’s exact flag, what happened last week, what a particular book says on a particular page — the model’s “data” on these is thin. But it still can’t not answer (see the section above), so the next-word machine runs anyway and produces a complete, fluent-looking response by filling the gaps with whatever fits the pattern. The less it actually has, the more it’s improvising. And it improvises just as smoothly.

A decent rule of thumb: the more obscure, specific, or precision-dependent your question, the higher you should turn your skepticism. The wrong answers it gives in those spots tend to be the most polished ones.

How do you actually work with something like this?

You don’t have to distrust everything — you just separate two things in your head: fluent and correct.

Fluent and helpful is great for drafts, brainstorming, rephrasing, getting unstuck. I take all of that at face value. But anything load-bearing (a name, a number, a date, a claim I’m about to repeat or act on), I don’t take its word for. I check. This is basically the same instinct as “no evidence, no completion”: a confident-sounding output isn’t proof of anything until you’ve seen the evidence. It’s also the habit I lean on hardest in my day-to-day setup across ChatGPT, Claude, and Gemini.

The mental model that works for me is treating it like a fast, widely-read, very articulate friend who occasionally bluffs with a completely straight face. You’ll listen to them, you’ll get a lot of value from them, and on the stuff that matters you’ll quietly double-check. That’s about the right distance.

One caveat: this is a snapshot

I should be honest that all of the above is simplified, and there’s plenty I’ve left out (and don’t fully understand). It’s also a moving target. People are actively working on getting models to express calibrated uncertainty, to say “I’m not sure,” to cite and verify before answering. It’s plausible that in a couple of years “AI bluffs with total confidence” stops being such a reliable complaint, and this post ages out.

But at least for now: next time it answers you in that perfectly assured tone, it’s worth quietly adding a footnote in your head — sounding right isn’t the same as being right.

想看中文版的話：為什麼 AI 唬爛的時候，口氣跟講真話一模一樣？

為什麼 AI 唬爛的時候，口氣跟講真話一模一樣？

KbWen — Tue, 02 Jun 2026 15:00:00 +0800

TL;DR：我自己的理解大概是這樣：AI 在做的事，從頭到尾就是「看著前面的字，猜下一個最順的字」。它優化的是「順不順、像不像話」，不是「對不對」。所以講對跟講錯用的是同一套力氣、同一種口氣，因為對它來說那根本是同一件事。它沒有內建一顆「我其實不知道」的按鈕，預設就是把話接得漂漂亮亮。篤定，跟它到底知不知道，是兩回事。（這是簡化過的講法，而且模型一直在進步，看看就好。）

你大概也被唬過吧。問 AI 一個東西，它回得有條有理、語氣篤定，你看了覺得很合理，結果拿去一查，整段是它編的。氣的不是它錯，是它錯得那麼自然，完全沒有一點心虛。

我一直覺得這件事滿有意思的。它到底為什麼可以這樣？後來大概想通一點，分享一下我自己的理解，不一定對。

它根本沒在分「對」跟「錯」

先講最核心的一件事：它其實沒有在判斷真假。

你可以把它想成一個超級加強版的手機輸入法。你打「我今天很」，輸入法會跳「開心」「累」「忙」給你選，對吧。它怎麼知道要跳這幾個？因為在它看過的一大堆句子裡，「我今天很」後面接這些字最順。它不是懂你今天過得好不好，它只是知道哪個字接上去最像人話。

AI 講白了就是這個東西放到很大很大。它從頭到尾在做的，就是看著前面那串字，猜「下一個最順的字是什麼」，吐出來，再看著變長的這串繼續猜下一個，一個字一個字接成一整段。（如果你好奇它眼裡的「字」其實長什麼樣，那是一種叫 token 的東西，我在 Token 是什麼？LLM 為何只讀 Token？裡有聊，這段不看也完全不影響理解。）

重點是：它整個過程在追求的，是「順」，是「像不像話」。不是「對不對」。這兩個常常剛好一致——順的話通常也是對的——但它們不是同一件事。一旦分岔，它會毫不猶豫地選「順」，把一句很順但是錯的話講給你聽。它不是故意騙你，它就是少了一個「先查證、再決定要不要這樣講」的步驟，講之前沒人幫它把關。

那「很有自信」的口氣是哪來的

這就是它唬人的關鍵了。

它學講話的材料，是人類寫的一大堆文字。而人類寫東西的時候，口氣通常是滿肯定的——文章、教學、百科、回答，大家都把話講得斬釘截鐵。它把這些讀進去，順便也就學會了那種「篤定的腔」。所以它預設講出來的東西，聽起來就是一副很有把握的樣子，因為它模仿的就是這種樣子。（而且後面那層用人類評分做的微調也往同個方向推：直接、肯定、感覺有用的回答，通常分數比較高。）

問題是，它沒有另外長一顆「欸我這題其實不太確定」的按鈕。一個真人不知道的時候，會吞吐、會說「我猜啦」、會皺眉。它不會。它不知道的時候，還是用一模一樣的順、一模一樣的篤定，把一段話接給你。對它來說「我知道」跟「我不知道」這兩種狀態，輸出起來長得幾乎一樣。

同樣的篤定口氣，一個是真的、一個是它編的——光看語氣你分不出來。

所以「它講得很有信心」這件事，真的不能拿來當「它是對的」的證據。一點都不能。這大概是我覺得最該記住的一句。

它什麼時候最會一本正經地胡說

照這個邏輯推一下就猜得到：它最會掰的，是那種它其實沒什麼料的題目。

很冷門的、很新的、很細節的東西——某個沒什麼人寫過的小工具的參數、上禮拜才發生的事、某本書第幾頁講了什麼——它手上的料很薄。可是它又不能不接話，「猜下一個字」這個機制一啟動，它還是會生出一段讀起來很完整的東西給你。料越薄，它越是用想像力把空格填滿，而且填得一樣順。

所以有個滿好用的直覺：當你問的東西越冷門、越具體、越要求「精確」，你心裡的警報就該開得越大。它在這種地方翻車的姿勢，往往最優雅。

我自己是怎麼跟它相處的

知道這件事之後，其實也不用怕它，調整一下心態就好。

我的做法很簡單，就是把「順」跟「對」這兩件事在腦子裡分開。它講得順、講得好聽，我照收，當草稿、當靈感很好用。但只要是有名有姓、有數字、有日期、我打算拿去用的東西，我就不會它說了我就信，會自己再查一下。這條習慣我在前一篇我每天開著三個 AI 的幾個小習慣裡也有提到，這篇算是把背後的原因補上，講為什麼那條習慣值得養，大概就是因為這篇講的這件事。

說穿了就是：把它當一個口才很好、見多識廣、但偶爾會一本正經唬你的朋友。你會聽他講，但重要的事你會自己再確認一下，對吧。差不多就是這種距離。

最後，這只是我現在的理解

要老實說一下，上面整套講法是簡化過的，真要摳細節，裡面還有一堆東西我也沒講（也不一定全懂）。而且這東西一直在變。已經有人在想辦法讓模型學會講「我不太確定」、會附上它有多少把握、會去查證再回答。搞不好過個一兩年，「AI 很愛自信地唬爛」這個說法本身就過時了，那這篇也就可以收起來了。

不過至少以現在來說，下次它又用那種無比篤定的口氣回你一段話的時候，你心裡可以默默補一句：講得順，不代表它知道喔。