<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Hallucination on KbWen Blog</title>
    <link>https://www.kbwen.com/tags/hallucination/</link>
    <description>KbWen 的個人技術部落格，分享 Python、機器學習、深度學習、資料工程與 AI 開發的學習筆記與實作心得。</description>
    <generator>Hugo</generator>
    <language>zh-tw</language>
    <image>
      <url>https://www.kbwen.com/images/og-default.png</url>
      <title>KbWen Blog</title>
      <link>https://www.kbwen.com/</link>
    </image>
    
    <lastBuildDate>Tue, 02 Jun 2026 15:45:00 +0800</lastBuildDate><atom:link href="https://www.kbwen.com/tags/hallucination/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Why Does AI Sound So Confident When It&#39;s Wrong?</title>
      <link>https://www.kbwen.com/why-ai-sounds-confident-when-wrong/</link>
      <pubDate>Tue, 02 Jun 2026 15:45:00 +0800</pubDate><dc:creator>KbWen</dc:creator>
      <guid>https://www.kbwen.com/why-ai-sounds-confident-when-wrong/</guid>
      <description>AI&amp;#39;s most dangerous trait isn&amp;#39;t that it&amp;#39;s wrong sometimes. It&amp;#39;s that its tone when wrong is identical to its tone when right. Here&amp;#39;s my plain-language take on why, including why it won&amp;#39;t just say &amp;#39;I don&amp;#39;t know&amp;#39;.</description>
      <content:encoded><![CDATA[<blockquote>
<p><strong>TL;DR</strong>: A language model generates text by predicting the next most-plausible word, over and over. It&rsquo;s optimizing for <em>sounds right</em>, not <em>is right</em> — so a true answer and a made-up one are produced the exact same way, in the exact same confident tone. There&rsquo;s no separate step where it checks whether what it&rsquo;s saying is true, and (by default) no &ldquo;I don&rsquo;t know&rdquo; setting. So the confidence you hear tells you nothing about whether it&rsquo;s correct. This is a simplified picture and the models are improving, but for now: fluent does not mean true.</p>
</blockquote>
<p>The scary thing about AI getting something wrong isn&rsquo;t the error itself. It&rsquo;s that the tone is identical to when it&rsquo;s right. No hesitation, no hedging, no tell. You read a confident, well-organized paragraph, it sounds completely reasonable, and then you check it and the whole thing was made up.</p>
<p>I find this genuinely interesting, so I spent a while trying to understand why it happens. Here&rsquo;s my current take. It&rsquo;s simplified, probably not perfectly accurate, but it&rsquo;s the mental model that made it click for me.</p>
<h2 id="whats-actually-going-on-it-predicts-the-next-word-not-the-truth">What&rsquo;s actually going on: it predicts the next word, not the truth</h2>
<p>The core thing to know is that the model isn&rsquo;t deciding whether something is true. It&rsquo;s predicting text.</p>
<p><strong>Next-token prediction means: look at the words so far, and guess the most plausible next word.</strong> That&rsquo;s it. Then it looks at the now-slightly-longer text and guesses the next word again, one piece at a time, until a whole answer exists.</p>
<p>Think of your phone&rsquo;s autocomplete. Type &ldquo;I&rsquo;m running a bit&rdquo; and it offers &ldquo;late.&rdquo; It doesn&rsquo;t know your schedule; it knows that across billions of sentences, &ldquo;late&rdquo; is what usually follows. A language model is that idea scaled up enormously and made much better at it, but it&rsquo;s the same move underneath.</p>
<p>The important part: what it&rsquo;s optimizing for is <em>plausibility</em>. Does this read like something a person would write? Not <em>accuracy</em>. Those two usually line up, because true statements are common in its training data. But when they come apart, it&rsquo;ll happily pick the fluent-sounding option and hand you a sentence that flows perfectly and is also wrong. It isn&rsquo;t lying to you. It just has no step where it stops to check whether what it&rsquo;s saying is true before it says it — nothing wired up to make it hold back when it&rsquo;s unsure.</p>
<h2 id="so-why-does-it-sound-so-sure-of-itself">So why does it sound so sure of itself?</h2>
<p>Because it learned to talk from confident writing, and confidence is just another pattern it copies.</p>
<p>Almost everything it trained on — articles, textbooks, documentation, answers — is written in a fairly assertive voice. People state things. So the model picked up that register along with everything else. Its default output <em>sounds</em> self-assured because the text it&rsquo;s imitating sounds self-assured. (The fine-tuning it gets afterward, where humans rate its answers, tends to push the same way: direct, helpful-sounding replies score better.)</p>
<p>And here&rsquo;s the catch: it has no separate dial for &ldquo;actually, I&rsquo;m not sure about this one.&rdquo; When a person doesn&rsquo;t know, they slow down, they hedge, they say &ldquo;I think?&rdquo; The model doesn&rsquo;t, by default. Whether it&rsquo;s repeating a rock-solid fact or inventing something on the spot, the output comes out equally smooth and equally certain. To it, &ldquo;I know this&rdquo; and &ldquo;I&rsquo;m guessing&rdquo; look almost the same on the way out.</p>
<p><img
  src="/images/figures/fig-confident-twins-en.png"
  alt="Two identical AI answer cards with identical full confidence bars, one tagged TRUE and one tagged MADE UP"
  loading="lazy"
  fetchpriority="auto"
  decoding="async" width="1040" height="470"
>
</p>
<p><em>Same confident tone — one&rsquo;s true, one&rsquo;s invented. The tone won&rsquo;t tell you which.</em></p>
<p>So confidence is just not usable as evidence that it&rsquo;s right. That&rsquo;s the single most useful thing to keep in mind, I think.</p>
<h2 id="why-doesnt-it-just-say-i-dont-know">Why doesn&rsquo;t it just say &ldquo;I don&rsquo;t know&rdquo;?</h2>
<p>This is the part I found most interesting, and it turns out there&rsquo;s a real answer beyond the mechanism: it was effectively <em>trained</em> to guess rather than abstain.</p>
<p>OpenAI researchers made this argument in a 2025 paper, <a href="https://openai.com/index/why-language-models-hallucinate/"><em>Why Language Models Hallucinate</em></a>. Their point, roughly: the standard ways we train and evaluate these models reward guessing over admitting uncertainty. On a typical benchmark, a confident guess that happens to be right earns points, while &ldquo;I don&rsquo;t know&rdquo; earns nothing — same as a wrong answer. It&rsquo;s like a multiple-choice exam where blanks score zero: if you&rsquo;re unsure, guessing is the better strategy. Do that across enough training, and the model learns the same lesson a test-taking student does — always put something down.</p>
<p>So the &ldquo;always answer, never abstain&rdquo; behavior isn&rsquo;t a random quirk. It&rsquo;s closer to a habit we accidentally trained in by grading the wrong thing. (I&rsquo;ve written before about how <a href="/benchmark-saturation-is-a-verification-problem/">our benchmarks can end up rewarding the wrong thing</a> — this is a pretty clean example of it.) The encouraging flip side, which the same paper makes, is that this is fixable: change the scoring to give credit for a well-placed &ldquo;I don&rsquo;t know,&rdquo; and you&rsquo;d expect less confident nonsense. Some newer models are already being nudged this way.</p>
<p>I don&rsquo;t want to oversell it, though. The same paper is honest that you won&rsquo;t get this to zero — it argues some baseline rate of error is baked into how these models are trained at all, clean data or not. So the realistic goal isn&rsquo;t &ldquo;no more mistakes,&rdquo; it&rsquo;s &ldquo;fewer confident ones, and more that arrive flagged as uncertain.&rdquo;</p>
<h2 id="when-is-ai-most-likely-to-make-things-up">When is AI most likely to make things up?</h2>
<p>It confabulates most when it has the least to go on — obscure, very recent, or hyper-specific things.</p>
<p>Some niche tool&rsquo;s exact flag, what happened last week, what a particular book says on a particular page — the model&rsquo;s &ldquo;data&rdquo; on these is thin. But it still can&rsquo;t <em>not</em> answer (see the section above), so the next-word machine runs anyway and produces a complete, fluent-looking response by filling the gaps with whatever fits the pattern. The less it actually has, the more it&rsquo;s improvising. And it improvises just as smoothly.</p>
<p>A decent rule of thumb: the more obscure, specific, or precision-dependent your question, the higher you should turn your skepticism. The wrong answers it gives in those spots tend to be the most polished ones.</p>
<h2 id="how-do-you-actually-work-with-something-like-this">How do you actually work with something like this?</h2>
<p>You don&rsquo;t have to distrust everything — you just separate two things in your head: <em>fluent</em> and <em>correct</em>.</p>
<p>Fluent and helpful is great for drafts, brainstorming, rephrasing, getting unstuck. I take all of that at face value. But anything load-bearing (a name, a number, a date, a claim I&rsquo;m about to repeat or act on), I don&rsquo;t take its word for. I check. This is basically the same instinct as <a href="/no-evidence-no-completion-verification-principle/">&ldquo;no evidence, no completion&rdquo;</a>: a confident-sounding output isn&rsquo;t proof of anything until you&rsquo;ve seen the evidence. It&rsquo;s also the habit I lean on hardest in my <a href="/how-i-use-chatgpt-claude-gemini/">day-to-day setup across ChatGPT, Claude, and Gemini</a>.</p>
<p>The mental model that works for me is treating it like a fast, widely-read, very articulate friend who occasionally bluffs with a completely straight face. You&rsquo;ll listen to them, you&rsquo;ll get a lot of value from them, and on the stuff that matters you&rsquo;ll quietly double-check. That&rsquo;s about the right distance.</p>
<h2 id="one-caveat-this-is-a-snapshot">One caveat: this is a snapshot</h2>
<p>I should be honest that all of the above is simplified, and there&rsquo;s plenty I&rsquo;ve left out (and don&rsquo;t fully understand). It&rsquo;s also a moving target. People are actively working on getting models to express calibrated uncertainty, to say &ldquo;I&rsquo;m not sure,&rdquo; to cite and verify before answering. It&rsquo;s plausible that in a couple of years &ldquo;AI bluffs with total confidence&rdquo; stops being such a reliable complaint, and this post ages out.</p>
<p>But at least for now: next time it answers you in that perfectly assured tone, it&rsquo;s worth quietly adding a footnote in your head — <em>sounding right isn&rsquo;t the same as being right.</em></p>
<p><em>想看中文版的話：<a href="/why-ai-sounds-so-confident-when-its-wrong/">為什麼 AI 唬爛的時候，口氣跟講真話一模一樣？</a></em></p>
]]></content:encoded>
    </item>
    
    <item>
      <title>為什麼 AI 唬爛的時候，口氣跟講真話一模一樣？</title>
      <link>https://www.kbwen.com/why-ai-sounds-so-confident-when-its-wrong/</link>
      <pubDate>Tue, 02 Jun 2026 15:00:00 +0800</pubDate><dc:creator>KbWen</dc:creator>
      <guid>https://www.kbwen.com/why-ai-sounds-so-confident-when-its-wrong/</guid>
      <description>AI 最會唬人的地方，不是它會錯，是它錯的時候那個口氣跟講對的時候完全一樣。用『它一直在猜下一個最順的字』這個角度，白話聊聊為什麼篤定不等於知道。</description>
      <content:encoded><![CDATA[<blockquote>
<p><strong>TL;DR</strong>：我自己的理解大概是這樣：AI 在做的事，從頭到尾就是「看著前面的字，猜下一個最順的字」。它優化的是「順不順、像不像話」，不是「對不對」。所以講對跟講錯用的是同一套力氣、同一種口氣，因為對它來說那根本是同一件事。它沒有內建一顆「我其實不知道」的按鈕，預設就是把話接得漂漂亮亮。篤定，跟它到底知不知道，是兩回事。（這是簡化過的講法，而且模型一直在進步，看看就好。）</p>
</blockquote>
<p>你大概也被唬過吧。問 AI 一個東西，它回得有條有理、語氣篤定，你看了覺得很合理，結果拿去一查，整段是它編的。氣的不是它錯，是它錯得那麼自然，完全沒有一點心虛。</p>
<p>我一直覺得這件事滿有意思的。它到底為什麼可以這樣？後來大概想通一點，分享一下我自己的理解，不一定對。</p>
<h2 id="它根本沒在分對跟錯">它根本沒在分「對」跟「錯」</h2>
<p>先講最核心的一件事：它其實沒有在判斷真假。</p>
<p>你可以把它想成一個超級加強版的手機輸入法。你打「我今天很」，輸入法會跳「開心」「累」「忙」給你選，對吧。它怎麼知道要跳這幾個？因為在它看過的一大堆句子裡，「我今天很」後面接這些字最順。它不是懂你今天過得好不好，它只是知道哪個字接上去最像人話。</p>
<p>AI 講白了就是這個東西放到很大很大。它從頭到尾在做的，就是看著前面那串字，猜「下一個最順的字是什麼」，吐出來，再看著變長的這串繼續猜下一個，一個字一個字接成一整段。（如果你好奇它眼裡的「字」其實長什麼樣，那是一種叫 token 的東西，我在 <a href="/what-is-token-in-llm/">Token 是什麼？LLM 為何只讀 Token？</a> 裡有聊，這段不看也完全不影響理解。）</p>
<p>重點是：它整個過程在追求的，是「順」，是「像不像話」。不是「對不對」。這兩個常常剛好一致——順的話通常也是對的——但它們不是同一件事。一旦分岔，它會毫不猶豫地選「順」，把一句很順但是錯的話講給你聽。它不是故意騙你，它就是少了一個「先查證、再決定要不要這樣講」的步驟，講之前沒人幫它把關。</p>
<h2 id="那很有自信的口氣是哪來的">那「很有自信」的口氣是哪來的</h2>
<p>這就是它唬人的關鍵了。</p>
<p>它學講話的材料，是人類寫的一大堆文字。而人類寫東西的時候，口氣通常是滿肯定的——文章、教學、百科、回答，大家都把話講得斬釘截鐵。它把這些讀進去，順便也就學會了那種「篤定的腔」。所以它預設講出來的東西，聽起來就是一副很有把握的樣子，因為它模仿的就是這種樣子。（而且後面那層用人類評分做的微調也往同個方向推：直接、肯定、感覺有用的回答，通常分數比較高。）</p>
<p>問題是，它沒有另外長一顆「欸我這題其實不太確定」的按鈕。一個真人不知道的時候，會吞吐、會說「我猜啦」、會皺眉。它不會。它不知道的時候，還是用一模一樣的順、一模一樣的篤定，把一段話接給你。對它來說「我知道」跟「我不知道」這兩種狀態，輸出起來長得幾乎一樣。</p>
<p><img
  src="/images/figures/fig-confident-twins-zh.png"
  alt="兩張一模一樣的 AI 答案卡，信心條都滿格，一張標「真的」一張標「唬爛的」"
  loading="lazy"
  fetchpriority="auto"
  decoding="async" width="1040" height="470"
>
</p>
<p><em>同樣的篤定口氣，一個是真的、一個是它編的——光看語氣你分不出來。</em></p>
<p>所以「它講得很有信心」這件事，真的不能拿來當「它是對的」的證據。一點都不能。這大概是我覺得最該記住的一句。</p>
<h2 id="它什麼時候最會一本正經地胡說">它什麼時候最會一本正經地胡說</h2>
<p>照這個邏輯推一下就猜得到：它最會掰的，是那種它其實沒什麼料的題目。</p>
<p>很冷門的、很新的、很細節的東西——某個沒什麼人寫過的小工具的參數、上禮拜才發生的事、某本書第幾頁講了什麼——它手上的料很薄。可是它又不能不接話，「猜下一個字」這個機制一啟動，它還是會生出一段讀起來很完整的東西給你。料越薄，它越是用想像力把空格填滿，而且填得一樣順。</p>
<p>所以有個滿好用的直覺：當你問的東西越冷門、越具體、越要求「精確」，你心裡的警報就該開得越大。它在這種地方翻車的姿勢，往往最優雅。</p>
<h2 id="我自己是怎麼跟它相處的">我自己是怎麼跟它相處的</h2>
<p>知道這件事之後，其實也不用怕它，調整一下心態就好。</p>
<p>我的做法很簡單，就是把「順」跟「對」這兩件事在腦子裡分開。它講得順、講得好聽，我照收，當草稿、當靈感很好用。但只要是有名有姓、有數字、有日期、我打算拿去用的東西，我就不會它說了我就信，會自己再查一下。這條習慣我在前一篇 <a href="/daily-habits-using-ai-chatbots/">我每天開著三個 AI 的幾個小習慣</a> 裡也有提到，這篇算是把背後的原因補上，講為什麼那條習慣值得養，大概就是因為這篇講的這件事。</p>
<p>說穿了就是：把它當一個口才很好、見多識廣、但偶爾會一本正經唬你的朋友。你會聽他講，但重要的事你會自己再確認一下，對吧。差不多就是這種距離。</p>
<h2 id="最後這只是我現在的理解">最後，這只是我現在的理解</h2>
<p>要老實說一下，上面整套講法是簡化過的，真要摳細節，裡面還有一堆東西我也沒講（也不一定全懂）。而且這東西一直在變。已經有人在想辦法讓模型學會講「我不太確定」、會附上它有多少把握、會去查證再回答。搞不好過個一兩年，「AI 很愛自信地唬爛」這個說法本身就過時了，那這篇也就可以收起來了。</p>
<p>不過至少以現在來說，下次它又用那種無比篤定的口氣回你一段話的時候，你心裡可以默默補一句：講得順，不代表它知道喔。</p>
]]></content:encoded>
    </item>
    
  </channel>
</rss>
