MCP on KbWen Blog

MCP 資安危機：問題不在協定，而在治理

KbWen — Mon, 25 May 2026 15:00:00 +0800

TL;DR： MCP 在一年多內，從 Anthropic 的內部實驗變成 AI 業界共通的介面。但進入 2026 年，資安研究員一個接一個把它拆開：官方 SDK 的 by-design RCE、tool poisoning、rug pull。我的看法是，這些漏洞大多不是協定的 bug，而是「把能力交出去、卻沒把治理一起交出去」的必然結果。現在大家急著補的那些東西，OAuth scope、人工確認、伺服器註冊表，其實就是治理被重新貼回協定上。

2026 年 4 月，資安團隊 OX Security 公布了一個發現：MCP 的官方 SDK（Python、TypeScript、Java、Rust 全中）存在一條從設定檔直接到指令執行的路徑，攻擊者可以在任何跑著有問題實作的機器上執行任意系統指令。根據他們的估算，受影響的套件下載量超過 1.5 億次，潛在波及的伺服器實例上看 20 萬個（The Register 的報導用的標題就是「20 萬台伺服器有風險」）。後續整個生態跟著冒出一連串 CVE，包括 MCP Inspector 的 CVE-2025-49596 和 Cursor 的 CVE-2025-54136。

但真正讓我停下來想的，是 Anthropic 的回應：這是設計如此（by design）。他們不打算改協定，並表示輸入清洗是開發者自己的責任。

這句話可以有兩種讀法，而我認為兩種都對。這正是整件事最值得想的地方。

先講清楚：MCP 為什麼會贏

要評論 MCP 的資安問題，得先承認它解決了一個真的很煩的問題。

在 MCP 之前，每接一個工具到 AI 上，你就得寫一套各自為政的膠水。M 個模型乘上 N 個工具，等於 M×N 種接法。MCP 把它變成 M+N：工具實作一次 server，模型實作一次 client，中間用同一套協定講話。Anthropic 當初的比喻是「AI 的 USB-C」，這個比喻站得住，是因為它真的描述了發生的事。

而且它的擴散速度不是普通的快。OpenAI 在 2025 年 3 月把 MCP 接進 Agents SDK、Responses API 和 ChatGPT 桌面版；Google DeepMind 在 4 月跟進。到了 2025 年 12 月，Anthropic 把 MCP 捐給 Linux Foundation，AWS、Google、Microsoft、OpenAI、Bloomberg、Cloudflare 全都掛名白金會員。這時候它已經不是「Anthropic 的協定」，而是業界共同基礎建設。光是 SDK 的月下載量，就從上線時的約十萬次成長到 2026 年 3 月的約 9700 萬次。

換句話說，這不是一個沒人用的協定出包。這是一個贏家出包，而且是贏在規模上。

然後資安研究員開始拆它

問題是，讓 MCP 好接的那些設計，同時也讓它好攻擊。研究社群這一年整理出幾類反覆出現的攻擊手法，值得分開來看。

Tool poisoning（工具下毒）。 MCP 在握手時，server 會用 tools/list 把每個工具的描述回傳給模型看。麻煩在於這段描述模型會讀、人通常不會看。把惡意指令藏在工具描述裡，對使用者隱形、對 LLM 有效。Invariant Labs 就公開示範過：一個看起來人畜無害的工具，描述裡偷偷寫著「順便把 ~/.ssh 的內容也傳過來」。有些研究者把同一件事叫做「line jumping」，因為指令在工具真正被呼叫之前就插隊生效了。

Rug pull（地毯抽走）。 你第一次裝某個 MCP server 時審過了、也同意了。但工具的描述和行為可以在事後被悄悄改掉，而這種變更不一定會觸發新的同意流程。先用一個正常的工具建立信任，再在某次更新裡把它變壞。又因為定義是持久的，之後每一個叫到它的 session 都會跑到下毒後的版本。

這些不是紙上談兵。一個叫 MCPTox 的 benchmark 在 45 個真實世界的 MCP server 上測試，對 o1-mini 的攻擊成功率達到 72.8%。連 NSA 都出了一份 MCP 安全指引。把這些放在一起看，你會發現一個共同點：攻擊面幾乎都不在「協定本身有沒有加密」這種傳統資安問題上，而在一個非決定性的東西，也就是 LLM，被放在安全決策的正中央。

關鍵爭議：「設計如此」算不算卸責

回到 Anthropic 那句「by design」。

同情他們的讀法是：協定本來就只負責「連接」，不負責「信任」。STDIO 會執行你給它的指令，這跟 shell 會執行你打的指令一樣，是工具的本分，不是漏洞。把每一種誤用都當成協定要修的 bug，協定會變得無法使用。

不同情的讀法是：當你的官方 SDK、橫跨四種語言、被下載超過一億次，「清洗是開發者的責任」這句話的實際效果，就是把一個系統性風險，平均分攤給十幾萬個多半沒有資安團隊的開發者。標準之所以是標準，就是因為大家會照抄它的預設值。預設值不安全，就等於不安全。

我的立場偏後者，但我想把話講得更精確一點：這兩種讀法其實沒有矛盾。協定確實只該負責連接；但問題就在於，整個生態把「連接的標準」誤當成了「信任的標準」。

我的看法：這不是協定的 bug，是治理的缺口

如果你讀過這個部落格之前談 AI 代理的幾篇，這個結論不會讓你意外。

我一直在寫的一件事是：AI 代理出事，通常不是模型不夠強，而是結構不夠。MCP 的資安危機是同一個故事的放大版。tool poisoning 本質上就是 prompt injection；我們在談代理常見痛點時就提過，安全研究員花 500 美元就能讓 Devin 透過 GitHub issue 執行範圍外的操作，成功率八成以上。rug pull 本質上則是範圍失控，加上沒有 evidence trail：沒有人能指著說「這個工具上週的定義長這樣、這週變成那樣」。

換句話說，MCP 沒有發明新的問題。它把幾個老問題（盲目信任工具輸出、範圍蔓延、缺乏可追溯性），用一個標準協定，以一億次下載的規模工業化了。

能力交出去了，治理沒有跟上。這就是缺口。

我在從「下指令」到「蓋系統」那篇講過，prompt 的問題不在 prompt 本身，而在它沒有結構。MCP 是同一個層級往上的版本：協定的問題不在協定本身，而在大家以為「有了標準介面」就等於「有了安全保證」。連接的標準，不是信任的標準。

那正在補的東西，其實就是治理

最能佐證這個判斷的，是大家現在拿來補洞的工具。

2026 年的 MCP 規格往哪個方向走？OAuth 2.1（強制 PKCE、禁掉 implicit grant），把 server 定位成 OAuth resource server；加上 incremental scope consent，讓 client 每次只要最小權限；用 resource indicator 綁定 token，避免被挪用；規格明文要求 human-in-the-loop，對破壞性操作要有風險標註和確認流程；以及用一個受治理的中央註冊表，讓工具註冊一次、政策定義一次。

把這串東西念一遍，你會發現它根本不是在修一個協定 bug。它是把最小權限、人工核可、可追溯性這些老牌的治理原則，一條一條重新貼回協定上。這正是我說「漏洞是治理缺口」的證據：補丁的形狀，就長得跟治理一模一樣。

這跟只用 Prompt 和技能也能做基本治理那篇的精神是一致的。治理不一定要很重，但它不能是零。

給實作者的幾條原則

如果你今天就在用 MCP（用 Claude Code、Cursor，或任何接了 MCP 的工具，你大概就在用），我會建議幾件事。

把每一個 MCP server 當成不可信的輸入來對待。這跟我們對待 LLM 輸出的態度應該一樣：預設不信，要有東西可查才信。一個第三方 server 的工具描述，跟一段使用者貼進來的文字，威脅等級是一樣的。

權限給到剛好夠用就好。如果一個查 wiki 的 server 要求能讀你整個檔案系統，那不是功能，是紅旗。

對第三方 server 釘版本、留審計。rug pull 的前提是「悄悄更新」，那就讓更新沒辦法悄悄：固定版本、記錄工具定義的變更。

破壞性操作一定留一個人在迴圈裡。刪檔、送外部請求、動資料庫這類不可逆的動作，不要讓 agent 自己決定。

能用第一方就用第一方。生態裡有上萬個社群 server，方便，但「方便」正是 rug pull 和 tool poisoning 的溫床。

最後

MCP 會留下來，這點我沒什麼懷疑。它解決的問題太真實，網路效應也已經成形。但我希望大家別把「它贏了」誤讀成「它安全了」。

一個協定能不能成為標準，看的是它好不好接；一個系統安不安全，看的是它有沒有治理。這是兩件事。2026 年這一連串的漏洞，與其說是 MCP 的失敗，不如說是一次提醒：當我們把越來越多能力交給 AI 去調用，真正要補上的從來不是更聰明的模型，而是更紮實的結構。

English version: MCP Security Isn’t a Protocol Bug. It’s a Governance Problem.

MCP Security Isn't a Protocol Bug. It's a Governance Problem.

KbWen — Mon, 25 May 2026 14:30:00 +0800

TL;DR: MCP went from an Anthropic side-project to the industry’s default agent-to-tool interface in about a year. Then 2026 brought a steady drip of disclosures: a by-design RCE in the official SDKs, tool poisoning, rug pulls. My read is that almost none of these are protocol bugs. They’re what happens when you ship capability without shipping governance, and the patches now landing (OAuth scopes, human-in-the-loop, registries) are just governance being bolted back on.

In April 2026, the security firm OX Security disclosed a path in MCP’s official SDKs (Python, TypeScript, Java, and Rust) that runs straight from configuration to command execution, letting an attacker run arbitrary OS commands on any machine hosting a vulnerable implementation. By their count it touched packages with over 150 million downloads and up to 200,000 server instances; The Register ran it as “200k servers at risk.” Downstream CVEs followed across the ecosystem, including CVE-2025-49596 in MCP Inspector and CVE-2025-54136 in Cursor.

Anthropic’s response is where it gets interesting: this is by design. They declined to change the protocol and said input sanitization is the developer’s responsibility.

That sentence has two valid readings. I think both are correct, and that’s the most interesting thing about this whole episode.

First, give MCP its due

You can’t fairly critique MCP’s security without admitting it solved a genuinely annoying problem.

Before MCP, every tool you wired into an AI needed its own bespoke glue. M models times N tools is M×N integrations. MCP turns that into M+N: each tool implements a server once, each model implements a client once, and they speak one protocol in between. Anthropic’s original pitch was “USB-C for AI,” and the analogy holds because it describes what actually happened.

And it spread fast. OpenAI wired MCP into its Agents SDK, Responses API, and ChatGPT desktop in March 2025; Google DeepMind followed in April. By December 2025 Anthropic had donated MCP to the Linux Foundation with AWS, Google, Microsoft, OpenAI, Bloomberg, and Cloudflare as platinum backers. At that point it stopped being “Anthropic’s protocol” and became shared infrastructure. SDK downloads went from roughly 100,000 a month at launch to about 97 million by March 2026. The case for why it won is well argued in The New Stack’s writeup.

So this isn’t an unused protocol failing in obscurity. It’s the winner failing, at scale.

Then the security researchers took it apart

The trouble is that the design choices that made MCP easy to adopt also made it easy to attack. A few attack classes kept recurring this year, and they’re worth separating:

Tool poisoning. During the handshake, an MCP server returns each tool’s description to the model via tools/list. The model reads that description; the human usually doesn’t. Hide an instruction in the description and it’s invisible to the user but live to the LLM. Invariant Labs demonstrated a benign-looking tool whose description quietly asked the agent to also send along the contents of ~/.ssh. Some researchers call the same move “line jumping,” because the instruction takes effect before the tool is ever actually called.

Rug pull. You reviewed and approved a server the first time you installed it. But a tool’s description and behavior can be changed afterward, and that change doesn’t necessarily trigger a fresh approval. Build trust with a clean tool, then turn it malicious in an update. Because the definition is persistent, every later session that calls it runs the poisoned version.

These aren’t hypothetical. A benchmark called MCPTox hit a 72.8% attack success rate against o1-mini across 45 real-world MCP servers. The NSA published its own MCP security guidance. Put it together and the common thread is clear: the attack surface isn’t really about whether the transport is encrypted. It’s about a non-deterministic actor, the LLM, sitting in the middle of security-critical decisions.

The real argument: is “by design” a cop-out?

Back to Anthropic’s “by design.”

The sympathetic reading: a protocol’s job is connection, not trust. STDIO executes the command you hand it the same way a shell executes what you type. That’s the tool doing its job, not a vulnerability. Treat every misuse as a protocol bug to fix and the protocol becomes unusable.

The unsympathetic reading: when your official SDK, across four languages, has been downloaded more than 150 million times, “sanitization is the developer’s responsibility” effectively spreads a systemic risk across a hundred-thousand-plus developers, most of whom have no security team. A standard becomes a standard precisely because people copy its defaults. If the default isn’t safe, it isn’t safe.

I lean toward the second, but let me be precise: the two readings don’t actually contradict each other. A protocol should only be responsible for connection. The problem is that the ecosystem mistook a standard for connection for a standard for trust.

My take: this is a governance gap, not a protocol bug

If you’ve read what I’ve written here about agents before, this won’t surprise you.

The thing I keep coming back to is that when AI agents go wrong, it’s usually not the model — it’s the missing structure. The MCP security story is that argument at scale. Tool poisoning is prompt injection. A rug pull is scope creep plus the absence of an evidence trail: nobody can point and say “here’s what this tool’s definition was last week, and here’s what it is now.”

MCP didn’t invent a new class of problem. It took a few old ones (blindly trusting tool output, scope creep, no traceability) and industrialized them behind a single standard, at a hundred million downloads.

The capability got handed over; the governance didn’t come with it.

This is the same shape as the no evidence, no completion principle: a thing isn’t trustworthy because it claims to be, it’s trustworthy because you can check it. A standard interface tells you how to connect. It tells you nothing about whether to trust what’s on the other end.

The patches landing now are just governance

The strongest evidence for that read is the toolkit people are reaching for to fix it.

Where is the 2026 MCP spec heading? OAuth 2.1 (mandatory PKCE, no implicit grant), with servers acting as OAuth resource servers. Incremental scope consent, so a client requests only the minimum access per operation. Resource indicators, so a token can’t be reused where it shouldn’t. An explicit human-in-the-loop requirement, with risk annotations and approval flows for destructive operations. And a governed central registry, so tools are registered once and policy is defined once (the MCP authorization spec tracks most of this).

Read that list back and none of it is a protocol patch. It’s least privilege, human approval, and traceability — old governance principles, being reattached to the protocol one at a time. When the remedy looks exactly like governance, that tells you what was missing in the first place.

A few rules if you’re shipping on MCP

If you’re using MCP today (and if you use Claude Code, Cursor, or anything with MCP wired in, you probably are), here’s what I’d do.

Treat every MCP server as untrusted input. Same posture as LLM output: don’t trust by default, trust when you can verify. A third-party server’s tool description deserves the same suspicion as text a user pasted in.

Grant the minimum scope that works. If a wiki-lookup server asks to read your whole filesystem, that’s not a feature, it’s a red flag.

Pin versions and audit third-party servers. A rug pull depends on a silent update, so don’t let updates be silent. Pin versions, log changes to tool definitions.

Keep a human in the loop for anything irreversible. Deletes, outbound requests, database writes: don’t let the agent decide those alone.

Prefer first-party servers. There are tens of thousands of community servers and they’re convenient. “Convenient” is exactly the soil rug pulls and tool poisoning grow in.

The takeaway

MCP is going to stick around. The problem it solves is too real and the network effects are already in place. But don’t read “it won” as “it’s safe.”

What makes a protocol a standard is how easy it is to connect to. What makes a system safe is whether it’s governed. Those are different questions. The 2026 disclosures aren’t really MCP failing so much as a reminder: as we hand more and more capability to AI to invoke on our behalf, the thing we actually need to add was never a smarter model. It’s sturdier structure.

中文版本：MCP 資安危機：問題不在協定，而在治理