MCP Security Is a Governance Problem

TL;DR: MCP went from an Anthropic side-project to the industry’s default agent-to-tool interface in about a year. Then 2026 brought a steady drip of disclosures: a by-design RCE in the official SDKs, tool poisoning, rug pulls. My read is that almost none of these are protocol bugs. They’re what happens when you ship capability without shipping governance, and the patches now landing (OAuth scopes, human-in-the-loop, registries) are just governance being bolted back on.

In April 2026, the security firm OX Security disclosed a path in MCP’s official SDKs (Python, TypeScript, Java, and Rust) that runs straight from configuration to command execution, letting an attacker run arbitrary OS commands on any machine hosting a vulnerable implementation. By their count it touched packages with over 150 million downloads and up to 200,000 server instances; The Register ran it as “200k servers at risk.” Downstream CVEs followed across the ecosystem, including CVE-2025-49596 in MCP Inspector and CVE-2025-54136 in Cursor.

Anthropic’s response is where it gets interesting: this is by design. They declined to change the protocol and said input sanitization is the developer’s responsibility.

That sentence has two valid readings. I think both are correct, and that’s the most interesting thing about this whole episode.

First, give MCP its due

You can’t fairly critique MCP’s security without admitting it solved a genuinely annoying problem.

Before MCP, every tool you wired into an AI needed its own bespoke glue. M models times N tools is M×N integrations. MCP turns that into M+N: each tool implements a server once, each model implements a client once, and they speak one protocol in between. Anthropic’s original pitch was “USB-C for AI,” and the analogy holds because it describes what actually happened.

And it spread fast. OpenAI wired MCP into its Agents SDK, Responses API, and ChatGPT desktop in March 2025; Google DeepMind followed in April. By December 2025 Anthropic had donated MCP to the Linux Foundation with AWS, Google, Microsoft, OpenAI, Bloomberg, and Cloudflare as platinum backers. At that point it stopped being “Anthropic’s protocol” and became shared infrastructure. SDK downloads went from roughly 100,000 a month at launch to about 97 million by March 2026. The case for why it won is well argued in The New Stack’s writeup.

So this isn’t an unused protocol failing in obscurity. It’s the winner failing, at scale.

Then the security researchers took it apart

The trouble is that the design choices that made MCP easy to adopt also made it easy to attack. A few attack classes kept recurring this year, and they’re worth separating:

Tool poisoning. During the handshake, an MCP server returns each tool’s description to the model via tools/list. The model reads that description; the human usually doesn’t. Hide an instruction in the description and it’s invisible to the user but live to the LLM. Invariant Labs demonstrated a benign-looking tool whose description quietly asked the agent to also send along the contents of ~/.ssh. Some researchers call the same move “line jumping,” because the instruction takes effect before the tool is ever actually called.

Rug pull. You reviewed and approved a server the first time you installed it. But a tool’s description and behavior can be changed afterward, and that change doesn’t necessarily trigger a fresh approval. Build trust with a clean tool, then turn it malicious in an update. Because the definition is persistent, every later session that calls it runs the poisoned version.

These aren’t hypothetical. A benchmark called MCPTox hit a 72.8% attack success rate against o1-mini across 45 real-world MCP servers. The NSA published its own MCP security guidance. Put it together and the common thread is clear: the attack surface isn’t really about whether the transport is encrypted. It’s about a non-deterministic actor, the LLM, sitting in the middle of security-critical decisions.

The real argument: is “by design” a cop-out?

Back to Anthropic’s “by design.”

The sympathetic reading: a protocol’s job is connection, not trust. STDIO executes the command you hand it the same way a shell executes what you type. That’s the tool doing its job, not a vulnerability. Treat every misuse as a protocol bug to fix and the protocol becomes unusable.

The unsympathetic reading: when your official SDK, across four languages, has been downloaded more than 150 million times, “sanitization is the developer’s responsibility” effectively spreads a systemic risk across a hundred-thousand-plus developers, most of whom have no security team. A standard becomes a standard precisely because people copy its defaults. If the default isn’t safe, it isn’t safe.

I lean toward the second, but let me be precise: the two readings don’t actually contradict each other. A protocol should only be responsible for connection. The problem is that the ecosystem mistook a standard for connection for a standard for trust.

My take: this is a governance gap

If you’ve read what I’ve written here about agents before, this won’t surprise you.

The thing I keep coming back to is that when AI agents go wrong, it’s usually not the model — it’s the missing structure. The MCP security story is that argument at scale. Tool poisoning is prompt injection. A rug pull is scope creep plus the absence of an evidence trail: nobody can point and say “here’s what this tool’s definition was last week, and here’s what it is now.”

MCP didn’t invent a new class of problem. It took a few old ones (blindly trusting tool output, scope creep, no traceability) and industrialized them behind a single standard, at a hundred million downloads.

The capability got handed over; the governance didn’t come with it.

This is the same shape as the no evidence, no completion principle: a thing isn’t trustworthy because it claims to be, it’s trustworthy because you can check it. A standard interface tells you how to connect. It tells you nothing about whether to trust what’s on the other end.

The patches landing now are just governance

The strongest evidence for that read is the toolkit people are reaching for to fix it.

Where is the 2026 MCP spec heading? OAuth 2.1 (mandatory PKCE, no implicit grant), with servers acting as OAuth resource servers. Incremental scope consent, so a client requests only the minimum access per operation. Resource indicators, so a token can’t be reused where it shouldn’t. An explicit human-in-the-loop requirement, with risk annotations and approval flows for destructive operations. And a governed central registry, so tools are registered once and policy is defined once (the MCP authorization spec tracks most of this).

Read that list back and none of it is a protocol patch. It’s least privilege, human approval, and traceability — old governance principles, being reattached to the protocol one at a time. When the remedy looks exactly like governance, that tells you what was missing in the first place.

A few rules if you’re shipping on MCP

If you’re using MCP today (and if you use Claude Code, Cursor, or anything with MCP wired in, you probably are), here’s what I’d do.

Treat every MCP server as untrusted input. Same posture as LLM output: don’t trust by default, trust when you can verify. A third-party server’s tool description deserves the same suspicion as text a user pasted in.

Grant the minimum scope that works. If a wiki-lookup server asks to read your whole filesystem, that’s not a feature, it’s a red flag.

Pin versions and audit third-party servers. A rug pull depends on a silent update, so don’t let updates be silent. Pin versions, log changes to tool definitions.

Keep a human in the loop for anything irreversible. Deletes, outbound requests, database writes: don’t let the agent decide those alone.

Prefer first-party servers. There are tens of thousands of community servers and they’re convenient. “Convenient” is exactly the soil rug pulls and tool poisoning grow in.

The takeaway

MCP is going to stick around. The problem it solves is too real and the network effects are already in place. But don’t read “it won” as “it’s safe.”

A protocol becomes a standard because it’s easy to connect to. Whether the system around it is safe is a separate question, and the answer turns on governance. The 2026 disclosures weren’t MCP failing so much as a reminder of that gap: as we hand AI more capability to invoke on our behalf, the work that’s left is mostly structural. It means building sturdier scaffolding around the models we already have, and governing what they’re allowed to reach.

中文版本：MCP 資安危機：問題出在治理

First, give MCP its due#

Then the security researchers took it apart#

The real argument: is “by design” a cop-out?#

My take: this is a governance gap#

The patches landing now are just governance#

A few rules if you’re shipping on MCP#

The takeaway#

Read next#

More in this thread

Prior art: what distributed systems already knows

Agentjacking: how a fake bug report hijacks your coding agent

Cursor Sold for $60B. What That Price Actually Signals.

uv: the Python tool that replaces pip, venv, and pyenv