Governance

Editor's picks

Why AI Agents Fail in Production

Why AI agents fail: most failures trace to governance gaps (phase gates, state handoffs, capability boundaries) more than to the model itself, and the two need completely different fixes. How to tell them apart.

Governance

See all 17

AI Systems 看中文版

When an AI says "done," ask it to show you

An AI's 'done' sounds the same whether the work happened or not. The fix is one small habit: don't take its word for it, ask it to show you a result you can check yourself, sized to the task.

AI Systems Read in English

AI 說「完成了」，怎麼確認它真的做完？

AI 回報「完成了」的時候，真的做完、做一半繞過去、方向整個誤會，那段話讀起來幾乎一樣。與其判斷那句話可不可信，不如養成一個反射：給我看一個我自己查得到的東西，commit、測試輸出、diff。

AI Systems 看中文版

Claude Fable 5: First Public Mythos-Class Model, One Day In

Anthropic released Claude Fable 5 on June 9 — the first publicly available Mythos-class model, one tier above Opus. What it is, what it costs, the June 22 deadline on the subscription window, and what changed when I pointed three real projects at it for a day.

AI Systems Read in English

Claude Fable 5 是什麼？第一個公開的 Mythos 級模型，加上我第一天的使用心得

Anthropic 6/9 釋出第一個公開的 Mythos 級模型 Claude Fable 5。這篇整理它跟 Opus 4.8 的關係、定價、6/22 截止的訂閱免費期，加上第一天把三個專案丟給它跑的心得：它對治理流程的遵守程度是真的，token 也是真的兇。

AI Systems

怎麼讓 AI agent 照流程走:閘門只記帳,不攔人

流程裡那些閘門其實不在執行時擋住 AI agent,它要的是一張改不掉的收據。真正有牙齒的不是閘門,是記錄抹不掉、賴不掉。

AI Systems Read in English

Benchmark 飽和，其實是個驗證問題

GSM8k 99%、MMLU 90 出頭、HLE 在 2026 年中已進入 40 分檔。每出一份『更難的 benchmark』看起來都在解決問題，但結構性的事沒變：我們從來沒在驗證模型學會了什麼，只是在量它有沒有看過。

Agent

See all 15

AI Systems 看中文版

When an AI says "done," ask it to show you

An AI's 'done' sounds the same whether the work happened or not. The fix is one small habit: don't take its word for it, ask it to show you a result you can check yourself, sized to the task.

AI Systems Read in English

怎麼讓 AI agent 照流程走:閘門只記帳,不攔人

流程裡那些閘門其實不在執行時擋住 AI agent,它要的是一張改不掉的收據。真正有牙齒的不是閘門,是記錄抹不掉、賴不掉。

AI Systems偏技術，給工程的Read in English

MCP 資安危機：問題出在治理

MCP（Model Context Protocol）一年內成為 AI 業界標準，2026 年卻接連爆出 RCE、tool poisoning、rug pull 等資安漏洞。本文整理多方專家觀點，並提出我的看法：真正要補的是治理這一層。

Agentic OS

See all 10

AI Systems 看中文版

When an AI says "done," ask it to show you

An AI's 'done' sounds the same whether the work happened or not. The fix is one small habit: don't take its word for it, ask it to show you a result you can check yourself, sized to the task.

AI Systems Read in English

AI 說「完成了」，怎麼確認它真的做完？

AI Systems

怎麼讓 AI agent 照流程走:閘門只記帳,不攔人

流程裡那些閘門其實不在執行時擋住 AI agent,它要的是一張改不掉的收據。真正有牙齒的不是閘門,是記錄抹不掉、賴不掉。

AI Systems Read in English

Token 成本的真相:分級,但別分太細

把 token 當設計變數而非月底帳單:粗放的任務成本沒有上限,但分太細也不會更省。快取讓過度切分反而更貴,重點是找到對的顆粒度。

AI Systems 看中文版

Token Economics of AI Agent Governance

Governance has a bounded, knowable token cost; ungoverned agent work tends not to. And task granularity has its own price: caching can make over-decomposition cost more than it looks.

AI Systems

No evidence, no completion

No evidence, no completion: the one rule that closes most AI agent failures. A task isn't done until it produces a verifiable artifact (commit SHA, test output).

Claude Code

See all 9

AI Systems 看中文版

When an AI says "done," ask it to show you

An AI's 'done' sounds the same whether the work happened or not. The fix is one small habit: don't take its word for it, ask it to show you a result you can check yourself, sized to the task.

AI Systems Read in English

AI 說「完成了」，怎麼確認它真的做完？

AI Systems 看中文版

Claude Fable 5: First Public Mythos-Class Model, One Day In

AI Systems Read in English

Claude Fable 5 是什麼？第一個公開的 Mythos 級模型，加上我第一天的使用心得

AI Systems

No evidence, no completion

No evidence, no completion: the one rule that closes most AI agent failures. A task isn't done until it produces a verifiable artifact (commit SHA, test output).

AI Systems

Work Log：跨 session 的記憶機制

AI 代理每個新對話都失憶？Work Log 用一份 markdown 記錄任務進度與決策，讓 Claude Code 跨 session 接續，不用每次重講背景。

Architecture

AI Systems

怎麼讓 AI agent 照流程走:閘門只記帳,不攔人

流程裡那些閘門其實不在執行時擋住 AI agent,它要的是一張改不掉的收據。真正有牙齒的不是閘門,是記錄抹不掉、賴不掉。

AI Systems偏技術，給工程的Read in English

MCP 資安危機：問題出在治理

AI Systems 看中文版

MCP Security Is a Governance Problem

MCP became the industry's default agent-to-tool interface in barely a year, then 2026 brought a wave of RCE, tool poisoning, and rug-pull disclosures. Weighing the expert debate, my take: the real exposure is a governance gap that better protocol design alone won't close.

AI Systems

Prior art: what distributed systems already knows

AI agent governance maps onto distributed systems patterns: audit logs, delivery acknowledgment, idempotency, least privilege. The prior art already exists.

LLM

AI Systems 看中文版

Claude Fable 5: First Public Mythos-Class Model, One Day In

AI Systems Read in English

Claude Fable 5 是什麼？第一個公開的 Mythos 級模型，加上我第一天的使用心得

AI Systems Read in English

Benchmark 飽和，其實是個驗證問題

AI Systems 看中文版

LLM Benchmark Saturation Is a Verification Problem

GSM8k at 99%, MMLU at the 88-94% noise band, HLE already in the mid-40s by mid-2026. Each round of harder benchmarks looks like progress, but the field never solved the underlying problem: we measure correlation with a test distribution and call it capability.

Token Economics

AI Systems 看中文版

Benchmarks

AI Systems Read in English

Benchmark 飽和，其實是個驗證問題

AI Systems 看中文版

LLM Benchmark Saturation Is a Verification Problem

All posts (17)

When an AI says "done," ask it to show you
AI 說「完成了」，怎麼確認它真的做完？
Claude Fable 5: First Public Mythos-Class Model, One Day In
Claude Fable 5 是什麼？第一個公開的 Mythos 級模型，加上我第一天的使用心得
怎麼讓 AI agent 照流程走:閘門只記帳,不攔人
Benchmark 飽和，其實是個驗證問題
LLM Benchmark Saturation Is a Verification Problem
MCP 資安危機：問題出在治理
MCP Security Is a Governance Problem
Token 成本的真相:分級,但別分太細
Token Economics of AI Agent Governance
No evidence, no completion
Work Log：跨 session 的記憶機制
Prior art: what distributed systems already knows
只用 Prompt 和技能，也能做到基本治理
Why AI Agents Fail in Production
AI 代理常見痛點與我們的嘗試