Posts

LLM Benchmark Saturation Is a Verification Problem

GSM8k at 99%, MMLU at the 88-94% noise band, HLE already in the mid-40s by mid-2026. Each round of harder benchmarks looks like progress, but the field never solved the underlying problem: we measure correlation with a test distribution and call it capability.

Python

Python List Comprehensions: Read Them as For-Loops

A relaxed take on Python list comprehensions: translate them back into the equivalent for-loop, and check what's actually true about variable leaking and speed on Python 3.14.

Python輕鬆讀

Python 列表推導式：一行取代 for 迴圈

用比較白話的方式聊 Python 列表推導式：把它翻回普通的 for 迴圈來看，順便用 Python 3.14 實測一下變數外洩跟效能到底是怎樣。

A three-stage evolution diagram: a small four-line atomic skill on the left, a cluster of overlapping skills in the middle (pattern emerging), and a taller seventeen-line production skill on the right, connected by dashed timeline arrows

AI Systems

The Skill Your Annoyed Prompt Becomes

Your first Claude Code skill won't look like the polished examples in tutorials. It'll look like a prompt you've typed three times in a row, saved into a four-line markdown file. This post walks that minimum shape, shows the three things that break, and compares it to a real seventeen-line production-grade skill from the framework I use daily.

三層演化圖:左邊一個 4 行 atomic skill 草稿,中間幾個 atomic skill 群聚,右邊一個成熟的 17 行 production skill,細線串成時間軸

AI Systems

怎麼寫你的第一個 skill — 從一個煩躁的 prompt 開始

你的第一個 skill 不會長得像書裡那些 production-grade 的成熟形態,它會長得像「你重複打三次的同一個 prompt」。從那裡開始,比從一個成熟框架的 skill 倒著學容易很多。

A cutaway view of two files side by side: a small 13-line dispatcher on the left, a heavier protocol file on the right, connected by a thin line

AI Systems

What a 13-Line Skill Leaves Out

I asked Claude to draft me a skill that calls OpenAI's Codex CLI. It came back as thirteen lines of markdown. The thirteen lines aren't the skill — they point to where the skill actually lives. That split between dispatcher and contract is what separates a skill from a prompt.

兩份檔案的剖視對照:左邊是 13 行的 /codex-cli skill,右邊是它指向的、比較厚的 workflow

AI Systems

13 行的 skill:AI 起稿,我事後才看懂

我請 AI 幫我寫一個能從 Claude Code 呼叫 Codex CLI 的 skill,它給我 13 行 markdown。13 行很小,但 skill 跟 prompt 真正的差別不在這 13 行裡——在它指過去的那一份東西裡。

AI Systems偏技術，給工程的

MCP 資安危機：問題不在協定，而在治理

MCP（Model Context Protocol）一年內成為 AI 業界標準，2026 年卻接連爆出 RCE、tool poisoning、rug pull 等資安漏洞。本文整理多方專家觀點，並提出我的看法：這不是協定的 bug，而是治理的缺口。

AI Systems

MCP Security Isn't a Protocol Bug. It's a Governance Problem.

MCP became the industry's default agent-to-tool interface in barely a year, then 2026 brought a wave of RCE, tool poisoning, and rug-pull disclosures. Weighing the expert debate, my take: these aren't protocol bugs, they're a governance gap.

一份 SKILL.md 檔上疊著 API 合約圖(輸入、輸出、範圍),顯示兩者其實是同一個形狀

AI Systems

Skill 邊界設計:從能力到合約

一個 skill 會多可預測,大概就看它的邊界劃得多清楚。把它當能力清單,它會亂跑;把它當合約(講好輸入、輸出、不碰什麼),它就比較像一個設計良好的 API。