TL;DR: A skill I asked Claude to draft came back as thirteen lines of markdown. Less than a function. The thirteen lines aren’t the skill — they’re a dispatcher pointing to a longer file where the contract actually lives. That split is what separates a skill from a prompt. And the part the model was reliably wrong about — the real interface of the external tool the skill talks to — is the part a human still has to verify.
I have three earlier posts on what an AI skill is. What Makes an AI Skill Different from a Prompt? puts it inside a stack. Skill Design as Interface Design frames it as API design. Skill 邊界設計 is the Chinese sibling on boundary discipline.
This one is different. I’m going to take a real skill apart.
The one I’m picking is /codex-cli. It’s a Claude Code slash command that lets me dispatch a task to OpenAI’s codex CLI from inside my Claude Code session: I type /codex-cli fix the typo in README, the skill takes it from there.
I picked it because it’s small enough to fit on one screen.
The thirteen lines
This is what Claude wrote when I asked for a first version:
# /codex-cli
Execute the canonical workflow: `.agent/workflows/codex-cli.md`
## Execution
Follow every step in `.agent/workflows/codex-cli.md` sequentially.
The user's task description is: $ARGUMENTS
- [OPTIONAL MODULE] Requires globally installed `codex` CLI
(`npm install -g @openai/codex`).
- If CLI is unavailable, inform the user and fall back to native execution.
- End response with ⚡ ACX.
That’s it. Thirteen lines. No prompt engineering, no system message, no chain-of-thought scaffolding. The first time I saw it I assumed I’d asked the wrong question.
It’s actually a skill. Just not all of one.
What the thirteen lines actually do
The header is the trigger — typing the slash command in Claude Code loads this file. The $ARGUMENTS placeholder is where my task description gets substituted in; “fix the typo in README” becomes the input the skill operates on. The two bullets at the bottom are a fallback clause: what happens when the external tool isn’t there.
That’s enough to make the file a skill rather than a prompt. It receives an input, names a protocol to execute, and defines a fallback. A prompt would have padded that space with personas and instructions. This skill is a dispatcher — it points elsewhere and gets out of the way.
The skill lives somewhere else
There’s a quiet line in those thirteen:
Execute the canonical workflow:
.agent/workflows/codex-cli.md
The file it points to is much heavier than thirteen lines. The dispatcher is thin precisely because it outsources the substance.
If you open that workflow, the shape of what’s inside is roughly this: before invoking the external tool, confirm it’s actually installed and authenticated; if not, fall back. Before forwarding the user’s task, wrap it in a small set of guardrails — don’t modify files outside the agreed scope, don’t refactor what wasn’t asked for, stop and ask if the scope is unclear. After the tool returns, read the diff and verify it stayed inside the scope; if it didn’t, roll back the unauthorized changes and record what happened.
That’s the contract. The dispatcher gets the input to it; the contract is what makes the skill behave like a skill.
It’s worth lingering on the difference. A well-written prompt says “please be careful.” A skill says “after you run, read the diff, and if you touched anything I didn’t authorize, undo it.” One is a request to the model. The other is an enforceable check that runs whether or not the model felt careful that day.
What the model got wrong
Now the part that’s harder to write about. The first version of this skill — the thirteen lines and the longer workflow it pointed to — was drafted by Claude. I asked, it produced something that looked professional, and I committed it.
Then I tried to run it.
The flags it used to invoke codex were the right shape, but several of them didn’t exist in the version of the CLI I had installed. The model wasn’t lying. It was confidently extrapolating from how an industrial-strength CLI is supposed to look, producing names that sounded like the flags such a tool ought to have without being the ones this tool actually had. Running codex --help for the first time told a different story than the workflow had assumed. It took a couple of rounds of correction to bring the workflow into alignment with a tool that actually existed.
This is the part of “I had AI draft my skill” that doesn’t usually make it into the writeup. The skill’s shape — dispatcher up front, fallback declared, contract pointed at — was reliably good. The fidelity to the real external interface was reliably suspect.
That divergence has a useful generalization. Models trained on a lot of CLIs develop a strong intuition for what the contract surface of a tool tends to look like, but no way to verify that this particular tool exposes that particular surface.
Collaboration with taste
If you take only one thing from this, take this: letting an agent draft a skill is fine. The skeleton it produces is usually right — the dispatcher, the placeholder, the fallback note, the pointer to a deeper protocol. What it can’t do is confirm, on your behalf, that the names and flags it used line up with the tools you actually have.
That verification is the work. Not because the model is bad at writing skills, but because boundary calibration only happens through contact with real failure modes. A skill that invokes an invented flag misbehaves silently the first time someone runs it. The model has no feedback loop to catch that; you do.
A useful framing has settled out for me from a few of these: AI is good at shape, humans are good at fit. Skill design happens at the seam between them. The agent drafts the structure; the human looks at the structure and asks whether each line it produced refers to something that actually exists.
The dispatcher can be thirteen lines. The contract can’t be lazy. What this implies is that a good skill is often surprisingly small at the dispatcher layer and surprisingly careful at the contract layer — and “I had AI draft this” isn’t a problem as long as the human runs the verification step the model structurally can’t. The visible labor, the thirteen-line file, isn’t where the load is. The load is in what that file points to, and in the moments when you ran the tool, checked the flags against --help, and corrected the parts that were confidently wrong.
One closing note. If you’re about to author your own skill, the conventions differ across tools — Claude Code’s official skills docs, OpenAI’s Codex CLI reference, Cursor’s .cursor/rules, and the AGENTS.md convention each express the dispatcher-and-contract idea slightly differently. Check the source-of-truth docs before committing to a pattern. That includes cross-checking this post — blog posts go stale; the docs don’t wait for you.
Agentic OS is open source: github.com/KbWen/agentic-os
Read next
- What Makes an AI Skill Different from a Prompt? — the stack-level framing this post is built on
- Skill Design as Interface Design — the conceptual sibling: skills as API contracts
- Skill 邊界設計:從能力到合約 — Chinese companion on capability-to-contract boundary discipline