Two task cards: a red 'no artifact' card versus a teal card with commit SHA and 47 passing tests as verified evidence

No evidence, no completion

TL;DR: “No evidence, no completion” is a single structural principle: a task isn’t done until the agent produces an artifact that exists outside the conversation and can be checked independently. It sounds trivial. In practice it closes most of the common agent failure modes in one rule, because the act of specifying what evidence looks like, before the task runs, forces you to define what “done” actually means. In the previous post in this series I described an agent that said a feature was done (commit SHA requested, none existed, two of three modules unchanged). The failure had a name: no external completion criterion existed, so the agent supplied its own. That gap has a one-rule fix. ...

2026-05-22 · 6 min read · 1097 words · KbWen · EN
A message queue with a dropped no-ACK message mapped to a passing phase pipeline, showing the same pattern

Prior art: what distributed systems already knows

TL;DR: The governance problems that make AI agents unpredictable (unverified completions, state loss between sessions, unconstrained scope) are structurally identical to problems distributed systems engineering solved with audit logs, delivery acknowledgment, state machines, and least-privilege access. The one genuine difference is non-determinism: an agent given the same open-ended task twice will do something different, which means governance needs to front-load constraints rather than just catch failures after. But the rest of the pattern library applies directly. ...

2026-05-22 · 6 min read · 1130 words · KbWen · EN
Two fractal trees — one chaotic and orange, one orderly and teal — contrasting ungoverned vs governed agent behavior

Why AI Agents Go Wrong: It's Not the Model

TL;DR: “The agent did something wrong” usually gets diagnosed as a model problem. Most of the time it isn’t. Capability failures (wrong reasoning) and governance failures (no structure to catch wrong reasoning) look identical from the outside but need completely different fixes. This post is about telling them apart, and why most teams are currently solving the wrong one. The agent said the feature was done. I asked for the commit SHA. There wasn’t one. When I checked the branch, two of the three modules it described implementing hadn’t changed. ...

2026-05-22 · 9 min read · 1771 words · KbWen · EN