Five days ago we published a self-assessment against the 12-Factor Agents benchmark and scored ourselves 11 out of 12 — in public, with the gap named. Factor 9, the error-handling factor, was the one we hadn't built.

Today it shipped. This is what it does, and why the gap mattered more than a checklist item.

The loop everyone catches

Classic agent loop detection works on repetition. An agent calls the same tool, gets the same result, and tries again. Trajectory reconstruction hashes each step's result; when the same hash repeats across steps, you have a cycle. PromptKing has done this since the trajectory viewer shipped — cycle count, capacity leakage percentage, and an At-Risk escalation the moment a loop crosses your org's threshold.

That catches the agent stuck in a rut. It does not catch the agent falling down the stairs.

The loop nobody catches

Consider an agent that tries a tool call and fails. It retries with a tweaked parameter — fails differently. Retries again — a third, distinct error. Each attempt consumes capacity. Each produces a unique result. No hash ever repeats, so cycle detection sees a busy agent making progress, right up until the invoice says otherwise.

We call this a retry storm, and it is the signature failure mode of production agents in 2026: not the infinite loop, but the infinite variety of failure. The 12-Factor Agents framework's Factor 9 exists precisely because error handling is where agent runs quietly turn into capacity drains.

Detect the cadence, not the content

The fix is a shift in what you measure. Cycle detection asks: is the agent producing the same result? Factor 9 asks: is the agent failing in a row — regardless of how creatively?

PromptKing's implementation reads each trajectory node for a canonical error outcome, counts the longest run of consecutive failures in time order, and compares it to a threshold your org controls (default: three in a row). Cross that line and the trajectory escalates to At-Risk with an explicit, human-readable signal — the streak length, named as a retry storm.

Two design choices matter here.

First, escalation is not enforcement. A Factor 9 breach flows into the same simulate-before-enforce path as every other governance signal: the trajectory is flagged, a policy simulation shows what an action would do, and a human approves before anything real happens. Simulation is the control plane. The circuit breaker is the safety net. Factor 9 never acts on its own — it makes sure a human sees the storm while it's still cheap.

Second, it works with zero schema changes. Detection is pure logic in the trajectory reconstruction layer, reading data the connectors already produce. No new tables, no new data collected, metadata only — the same trust posture as everything else in the platform.

Why publish the gap first

Scoring yourself 11/12 in public and naming the missing factor is uncomfortable. It's also the whole point. Governance that explains itself earns faster adoption — and that applies to the governance vendor too. The benchmark is now 12/12, and the receipt is a dated pair of posts anyone can read in order.

If your agents retry, they can storm. The question is whether anyone is counting.

PromptKing governs AI spend and agent behavior across Anthropic, Microsoft 365 Copilot, GitHub Copilot, Google Gemini, AWS Bedrock, and IBM watsonx — via vendor-native APIs, metadata only. See it at promptking32.com.

Factor 9: The Loop Your Loop Detector Can't See

The loop everyone catches

The loop nobody catches

Detect the cadence, not the content

Why publish the gap first

See your organization's AI spend data