Back to blog
AI Trending May 31, 2026 · 6 min read

9arm Skills — Debug Mantras, Post-Mortems, and Code Review as Agent Skills

How thananon's compact Claude Code plugin turns debugging from a guessing game into a repeatable four-step discipline, and code reviews into structured outsider scrutiny.

#Claude Code #debugging #code review #agent skills #post-mortem #developer tools

After looking at large methodology systems like Superpowers, it’s worth examining a more focused take: 9arm-skills by thananon — a compact Claude Code plugin with four carefully engineered skills for debugging, post-mortems, code review, and management communication.

What makes this project interesting isn’t the number of skills (just 4), but the craft of each one. These are some of the most precisely-written agent skill files I’ve seen.

The Skills

1. Debug Mantra — Reproducible Debugging as a Forced Process

The debug-mantra skill enforces a four-step discipline that the agent must follow in order before proposing any fix:

  1. Reproduce reliably — Can the issue be reproduced? If not, stop. Don’t hypothesize. Flaky bugs need rate-raising before they’re debuggable.
  2. Know the fail path — Debugger first, then source tracing + knob enumeration, then in-code instrumentation. Escalate only when the prior tactic fails.
  3. Falsify the hypothesis — When a candidate root cause surfaces, run the disproof first. If the hypothesis survives, it’s real. Generate 3-5 ranked hypotheses, not one.
  4. Every run is a breadcrumb — Maintain a running ledger of every experiment. Before declaring a hypothesis correct, verify it against every prior observation.

The skill has a unique pattern: the agent recites a mantra block verbatim at the start of every debugging session:

  1. First is reproducibility. Can the issue be reproduced reliably?
  2. Know the fail path. Debugger first; then source trace + knob enumeration; then in-code instrumentation.
  3. Question your hypothesis. What would disprove it?
  4. Every run is a breadcrumb. Cross-reference all of them.

This recitation isn’t for the user — it’s a forcing function for the agent itself. The operating rules are explicit:

The discipline about reproducibility is particularly sharp. There are three tiers:

2. Post-Mortem — Canonical Engineering Records

The post-mortem skill writes the engineering record of a fixed bug. It’s designed to be the artifact that lets future-you recover the mental model fast.

It refuses to draft without four required inputs:

The structure has 9 sections, with Summary, Root Cause, Fix, and Validation as mandatory. Code identifiers are first-class citizens — function names, file paths, struct fields, commit SHAs are expected, not stripped. The audience is engineers.

The skill is opinionated about tone:

The worked example in the skill — a Tada hang in dumbModel — is a masterclass in how to write a post-mortem. It walks through the root cause (scratchBuf == NULL due to a skipped cross-stream event), names the prior fix attempt (PR #5612) and why it was wrong, documents the exact experiment that nailed the cause (forcing numStreams = 2 made the bug disappear), and states validation coverage honestly.

3. Scrutinize — Outsider Code Review

The scrutinize skill performs an end-to-end review of a plan, PR, or code change from an outsider perspective. It’s structured as a 4-step workflow:

  1. Intent — State the goal in one sentence. If you can’t, stop. Then ask: is there a simpler way? Consider doing nothing, using something that already exists, a smaller change that solves 90% with 10% risk, or solving at a different layer.
  2. Trace — Walk the actual code path end-to-end, not just the diff lines. Include unchanged code on either side. Bugs hide at the seams.
  3. Verify — For each claim the change makes, walk the path explicitly and check if it actually produces that behavior. Test edge cases, error paths, concurrent callers.
  4. Report — Output one tight section per finding, ordered by severity. Close with a one-line verdict: ship / fix-then-ship / rework / reject.

Key operating rules:

The “intent first” step is the most valuable. Before reviewing line-by-line, the skill steps back and asks whether the change should exist at all. This catches the class of bugs where the code is correct but unnecessary.

4. Management Talk — Engineering to Leadership Translation

The management-talk skill rewrites engineer-to-engineer content for engineering-org leadership (VPs, directors, PMs, release managers). It’s the companion to post-mortem — the post-mortem owns the engineering truth, and management-talk reframes it for leadership.

The translation rules are precise:

Keep: Product names, framework names, JIRA keys, PR numbers, customer/workload identifiers. These are the bridge between engineering and leadership tracking.

Strip: Function names, file paths, struct fields, commit SHAs, code expressions, internal data-structure jargon.

Translate: Mechanism into one or two sentences of plain-English cause-and-effect. Don’t strip so much that you lose meaning — “race condition, synchronization, uninitialized buffer, fast-path” are concepts leadership reads fluently.

Don’t: Hedge, restate the obvious, tell leadership how to do their job, or include engineering-process minutiae.

What I Learned

This project is small but sharp. A few takeaways:

  1. Recitation as a forcing function. The debug-mantra’s “recite verbatim” pattern is clever — it forces the agent to internalize the process before executing. I’m adding this to my own doppelganger’s debugging workflow.

  2. Refusal is a feature. The post-mortem skill refuses to draft without four required inputs. The debug-mantra stops if there’s no repro. Saying “no” is more valuable than producing plausible-sounding output. This is something most agent skill authors get wrong.

  3. The post-mortem + management-talk pair. Having two skills that own different versions of the same content (engineering truth vs. leadership summary) is a clean design pattern. The post-mortem keeps code identifiers; management-talk strips them. Each serves its audience without compromise.

  4. Output format is part of the spec. The scrutinize skill specifies exact output format (Finding + Why it matters + Evidence + Suggested change). The post-mortem specifies 9 sections with mandatory vs. conditional markers. This makes the agent’s output predictable and reviewable.

If you’re building Claude Code skills or agent workflows, I’d recommend reading through the 9arm-skills source. At only 4 skills, it’s digestible in one sitting — and each skill contains design patterns worth borrowing.


← All posts dot8pixels.dev