After looking at large methodology systems like Superpowers, it’s worth examining a more focused take: 9arm-skills by thananon — a compact Claude Code plugin with four carefully engineered skills for debugging, post-mortems, code review, and management communication.
What makes this project interesting isn’t the number of skills (just 4), but the craft of each one. These are some of the most precisely-written agent skill files I’ve seen.
The Skills
1. Debug Mantra — Reproducible Debugging as a Forced Process
The debug-mantra skill enforces a four-step discipline that the agent must follow in order before proposing any fix:
- Reproduce reliably — Can the issue be reproduced? If not, stop. Don’t hypothesize. Flaky bugs need rate-raising before they’re debuggable.
- Know the fail path — Debugger first, then source tracing + knob enumeration, then in-code instrumentation. Escalate only when the prior tactic fails.
- Falsify the hypothesis — When a candidate root cause surfaces, run the disproof first. If the hypothesis survives, it’s real. Generate 3-5 ranked hypotheses, not one.
- Every run is a breadcrumb — Maintain a running ledger of every experiment. Before declaring a hypothesis correct, verify it against every prior observation.
The skill has a unique pattern: the agent recites a mantra block verbatim at the start of every debugging session:
- First is reproducibility. Can the issue be reproduced reliably?
- Know the fail path. Debugger first; then source trace + knob enumeration; then in-code instrumentation.
- Question your hypothesis. What would disprove it?
- Every run is a breadcrumb. Cross-reference all of them.
This recitation isn’t for the user — it’s a forcing function for the agent itself. The operating rules are explicit:
- Recite verbatim, never paraphrase
- If the user says “skip the mantra,” skip the recital but still apply the four steps silently
- If you catch yourself proposing a fix without a reliable repro, stop and return to step 1
The discipline about reproducibility is particularly sharp. There are three tiers:
- Reliable repro → proceed
- Flaky repro → raise the rate first (loop, parallelize, inject sleeps) until 50% flake is achieved
- No repro at all → stop. Say so explicitly. Do not proceed to hypothesize.
2. Post-Mortem — Canonical Engineering Records
The post-mortem skill writes the engineering record of a fixed bug. It’s designed to be the artifact that lets future-you recover the mental model fast.
It refuses to draft without four required inputs:
- Reliable repro exists (deterministic or high-rate flake)
- Root cause is known (the mechanism, not a hypothesis)
- Fix is identified (PR/commit/branch pointer)
- Fix is validated (original repro now passes)
The structure has 9 sections, with Summary, Root Cause, Fix, and Validation as mandatory. Code identifiers are first-class citizens — function names, file paths, struct fields, commit SHAs are expected, not stripped. The audience is engineers.
The skill is opinionated about tone:
- Mechanism over narrative — “which function skipped which event under which gate,” not “a synchronization issue”
- No hedging — “We believe” / “appears to” / “may have” are dropped
- Blameless — describe the gap, not the person
- Validation coverage honesty — “validated on Llama-2-70B / 8 GPUs / DeepSpeed; not retested on other workloads” is information, not a hole
The worked example in the skill — a Tada hang in dumbModel — is a masterclass in how to write a post-mortem. It walks through the root cause (scratchBuf == NULL due to a skipped cross-stream event), names the prior fix attempt (PR #5612) and why it was wrong, documents the exact experiment that nailed the cause (forcing numStreams = 2 made the bug disappear), and states validation coverage honestly.
3. Scrutinize — Outsider Code Review
The scrutinize skill performs an end-to-end review of a plan, PR, or code change from an outsider perspective. It’s structured as a 4-step workflow:
- Intent — State the goal in one sentence. If you can’t, stop. Then ask: is there a simpler way? Consider doing nothing, using something that already exists, a smaller change that solves 90% with 10% risk, or solving at a different layer.
- Trace — Walk the actual code path end-to-end, not just the diff lines. Include unchanged code on either side. Bugs hide at the seams.
- Verify — For each claim the change makes, walk the path explicitly and check if it actually produces that behavior. Test edge cases, error paths, concurrent callers.
- Report — Output one tight section per finding, ordered by severity. Close with a one-line verdict: ship / fix-then-ship / rework / reject.
Key operating rules:
- No rubber-stamps — “LGTM” is not an output
- Cite or it didn’t happen — every claim references a specific file:line
- One simpler-alternative pass is mandatory, even on small changes
- No flattery, no hedging — state the finding
The “intent first” step is the most valuable. Before reviewing line-by-line, the skill steps back and asks whether the change should exist at all. This catches the class of bugs where the code is correct but unnecessary.
4. Management Talk — Engineering to Leadership Translation
The management-talk skill rewrites engineer-to-engineer content for engineering-org leadership (VPs, directors, PMs, release managers). It’s the companion to post-mortem — the post-mortem owns the engineering truth, and management-talk reframes it for leadership.
The translation rules are precise:
Keep: Product names, framework names, JIRA keys, PR numbers, customer/workload identifiers. These are the bridge between engineering and leadership tracking.
Strip: Function names, file paths, struct fields, commit SHAs, code expressions, internal data-structure jargon.
Translate: Mechanism into one or two sentences of plain-English cause-and-effect. Don’t strip so much that you lose meaning — “race condition, synchronization, uninitialized buffer, fast-path” are concepts leadership reads fluently.
Don’t: Hedge, restate the obvious, tell leadership how to do their job, or include engineering-process minutiae.
What I Learned
This project is small but sharp. A few takeaways:
-
Recitation as a forcing function. The debug-mantra’s “recite verbatim” pattern is clever — it forces the agent to internalize the process before executing. I’m adding this to my own doppelganger’s debugging workflow.
-
Refusal is a feature. The post-mortem skill refuses to draft without four required inputs. The debug-mantra stops if there’s no repro. Saying “no” is more valuable than producing plausible-sounding output. This is something most agent skill authors get wrong.
-
The post-mortem + management-talk pair. Having two skills that own different versions of the same content (engineering truth vs. leadership summary) is a clean design pattern. The post-mortem keeps code identifiers; management-talk strips them. Each serves its audience without compromise.
-
Output format is part of the spec. The scrutinize skill specifies exact output format (Finding + Why it matters + Evidence + Suggested change). The post-mortem specifies 9 sections with mandatory vs. conditional markers. This makes the agent’s output predictable and reviewable.
If you’re building Claude Code skills or agent workflows, I’d recommend reading through the 9arm-skills source. At only 4 skills, it’s digestible in one sitting — and each skill contains design patterns worth borrowing.