From cebaec23f917777653e12d6d6d7c5d9623594333 Mon Sep 17 00:00:00 2001 From: andrewstellman Date: Thu, 26 Mar 2026 22:37:23 -0400 Subject: [PATCH 1/7] quality-playbook v1.1.0: add regression test generation and startup banner --- skills/quality-playbook/SKILL.md | 13 ++++- .../references/review_protocols.md | 56 +++++++++++++++++++ 2 files changed, 67 insertions(+), 2 deletions(-) diff --git a/skills/quality-playbook/SKILL.md b/skills/quality-playbook/SKILL.md index 93b6f34d8..4d09aa35b 100644 --- a/skills/quality-playbook/SKILL.md +++ b/skills/quality-playbook/SKILL.md @@ -1,15 +1,22 @@ --- name: quality-playbook -description: 'Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions ''quality playbook'', ''spec audit'', ''Council of Three'', ''fitness-to-purpose'', ''coverage theater'', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase.' +description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase." license: Complete terms in LICENSE.txt metadata: - version: 1.0.0 + version: 1.1.0 author: Andrew Stellman github: https://github.com/andrewstellman/ --- # Quality Playbook Generator +**When this skill starts, display this banner before doing anything else:** + +``` +Quality Playbook v1.1.0 — by Andrew Stellman +https://github.com/andrewstellman/ +``` + Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds. ## Why This Exists @@ -231,6 +238,8 @@ Key sections: bootstrap files, focus areas mapped to architecture, and these man - Grep before claiming missing - Do NOT suggest style changes — only flag things that are incorrect +**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests in `quality/test_regression.*` that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol. + ### File 4: `quality/RUN_INTEGRATION_TESTS.md` **Read `references/review_protocols.md`** for the template. diff --git a/skills/quality-playbook/references/review_protocols.md b/skills/quality-playbook/references/review_protocols.md index 11ef0b252..3f3b0cb94 100644 --- a/skills/quality-playbook/references/review_protocols.md +++ b/skills/quality-playbook/references/review_protocols.md @@ -50,6 +50,62 @@ For each file reviewed: - Overall assessment: SHIP IT / FIX FIRST / NEEDS DISCUSSION ``` +### Phase 2: Regression Tests for Confirmed Bugs + +After the code review produces findings, write regression tests that reproduce each BUG finding. This transforms the review from "here are potential bugs" into "here are proven bugs with failing tests." + +**Why this matters:** A code review finding without a reproducer is an opinion. A finding with a failing test is a fact. Across multiple codebases (Go, Rust, Python), regression tests written from code review findings have confirmed bugs at a high rate — including data races, cross-tenant data leaks, state machine violations, and silent context loss. The regression tests also serve as the acceptance criteria for fixing the bugs: when the test passes, the bug is fixed. + +**How to generate regression tests:** + +1. **For each BUG finding**, write a test that: + - Targets the exact code path and line numbers from the finding + - Fails on the current implementation, confirming the bug exists + - Uses mocking/monkeypatching to isolate from external services + - Includes the finding description in the test docstring for traceability + +2. **Name the test file** `quality/test_regression.*` using the project's language: + - Python: `quality/test_regression.py` + - Go: `quality/regression_test.go` (or in the relevant package's test directory) + - Rust: `quality/regression_tests.rs` or a `tests/regression_*.rs` file in the relevant crate + - Java: `quality/RegressionTest.java` + - TypeScript: `quality/regression.test.ts` + +3. **Each test should document its origin:** + ``` + # Python example + def test_webhook_signature_raises_on_malformed_input(): + """[BUG from 2026-03-26-reviewer.md, line 47] + Webhook signature verification raises instead of returning False + on malformed signatures, risking 500 instead of clean 401.""" + + // Go example + func TestRestart_DataRace_DirectFieldAccess(t *testing.T) { + // BUG from 2026-03-26-claude.md, line 3707 + // Restart() writes mutex-protected fields without acquiring the lock + } + ``` + +4. **Run the tests and report results** as a confirmation table: + ``` + | Finding | Test | Result | Confirmed? | + |---------|------|--------|------------| + | Webhook signature raises on malformed input | test_webhook_signature_... | FAILED (expected) | YES — bug confirmed | + | Queued messages deleted before processing | test_message_queue_... | FAILED (expected) | YES — bug confirmed | + | Thread active check fails open | test_is_thread_active_... | PASSED (unexpected) | NO — needs investigation | + ``` + +5. **If a test passes unexpectedly**, investigate — either the finding was a false positive, or the test doesn't exercise the right code path. Report as NEEDS INVESTIGATION, not as a confirmed bug. + +**Language-specific tips:** + +- **Go:** Use `go test -race` to confirm data race findings. The race detector is definitive — if it fires, the race is real. +- **Rust:** Use `#[should_panic]` or assert on specific error conditions. For atomicity bugs, assert on cleanup state after injected failures. +- **Python:** Use `monkeypatch` or `unittest.mock.patch` to isolate external dependencies. Use `pytest.raises` for exception-path bugs. +- **Java:** Use Mockito or similar to isolate dependencies. Use `assertThrows` for exception-path bugs. + +**Save the regression test output** alongside the code review: if the review is at `quality/code_reviews/2026-03-26-reviewer.md`, the regression tests go in `quality/test_regression.*` and the confirmation results go in the review file as an addendum or in `quality/results/`. + ### Why These Guardrails Matter These four guardrails often improve AI code review quality by reducing vague and hallucinated findings: From 57794b77fa4566370f6a07e27d4fa1cb9d44ca70 Mon Sep 17 00:00:00 2001 From: andrewstellman Date: Thu, 26 Mar 2026 22:39:34 -0400 Subject: [PATCH 2/7] Regenerate docs/README.skills.md for quality-playbook v1.1.0 --- docs/README.skills.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.skills.md b/docs/README.skills.md index 59df49100..858ea2730 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -218,7 +218,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [publish-to-pages](../skills/publish-to-pages/SKILL.md) | Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. | `scripts/convert-pdf.py`
`scripts/convert-pptx.py`
`scripts/publish.sh` | | [pytest-coverage](../skills/pytest-coverage/SKILL.md) | Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None | | [python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md) | Generate a complete MCP server project in Python with tools, resources, and proper configuration | None | -| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`
`references/constitution.md`
`references/defensive_patterns.md`
`references/functional_tests.md`
`references/review_protocols.md`
`references/schema_mapping.md`
`references/spec_audit.md`
`references/verification.md` | +| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`
`references/constitution.md`
`references/defensive_patterns.md`
`references/functional_tests.md`
`references/review_protocols.md`
`references/schema_mapping.md`
`references/spec_audit.md`
`references/verification.md` | | [quasi-coder](../skills/quasi-coder/SKILL.md) | Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None | | [readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md) | Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None | | [refactor](../skills/refactor/SKILL.md) | Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None | From 12f0430c9fbd3f3ea0e36b8be46baa87a20868bb Mon Sep 17 00:00:00 2001 From: andrewstellman Date: Sat, 28 Mar 2026 21:57:41 -0400 Subject: [PATCH 3/7] quality-playbook v1.2.0: state machine analysis and missing safeguard detection --- skills/quality-playbook/SKILL.md | 23 ++++++- .../references/defensive_patterns.md | 63 ++++++++++++++++++- 2 files changed, 82 insertions(+), 4 deletions(-) diff --git a/skills/quality-playbook/SKILL.md b/skills/quality-playbook/SKILL.md index 4d09aa35b..b5242ec42 100644 --- a/skills/quality-playbook/SKILL.md +++ b/skills/quality-playbook/SKILL.md @@ -1,9 +1,9 @@ --- name: quality-playbook -description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase." +description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase." license: Complete terms in LICENSE.txt metadata: - version: 1.1.0 + version: 1.2.0 author: Andrew Stellman github: https://github.com/andrewstellman/ --- @@ -13,7 +13,7 @@ metadata: **When this skill starts, display this banner before doing anything else:** ``` -Quality Playbook v1.1.0 — by Andrew Stellman +Quality Playbook v1.2.0 — by Andrew Stellman https://github.com/andrewstellman/ ``` @@ -158,6 +158,21 @@ This is the most important step. Search for defensive code patterns — each one Minimum bar: at least 2–3 defensive patterns per core source file. If you find fewer, you're skimming — read function bodies, not just signatures. +### Step 5a: Trace State Machines + +If the project has any kind of state management — status fields, lifecycle phases, workflow stages, mode flags — trace the state machine completely. This catches a category of bugs that defensive pattern analysis alone misses: states that exist but aren't handled. + +**How to find state machines:** Search for status/state fields in models, enums, or constants (e.g., `status`, `state`, `phase`, `mode`). Search for guards that check status before allowing actions (e.g., `if status == "running"`, `match self.state`). Search for state transitions (assignments to status fields). + +**For each state machine you find:** + +1. **Enumerate all possible states.** Read the enum, the constants, or grep for every value the field is assigned. List them all. +2. **For each consumer of state** (UI handlers, API endpoints, control flow guards), check: does it handle every possible state? A `switch`/`match` without a meaningful default, or an `if/elif` chain that doesn't cover all states, is a gap. +3. **For each state transition**, check: can you reach every state? Are there states you can enter but never leave? Are there states that block operations that should be available? +4. **Record gaps as findings.** A status guard that allows action X for "running" but not for "stuck" is a real bug if the user needs to perform action X on stuck processes. A process that enters a terminal state but never triggers cleanup is a real bug. + +**Why this matters:** State machine gaps produce bugs that are invisible during normal operation but surface under stress or edge conditions — exactly when you need the system to work. A batch processor that can't be killed when it's in "stuck" status, or a watcher that never self-terminates after all work completes, or a UI that refuses to resume a "pending" run, are all symptoms of incomplete state handling. These bugs don't show up in defensive pattern analysis because the code isn't defending against them — it's simply not handling them at all. + ### Step 5b: Map Schema Types If the project has a validation layer (Pydantic models in Python, JSON Schema, TypeScript interfaces/Zod schemas, Java Bean Validation annotations, Scala case class codecs), read the schema definitions now. For every field you found a defensive pattern for, record what the schema accepts vs. rejects. @@ -179,6 +194,8 @@ Every project has a different failure profile. This step uses **two sources** - "What produces correct-looking output that is actually wrong?" — This is the most dangerous class of bug: output that passes all checks but is subtly corrupted. - "What happens at 10x scale that doesn't happen at 1x?" — Chunk boundaries, rate limits, timeout cascading, memory pressure. - "What happens when this process is killed at the worst possible moment?" — Mid-write, mid-transaction, mid-batch-submission. +- "What information does the user need before committing to an irreversible or expensive operation?" — Pre-run cost estimates, confirmation of scope (especially when fan-out or expansion will multiply the work), resource warnings. If the system can silently commit the user to hours of processing or significant cost without showing them what they're about to do, that's a missing safeguard. Search for operations that start long-running processes, submit batch jobs, or trigger expansion/fan-out — and check whether the user sees a preview, estimate, or confirmation with real numbers before the point of no return. +- "What happens when a long-running process finishes — does it actually stop?" — Polling loops, watchers, background threads, and daemon processes that run until completion should have explicit termination conditions. If the loop checks "is there more work?" but never checks "is all work done?", it will run forever after completion. This is especially common in batch processors and queue consumers. Generate realistic failure scenarios from this knowledge. You don't need to have observed these failures — you know from training that they happen to systems of this type. Write them as **architectural vulnerability analyses** with specific quantities and consequences. Frame each as "this architecture permits the following failure mode" — not as a fabricated incident report. Use concrete numbers to make the severity non-negotiable: "If the process crashes mid-write during a 10,000-record batch, `save_state()` without an atomic rename pattern will leave a corrupted state file — the next run gets JSONDecodeError and cannot resume without manual intervention." Then ground them in the actual code you explored: "Read persistence.py line ~340 (save_state): verify temp file + rename pattern." diff --git a/skills/quality-playbook/references/defensive_patterns.md b/skills/quality-playbook/references/defensive_patterns.md index b515013ed..05070576a 100644 --- a/skills/quality-playbook/references/defensive_patterns.md +++ b/skills/quality-playbook/references/defensive_patterns.md @@ -133,8 +133,69 @@ fn test_defensive_pattern_name() { } ``` +## State Machine Patterns + +State machines are a special category of defensive pattern. When you find status fields, lifecycle phases, or mode flags, trace the full state machine — see SKILL.md Step 5a for the complete process. + +**How to find state machines:** + +| Language | Grep pattern | +|---|---| +| Python | `status`, `state`, `phase`, `mode`, `== "running"`, `== "pending"` | +| Java | `enum.*Status`, `enum.*State`, `.getStatus()`, `switch.*status` | +| Scala | `sealed trait.*State`, `case object`, `status match` | +| TypeScript | `status:`, `state:`, `Status =`, `switch.*status` | +| Go | `Status`, `State`, `type.*Phase`, `switch.*status` | +| Rust | `enum.*State`, `enum.*Status`, `match.*state` | + +**For each state machine found:** + +1. List every possible state value (read the enum or grep for assignments) +2. For each handler/consumer that checks state, verify it handles ALL states +3. Look for states you can enter but never leave (terminal state without cleanup) +4. Look for operations that should be available in a state but are blocked by an incomplete guard + +**Converting state machine gaps to scenarios:** + +```markdown +### Scenario N: [Status] blocks [operation] + +**Requirement tag:** [Req: inferred — from handler() status guard] + +**What happened:** The [handler] only allows [operation] when status is "[allowed_states]", but the system can enter "[missing_state]" status (e.g., due to [condition]). When this happens, the user cannot [operation] and has no workaround through the interface. + +**The requirement:** [operation] must be available in all states where the user would reasonably need it, including [missing_state]. + +**How to verify:** Set up a [entity] in "[missing_state]" status. Attempt [operation]. Assert it succeeds or provides a clear error with a workaround. +``` + +## Missing Safeguard Patterns + +Search for operations that commit the user to expensive, irreversible, or long-running work without adequate preview or confirmation: + +| Pattern | What to look for | +|---|---| +| Pre-commit information gap | Operations that start batch jobs, fan-out expansions, or API calls without showing estimated cost, scope, or duration | +| Silent expansion | Fan-out or multiplication steps where the final work count isn't known until runtime, with no warning shown | +| No termination condition | Polling loops, watchers, or daemon processes that check for new work but never check whether all work is done | +| Retry without backoff | Error handling that retries immediately or on a fixed interval without exponential backoff, risking rate limit floods | + +**Converting missing safeguards to scenarios:** + +```markdown +### Scenario N: No [safeguard] before [operation] + +**Requirement tag:** [Req: inferred — from init_run()/start_watch() behavior] + +**What happened:** [Operation] commits the user to [consequence] without showing [missing information]. In practice, a [example] fanned out from [small number] to [large number] units with no warning, resulting in [cost/time consequence]. + +**The requirement:** Before committing to [operation], display [safeguard] showing [what the user needs to see]. + +**How to verify:** Initiate [operation] and assert that [safeguard information] is displayed before the point of no return. +``` + ## Minimum Bar You should find at least 2–3 defensive patterns per source file in the core logic modules. If you find fewer, read function bodies more carefully — not just signatures and comments. -For a medium-sized project (5–15 source files), expect to find 15–30 defensive patterns total. Each one should produce at least one boundary test. +For a medium-sized project (5–15 source files), expect to find 15–30 defensive patterns total. Each one should produce at least one boundary test. Additionally, trace at least one state machine if the project has status/state fields, and check at least one long-running operation for missing safeguards. From 8d1f8c448ba9be961e19a17aee1b48ff8ac242bf Mon Sep 17 00:00:00 2001 From: andrewstellman Date: Sat, 28 Mar 2026 22:44:38 -0400 Subject: [PATCH 4/7] Regenerate docs/README.skills.md for quality-playbook v1.2.0 --- docs/README.skills.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.skills.md b/docs/README.skills.md index 259ea5d84..308d3b34a 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -221,7 +221,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [publish-to-pages](../skills/publish-to-pages/SKILL.md) | Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. | `scripts/convert-pdf.py`
`scripts/convert-pptx.py`
`scripts/publish.sh` | | [pytest-coverage](../skills/pytest-coverage/SKILL.md) | Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None | | [python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md) | Generate a complete MCP server project in Python with tools, resources, and proper configuration | None | -| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`
`references/constitution.md`
`references/defensive_patterns.md`
`references/functional_tests.md`
`references/review_protocols.md`
`references/schema_mapping.md`
`references/spec_audit.md`
`references/verification.md` | +| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`
`references/constitution.md`
`references/defensive_patterns.md`
`references/functional_tests.md`
`references/review_protocols.md`
`references/schema_mapping.md`
`references/spec_audit.md`
`references/verification.md` | | [quasi-coder](../skills/quasi-coder/SKILL.md) | Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None | | [readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md) | Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None | | [refactor](../skills/refactor/SKILL.md) | Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None | From 3ef77d9615f213b96842e888392701a47e0b4d0f Mon Sep 17 00:00:00 2001 From: Andrew Stellman Date: Sat, 28 Mar 2026 22:52:16 -0400 Subject: [PATCH 5/7] Switch SKILL.md description to single quotes per repo convention Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/quality-playbook/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/quality-playbook/SKILL.md b/skills/quality-playbook/SKILL.md index b5242ec42..7fff189bd 100644 --- a/skills/quality-playbook/SKILL.md +++ b/skills/quality-playbook/SKILL.md @@ -1,6 +1,6 @@ --- name: quality-playbook -description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase." +description: 'Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase.' license: Complete terms in LICENSE.txt metadata: version: 1.2.0 From 7c542f10e1c68d4dec25440d2849af709344ab3f Mon Sep 17 00:00:00 2001 From: Andrew Stellman Date: Sat, 28 Mar 2026 22:55:51 -0400 Subject: [PATCH 6/7] Use language-appropriate filenames for regression tests in SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/quality-playbook/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/quality-playbook/SKILL.md b/skills/quality-playbook/SKILL.md index 7fff189bd..859d091f2 100644 --- a/skills/quality-playbook/SKILL.md +++ b/skills/quality-playbook/SKILL.md @@ -255,7 +255,7 @@ Key sections: bootstrap files, focus areas mapped to architecture, and these man - Grep before claiming missing - Do NOT suggest style changes — only flag things that are incorrect -**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests in `quality/test_regression.*` that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol. +**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests using the language-appropriate filenames described in `references/review_protocols.md` (for example, Go: `regression_test.go`, TypeScript: `regression.test.ts`) that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol. ### File 4: `quality/RUN_INTEGRATION_TESTS.md` From 31971f3a72ecf5025366a9eebc158b401e9ff3c3 Mon Sep 17 00:00:00 2001 From: andrewstellman Date: Sat, 28 Mar 2026 23:00:14 -0400 Subject: [PATCH 7/7] Regenerate docs/README.skills.md --- docs/README.skills.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/README.skills.md b/docs/README.skills.md index 308d3b34a..c12c553ca 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -221,7 +221,6 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [publish-to-pages](../skills/publish-to-pages/SKILL.md) | Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. | `scripts/convert-pdf.py`
`scripts/convert-pptx.py`
`scripts/publish.sh` | | [pytest-coverage](../skills/pytest-coverage/SKILL.md) | Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None | | [python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md) | Generate a complete MCP server project in Python with tools, resources, and proper configuration | None | -| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`
`references/constitution.md`
`references/defensive_patterns.md`
`references/functional_tests.md`
`references/review_protocols.md`
`references/schema_mapping.md`
`references/spec_audit.md`
`references/verification.md` | | [quasi-coder](../skills/quasi-coder/SKILL.md) | Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None | | [readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md) | Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None | | [refactor](../skills/refactor/SKILL.md) | Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None |