Skip to content

fix: address high + medium priority audit findings from #234#237

Merged
Alan-Jowett merged 3 commits intomicrosoft:mainfrom
Alan-Jowett:fix/issue-234-audit-findings
Apr 9, 2026
Merged

fix: address high + medium priority audit findings from #234#237
Alan-Jowett merged 3 commits intomicrosoft:mainfrom
Alan-Jowett:fix/issue-234-audit-findings

Conversation

@Alan-Jowett
Copy link
Copy Markdown
Member

Summary

Addresses all High and Medium priority findings from the library health audit (#234). 11 files changed across protocols, templates, and manifest — net token savings of ~8,500 tokens across the guardrail corpus (amplified 74× across all templates).

Changes

High Priority

Finding File Change
P3-001 self-verification.md Rule 2 (Citation Audit) compressed to cross-reference anti-hallucination Rules 1–4 instead of restating them. ~7,400 token savings.
P4-001 review-code.md Subjective abstraction check ("not too much, not too little") decomposed into 3 observable checks: >3 unrelated concepts, duplicates sibling logic, unclear responsibility.

Medium — Guardrail Determinism (74× blast radius)

Finding File Change
P4-005/006 anti-hallucination.md L27: "reasonable conclusion" → "stated chain of logical steps". L48: "low confidence" anchored to ≥2 ASSUMED premises from Rule 1.
P4-008 self-verification.md Rule 6: "structurally similar" operationalized as "same section headings, same item count (±20%), same classification labels".
P3-002/P1-001 self-verification.md Rule 3 coverage statement deduplicated — now cross-references operational-constraints Rule 9. ~1,140 token savings.
P4-002/003 review-code.md Subjective maintainability checks replaced with 4 readability indicators + 4 design violation checks.

Medium — Structural

Finding File Change
P1-002 adversarial-falsification.md applicable_to updated to match 6 actual template consumers (was listing 3 non-consumers, missing 5 real ones).
P1-003 4 orphan protocols definition-of-done, tool-reliability-defense, input-clarity-gate, fixed-point-verificationapplicable_to updated from [] to composable with documented intended use cases.
P1-004 plan-implementation.md plan-refactoring merged into plan-implementation with a mode parameter (implementation | refactoring). plan-refactoring.md deleted.

Validation

  • python tests/validate-manifest.py — manifest ↔ template protocol sync passes
  • ✅ All edits preserve existing protocol independence (no conflicting instructions when composed)

Not Addressed (Low priority — deferred)

P1-005, P1-006, P1-007, P1-010, P1-011, P3-005, P3-006 — per scope decision.

Closes #234

High priority:
- P3-001: Compress self-verification Rule 2 — replace restated content
  with cross-reference to anti-hallucination Rules 1-4 (~7,400 token savings)
- P4-001: Decompose review-code abstraction check into 3 observable checks

Medium — guardrail determinism (74x blast radius):
- P4-005/P4-006: Replace 'reasonable conclusion' with 'stated chain of
  logical steps'; anchor 'low confidence' to ASSUMED premise count
- P4-008: Operationalize 'structurally similar' as same headings, same
  item count (+-20%), same classification labels
- P3-002/P1-001: Deduplicate 4-field coverage statement — self-verification
  Rule 3 now cross-references operational-constraints Rule 9 (~1,140 token savings)
- P4-002/P4-003: Replace subjective maintainability checks with 4 readability
  indicators and 4 design violation checks

Medium — structural:
- P1-002: Update adversarial-falsification applicable_to to match actual
  template consumers
- P1-003: Document 4 orphan protocols as user-composable with intended
  use cases
- P1-004: Merge plan-implementation and plan-refactoring into single
  template with mode parameter

Closes microsoft#234

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 8, 2026 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the High and Medium priority findings from library health audit #234 by tightening guardrail determinism, deduplicating overlapping guardrail content (token savings), and cleaning up structural inconsistencies (template/protocol applicability and template consolidation).

Changes:

  • Compressed/deduplicated guardrail text by cross-referencing existing protocols (notably self-verificationanti-hallucination and operational-constraints).
  • Replaced subjective checks with more operationalized criteria in review-code and guardrails.
  • Consolidated planning templates by merging refactoring planning into plan-implementation and removing plan-refactoring.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
templates/review-code.md Replaces subjective maintainability prompts with a more checkable checklist.
templates/plan-refactoring.md Deleted as part of consolidation into plan-implementation.
templates/plan-implementation.md Adds mode to support implementation vs refactoring planning in one template.
protocols/reasoning/fixed-point-verification.md Marks protocol as user-composable and documents intended use.
protocols/guardrails/tool-reliability-defense.md Marks protocol as user-composable and documents intended use.
protocols/guardrails/self-verification.md Deduplicates citation/coverage guidance via cross-references; tightens determinism definition.
protocols/guardrails/input-clarity-gate.md Marks protocol as user-composable and documents intended use.
protocols/guardrails/definition-of-done.md Marks protocol as user-composable and documents intended use.
protocols/guardrails/anti-hallucination.md Replaces subjective wording with more explicit/deterministic criteria.
protocols/guardrails/adversarial-falsification.md Updates applicable_to list to reflect actual template consumers.
manifest.yaml Removes plan-refactoring template entry; updates plan-implementation description accordingly.

- self-verification: add epistemic label as explicit remediation option
  in Citation Audit (not just citation or removal)
- applicable_to: revert 'composable' sentinel to '[]' per CONTRIBUTING.md
  convention (definition-of-done, input-clarity-gate,
  tool-reliability-defense, fixed-point-verification)
- review-code: replace non-observable maintainability checks ('10-second
  scan', '3 unrelated concepts') with concrete, checkable criteria
- plan-implementation: add explicit mode validation (only 'implementation'
  or 'refactoring'; default to 'implementation'), update input_contract
  type to include source-code for refactoring mode
- docs: update CATALOG.md, README.md, getting-started.md to reference
  plan-implementation with mode=refactoring instead of deleted
  plan-refactoring template

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

The anti-hallucination protocol uses [ASSUMPTION] as the inline tag for
the ASSUMED category. Using [ASSUMED] here conflicted with the canonical
tag set and introduced inconsistency within the same sentence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Alan-Jowett Alan-Jowett merged commit 8363a14 into microsoft:main Apr 9, 2026
4 checks passed
@Alan-Jowett Alan-Jowett deleted the fix/issue-234-audit-findings branch April 9, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Library health audit findings: guardrail compression, determinism fixes, structural cleanup

2 participants