fix: address high + medium priority audit findings from #234#237
Merged
Alan-Jowett merged 3 commits intomicrosoft:mainfrom Apr 9, 2026
Merged
Conversation
High priority: - P3-001: Compress self-verification Rule 2 — replace restated content with cross-reference to anti-hallucination Rules 1-4 (~7,400 token savings) - P4-001: Decompose review-code abstraction check into 3 observable checks Medium — guardrail determinism (74x blast radius): - P4-005/P4-006: Replace 'reasonable conclusion' with 'stated chain of logical steps'; anchor 'low confidence' to ASSUMED premise count - P4-008: Operationalize 'structurally similar' as same headings, same item count (+-20%), same classification labels - P3-002/P1-001: Deduplicate 4-field coverage statement — self-verification Rule 3 now cross-references operational-constraints Rule 9 (~1,140 token savings) - P4-002/P4-003: Replace subjective maintainability checks with 4 readability indicators and 4 design violation checks Medium — structural: - P1-002: Update adversarial-falsification applicable_to to match actual template consumers - P1-003: Document 4 orphan protocols as user-composable with intended use cases - P1-004: Merge plan-implementation and plan-refactoring into single template with mode parameter Closes microsoft#234 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses the High and Medium priority findings from library health audit #234 by tightening guardrail determinism, deduplicating overlapping guardrail content (token savings), and cleaning up structural inconsistencies (template/protocol applicability and template consolidation).
Changes:
- Compressed/deduplicated guardrail text by cross-referencing existing protocols (notably
self-verification→anti-hallucinationandoperational-constraints). - Replaced subjective checks with more operationalized criteria in
review-codeand guardrails. - Consolidated planning templates by merging refactoring planning into
plan-implementationand removingplan-refactoring.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/review-code.md | Replaces subjective maintainability prompts with a more checkable checklist. |
| templates/plan-refactoring.md | Deleted as part of consolidation into plan-implementation. |
| templates/plan-implementation.md | Adds mode to support implementation vs refactoring planning in one template. |
| protocols/reasoning/fixed-point-verification.md | Marks protocol as user-composable and documents intended use. |
| protocols/guardrails/tool-reliability-defense.md | Marks protocol as user-composable and documents intended use. |
| protocols/guardrails/self-verification.md | Deduplicates citation/coverage guidance via cross-references; tightens determinism definition. |
| protocols/guardrails/input-clarity-gate.md | Marks protocol as user-composable and documents intended use. |
| protocols/guardrails/definition-of-done.md | Marks protocol as user-composable and documents intended use. |
| protocols/guardrails/anti-hallucination.md | Replaces subjective wording with more explicit/deterministic criteria. |
| protocols/guardrails/adversarial-falsification.md | Updates applicable_to list to reflect actual template consumers. |
| manifest.yaml | Removes plan-refactoring template entry; updates plan-implementation description accordingly. |
- self-verification: add epistemic label as explicit remediation option
in Citation Audit (not just citation or removal)
- applicable_to: revert 'composable' sentinel to '[]' per CONTRIBUTING.md
convention (definition-of-done, input-clarity-gate,
tool-reliability-defense, fixed-point-verification)
- review-code: replace non-observable maintainability checks ('10-second
scan', '3 unrelated concepts') with concrete, checkable criteria
- plan-implementation: add explicit mode validation (only 'implementation'
or 'refactoring'; default to 'implementation'), update input_contract
type to include source-code for refactoring mode
- docs: update CATALOG.md, README.md, getting-started.md to reference
plan-implementation with mode=refactoring instead of deleted
plan-refactoring template
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The anti-hallucination protocol uses [ASSUMPTION] as the inline tag for the ASSUMED category. Using [ASSUMED] here conflicted with the canonical tag set and introduced inconsistency within the same sentence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses all High and Medium priority findings from the library health audit (#234). 11 files changed across protocols, templates, and manifest — net token savings of ~8,500 tokens across the guardrail corpus (amplified 74× across all templates).
Changes
High Priority
self-verification.mdanti-hallucinationRules 1–4 instead of restating them. ~7,400 token savings.review-code.mdMedium — Guardrail Determinism (74× blast radius)
anti-hallucination.mdself-verification.mdself-verification.mdoperational-constraintsRule 9. ~1,140 token savings.review-code.mdMedium — Structural
adversarial-falsification.mdapplicable_toupdated to match 6 actual template consumers (was listing 3 non-consumers, missing 5 real ones).definition-of-done,tool-reliability-defense,input-clarity-gate,fixed-point-verification—applicable_toupdated from[]tocomposablewith documented intended use cases.plan-implementation.mdplan-refactoringmerged intoplan-implementationwith amodeparameter (implementation|refactoring).plan-refactoring.mddeleted.Validation
python tests/validate-manifest.py— manifest ↔ template protocol sync passesNot Addressed (Low priority — deferred)
P1-005, P1-006, P1-007, P1-010, P1-011, P3-005, P3-006 — per scope decision.
Closes #234