diff --git a/.flowr/flows/planning-flow.yaml b/.flowr/flows/planning-flow.yaml
index e956196..5fcb4fa 100644
--- a/.flowr/flows/planning-flow.yaml
+++ b/.flowr/flows/planning-flow.yaml
@@ -143,7 +143,7 @@ states:
       out: []
     conditions:
       feature-baselined:
-        feature-status: ==BASELINED
+        baseline-confirmed: ==verified
       committed-to-main-locally:
         committed-to-main-locally: ==verified
     next:
diff --git a/.opencode/knowledge/requirements/feature-discovery.md b/.opencode/knowledge/requirements/feature-discovery.md
index 0b77355..bf05845 100644
--- a/.opencode/knowledge/requirements/feature-discovery.md
+++ b/.opencode/knowledge/requirements/feature-discovery.md
@@ -13,7 +13,7 @@ last-updated: 2026-05-08
 - Feature boundaries respect bounded context borders, aggregate transactional boundaries, and module dependency order per [[requirements/feature-boundaries]]. Features that span boundaries are flagged for splitting.
 - Rules are derived systematically from three sources: domain events, aggregate invariants, and commands per [[requirements/rule-derivation]]. Every rule traces to at least one domain model artifact.
 - Gaps discovered during feature discovery (a bounded context with no feature, a quality attribute with no enforcing feature, a domain event with no corresponding rule) are flagged, not silently filled.
-- Features have a lifecycle of increasing specificity: `Status: ELICITING` through discovery and breakdown, advancing to `BASELINED` after baseline confirmation.
+- Features progress through a lifecycle of increasing specificity: an empty file with a description → coarse Rules (Business) and Constraints → full Rule blocks with @id-tagged Examples.
 
 ## Concepts
 
@@ -26,9 +26,9 @@ last-updated: 2026-05-08
 **Gap Analysis**: Systematically verify coverage across three dimensions: (1) every bounded context from the domain model is covered by at least one feature, (2) every quality attribute from the product definition is enforced by at least one feature's constraints, (3) every critical domain event is traceable to at least one business rule. Uncovered areas indicate missing features or gaps in the domain model itself. Flag both.
 
 **Feature Lifecycle**: Features follow a lifecycle of increasing specificity across phases:
-1. **Discovery**: Feature boundaries identified, coarse business rules written, constraints scoped. Status: ELICITING.
-2. **Breakdown**: Coarse rules expanded into full Rule blocks with As a/I want/So that format. INVEST validation applied. Status remains ELICITING.
-3. **Example Writing and Baseline**: Given/When/Then Examples written, pre-mortems applied, baseline confirmed. Status advances to BASELINED.
+1. **Discovery**: Feature boundaries identified, coarse business rules written, constraints scoped.
+2. **Breakdown**: Coarse rules expanded into full Rule blocks with As a/I want/So that format. INVEST validation applied.
+3. **Example Writing and Baseline**: Given/When/Then Examples written, pre-mortems applied, baseline confirmed (feature now has @id-tagged Examples).
 
 ## Content
 
@@ -36,7 +36,7 @@ last-updated: 2026-05-08
 
 Feature discovery is two sequential activities:
 
-1. **Boundary identification** (discover-features skill): Use the delivery order as backbone. Map each step to bounded contexts and aggregates from the domain model. Split candidates that span contexts or aggregates. Name features and write descriptions per [[requirements/feature-boundaries]]. Create .feature files with title, description, Status: ELICITING, and an empty Questions table.
+1. **Boundary identification** (discover-features skill): Use the delivery order as backbone. Map each step to bounded contexts and aggregates from the domain model. Split candidates that span contexts or aggregates. Name features and write descriptions per [[requirements/feature-boundaries]]. Create .feature files with title, description, and an empty Questions table.
 
 2. **Rule derivation** (discover-rules skill): For each feature, assign domain model artifacts (entities, events, invariants, commands) based on bounded context membership. Derive behavioral rules from events, structural rules from invariants, and action rules from commands per [[requirements/rule-derivation]]. Map quality attributes to constraints. Write coarse Rules (Business) bullets and Constraints into each .feature file.
 
diff --git a/.opencode/knowledge/requirements/gherkin.md b/.opencode/knowledge/requirements/gherkin.md
index 4317aa4..b4ac731 100644
--- a/.opencode/knowledge/requirements/gherkin.md
+++ b/.opencode/knowledge/requirements/gherkin.md
@@ -13,6 +13,7 @@ last-updated: 2026-04-29
 - `Then` must be a single, observable, measurable outcome; no "and" combining multiple behaviours in one `Then`.
 - Bug Examples use `@bug` and require both a specific feature test and a Hypothesis property test.
 - After criteria commit, Examples are frozen; changes require `@deprecated` on the old Example and a new Example with a new `@id`.
+- Two Examples with the same `Then` outcome but different input values test the same behaviour; partition by behaviour outcome, not by input value (Wynne, 2015; Adzic, 2011).
 
 ## Concepts
 
@@ -26,6 +27,8 @@ last-updated: 2026-04-29
 
 **Bug Examples**: When a defect is reported, add an `@bug` Example. Implement both a specific `@id` test and a Hypothesis property test covering the whole class of inputs. Both are required.
 
+**Behavioral Distinctness**: Two Examples are behavior-distinct only when they produce different `Then` outcomes (Wynne, 2015; Adzic, 2011). Partitioning by behaviour outcome rather than by input value avoids the combinatorial explosion of value-distinct testing. Two Examples with the same `Then` but different input values test the same behaviour — keep one, discard the duplicates. For action and behavioural rules, each distinct outcome gets one representative Example. For structural (invariant) rules, one representative Example suffices because the invariant holds across all inputs; full coverage is deferred to a Hypothesis property test per [[software-craft/test-design#concepts]].
+
 ## Content
 
 ### Declarative vs Imperative
@@ -84,6 +87,7 @@ Implement both:
 - Multiple behaviours in one Example: split them
 - Examples that test implementation details ("Then: the Strategy pattern is used")
 - Imperative UI steps instead of declarative behaviour descriptions
+- Two examples with the same `Then` but different input values: duplicate behaviour coverage per [[requirements/gherkin#concepts]]
 
 ### Feature File Path Convention
 
@@ -98,4 +102,5 @@ Test path conventions (`tests/features/<feature_slug>/`), the feature-test vs un
 - [[requirements/invest]]: story quality criteria for rules
 - [[requirements/moscow]]: prioritizing Examples as Must/Should/Could
 - [[requirements/decomposition]]: splitting Rules with too many Examples
-- [[requirements/pre-mortem]]: finding hidden failure modes in rules
\ No newline at end of file
+- [[requirements/pre-mortem]]: finding hidden failure modes in rules
+- [[software-craft/test-design]]: property-based testing for invariant rules
\ No newline at end of file
diff --git a/.opencode/knowledge/requirements/pre-mortem.md b/.opencode/knowledge/requirements/pre-mortem.md
index 3643d91..47dec2a 100644
--- a/.opencode/knowledge/requirements/pre-mortem.md
+++ b/.opencode/knowledge/requirements/pre-mortem.md
@@ -9,8 +9,9 @@ last-updated: 2026-04-29
 ## Key Takeaways
 
 - Prospective hindsight catches approximately 30% more issues than forward-looking review (Klein, 1998); frame the question as "it already failed: why?" to activate explanation mode.
-- Apply the pre-mortem at three levels of granularity: specification (missing observable behaviours), architecture (design principle violations), and implementation (design self-declaration).
+- Apply the pre-mortem at four levels of granularity: specification (missing observable behaviours), behavior (failure modes per distinct outcome), architecture (design principle violations), and implementation (design self-declaration).
 - At specification: "Imagine this feature was built exactly as described, all tests pass, but it doesn't work for the user. What would be missing?"
+- At behavior: "Imagine this specific behaviour went wrong in production — how?" Run per distinct `Then` outcome after grouping Examples per [[requirements/gherkin#concepts]]; add Examples for surfaced failure modes.
 - At architecture: for each candidate class check [[software-craft/object-calisthenics#key-takeaways]] and [[software-craft/solid#key-takeaways]]; for each external dependency check [[architecture/hexagonal#key-takeaways]]; for each noun check if it serves double duty across modules.
 - All pre-mortems are enforced by condition gates in the flow: they are not optional exercises.
 
@@ -20,6 +21,12 @@ last-updated: 2026-04-29
 
 **Specification Pre-Mortem**: Ask "What observable behaviours must we prove for this Rule to be complete?" This surfaces hidden requirements that forward-looking analysis misses.
 
+**Behavior Pre-Mortem**: Ask "Imagine this specific behaviour went wrong in production — how would it fail?" Once Examples are grouped by distinct `Then` outcome per [[requirements/gherkin#concepts]], run this pre-mortem for each outcome independently. The framing varies by rule type:
+- **Action rules**: "A user performs this action. What subtle real-world conditions would cause it to produce the wrong result?" (e.g., concurrent writes, stale reads, rounding, timezone shifts)
+- **Behavioural rules**: "The system applies this business rule. What edge-case inputs would expose a gap in the logic?" (e.g., boundary crossing, empty/zero/null, ordering dependency)
+- **Structural/invariant rules**: "This invariant must always hold. What counterexamples would break it?" — surface candidate counterexamples, then capture them in a Hypothesis property test per [[software-craft/test-design#concepts]] rather than as additional BDD Examples.
+Add Examples for the failure modes surfaced. This is a distinct level from specification pre-mortem: specification asks "what behaviours are missing from the rule?"; behavior asks "how could this specific outcome fail in production?" per the prospective hindsight mechanism (Klein, 1998).
+
 **Architecture Pre-Mortem**: Ask "In 6 months this design is a mess. What mistakes did we make?" Check each candidate class per [[software-craft/object-calisthenics]] and [[software-craft/solid]]. Check each external dependency per [[architecture/hexagonal]]. Check each noun for cross-module double duty.
 
 **Flow Condition Gates**: Pre-mortem completion is enforced by condition gates in the flow YAML. Self-declaration uses explicit AGREE/DISAGREE commitments (a commitment device (Cialdini, 2001) that makes the declaration psychologically binding). Adversarial framing during pre-mortem analysis ("find what's wrong" rather than "confirm it's right") uses adversarial collaboration (Mellers et al., 2001) to produce stronger reasoning.
@@ -34,6 +41,16 @@ Ask:
 
 Record the findings in the feature's Questions section or as additional Rules.
 
+### Behavior Pre-Mortem
+
+Once Examples are grouped by distinct `Then` outcome per [[requirements/gherkin#concepts]], run for each outcome:
+
+- **Action rules**: "A user performs this action. What subtle real-world conditions would cause it to produce the wrong result?" (e.g., concurrent writes, stale reads, rounding, timezone shifts)
+- **Behavioural rules**: "The system applies this business rule. What edge-case inputs would expose a gap in the logic?" (e.g., boundary crossing, empty/zero/null, ordering dependency)
+- **Structural/invariant rules**: "This invariant must always hold. What counterexamples would break it?" — surface candidate counterexamples, then capture them in a Hypothesis property test per [[software-craft/test-design#concepts]] rather than as additional BDD Examples.
+
+Add Examples for the failure modes surfaced. This is a distinct level from specification pre-mortem: specification asks "what behaviours are missing from the rule?"; behavior asks "how could this specific outcome fail in production?"
+
 ### Architecture Pre-Mortem
 
 Ask:
@@ -58,4 +75,5 @@ The design self-declaration covers YAGNI, KISS, DRY, Object Calisthenics per [[s
 - [[software-craft/tdd]]: design self-declaration subsumes the implementation pre-mortem
 - [[software-craft/object-calisthenics]]: ObjCal-7 (two instance variables) checked in architecture pre-mortem
 - [[software-craft/smell-catalogue]]: pattern smells checked in implementation pre-mortem
-- [[software-craft/solid]]: SOLID checks in implementation pre-mortem
\ No newline at end of file
+- [[software-craft/solid]]: SOLID checks in implementation pre-mortem
+- [[software-craft/test-design]]: property-based testing for structural/invariant rules
\ No newline at end of file
diff --git a/.opencode/knowledge/requirements/wsjf.md b/.opencode/knowledge/requirements/wsjf.md
index 3041116..073aef8 100644
--- a/.opencode/knowledge/requirements/wsjf.md
+++ b/.opencode/knowledge/requirements/wsjf.md
@@ -12,7 +12,7 @@ last-updated: 2026-05-04
 - Value (1-5) maps to Kano categories: 5=Must-have (core workflow blocked), 4=High, 3=Medium (performance), 2=Low (delighter), 1=Minimal (cosmetic).
 - Effort (1-5) maps to complexity: 1=Trivial (no new domain concepts), 2=Small (one new entity), 3=Medium (cross-cutting), 4=Large (multiple entities), 5=Very large (spans modules).
 - Dependency=1 features are ineligible regardless of WSJF score; ties broken by Value; if all features have Dependency=1, resolve the blocking dependency first.
-- Only features with `Status: BASELINED` are eligible for WSJF scoring; WIP limit is 1.
+- Only features with @id-tagged Examples (confirmed by baseline) are eligible for WSJF scoring; WIP limit is 1.
 
 ## Concepts
 
@@ -76,7 +76,7 @@ Estimate implementation complexity:
 
 ### Prerequisites
 
-- Only features with `Status: BASELINED` are eligible for WSJF scoring
+- Only features with @id-tagged Examples (confirmed by baseline) are eligible for WSJF scoring
 - WIP limit of 1: only one feature in progress at a time
 - The PO selects and moves the feature; no other agent moves feature files
 
diff --git a/.opencode/knowledge/software-craft/test-design.md b/.opencode/knowledge/software-craft/test-design.md
index 917735b..be6af00 100644
--- a/.opencode/knowledge/software-craft/test-design.md
+++ b/.opencode/knowledge/software-craft/test-design.md
@@ -13,6 +13,7 @@ last-updated: 2026-04-29
 - Test coupling exists on a spectrum: feature tests (most resilient) > unit contract tests > property-based tests > white-box tests (most brittle, avoid).
 - One observable behaviour per test: each test should fail for exactly one reason and pass for exactly one reason.
 - Hard-coded values are acceptable when the test only requires that value; parameterising prematurely couples the test to assumptions about future needs.
+- Property tests: all invariant/structural rules, not just @bug Examples. Examples alone cannot prove an invariant (MacIver, 2016).
 
 ## Concepts
 
@@ -26,6 +27,8 @@ last-updated: 2026-04-29
 
 **Semantic Depth**. A test that exists for an @id tag but exercises domain logic directly instead of through the entry point described in the acceptance criterion has correct structural traceability but wrong semantic depth. Every @id test must exercise the entry point the AC describes: if the AC specifies a command-line invocation, the test must invoke the command handler; if the AC specifies an API call, the test must call the API endpoint. Structural traceability (every @id has a test function) without semantic depth (every @id test exercises the right entry point) creates a false sense of coverage. Tests exist for every example but don't verify the actual user-facing behavior.
 
+**Invariant Property Tests**. Structural (invariant) rules describe properties that must hold across all inputs, not specific behaviours. Examples alone cannot prove an invariant — they only confirm it holds for the selected cases (MacIver, 2016). When a Rule asserts an invariant (e.g., "total must equal sum of parts," "output must be sorted," "balance must never go negative"), the specification pre-mortem and behavior pre-mortem surface candidate counterexamples. These counterexamples become assertions in a Hypothesis property test (`tests/unit/`) that verifies the invariant across a generated range of inputs, catching failure modes that no finite set of hand-picked Examples could have found.
+
 ## Content
 
 ### Test Coupling Spectrum
@@ -34,7 +37,7 @@ last-updated: 2026-04-29
 |---|---|---|---|
 | Feature test | Observable behaviour through public interface | Highest | Every @id acceptance criterion |
 | Unit contract test | Module protocol (inputs, outputs, invariants) | High | Complex domain logic with clear contracts |
-| Property test | Invariants across input ranges | Moderate | Bug @id requirements; edge-case classes |
+| Property test | Invariants across input ranges | Moderate | Bug @id requirements; all structural/invariant rules |
 | White-box test | Internal state or private methods | Lowest | Legacy characterization only |
 
 ### Semantic Alignment Examples
@@ -58,7 +61,7 @@ last-updated: 2026-04-29
 |-----------|----------|-------------|
 | `tests/features/<feature_slug>/` | BDD scenario tests: one test per `@id` tag in the feature file | `@id` tag required |
 | `tests/unit/` | Unit contract tests: coverage-boosting tests for implementation branches not covered by BDD examples | No `@id` tag |
-| `tests/unit/` | Property tests: invariant verification across input ranges | No `@id` tag (except `@bug` examples) |
+| `tests/unit/` | Property tests: invariant verification across input ranges | No `@id` tag (except `@bug` examples); all structural/invariant rules must have one |
 
 **Rule:** `tests/features/` is exclusively for BDD scenario tests that trace back to `@id` tags in the feature file. Coverage-boosting tests that exercise implementation branches not covered by any `@id` example are unit contract tests and belong in `tests/unit/`, not `tests/features/`. A test without an `@id` tag in `tests/features/` violates the traceability contract.
 
@@ -67,4 +70,5 @@ last-updated: 2026-04-29
 - [[software-craft/tdd]]: the RED-GREEN-REFACTOR cycle that produces these tests
 - [[software-craft/code-review]]: reviewing whether tests meet these quality criteria
 - [[requirements/gherkin]]: the specification format that drives test design
-- [[software-craft/stub-design]]: creating typed stubs that maintain semantic alignment
\ No newline at end of file
+- [[software-craft/stub-design]]: creating typed stubs that maintain semantic alignment
+- [[requirements/pre-mortem]]: behavior pre-mortem surfaces counterexamples for property tests
\ No newline at end of file
diff --git a/.opencode/skills/confirm-baseline/SKILL.md b/.opencode/skills/confirm-baseline/SKILL.md
index 49c0856..33ffadb 100644
--- a/.opencode/skills/confirm-baseline/SKILL.md
+++ b/.opencode/skills/confirm-baseline/SKILL.md
@@ -10,4 +10,3 @@ Available knowledge: [[requirements/decomposition#key-takeaways]]. `in` artifact
 1. Verify all Examples have `@id` tags. If any are missing, the feature is not ready for baseline.
 2. Verify the feature passes decomposition checks per [[requirements/decomposition#key-takeaways]]: no more than 2 concerns, no more than 8 Must Examples.
 3. Verify all planning artifacts are present and consistent.
-4. Verify feature status is BASELINED.
diff --git a/.opencode/skills/discover-features/SKILL.md b/.opencode/skills/discover-features/SKILL.md
index 89ad4c1..99c3ac2 100644
--- a/.opencode/skills/discover-features/SKILL.md
+++ b/.opencode/skills/discover-features/SKILL.md
@@ -14,5 +14,5 @@ Available knowledge: [[requirements/feature-boundaries]], [[requirements/feature
 5. Name each feature per [[requirements/feature-boundaries#content]]: use the delivery step name, validated for clarity and specificity.
 6. Write a description for each feature per [[requirements/feature-boundaries#content]]: what it provides, which context it serves, why it exists, key entities.
 7. Identify cross-cutting quality attributes from product_definition.md that will become Constraints — note which features they distribute to per [[requirements/feature-boundaries#content]] — but do NOT write Constraints yet; discover-rules will write them.
-8. Create a `.feature` file from the template at `.templates/docs/features/feature.feature.template` for each feature with title, description, Status: ELICITING, and an empty Questions table. Do NOT write Rules (Business) or Constraints — those come from the discover-rules skill.
+8. Create a `.feature` file from the template at `.templates/docs/features/<feature_name>.feature.template` for each feature with title, description, and an empty Questions table. Do NOT write Rules (Business) or Constraints — those come from the discover-rules skill.
 9. Run context coverage gap analysis per [[requirements/feature-discovery#content]]: every bounded context covered by at least one feature? IF any gap → add a Questions entry flagging it.
\ No newline at end of file
diff --git a/.opencode/skills/select-feature/SKILL.md b/.opencode/skills/select-feature/SKILL.md
index 37b9e96..bfa1c70 100644
--- a/.opencode/skills/select-feature/SKILL.md
+++ b/.opencode/skills/select-feature/SKILL.md
@@ -1,18 +1,39 @@
 ---
 name: select-feature
-description: "Select the next feature to develop: by delivery order for the first feature, by WSJF for subsequent features"
+description: "Select the next feature to develop by detecting delivery status from disk evidence, following delivery order"
 ---
 
 # Select Feature
 
-Available knowledge: [[requirements/wsjf]]. `in` artifacts: read all before starting work.
+`in` artifacts: read all before starting work.
 
-1. Discover available features by listing `docs/features/` (or the project's feature directory).
-2. IF no feature files exist → exit via `no_features`; features need discovery first.
-3. IF more than one feature has `Status: BASELINED` → stop; WIP limit is 1.
-4. Verify that architecture covers the candidate features by checking `domain_model.md` for relevant bounded contexts and `product_definition.md` for technology stack.
-5. IF features have `Status: ELICITING` (no BASELINED features yet) → this is a first-run selection:
-   - Select the first feature by delivery order from `product_definition.md`. The delivery order was established during discovery and already reflects business priority and technical dependencies.
-   - Skip WSJF scoring: there's nothing to compare against.
-6. IF features have `Status: BASELINED` (subsequent runs) → score per [[requirements/wsjf]] and select the highest WSJF score among Dependency=0 candidates.
+1. List available feature files in `docs/features/`.
+2. IF no feature files exist → exit via `no-features`; features need discovery first.
+3. Read `product_definition.md` to obtain the delivery order (ordered list of feature slugs).
+4. For each feature slug in delivery order, determine delivery status with a single pipeline
+   — do NOT open or read individual feature or test files:
+
+   a. Extract every @id tag from the feature file and the matching test function
+      hex suffixes from the test directory, then compare:
+
+        diff \
+          <(grep -oP '@id:\K\w+' docs/features/<slug>.feature | sort -u) \
+          <(grep -rh "def test_<slug>_" tests/features/<slug>/ 2>/dev/null \
+             | grep -oP 'test_<slug>_\K\w+' | sort)
+
+      - Diff produces output → some @id tags lack matching test functions →
+        feature is incomplete (select it).
+      - Diff is clean → all @id tags have matching test functions.
+
+   b. If diff is clean, run the tests scoped to that feature's test directory
+      using the project's test runner (see Project Commands table).
+      - Any failures → feature is incomplete (select it).
+      - All pass → feature is delivered (skip).
+
+   c. If the test directory does not exist (grep returns nothing), the
+      diff will fail → feature is incomplete (select it).
+
+5. Select the first incomplete feature by delivery order.
+6. IF every feature in the delivery order is delivered (diff clean + tests pass for all) →
+   exit via `no-features`.
 7. Set the `feature-id` session param to the selected feature's filename stem (without `.feature` extension).
diff --git a/.opencode/skills/write-bdd-features/SKILL.md b/.opencode/skills/write-bdd-features/SKILL.md
index b97c1b9..0b05d99 100644
--- a/.opencode/skills/write-bdd-features/SKILL.md
+++ b/.opencode/skills/write-bdd-features/SKILL.md
@@ -11,8 +11,13 @@ Available knowledge: [[requirements/gherkin]], [[requirements/moscow]], [[requir
 2. Run a pre-mortem per [[requirements/pre-mortem]] for each Rule before writing any Examples. All Rules must have their pre-mortems completed before any Examples are written.
 3. IF hidden failure modes surface from the pre-mortem → add Examples to cover them per [[requirements/gherkin#key-takeaways]].
 4. Write Example blocks per [[requirements/gherkin#concepts]]: declarative, single observable outcome per Then, using `Example:` keyword not `Scenario:`.
-5. Classify each Example per [[requirements/moscow#concepts]]; MoSCoW classification is for internal triage only: do NOT add Must/Should/Could tags to Examples in the .feature file.
-6. IF a Rule has more than 8 Must Examples → split the Rule per [[requirements/decomposition#key-takeaways]].
-7. IF a Rule spans more than 2 concerns → split per [[requirements/decomposition#key-takeaways]].
-8. Assign `@id` tags to all Examples that don't already have one. If the feature file already uses an ID format (e.g., sequential, semantic, custom), respect it. If no format is established, use 8-char random hex. After a feature is BASELINED, all Example blocks are immutable. Changes require `@deprecated` on the old Example (preserving the original @id) and a new Example with a new @id. `@id` tags are for traceability only; do NOT add priority tags (e.g. @must, @should) to Examples.
-9. Evaluate each Rule's Examples for quality: observable (single outcome per Then), declarative (behaviour not steps), distinct (no duplicate coverage), and pre-mortem coverage. Every criterion that fails is a hard blocker: fix before advancing.
+5. For each Rule, verify Examples cover distinct behaviours per [[requirements/gherkin#concepts]]:
+   a) Group Examples by `Then` outcome. Same outcome = same behaviour. Keep one representative per outcome. Discard duplicates.
+   b) For each distinct outcome, run the behavior-level pre-mortem per [[requirements/pre-mortem#concepts]].
+   c) Add Examples targeting the failure modes surfaced.
+   d) Structural (invariant) rules: one representative Example suffices. Defer full coverage to a Hypothesis property test per [[software-craft/test-design#concepts]].
+6. Classify each Example per [[requirements/moscow#concepts]]; MoSCoW classification is for internal triage only: do NOT add Must/Should/Could tags to Examples in the .feature file.
+7. IF a Rule has more than 8 Must Examples → split the Rule per [[requirements/decomposition#key-takeaways]].
+8. IF a Rule spans more than 2 concerns → split per [[requirements/decomposition#key-takeaways]].
+9. Assign `@id` tags to all Examples that don't already have one. If the feature file already uses an ID format (e.g., sequential, semantic, custom), respect it. If no format is established, use 8-char random hex. After a feature is BASELINED, all Example blocks are immutable. Changes require `@deprecated` on the old Example (preserving the original @id) and a new Example with a new @id. `@id` tags are for traceability only; do NOT add priority tags (e.g. @must, @should) to Examples.
+10. Evaluate each Rule's Examples for quality: observable (single outcome per Then), declarative (behaviour not steps), distinct (no duplicate coverage), and pre-mortem coverage. Every criterion that fails is a hard blocker: fix before advancing.
diff --git a/.opencode/skills/write-test/SKILL.md b/.opencode/skills/write-test/SKILL.md
index 7cfacea..81ce41f 100644
--- a/.opencode/skills/write-test/SKILL.md
+++ b/.opencode/skills/write-test/SKILL.md
@@ -7,7 +7,7 @@ description: "Write a failing test body for one BDD example"
 
 Available knowledge: [[software-craft/tdd]], [[software-craft/test-design]], [[software-craft/smell-catalogue]], [[software-craft/object-calisthenics]], [[software-craft/solid]]. `in` artifacts: read all before starting work. 
 
-1. Pick the next unimplemented `@id` from the feature file: order by fewest dependencies first per [[software-craft/tdd#concepts]].
+1. Pick the next unimplemented `@id` from the feature file: order by fewest dependencies first per [[software-craft/tdd#concepts]]. IF the `@id` belongs to a structural (invariant) Rule → also generate a Hypothesis property test in `tests/unit/` per [[software-craft/test-design#concepts]], using the counterexamples surfaced by the behavior pre-mortem per [[requirements/pre-mortem#concepts]].
 2. Write a failing test that specifies the expected behavior per [[software-craft/tdd#key-takeaways]]. Preserve the full docstring from the test stub. The Gherkin steps (Given/When/Then) are immutable specification content for traceability and must not be removed, shortened, or reformatted.
 3. IF a spec gap or inconsistency is discovered → do NOT modify specification documents (domain_model.md, glossary.md, product_definition.md, ADRs, feature files). Flag it in output notes. The SE may ONLY modify production code and test code.
 4. Run `task test-fast` to confirm the test fails for the right reason (RED) per [[software-craft/tdd#key-takeaways]].
diff --git a/.templates/docs/features/feature.feature.template b/.templates/docs/features/<feature_name>.feature.template
similarity index 96%
rename from .templates/docs/features/feature.feature.template
rename to .templates/docs/features/<feature_name>.feature.template
index 0f79973..e1fb0f8 100644
--- a/.templates/docs/features/feature.feature.template
+++ b/.templates/docs/features/<feature_name>.feature.template
@@ -3,8 +3,6 @@ Feature: <Feature title>
   <2–4 sentence description of what this feature does and why it exists.
   Written in plain language, always kept current by the PO.>
 
-  Status: ELICITING | BASELINED (YYYY-MM-DD)
-
   Rules (Business):
   - <Business rule that applies across multiple Examples>
 
diff --git a/AGENTS.md b/AGENTS.md
index f9aa62b..3a1b0bd 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -20,7 +20,7 @@ Post-mortem analysis shows these practices prevent most project failures. Violat
 
 When creating a document, use the template in `.templates/` that matches the artifact type. Strip the `.templates/` prefix and `.template` suffix to determine the destination path. For example:
 - `.templates/docs/adr/ADR_YYYYMMDD_<adr_id>.md.template` → `docs/adr/ADR_20260430_my_decision.md`
-- `.templates/docs/features/feature.feature.template` → `docs/features/my_feature.feature`
+- `.templates/docs/features/<feature_name>.feature.template` → `docs/features/my_feature.feature`
 - `.templates/docs/interview-notes/IN_YYYYMMDD_<session_id>.md.template` → `docs/interview-notes/IN_20260430_session_management.md`
 
 If no template exists for an artifact type, create the document without one.