From bd833087b748ffa50e177ceae6efa0f968134c2e Mon Sep 17 00:00:00 2001 From: hyperpolymath <6759885+hyperpolymath@users.noreply.github.com> Date: Wed, 3 Jun 2026 00:49:09 +0100 Subject: [PATCH 1/3] docs(architecture): cross-repo proof DAG design (closes C14 estate blocker) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the estate-onboarding blocker C14 surfaced in the "what stops echidna from running proof work across the estate" audit: when ephapax `formal/` changes, downstream consumers (proven, valence-shell, verisimdb) do not re-verify until manual notice. The DAG turns that drift window from "weeks" into "the next dispatcher cycle". Design only — no impl. Captures: * Where edges live (consumer-declared `.machine_readable/ proof-deps.a2ml` ingested by echidna; hypatia dispatches). * A2ML schema v1.0 mirroring the C12 manifest tolerance shape (unknown fields ignored, forward-compat). * Discovery: manual v1; automatic v2 (parking on verisimdb#3 cross-prover canonicalisation). * Trigger semantics: path-grained, coalesced, backpressured, fail-open on dispatcher unreachable. * Failure semantics for malformed deps, unreachable refs, dependency cycles, fail_open overrides. * Concrete walk-through: ephapax→{valence-shell, verisimdb, proven}. * Four open decisions explicitly flagged for the owner: edge ref policy, issue vs comment, echo-types special case, fail_open default. * Acceptance criteria for the future impl PR. Estate context: this design names known producer/consumer pairs already in the estate (ephapax, echo-types, kategoria, tropical-resource-typing, vcl-ut) and aligns with the existing echo-types audit rule ([[feedback_proofs_must_check_and_cross_doc_echo_types]]). Cross-references to the rest of the audit punch list: C12 (echidnabot manifest), C15 (hypatia push-fixes wiring), C16 (GNN feedback loop), D22 (estate trust badge), F25 (verisimdb#3 cross-prover RDF). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/architecture/CROSS-REPO-PROOF-DAG.md | 306 ++++++++++++++++++++++ 1 file changed, 306 insertions(+) create mode 100644 docs/architecture/CROSS-REPO-PROOF-DAG.md diff --git a/docs/architecture/CROSS-REPO-PROOF-DAG.md b/docs/architecture/CROSS-REPO-PROOF-DAG.md new file mode 100644 index 00000000..494f1da8 --- /dev/null +++ b/docs/architecture/CROSS-REPO-PROOF-DAG.md @@ -0,0 +1,306 @@ + + +# Cross-Repo Proof Dependency DAG + +**Status**: Design draft — implementation not yet started. +**Closes**: estate blocker **C14** from the 2026-06-03 audit +("when ephapax `formal/` changes, downstream consumers don't re-verify"). + +--- + +## Problem + +The estate carries several producer/consumer proof relationships that +the current per-repo CI shape does not honour: + +| Producer | Path | Known consumers | +|-----------------|-------------------------------|---------------------------------------------------| +| `ephapax` | `formal/PRESERVATION-*.v` | `valence-shell`, `verisimdb`, `proven` | +| `echo-types` | `formal/echo/*.v` | `ephapax` (L3 obligations), `valence-shell` | +| `kategoria` | `formal/*.v` | `verisimdb`, downstream HP type-checker repos | +| `tropical-resource-typing` | `formal/*.v` | `verisimdb` (Q1 foundation) | +| `vcl-ut` | `formal/*.v` | `verisimdb` (P3, V2 foundation pack) | + +Today, when a producer's proofs change, no automatic signal walks the +DAG to re-verify downstream. Transitive correctness drift goes +unnoticed until a manual reviewer notices, or until a consumer's own CI +happens to re-run against the new producer SHA. That can be weeks. + +This is **C14** in the 2026-06-03 audit's punch list. + +--- + +## Scope (this doc) vs out-of-scope + +In-scope: +- Where the DAG lives. +- Edge format (TOML / A2ML). +- Edge-discovery rules (manual + automatic). +- Trigger semantics (which webhook fires which downstream jobs). +- Failure semantics (degraded mode when a consumer can't be reached). + +Out-of-scope: +- Term-level proof translation between Coq ↔ Lean ↔ Agda — that is + cross-prover canonicalisation, blocked on `verisimdb#3`. +- Per-theorem dependency tracking (i.e. "only re-verify proofs whose + imports actually touched the producer hunk"). v1 is repo-grain only; + per-theorem precision is a follow-up once the v1 plumbing is real. + +--- + +## Where the DAG lives + +**Recommendation: edges declared per-consumer; aggregation in echidna.** + +```text + consumer repo echidna (estate dispatcher) + ───────────── ───────────────────────────── + .machine_readable/ ┌─────────────────┐ + proof-deps.a2ml ───── ingest ───────► │ proof_dag.toml │ + │ (synthesised │ + consumer repo │ per-fetch) │ + ───────────── │ │ + .machine_readable/ └─────────┬───────┘ + proof-deps.a2ml ───── ingest ───────► │ + ▼ + ┌─────────────────┐ + │ hypatia │ + │ (push fixes / │ + │ enqueue jobs) │ + └─────────┬───────┘ + │ + ▼ + ┌─────────────────┐ + │ echidnabot │ + │ (per-repo CI) │ + └─────────────────┘ +``` + +Rationale: + +- **Edges live in the consumer**, not the producer. Producers shouldn't + need to know who depends on them; the reverse direction (consumer + declares what it pulls from) matches how Coq/Lean/Agda imports already + work and means no producer-side coordination is needed when a new + consumer onboards. +- **Aggregation in echidna** because echidna is already the estate's + prover-aware dispatcher (`src/rust/provers/`, prover-class scheduling, + trust-bridge), and the dispatch decision needs prover metadata that + echidnabot doesn't carry. +- **Trigger orchestration in hypatia** because hypatia is the estate-CI + intelligence layer that already coordinates `gitbot-fleet` and the + "push committed fixes to remotes" contract (see the C15 PR). + +--- + +## Edge format + +A new file `proof-deps.a2ml` joins the existing +`.machine_readable/bot_directives/` family: + +```toml +# .machine_readable/proof-deps.a2ml +schema_version = "1.0" +directive_type = "proof-deps" + +# Each [[depends_on]] entry is one edge from this consumer to a producer. +[[depends_on]] +producer = "hyperpolymath/ephapax" +ref = "main" # branch | tag | exact-sha +paths = ["formal/PRESERVATION-*.v"] # producer-side globs +local_uses = ["formal/L3-ECHO/**/*.v"] # consumer-side proofs that + # transitively rely on producer +rationale = "L3 echo obligations re-use ephapax's preservation_l3." +# Optional gating override: +fail_open = false # if true: producer red doesn't block consumer CI + # (default false — producer red ⇒ consumer red). + +[[depends_on]] +producer = "hyperpolymath/echo-types" +ref = "main" +paths = ["formal/echo/*.v"] +local_uses = ["formal/L3-ECHO/EchoAxioms.v"] +rationale = "Echo-types canonical axioms; echo-types audit recorded." +``` + +Schema notes: + +- `producer` is `/` — platform-agnostic; same shape works + for Codeberg / Radicle once those adapters land. +- `ref` is normally `main`; per-edge SHA pins are allowed for + reproducibility-critical consumers (e.g. release branches). +- `paths` are producer-side globs — used both for edge invalidation + ("did this hunk touch a tracked path?") and for the v2 per-theorem + story. +- `local_uses` documents *why* the edge exists. Mostly for humans, but + hypatia surfaces it in the PR comment so reviewers know what failed. +- Tolerance: unknown fields are ignored. Same forward-compat shape as + the C12 manifest. + +--- + +## Discovery + +Two discovery paths feed the DAG: + +1. **Manual** — author commits `proof-deps.a2ml` declaring producers. + This is the only path that v1 honours. +2. **Automatic** (v2, after `verisimdb#3`) — extract producer references + from Coq `Require Import`, Lean `import`, Agda `open import`. The + estate's per-language imports vary too widely to do this safely in + v1; a missing edge surfaces as a stale-proof failure within one + producer-change cycle and the author adds it manually. + +The aggregator (`echidna ingest-proof-deps`) walks each repo's +`proof-deps.a2ml` via the same `PlatformAdapter::get_file_contents` path +echidnabot already uses for `bot_directives/echidnabot.a2ml`. Edges are +cached for 24h with ETag revalidation. No edge → no behaviour change. + +--- + +## Trigger semantics + +```text + producer push to tracked path + │ + ▼ + webhook hits echidna ─────► consult ingested DAG, find consumers + │ (forward edges by `paths` glob match) + ▼ + for each consumer: + enqueue echidnabot job + payload: { repo, ref, reason: "upstream change", + upstream: { producer, sha, paths } } + │ + ▼ + echidnabot runs that repo's proof check (per C12 manifest) + │ + ▼ + result → comment on producer commit + open / update consumer issue + "ephapax@abc1234 broke valence-shell@formal/L3-ECHO/EchoAxioms.v" +``` + +Trigger details: + +- Path-grained: only producers' commits that touch a tracked `paths` + glob fan out. Touching producer README, CI, or non-tracked source + does not enqueue anything. +- Coalesced: multiple pushes within a 60s window collapse to one + downstream job per consumer. +- Backpressure: hypatia's existing concurrency limits (per-repo + semaphore + global cap) apply. Cascade fan-out is bounded. +- Fail-open: if echidna's DAG cache is stale or unreachable, fall back + to "no edges" — never block producer CI on the dispatcher being down. + +--- + +## Failure semantics + +| Condition | Behaviour | +|--------------------------------------------|--------------------------------------------| +| Consumer's `proof-deps.a2ml` is malformed | Log warning; treat as "no edges". No CI block on the consumer. | +| Producer ref not reachable (deleted branch / private repo) | Edge skipped; warning surfaced in the consumer's next PR. | +| Consumer CI red after upstream-triggered run | Issue opened in consumer repo, label `blocked-on:#`; `fail_open` overrides this to advisory-only. | +| Producer red + `fail_open = false` | Consumer's `merge_block` (per C12 manifest) gates merges to consumer-main. | +| DAG cycle (A → B → A) | Detected at ingest; cycle edge dropped + filed as `echidna#` with both repos labelled. v1 logs only; v2 hard-fails ingestion. | + +--- + +## Concrete walk-through + +```text +2026-06-XX 10:00 ephapax main pushes commit abc1234 + touching formal/PRESERVATION-L3.v + + 10:00:02 echidna webhook receives push event + matches DAG glob "formal/PRESERVATION-*.v" + consumers: { valence-shell, verisimdb, proven } + + 10:00:03 hypatia enqueues 3 echidnabot jobs: + valence-shell @ main (proof-set: formal/L3-ECHO/**/*.v) + verisimdb @ main (proof-set: formal/foundation-pack/**) + proven @ main (proof-set: formal/echo-leak/*.v) + + 10:00:08 echidnabot dispatches Coq jobs (per each repo's + C12 manifest provers list) + + 10:01:42 results land: + valence-shell GREEN + verisimdb RED (preservation_l3 fails on L2-modality lemma) + proven GREEN + + 10:01:44 echidna posts comment on ephapax abc1234: + "Downstream impact: verisimdb red — preservation_l3 fails + against L2-modality lemma at line 412." + Opens issue verisimdb#NNN with full diff + the failing + proof excerpt. Adds label "blocked-on:ephapax@abc1234". +``` + +--- + +## Open decisions for the owner + +1. **Edge ref policy**: do all consumers default to `ref = "main"`, or + should release-branch consumers be required to pin a SHA? Tradeoff: + pinned SHAs are reproducible but go stale; `main` is current but can + silently flip. +2. **Issue spam vs PR comment**: when a producer's main breaks a + consumer, do we open an issue per consumer (high signal, can pile + up) or just comment on the producer commit (no follow-up nudge)? +3. **Echo-types special case**: echo-types is the L3 canonical axioms + producer for ephapax (per estate policy + [[feedback_proofs_must_check_and_cross_doc_echo_types]]). Should + echo-types ↔ ephapax edges be auto-injected by echidna without a + `proof-deps.a2ml` declaration, or stay opt-in like everything else? +4. **`fail_open` default**: producer red ⇒ consumer red is the safer + default but it means an upstream's bad-day red-CI cascades. Should + `fail_open = true` be the default and `fail_open = false` be the + opt-in for hard-coupled consumers? + +--- + +## What this design unblocks + +Closing C14 makes three other audit blockers more tractable: + +- **C16** (GNN outcome-feedback loop) — per-edge success/failure + signals are exactly the cross-repo training data the GNN currently + doesn't see. +- **F25** (cross-prover RDF alignment, blocked on `verisimdb#3`) — the + DAG gives an empirical ground-truth set of co-evolving proof pairs, + which is the input that translation work needs. +- **D22** (no single green "trust pipeline verified" badge) — the DAG + defines what "fully verified estate" means: every edge green. + +--- + +## Acceptance criteria for the implementation PR (out of scope here) + +When this design is implemented: + +- [ ] `echidna ingest-proof-deps` walks all repos that echidna already + knows about, parses any `.machine_readable/proof-deps.a2ml`, + and builds `proof_dag.toml` at the daemon level. +- [ ] Producer webhooks fan out via `EdgeEnvelope` events on the + existing `dispatch.rs` channel. +- [ ] hypatia's enqueue path accepts `EdgeEnvelope` and routes to + echidnabot exactly as it does for direct-push events. +- [ ] Cycle detection runs at ingest and logs (v1) / hard-fails (v2). +- [ ] Three estate-shaped fixtures: ephapax→valence-shell, + echo-types→ephapax, kategoria→verisimdb. +- [ ] Sad-path: unreachable producer ref, malformed `proof-deps.a2ml`, + consumer mid-rebase. All degrade gracefully. + +--- + +## See also + +- C12 PR — per-repo manifest schema v2.0 (echidnabot). +- C15 PR — hypatia push-fixes wiring (gitbot-fleet). +- `docs/architecture/CORRECTNESS-ARCHITECTURE.md` — the soundness + invariant the DAG must not violate. +- `docs/architecture/VERISIM-ER-SCHEMA.md` — where ingested edges + eventually persist for cross-prover analytics. From 0d27d43019307b6bc44705ce5ee19a5987e97400 Mon Sep 17 00:00:00 2001 From: hyperpolymath <6759885+hyperpolymath@users.noreply.github.com> Date: Wed, 3 Jun 2026 01:07:07 +0100 Subject: [PATCH 2/3] docs(handover): correct B7 audit claim + add HP smoke fixtures + onboarding doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 2026-06-03 estate-audit punch list characterised B7 as "HP type-checker ecosystem (13 provers) corpus-only — Rust adapters not scaffolded in src/rust/provers/." That is incorrect. Verified state of the tree (2026-06-03): * src/rust/provers/hp_ecosystem.rs:50-153 — HPEcosystemBackend, a full ProverBackend impl that pattern-matches on ProverKind to select typell / katagoria / tropical-type-check + a discipline tag; implements parse_file / parse_string / apply_tactic / verify_proof. * src/rust/provers/mod.rs:1774-1824 — ProverFactory::create routes TypeLL + KatagoriaVerifier through HPEcosystemBackend, and 39 other *TypeChecker variants (incl. Echo + Tropical) through TypedWasmBackend::for_kind. * tests/common/mod.rs:114/185/245 + tests/gnn_augment_integration.rs:545 already cover HP-ecosystem kinds via is_hp_ecosystem() dispatch. So the backends exist and are wired. What is actually outstanding for estate-scale dispatch on these provers is downstream of the scaffold: 1. Upstream binaries (typell, katagoria, tropical-type-check) are not packaged in any Containerfile / manifests/live-provers.scm — a CI run gets "binary not found" rather than a verify result. 2. No smoke fixtures for the three Tier-8-canonical provers (echo / tropical / katagoria) — this PR adds them. 3. No documented onboarding flow for new HP disciplines, even though the pattern is mechanical and touches ~4 files. This PR delivers what is contained-scope: * docs/handover/B7-AUDIT-CORRECTION.md — correction memo with the file:line evidence above. * tests/fixtures/hp/echo_trivial.tll * tests/fixtures/hp/tropical_trivial.tll * tests/fixtures/hp/katagoria_trivial.k (three minimal smoke fixtures, single identity / refl goal each). * docs/HP-BACKEND-ONBOARDING.md — step-by-step checklist for adding a new HP discipline, with line references into the 4 files an author must touch. Out of scope (deferred to other audit items): * Packaging the upstream binaries — that's part of L3 → L1 hand-off criteria in the D18 PR. * Per-discipline GNN training extraction — that's the deferred TypeDiscipline Phase-2 work (audit item F26). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/HP-BACKEND-ONBOARDING.md | 151 +++++++++++++++++++++++++ docs/handover/B7-AUDIT-CORRECTION.md | 111 ++++++++++++++++++ tests/fixtures/hp/echo_trivial.tll | 12 ++ tests/fixtures/hp/katagoria_trivial.k | 10 ++ tests/fixtures/hp/tropical_trivial.tll | 12 ++ 5 files changed, 296 insertions(+) create mode 100644 docs/HP-BACKEND-ONBOARDING.md create mode 100644 docs/handover/B7-AUDIT-CORRECTION.md create mode 100644 tests/fixtures/hp/echo_trivial.tll create mode 100644 tests/fixtures/hp/katagoria_trivial.k create mode 100644 tests/fixtures/hp/tropical_trivial.tll diff --git a/docs/HP-BACKEND-ONBOARDING.md b/docs/HP-BACKEND-ONBOARDING.md new file mode 100644 index 00000000..b4f9aa29 --- /dev/null +++ b/docs/HP-BACKEND-ONBOARDING.md @@ -0,0 +1,151 @@ + + +# Onboarding a new HP type-discipline backend + +This is the working checklist for adding a new HP-ecosystem type +discipline to echidna. The 41 existing variants (TypeLL, +KatagoriaVerifier, TropicalTypeChecker, EchoTypeChecker, ...) all +followed this same pattern. + +For background on what already exists, see +`docs/handover/B7-AUDIT-CORRECTION.md`. + +## Where the wiring lives + +Adding a new discipline `Foo` (i.e. `ProverKind::FooTypeChecker`) +touches exactly four files: + +| File | What to add | +|-------------------------------------------------|-------------------------------------------------------------------| +| `src/rust/provers/mod.rs` | Enum variant + parser arm + is_hp_ecosystem + default_executable | +| `src/rust/provers/hp_ecosystem.rs::upstream()` | One match arm: `Foo => ("typell", "foo")` | +| `src/rust/provers/typed_wasm.rs::type_info_for` | `TypeInfo` selector for the discipline (the typed_wasm route) | +| `tests/fixtures/hp/foo_trivial.tll` | One smoke fixture exercising the new wire | + +If the discipline ships its own CLI (i.e. not `typell --discipline=foo` +but a standalone binary), it routes through `HPEcosystemBackend` and +the `upstream()` arm returns `("foo-cli", "foo")`. If it lives under +`typell`, it routes through `TypedWasmBackend` via the discipline +parameter. + +The deciding question: **does the upstream maintainer ship a separate +binary?** Examples: + +- `KatagoriaVerifier` → standalone `katagoria` binary → `HPEcosystemBackend`. +- `TropicalTypeChecker` → standalone `tropical-type-check` binary → either path (currently `TypedWasmBackend`). +- `EchoTypeChecker` → bundled under `typell` → `TypedWasmBackend`. + +## Step-by-step + +### 1. Add the enum variant + +`src/rust/provers/mod.rs`, in the `ProverKind` enum (around line 294): + +```rust +pub enum ProverKind { + // ... + FooTypeChecker, + // ... +} +``` + +### 2. Parser arm + +Same file, around line 567 (the string-to-ProverKind parser): + +```rust +"footypechecker" | "foo" => Ok(ProverKind::FooTypeChecker), +``` + +### 3. `is_hp_ecosystem` arm + +Same file, around line 434: + +```rust +pub fn is_hp_ecosystem(&self) -> bool { + matches!(self, + // ... + | ProverKind::FooTypeChecker + ) +} +``` + +### 4. `default_executable` arm + +Same file, around line 1360. Either: + +```rust +ProverKind::FooTypeChecker => "typell", // bundled +// or +ProverKind::FooTypeChecker => "foo-cli", // standalone +``` + +### 5. Factory routing arm + +Same file, `ProverFactory::create` (around line 1822). Add to the +appropriate match arm: + +- Standalone binary: add to the `HPEcosystemBackend` arm at line + 1776-1778. +- Bundled discipline: add to the `TypedWasmBackend` arm at lines + 1784-1822. + +### 6. (Bundled discipline only) `upstream()` arm + +`src/rust/provers/hp_ecosystem.rs`, the `upstream()` match around +line 63: + +```rust +ProverKind::FooTypeChecker => ("typell", "foo"), +``` + +### 7. `TypeInfo` selector + +`src/rust/provers/typed_wasm.rs::type_info_for` — return the discipline's +parametric `TypeInfo` so the unified verifier knows what to check. + +### 8. Smoke fixture + +`tests/fixtures/hp/foo_trivial.tll`: + +``` +# SPDX-License-Identifier: MPL-2.0 +#discipline: foo + +theorem foo_identity : ... . +Proof. + foo_refl. +Qed. +``` + +### 9. Run the smoke + +`tests/gnn_augment_integration.rs::test_hp_ecosystem_gnn_wires_top_premise` +already loops over `is_hp_ecosystem()` kinds, so the new variant gets +GNN-wiring coverage for free. A direct backend smoke takes one more +line in `tests/common/mod.rs` if you want it. + +## What you DON'T need to touch + +- **Confidence levels** — the trust bridge reads + `is_hp_ecosystem()` once; no per-discipline tuning. +- **Corpus extractors** — the HP corpus is built by + `crates/typed_wasm` and indexed by Sigma parameters; new disciplines + inherit the existing pipeline. +- **Result formatter** — `result_formatter.rs` reads `ProverKind` + via Display; the variant name is the user-facing label. + +## When the upstream binary doesn't ship + +For early-development disciplines where the upstream tool isn't yet +buildable (e.g. an unreleased Wokelang variant), the backend will +return a "binary not found" runtime error. That is the right +behaviour — echidna's trust bridge surfaces it as +`ProverOutcome::Unsupported` and the caller sees a clear signal. + +Do **not** stub the backend to return spurious success. Estate +consumers (per the C12 per-repo manifest) configure `[provers] +disabled = ["foo"]` to opt out cleanly. diff --git a/docs/handover/B7-AUDIT-CORRECTION.md b/docs/handover/B7-AUDIT-CORRECTION.md new file mode 100644 index 00000000..b9aa71e7 --- /dev/null +++ b/docs/handover/B7-AUDIT-CORRECTION.md @@ -0,0 +1,111 @@ + + +# B7 audit correction — HP ecosystem backends are NOT corpus-only + +**Status**: Correction. +**Origin**: The 2026-06-03 "what stops echidna from running proof work +across the estate" punch list listed **B7** as: + +> _HP type-checker ecosystem (13 provers) corpus-only — KatagoriaVerifier, +> Modal/Session/Choreographic/Epistemic/Refinement/Echo/Dependent/QTT/ +> Effect-Row/Tropical/TypeLL backends missing. Rust adapters not +> scaffolded in `src/rust/provers/`. Corpus contributes to vocab/training +> only._ + +That characterisation is incorrect. This memo replaces B7's "scaffold +missing adapters" framing with what is actually missing. + +## What is actually in tree (verified 2026-06-03) + +### Unified `HPEcosystemBackend` + +`src/rust/provers/hp_ecosystem.rs:50-153` defines +`HPEcosystemBackend`, a full `ProverBackend` implementation that: + +- Pattern-matches on `ProverKind` to select the upstream CLI + (`typell`, `katagoria`, `tropical-type-check`) and discipline tag + (`echo`, `session`, `qtt`, `effect-row`, ...) — see + `hp_ecosystem.rs::upstream()` lines 63-126. +- Implements `parse_file`, `parse_string`, `apply_tactic`, and + `verify_proof` against the upstream binary — lines 156-330. +- Injects a `#discipline:` header when the source lacks one (lines + 196-205), so estate consumers don't need to know the wire format. + +### Routing in `ProverFactory::create` + +`src/rust/provers/mod.rs:1774-1824` routes: + +| ProverKind | Backend | +|-------------------------------------------------------|--------------------------------------------------| +| `TypeLL`, `KatagoriaVerifier` | `hp_ecosystem::HPEcosystemBackend` | +| 39 other `*TypeChecker` variants incl. `EchoTypeChecker`, `TropicalTypeChecker` | `typed_wasm::TypedWasmBackend::for_kind` | + +Both routes exist on `main`. None of the 41 enum variants falls +through to "unsupported". + +### Test infrastructure + +`tests/common/mod.rs:114`, `:185`, `:245` already special-case +`k.is_hp_ecosystem()` for executable / args / default-binary resolution +in the prover-smoke harness. `tests/gnn_augment_integration.rs:545` +covers `test_hp_ecosystem_gnn_wires_top_premise`. + +## What is genuinely outstanding for these backends + +The audit's underlying intuition (estate proof work won't run end-to-end +on these provers) is correct — but the gap is downstream of the +scaffold, not the scaffold itself: + +1. **Upstream binaries are not packaged.** `typell`, `katagoria`, and + `tropical-type-check` are referenced from `hp_ecosystem.rs::upstream()` + but none of them ship in any of echidna's Containerfiles or the + Guix manifest. `manifests/live-provers.scm` covers Z3, CVC5, Lean4, + Coq, Agda, Isabelle, Idris2, F*, Dafny, TLAPS — the HP triad is + absent. A live CI run can compile and dispatch but the subprocess + exits with "binary not found". + +2. **No smoke fixture for the three named provers.** `tests/chapel_fixtures/` + has `coq_trivial.v` and `lean_trivial.lean`; there is no equivalent + `echo_trivial.tll`, `tropical_trivial.tll`, or `katagoria_trivial.k`. + This PR adds the minimum three. + +3. **No discoverability for "how do I add a new HP discipline."** + Adding a discipline today means editing the 39-arm match in + `hp_ecosystem.rs::upstream()`, the 41-arm match in + `ProverFactory::create`, and matching arms in `is_hp_ecosystem()` + and `default_executable()`. There is no documented onboarding flow. + +## What this PR delivers + +- This correction memo (`docs/handover/B7-AUDIT-CORRECTION.md`). +- Three smoke fixtures (`tests/fixtures/hp/`): + - `echo_trivial.tll` — discipline `echo`, single identity goal. + - `tropical_trivial.tll` — discipline `tropical`, resource-aware identity. + - `katagoria_trivial.k` — discipline `verify`, single isomorphism. +- A short onboarding doc (`docs/HP-BACKEND-ONBOARDING.md`) listing the + ~4 files an author must touch to add a new HP discipline, with line + references. + +What it does **not** deliver: + +- The upstream `typell` / `katagoria` / `tropical-type-check` binaries + (those live in `developer-ecosystem/katagoria`, + `verification-ecosystem/tropical-resource-typing`, and + `verification-ecosystem/typell` and are owner-managed). +- A live CI run gate (that needs the binaries packaged first; see + the L3 path checklist in the D18 PR). +- Per-discipline GNN training data extraction (TypeDiscipline Phase-2 + deferred — audit item F26). + +## Cross-references + +- `docs/handover/TODO.md` P4 — Wave-4 + HP ecosystem expansion (now + partially superseded for the HP rows). +- `docs/PROVER_COUNT.md` Tier-8 — canonical 41-variant inventory. +- C12 PR (echidnabot manifest) — per-repo opt-in shape that consumers + using these backends will adopt. +- D18 PR (L3 gate checklist) — packaging the upstream binaries is one + of the L3 → L1 hand-off criteria. diff --git a/tests/fixtures/hp/echo_trivial.tll b/tests/fixtures/hp/echo_trivial.tll new file mode 100644 index 00000000..8d528e2f --- /dev/null +++ b/tests/fixtures/hp/echo_trivial.tll @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: MPL-2.0 +#discipline: echo +# +# Smoke fixture: echo type-discipline single-goal identity. +# Exercises HPEcosystemBackend with ProverKind::EchoTypeChecker. +# The L3 (echo) obligation reduces to its own type — the simplest +# possible witness that the backend wiring round-trips. + +theorem echo_identity : forall (A : EchoType), A |- A. +Proof. + echo_refl. +Qed. diff --git a/tests/fixtures/hp/katagoria_trivial.k b/tests/fixtures/hp/katagoria_trivial.k new file mode 100644 index 00000000..d560110d --- /dev/null +++ b/tests/fixtures/hp/katagoria_trivial.k @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: MPL-2.0 +// discipline: verify +// +// Smoke fixture: katagoria categorical-isomorphism single-goal. +// Exercises HPEcosystemBackend with ProverKind::KatagoriaVerifier. +// The trivial isomorphism A ≅ A reduces by identity functor; smallest +// possible witness that the verifier wiring round-trips. + +theorem id_isomorphism (A : Object) : A ≅ A := by + refl diff --git a/tests/fixtures/hp/tropical_trivial.tll b/tests/fixtures/hp/tropical_trivial.tll new file mode 100644 index 00000000..a5e420bf --- /dev/null +++ b/tests/fixtures/hp/tropical_trivial.tll @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: MPL-2.0 +#discipline: tropical +# +# Smoke fixture: tropical resource-typing single-goal identity. +# Exercises TypedWasmBackend with ProverKind::TropicalTypeChecker. +# Cost = 0 (additive identity in the tropical semiring); type checks +# trivially under the standard resource-tracking judgement. + +theorem tropical_zero_cost : forall (A : TropicalType[cost=0]), A |- A. +Proof. + tropical_refl. +Qed. From d9b4e7faf95d6f6036c9cdb4fd69bc86218de2b4 Mon Sep 17 00:00:00 2001 From: hyperpolymath <6759885+hyperpolymath@users.noreply.github.com> Date: Wed, 3 Jun 2026 01:09:36 +0100 Subject: [PATCH 3/3] docs(handover): L3 gate path checklist (closes D18 estate blocker) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the estate-onboarding blocker D18 surfaced in the "what stops echidna from running proof work across the estate" audit: L3 → L1 hand-off requires Tier-1 green ≥7 days + all 4 waves landed-or-deferred. As of today, the gate is NOT satisfied — but crucially, NO ONE HAD ENUMERATED EXACTLY WHY, so L1 (Cap'n Proto) and L2 (Chapel full integration) stayed parked on a vague signal. This doc enumerates the six L3 acceptance criteria across the L3 prompt and TODO.md, and walks each one through its sub-criteria with current status: Criterion 1 — Wave-1 (Tier-1 every-PR): 2/3 sub-rows ✅, 1 UNKNOWN (green-window data not collected) Criterion 2 — Wave-2 (Tier-2 nightly): 2/3 sub-rows ✅, 1 UNVERIFIED (CI cycle outcome not recorded) Criterion 3 — Wave-3 (Tier-3 weekly): 4/5 sub-rows ✅, 1 UNVERIFIED (container-ci.yml weekly green-count not recorded) Criterion 4 — Wave-4 (Tier-4 quarterly): 2/3 sub-rows ✅, 1 owner choice (one execution OR explicit defer; both satisfy the L3 prompt's "land or defer with rationale" phrasing) Criterion 5 — Tier-1 green ≥7 days on main: 0/3 sub-rows (needs gh + GITHUB_TOKEN; previous sessions lacked them — STATE.a2ml triage records the block) Criterion 6 — Operational: 6a guix shell end-to-end: ❌ TODO 6b Dafny stub upgrade audit: ⚠ UNKNOWN 6c VeriSimDB record emission: ⚠ PARTIAL (gated on verisimdb#ghcr-publish.yml upstream) Minimum sufficient path documented at the end (~one session if gh auth is available). Without gh access, criteria 1/2/3/5 cannot close. The doc cross-references the other four audit-pass PRs (C12, C14, C15, B7) so the L3 gate context stays linked to the surrounding work. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/handover/L3-GATE-PATH.adoc | 276 ++++++++++++++++++++++++++++++++ 1 file changed, 276 insertions(+) create mode 100644 docs/handover/L3-GATE-PATH.adoc diff --git a/docs/handover/L3-GATE-PATH.adoc b/docs/handover/L3-GATE-PATH.adoc new file mode 100644 index 00000000..886e3f3f --- /dev/null +++ b/docs/handover/L3-GATE-PATH.adoc @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: MPL-2.0 +// SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell +:toc: left +:toclevels: 2 +:sectnums: +:icons: font + += L3 → L1 gate path: precise checklist + +_Closes audit item *D18* from the 2026-06-03_ +_"what stops echidna from running proof work across the estate"_ +_punch list._ + +Status as of *2026-06-03*: **gate NOT satisfied**. L1 (Cap'n Proto) and +L2 (Chapel full integration) are blocked until every box below is +ticked. This document enumerates exactly what remains. + +== Where the criteria come from + +`docs/handover/L3-LIVE-PROVER-CI-PROMPT.md` "Hand to L1 when": + +* All four waves land or are explicitly deferred with rationale in + `STATE.a2ml`. +* Tier-1 has run green on main for ≥7 days. + +`docs/handover/TODO.md` adds two operational pre-requisites (P0, P1). + +These five criteria are unpacked below. + +== Criterion 1 — Wave-1 (Tier-1, every PR) + +Wave-1 is the every-PR Tier-1 matrix (9 backends): coq, agda, lean4, +isabelle, z3, cvc5, eprover, alt-ergo, why3. + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| Wave-1 manifest exists +| ✅ DONE +| `manifests/live-provers.scm` + `.github/workflows/live-provers.yml` + (b022bf4, 2026-04-18). + +| Wave-1 binaries reachable on CI +| ✅ DONE +| 9 backends present; verified locally 2026-04-19 (18/18 tests pass). + +| Wave-1 green on main for ≥7 days +| ⚠ UNKNOWN +| `STATE.a2ml [session-…l3-triage] wave-1-status = "UNKNOWN — run data + inaccessible"`. Needs a `gh run list --workflow live-provers.yml + --branch main --status success` count over the last 7 days. Cannot + be ticked without that evidence. +|=== + +**To close:** run the gh query, confirm ≥7 consecutive successful +nightly invocations on main, and amend `STATE.a2ml` to record the +window. + +== Criterion 2 — Wave-2 (Tier-2, nightly) + +Wave-2 is the nightly Tier-2 matrix (10 backends): idris2, isabelle, +dafny, fstar, tlaps, vampire, gnatprove, glpk, spass, minizinc. (HOL +Light deferred to Wave-3.) + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| Wave-2 installers landed (local) +| ✅ DONE +| Real provisioning commands shipped 2026-04-19 (9a4aeeb + 6717b12). + 18/18 local tests pass; 13 real-version returns; 5 auto-skipped. + +| Wave-2 installers green in CI +| ⚠ UNVERIFIED +| TODO.md P0 row: "Wave-2 installers (idris2 / isabelle / dafny / + fstar / tlaps) are local-pass but CI-unverified." Likely failure + modes documented: Isabelle 500 MB download timeout, tlapm release + URL drift, `fstar.exe` symlink resolution, apt mirror changes. + +| HOL Light explicitly deferred to Wave-3 +| ✅ DONE +| Deferred + rationale (no prebuilt binary, opam build ~20 min + + camlp5). Recorded in STATE.a2ml. +|=== + +**To close:** wait for the next `0 3 * * *` UTC nightly to fire and +fix any red matrix cells in-place. Then run the 7-day green window +check (Criterion 1, sub-row 3) against the Tier-2 matrix. + +== Criterion 3 — Wave-3 (Tier-3, weekly) + +Wave-3 is the weekly Tier-3 container matrix (9 backends): tamarin, +proverif, imandra, scip, or-tools, hol4, acl2, twelf, metamath. + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| Wave-3 containers landed +| ✅ DONE +| PR #73 (2026-05-18) consolidated `.containerization/Containerfile.wave3` — + one shared `rust-builder`, `--target` per prover. + +| All non-proprietary backends runtime-smoke-verified REAL +| ✅ DONE +| 8/8 confirmed by exercising the binary in the built image (not by + trusting the build log). Five silent-stub failures fixed at-source + during verification: scip dead-download, tamarin dead-download, + metamath missing libc6-dev, or-tools missing build-number, twelf/acl2 + baked build paths, proverif opam-solve. + +| Imandra documented as honest licence-gated stub +| ✅ DONE +| Real Rust adapter + Idris2 ABI proofs retained; `IMANDRA_TOKEN` + secret path documented; KEEP decision recorded 2026-05-18. + +| Container CI workflow green +| ⚠ UNVERIFIED +| `container-ci.yml` weekly tier3-container matrix is allow-fail. No + recent green-run count on main. + +| PR #74 (proverif opam) resolved +| ✅ DONE +| Closed by the 2026-05-18 source-tarball pivot. +|=== + +**To close:** verify `container-ci.yml` weekly Tier-3 runs are not +silently red-but-allow-fail; record green-run count in `STATE.a2ml`. + +== Criterion 4 — Wave-4 (Tier-4, quarterly) + +Wave-4 is the quarterly allow-fail Tier-4 list (19 backends): Mizar, +Nuprl, PVS, Minlog, Dedukti, Arend, KeY, Prism, UPPAAL, ViPER, NuSMV, +Spin, TLC, CBMC, Seahorn, dReal, Boogie, Kissat, Alloy. + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| Per-backend mock-only rationale in STATE.a2ml +| ✅ DONE +| `STATE.a2ml [wave-4-rationale]` carries one-line rationale per + backend (1ca9862, 2026-04-26). + +| Scaffold placeholder in `live-provers.yml` +| ✅ DONE +| Tier-4 placeholder job announces the list. + +| At least one Wave-4 quarterly run executed +| ⚠ UNVERIFIED +| No evidence in STATE.a2ml that the placeholder has ever fired. + Possibly cron-pinned but never visited. +|=== + +**To close:** check that the quarterly cron has fired at least once +since scaffolding, OR explicitly defer Wave-4 with a STATE.a2ml note +"Wave-4 is scaffold-only and acceptable in the L3 hand-off — to be +revisited when a maintainer contributes a Containerfile". + +Both readings satisfy the L3 prompt's phrasing +("...all four waves land *or are explicitly deferred with rationale* +in STATE.a2ml"). The choice is the owner's. + +== Criterion 5 — Tier-1 green on main for ≥7 days + +Same as Criterion 1, sub-row 3 — broken out here because the L3 prompt +states it as a top-level acceptance criterion. + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| Run data accessible from a host with `gh` + auth +| ❌ TODO +| `STATE.a2ml [session-…l3-triage]` blocked on this. Local sessions + have lacked `gh` + GITHUB_TOKEN. + +| Tier-1 nightly green count ≥7 over last 7 days on main +| ❌ TODO +| Requires the above. Suggested query: ++ +[source,shell] +---- +gh run list \ + --workflow live-provers.yml \ + --branch main \ + --status success \ + --limit 14 \ + --json conclusion,startedAt +---- + +| Record green window in STATE.a2ml +| ❌ TODO +| Single key, e.g. `l3-tier1-green-window = "2026-MM-DD .. 2026-MM-DD"`. +|=== + +== Criterion 6 — Operational pre-requisites + +From `docs/handover/TODO.md`: + +[%header,cols="2,1,3"] +|=== +| Sub-criterion | Status | Notes + +| `guix shell -m manifests/live-provers.scm -- just test-live` + end-to-end on a dev machine +| ❌ TODO +| Stated as L3 acceptance in the prompt. Has not been run end-to-end + on a fresh dev host (Guix-extend was rejected; sealed container is + the fallback for the not-in-Guix tail). + +| Dafny stub-shape upgrade (165 LoC → deep) +| ⚠ UNKNOWN +| Stated as L3 acceptance in the prompt. No commit found that closes + this; needs an audit pass against `src/rust/provers/dafny.rs`. + +| VeriSimDB integration emits live-run records +| ⚠ PARTIAL +| Per `feedback_verisimdb_policy`, write paths exist in + `dispatch.rs::emit_live_result` (2026-04-26). CI workflow gated on + upstream `verisimdb#ghcr-publish.yml` — runbook YAML kept but not + merged upstream. Tracked separately by audit item A5. +|=== + +== Summary + +[%header,cols="1,4,1"] +|=== +| Criterion | Description | Status + +| 1 | Wave-1 (every-PR Tier-1) green ≥7 days | ⚠ UNKNOWN (data not collected) +| 2 | Wave-2 (nightly Tier-2) green ≥1 cycle | ⚠ UNVERIFIED +| 3 | Wave-3 (weekly Tier-3) green ≥1 cycle | ⚠ UNVERIFIED +| 4 | Wave-4 (quarterly Tier-4) executed or explicitly deferred | ⚠ owner choice +| 5 | Tier-1 green ≥7 days on main (data) | ❌ TODO +| 6a | `guix shell` local end-to-end | ❌ TODO +| 6b | Dafny stub upgrade | ⚠ UNKNOWN +| 6c | VeriSimDB live-record emission | ⚠ PARTIAL (gated on verisimdb#ghcr) +|=== + +**Minimum sufficient path to close the gate:** + +. Run the `gh run list` query for the last 14 nightlies of + `live-provers.yml` on main; record the green window in + `STATE.a2ml`. (Closes Criteria 1, 2, 5.) +. Same query against `container-ci.yml` for Tier-3 weekly. + (Closes Criterion 3 sub-row 4.) +. Owner decision on Wave-4: execute one quarterly run OR formally + defer with a STATE.a2ml note. (Closes Criterion 4 sub-row 3.) +. One dev-host run of `guix shell -m manifests/live-provers.scm -- + just test-live`. (Closes Criterion 6a.) +. Audit `src/rust/provers/dafny.rs` against the "deep wiring" bar + used for Coq / Lean4. If it falls short, file a follow-up; if it + meets the bar, record that in STATE.a2ml. (Closes Criterion 6b.) +. (Criterion 6c stays parked on `verisimdb#ghcr-publish.yml` — that + is upstream-blocked and not on this gate's critical path.) + +Estimated effort: **one session** if the gh tokens are available. +Without gh access, Criteria 1/2/3/5 cannot close, full stop. + +== Cross-references + +* `docs/handover/L3-LIVE-PROVER-CI-PROMPT.md` — original L3 mission. +* `docs/handover/TODO.md` — P0 / P1 rows referenced above. +* `docs/handover/L1-CAPNPROTO-PROMPT.md` — the prompt that fires + once this gate closes. +* `docs/handover/L2-CHAPEL-PROMPT.md` — blocked behind L1 + this gate. +* C12 PR (echidnabot manifest) — per-repo opt-in shape for consumers. +* C14 PR (cross-repo proof DAG design) — what L1's Cap'n Proto schemas + must carry. +* C15 PR (gitbot-fleet hypatia wiring) — the closed-loop signal the + L3 gate depends on for VeriSimDB record quality. +* B7 PR (HP backend correction) — confirms the HP ecosystem is not + blocking L3.