diff --git a/docs/decisions/0015-backend-file-lock-primitive.md b/docs/decisions/0015-backend-file-lock-primitive.md new file mode 100644 index 00000000..b96f4d7b --- /dev/null +++ b/docs/decisions/0015-backend-file-lock-primitive.md @@ -0,0 +1,125 @@ + + + +# 15. Backend-enforced file-lock primitive — spike + +Date: 2026-05-24 + +## Status + +Deferred (2026-05-24) — ADR-0016 (cross-host federation stop-gap) was +chosen as the next build over this one because it closes a more +user-visible survey gap without altering the verified backend core. +The bridge-layer advisory path-claims shipped in PR #142/#143 remain +the current answer for in-flight conflict signalling. + +This ADR stays on file as the design-of-record for a backend-enforced +lock primitive should "advisory warning is not enough" become a stated +requirement. Reopen by flipping to "Proposed" and scheduling alongside +the next P-0x proof-obligation cycle. + +## Context + +The multi-agent MCP survey identified that two comparator servers +(`rinadelph/Agent-MCP`, `AndrewDavidRivers/multi-agent-coordination-mcp`) +ship **hard file locks** at claim time, whereas `local-coord-mcp` has +only the bridge-layer **advisory** path-claims added in PR #142 / PR #143. +The survey marked file-level locks as a clear gap relative to those two. + +This spike evaluates promoting path-claims from "bridge-only advisory +warning" to a **backend-enforced lock primitive** in the verified Idris2 +ABI + Zig FFI. + +## What the change does + +1. Extend the `LocalCoord` Idris2 ABI with a new tool surface: + `coord_lock_paths(token, task, paths[])` and + `coord_unlock_paths(token, task)`. Paths are interned, normalised, and + the backend maintains an authoritative `task → segment[][]` map + alongside the existing claim map. +2. The lock check runs **inside** `coord_claim_task` when `paths` is + present: if any declared path segment-overlaps an existing locked + path held by another peer, the claim is **rejected** (not annotated). + Today's bridge-layer overlap scan becomes a projection of the + backend's authoritative state. +3. New proof obligation **P-08: LockSoundness** in + `cartridges/local-coord-mcp/abi/LocalCoord/Locks.idr`, discharged by + construction: + - **Mutual exclusion** — no two distinct tasks simultaneously hold + overlapping paths (segment-prefix-disjoint). + - **Lock-claim composition** — a granted claim's path-locks survive + until `coord_unlock_paths` or watchdog expiry; never silently + released by a different peer. + - **Watchdog interaction** — when the claim's role-based TTL fires + (P-03 WatchdogTermination), the lock-set is released atomically + with the claim. +4. Cartridge schema bump: `cartridge.json` adds the two new tools and + widens `coord_claim_task` to declare `path_locked` as a rejection + reason in its output schema. Coherence test (`dispatch_test.js`) + already enforces bridge↔cartridge sync, so the bridge tool list and + dispatcher must update in lockstep. + +## Cost + +| Surface | Work | +|---|---| +| Idris2 ABI | New `Locks.idr` module + P-08 discharge. Roughly 200-300 lines of Idris2; the segment-prefix proof reduces to list-prefix induction on `List String`, which is constructive (no new `believe_me`). | +| Zig FFI | `boj_coord_lock_paths` / `boj_coord_unlock_paths` exports + an internal `PathLockTable` (radix tree or sorted-array; flat array is fine at expected scale). ~150 LOC. | +| Bridge | Demote `path-claims.js` to a stateless projection — query the backend's lock table on demand rather than maintaining its own. Or delete it. | +| Cartridge schema | Two new tool entries; one output-shape widening on `coord_claim_task`. | +| Tests | New Idris2 totality proofs (P-08), Zig unit tests, bridge integration tests, bench update. | + +Estimated: 2-4 working days end-to-end. + +## Benefit + +- Closes the survey gap **completely**: backend-enforced locks rather + than an advisory layer. Strongest answer. +- Idris2 proof gives a guarantee neither Agent-MCP nor multi-agent-coord + has — they ship runtime-checked locks, we'd ship a *proved-correct* + lock primitive. Aligns with the AAA Formal posture (P-01..P-07). +- The bridge-layer code shrinks (path-claims.js largely becomes a + pass-through projection), which is a healthy direction. + +## Risks / open questions + +- **Architectural regression risk.** Today's design is explicit: the + verified backend stays minimal and the bridge layers advisory + features. Adding state to the backend (path-locks) is the first time + a non-task-id concept enters the proved core. P-08 must not weaken + the existing proofs — needs careful composition argument. +- **Schema-evolution lock-in.** Once `coord_lock_paths` is in the + cartridge manifest, removing it is a breaking change. Path-semantics + (segment-prefix matching) gets baked into the wire contract. +- **Defines vs. enforces.** What does the backend do when a *non-path* + claim conflicts with a locked path? The current advisory layer is + silent in that case; an enforcer must take a position. +- **Failure modes change.** Today a clashing path is a *warning*; under + this proposal it's a *claim rejection*. Existing rate-limit (5 + rejections / 10 min → 30s cooldown) applies; either way, agents need + retry logic. + +## Verdict + +Strongest technical answer to the survey gap, but the largest +architectural commitment in this branch of work. The proof load +(P-08) is tractable — segment-prefix mutual exclusion is structurally +inductive — but the *design* commitment (locks as a first-class backend +concept) deserves the user's explicit sign-off, not an implementer's +judgement call. + +**Recommendation:** Choose this only if "advisory warning is not +enough" is a stated requirement. Otherwise, prefer ADR-0016 +(cross-host federation stop-gap), which closes a different and arguably +more user-visible survey gap (Ruflo being the only comparator with real +cross-host story) without altering the verified core. + +## Out of scope + +- File-range locks (byte ranges within a file). Not addressed — paths + are still whole-file. The Perforce-style range lock would be a + separate, larger ADR. +- Lock fairness / queueing on contention. This spike rejects on + overlap; a queue would be a P-09 extension. +- Cross-host locks. Locks here are still localhost-bound; combining + with ADR-0010 or ADR-0016 is future work. diff --git a/docs/decisions/0016-mtls-federation-stopgap.md b/docs/decisions/0016-mtls-federation-stopgap.md new file mode 100644 index 00000000..ec090b57 --- /dev/null +++ b/docs/decisions/0016-mtls-federation-stopgap.md @@ -0,0 +1,144 @@ + + + +# 16. mTLS + ed25519 federation stop-gap — spike + +Date: 2026-05-24 + +## Status + +Accepted (2026-05-24) — chosen over ADR-0015 because it closes the +more user-visible survey gap (Ruflo as the only comparator with a +cross-host story) and is additive: loopback bus and Idris2-verified +intra-host ABI stay untouched. Positioned as the v1 transport; +ADR-0010 (DID + ML-DSA-87 + ML-KEM-1024 + federated quarantine) +remains the v2 upgrade path. + +**Implementation plan:** phased rollout, each phase its own PR off +`main` after this ADR lands. + +| Phase | Scope | Estimated | +|---|---|---| +| 1 | Identity foundation — ed25519 keypair, pubkey export, `known_peers.toml` parser, `coord-tui --print-pubkey` flag. No transport. | ~1 day | +| 2 | Envelope sign/verify on the loopback bus (round-trip validation before any wire work). | ~1 day | +| 3 | TLS listener on `:7746` + mTLS handshake gated by `known_peers.toml`. | ~1-2 days | +| 4 | New `coord_add_peer` / `coord_remove_peer` / `coord_list_remote_peers` tools — bridge + cartridge.json sync. | ~0.5 day | +| 5 | Idris2 `Federation.idr` — envelope-boundary signature soundness obligation. | ~1 day | +| 6 | Tests — two-instance round-trip, signature tampering, replay protection. | ~0.5 day | + +## Context + +The multi-agent MCP survey identified `ruvnet/ruflo` as the only +comparator shipping a real cross-host story (mTLS + ed25519 federation +across machines). `local-coord-mcp` is currently localhost-only +(`127.0.0.1:7745`) — closing this gap is one of the two outstanding +items from PR #143's spike. + +ADR-0010 already proposes the *ambitious* answer: DID-based identity, +ML-DSA-87 signing, ML-KEM-1024 key exchange, federated quarantine. It +sits at "Proposed (RFC)" status, tracked under epic #87 item 3, and is +a substantial cross-cartridge build effort. + +This spike evaluates a deliberately **smaller** stop-gap: ship +Ruflo-equivalent cross-host federation using mTLS + ed25519 *now*, +without blocking on the post-quantum work in ADR-0010. The two are +compatible: this becomes the v1 transport, ADR-0010 becomes the v2 +upgrade path. + +## What the change does + +1. **Transport.** Add a TLS-terminating endpoint alongside the loopback + bus — `https://:7746/coord/federated/inbox` for inbound + federated envelopes. Loopback bus (`:7745`) stays the default for + intra-machine traffic and is untouched. +2. **Identity.** Each peer generates an ed25519 keypair on first run + (stored under `~/.cache/coord-tui/peer.key`). The public key is the + peer's federated identity; the private key signs outbound envelopes. + No DID method, no PKI hierarchy — leaf-only trust. +3. **Peer trust.** A new file `~/.config/coord-tui/known_peers.toml` + maps peer-id → `{host, port, pubkey}`. Federation only works between + peers that have explicitly added each other (manual key exchange via + `coord-tui --print-pubkey` and a shared channel). No discovery; no + registry. +4. **mTLS.** Both ends of the federated link present client certs + derived from their ed25519 identity keys (SPKI-pinned, not + CA-signed). Connection fails closed if the presented pubkey doesn't + match `known_peers.toml`. +5. **Envelope wrapping.** Outbound coord messages destined for a remote + peer are signed (ed25519 over the canonical envelope bytes), wrapped + in `{from, to, signature, payload}`, sent over the mTLS connection. + Receiver verifies signature against `known_peers.toml`, then injects + into the local bus as if it had arrived locally. +6. **New tools.** `coord_add_peer(peer_id, host, port, pubkey)`, + `coord_remove_peer(peer_id)`, `coord_list_remote_peers()`. No tool + schema change to the existing `coord_send` / `coord_claim_task` + surface — federation is transport-layer. + +## Cost + +| Surface | Work | +|---|---| +| Zig adapter | TLS listener (use `std.crypto.tls`), ed25519 sign/verify (`std.crypto.sign.Ed25519` — already available), known_peers parser, envelope wrap/unwrap. ~400-600 LOC. | +| Idris2 ABI | New `Federation.idr` — but only proof obligations on the *envelope* boundary (signature verification soundness), not on the transport. ~150 lines; reuses existing `SafeHTTP` patterns. Class (J) believe_me may be unavoidable for `std.crypto.tls` opaque primitives — same axiom posture as the existing `Char`/`String` primitives. | +| Bridge | New 3 `coord_*` tool entries in `tools.js` + cartridge.json; dispatcher routes them to the backend. No envelope-shape changes. | +| coord-tui | `--print-pubkey` flag, `~/.config/coord-tui/known_peers.toml` reader, optional `coord-add-peer` shell helper. | +| Tests | Round-trip test on two backend instances on different ports; signature-tampering rejection test; replay-protection (envelope timestamp + nonce). | + +Estimated: 4-6 working days end-to-end. + +## Benefit + +- Closes the survey's "cross-host transport" gap directly: matches + Ruflo's posture, doesn't claim more. +- Additive — the existing loopback bus and Idris2-verified intra-host + ABI are untouched. No proof regression. +- Sets up ADR-0010 as a clean v2 upgrade (swap ed25519→ML-DSA-87, + add ML-KEM, add DID resolution) once that work is prioritised. + Wire format is the same shape; only the cryptographic primitives + change. +- Real user-visible feature — agents on Jonathan's workstation can + coordinate with agents on a build server, a sandbox VM, or a + teammate's machine. + +## Risks / open questions + +- **Trust bootstrap is manual.** Exchanging pubkeys via a shared + channel is the right answer for a v1 (it's how SSH known_hosts + works), but it does mean federation has a per-peer setup cost. ADR- + 0010's DID + cartridge-index resolution removes this; v1 doesn't. +- **No federated quarantine.** Tier-2+ envelopes from a remote peer + hit the *local* master for review — no upstream supervision model. + This is a feature gap relative to ADR-0010, not a bug, but worth + naming. +- **Crypto agility.** ed25519 in a post-quantum world has a finite + shelf life. Estate standards already commit to ML-DSA-87. This + spike's stop-gap framing is honest about that: v1 lasts until ADR- + 0010 lands, not forever. CHANGELOG and README should say so. +- **Idris2 transport-layer proofs.** TLS internals are opaque to the + Idris2 backend-assurance harness — same axiomatisation posture as + the existing 5 `believe_me` sites. Net new principled assumption, + not a regression. + +## Verdict + +Smaller scope than ADR-0015, lower architectural risk than ADR-0010, +and directly closes the only survey gap where `local-coord-mcp` +clearly trails a comparator. The "this is explicitly a stop-gap" +framing keeps the door open for ADR-0010 without making it a +blocker. + +**Recommendation:** Prefer this over ADR-0015 if the goal is "ship a +visible new capability that the survey identified as missing". +Combine with ADR-0015 only if file-locks are a stated user need; +otherwise ADR-0015 is gold-plating relative to the survey's actual +findings. + +## Out of scope + +- DID identity (ADR-0010, v2 upgrade). +- Post-quantum crypto (ADR-0010, v2 upgrade). +- Discovery / registry (deliberate — manual known_peers keeps the + trust story legible). +- Federated quarantine semantics (would land alongside DID work). +- Wire format negotiation for v1↔v2 transition — addressed when + ADR-0010 implementation starts.