feat(chapel): L2.3 cancel-token preemption#154
Merged
Conversation
…s self-SIGKILL on CAS win Wave-2 follow-on to PR #146. Closes the explicitly-deferred "kill in-flight losers" gap from L2.2: once the CAS-winner is selected, every still-polling loser observes the shared `cancelToken.cancelled` flag on its next poll tick and SIGKILLs its own subprocess instead of running to its per-prover timeout. Loser wall is now bounded by `pollInterval ≈ 100 ms` rather than `max(losers)`. Changes: - `parallel_proof_search.chpl`: - New `class CancelToken { var cancelled: atomic bool }` — class (not record) so that all `coforall` tasks share the same instance by reference. - `tryProver` signature gains `cancelToken: borrowed CancelToken? = nil`. Existing callers (sequential, best-of) pass nil and behave identically. - tryProver's poll loop now checks `cancelToken!.cancelled.read()` each tick; on observing true, SIGKILLs the subprocess and returns a preempted `ProofResult` with `exitCode = -5` (distinct from -3 timeout / -4 subprocess error). - `parallelProofSearchSpeculative` creates `new owned CancelToken()` and pairs the successful `winnerIdx.compareAndSwap` with `cancelToken.cancelled.write(true)` so only the SAME prover that wins the CAS triggers the signal (no double-write). - `proofs/agda/ParallelSoundness.agda`: - Header docstring extended to spell out that the existing `cancellation-safety` theorem IS the L2.3 preemption claim: once a head witness is true, the verdict is true regardless of the tail. The L2.3 code change preserves this — preempted losers land in the result table with success=false but are never returned because winnerIdx is set before any cancellation could race the CAS. - No new Agda lemma — the existing theorem already covers L2.3. - `docs/decisions/2026-05-30-chapel-l23-cancel-token.md`: - ADR addendum to chapel-rehabilitation.md covering the decision rationale, the soundness argument, three race-condition counter-arguments, expected speedup, reproduction recipe, and Wave-3 follow-ups (real-corpus bench, preemption telemetry, reason-string on the token). - Cross-link from rehab ADR's Wave-2 plan to the new ADR. Verification: `chpl --library --static -o /tmp/lib_test parallel_proof_search.chpl chapel_ffi_exports.chpl -I ../zig_ffi` compiles cleanly. `just bench-chapel-mrr` still returns correct speculative winners; the trivial fixtures don't exercise the new code path enough to show a wall-clock improvement (the bench writeup predicted this — real corpus is Wave-3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave-2 follow-on to PR #146. Closes the explicitly-deferred "kill
in-flight losers" gap from L2.2: once the CAS-winner is selected,
every still-polling loser observes the shared `cancelToken.cancelled`
flag on its next poll tick and SIGKILLs its own subprocess instead of
running to its per-prover timeout.
After this PR, loser wall is bounded by `pollInterval ≈ 100 ms`
rather than `max(losers)`.
What changed
`parallel_proof_search.chpl`
(not record) so all `coforall` tasks share by reference.
Sequential / best-of callers pass nil and behave identically.
each tick; on observing true, SIGKILL → preempted result with
`exitCode = -5` (distinct from -3 timeout / -4 subprocess error).
`winnerIdx.compareAndSwap` with `cancelToken.cancelled.write(true)`
so only the prover that wins the CAS triggers the signal.
`proofs/agda/ParallelSoundness.agda`
`cancellation-safety` theorem IS the L2.3 preemption claim.
No new lemma needed — `speculative (true ∷ bs) = true` for
arbitrary `bs` is exactly what preemption preserves.
ADR at `docs/decisions/2026-05-30-chapel-l23-cancel-token.md`
covering rationale, soundness, three race-condition counter-arguments,
expected speedup, reproduction recipe, Wave-3 follow-ups.
Cross-link from the rehab ADR's Wave-2 plan to the new ADR.
Test plan
🤖 Generated with Claude Code