Skip to content

[FIX] Rectify: Cross-Campaign Reaper Kills Actively Executing Dispatches#3396

Merged
Trecek merged 4 commits into
developfrom
cross-campaign-reaper-kills-actively-executing-dispatches-no/3355
May 31, 2026
Merged

[FIX] Rectify: Cross-Campaign Reaper Kills Actively Executing Dispatches#3396
Trecek merged 4 commits into
developfrom
cross-campaign-reaper-kills-actively-executing-dispatches-no/3355

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 31, 2026

Summary

The dispatch reaper (_dispatch_reaper.py) kills actively-executing dispatches from other campaigns because it has zero awareness of execution activity. Its only guards do not distinguish "orphaned process from a crashed campaign" from "actively-executing process doing real work."

The architectural solution is a dispatch-level heartbeat sidecar (.heartbeat file co-located with dispatch state files) combined with a reaper activity gate that checks the heartbeat mtime before killing. The heartbeat file is written by _run_dispatch() via a new _dispatch_heartbeat async context manager, is touch()ed every 30s, and is deleted on normal completion.

Closes #3355

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-231830-883536/.autoskillit/temp/rectify/rectify_cross_campaign_reaper_immunity_2026-05-30_233000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
rectify* opus[1m] 1 7.7k 20.3k 2.0M 116.5k 266 106.6k 19m 47s
review_approach* sonnet 1 52 8.5k 208.7k 53.7k 105 42.5k 7m 21s
dry_walkthrough* opus 1 636 10.5k 1.6M 89.5k 101 128.7k 4m 41s
implement* sonnet 1 2.1k 18.2k 2.3M 90.9k 102 75.5k 5m 52s
audit_impl* sonnet 1 78 9.7k 276.6k 41.8k 44 36.7k 5m 48s
prepare_pr* sonnet 1 80.5k 3.6k 166.8k 27.8k 21 40.8k 1m 18s
compose_pr* sonnet 1 57.2k 1.5k 163.9k 27.8k 15 15.5k 43s
review_pr* sonnet 1 1.8k 34.9k 1.2M 87.5k 88 72.3k 9m 39s
resolve_review* opus 1 54 8.1k 838.8k 53.8k 43 63.8k 5m 25s
Total 150.3k 115.5k 8.8M 116.5k 582.3k 1h 0m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
rectify 0
review_approach 0
dry_walkthrough 0
implement 372 6169.6 203.0 48.8
audit_impl 0
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 9 93197.2 7084.1 905.1
Total 381 23110.2 1528.4 303.0

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
opus[1m] 1 7.7k 20.3k 2.0M 106.6k 19m 47s
sonnet 6 141.9k 76.5k 4.3M 283.3k 30m 43s
opus 2 690 18.6k 2.5M 192.4k 10m 6s

Trecek and others added 4 commits May 31, 2026 00:36
…hes from reaper

Introduces a dispatch-level heartbeat file (dispatch-{id}.heartbeat) co-located
with dispatch state files so the cross-campaign reaper can detect active execution
across process boundaries. The reaper now skips identity-confirmed dispatches whose
heartbeat mtime is within the configurable grace period (default 90s), eliminating
the kill vector against actively-executing dispatches from other campaigns.

- _dispatch_reaper.py: add _is_dispatch_heartbeating() helper and heartbeat_grace_seconds
  gate inside if identity_confirmed: before dry_run check; add heartbeat_grace_seconds
  param to both reap_stale_dispatches() and reap_stale_dispatches_async()
- _api.py: add _dispatch_heartbeat() async context manager (write/touch/cleanup) and
  nest it inside execution_marker() around dispatch_food_truck() call
- _lifespan.py: pass heartbeat_grace_seconds=90.0 to both boot call sites
- tests: add 4 new reaper heartbeat tests and 3 new dispatch marker tests;
  update 3 strict mock assertions to include heartbeat_grace_seconds kwarg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…test mock

Two failures:
1. fleet/_api.py used path.write_text() for heartbeat file creation, violating
   the test_no_direct_write_text_in_src architectural rule. Replaced with
   atomic_write() from core.io.
2. test_run_dispatch_heartbeat_mtime_is_fresh monkeypatched dispatch_food_truck
   with a mock returning None, causing AttributeError on skill_result.subtype.
   Mock now returns a proper SkillResult.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cross-package submodule imports are prohibited by REQ-IMP-001/002.
atomic_write is re-exported from core/__init__.pyi.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ispatch_heartbeating

Aligns the private helper parameter name with its public-facing callers
(reap_stale_dispatches, reap_stale_dispatches_async) for naming symmetry
across the call chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the cross-campaign-reaper-kills-actively-executing-dispatches-no/3355 branch from e2642ab to 4f82e0c Compare May 31, 2026 07:36
@Trecek Trecek added this pull request to the merge queue May 31, 2026
Merged via the queue into develop with commit af2bb8f May 31, 2026
3 checks passed
@Trecek Trecek deleted the cross-campaign-reaper-kills-actively-executing-dispatches-no/3355 branch May 31, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant