Rectify: Channel B Resume-Boundary False-Fire by Trecek · Pull Request #3387 · TalonT-Org/AutoSkillit

Trecek · 2026-05-31T03:49:48Z

Summary

Channel B's _session_log_monitor has no resume-boundary awareness. Phase 2 unconditionally initializes scan_pos = 0 and last_size = 0, causing it to read the entire JSONL file — including content from the original session that already contains the completion marker. On resumed sessions (claude --resume), this produces a false ChannelBStatus.COMPLETION that kills the session in ~14 seconds with 0 tokens.

The architectural weakness is that _session_log_monitor treats every JSONL file as if it were freshly created, with no concept of a content boundary between prior and current session activity. The fix makes Phase 2's scan position a function of file state at discovery time, eliminating the entire class of stale-content false-fires.

Closes #3360

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-172850-168963/.autoskillit/temp/rectify/rectify_channel_b_resume_boundary_2026-05-30_172850_part_a.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	Model	count	uncached	output	cache_read	peak_ctx	turns	cache_write	time
rectify*	opus[1m]	1	93	37.3k	3.0M	151.4k	241	173.3k	24m 29s
review_approach*	sonnet	1	4.7k	8.3k	199.6k	52.0k	133	40.1k	7m 52s
dry_walkthrough*	opus	2	2.4k	34.5k	1.3M	80.6k	164	97.2k	15m 39s
implement*	sonnet	2	144	7.9k	650.5k	47.0k	73	32.7k	8m 16s
audit_impl*	sonnet	2	1.0k	13.2k	3.8M	72.4k	142	247.5k	15m 18s
prepare_pr*	sonnet	1	94.2k	3.3k	193.0k	27.6k	22	40.5k	1m 12s
compose_pr*	sonnet	1	37.9k	1.4k	162.6k	27.6k	14	15.5k	41s
review_pr*	sonnet	3	506	125.8k	4.0M	119.0k	207	329.1k	29m 32s
resolve_review*	opus	3	195	48.3k	5.9M	91.2k	219	273.9k	32m 48s
Total			141.2k	280.0k	19.2M	151.4k		1.2M	2h 15m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step	LoC Changed	cache_read/LoC	cache_write/LoC	output/LoC
rectify	0	—	—	—
review_approach	0	—	—	—
dry_walkthrough	0	—	—	—
implement	195	3335.9	167.8	40.3
audit_impl	0	—	—	—
prepare_pr	0	—	—	—
compose_pr	0	—	—	—
review_pr	0	—	—	—
resolve_review	47	124837.1	5827.4	1028.0
Total	242	79375.1	5164.7	1157.1

Model Usage Breakdown

Model	steps	uncached	output	cache_read	cache_write	time
opus[1m]	1	93	37.3k	3.0M	173.3k	24m 29s
sonnet	6	138.4k	159.9k	9.0M	705.5k	1h 2m
opus	2	2.6k	82.8k	7.2M	371.0k	48m 27s

Trecek

AutoSkillit PR Review — Verdict: approved_with_comments

Trecek · 2026-05-31T04:44:36Z

+        last_size = len(_initial_content.encode("utf-8"))
+    except OSError:
+        logger.warning(
+            "session_log_phase2_init_read_failed",


[warning] bugs: scan_pos is a Unicode char count (len of decoded str) while last_size is a UTF-8 byte count. The debug log at L310-L311 emits both as if comparable, but when the file contains multi-byte characters they diverge silently. Add a comment clarifying the units or use explicit variable names (scan_pos_chars / last_size_bytes).

Valid observation — flagged for design decision. scan_pos (char count for read_text slicing) and last_size (byte count for stat comparison) intentionally track different units for different consumers. Renaming variables or adding unit-suffix comments is a design decision for a follow-up.

Trecek · 2026-05-31T04:44:36Z

        assert result.session_id == ""


+class TestResumeBoundary:


[warning] cohesion: TestResumeBoundary is added to test_process_session_log_monitor.py only. test_process_channel_b.py exercises the same scan_pos boundary through subprocess helpers but has no explicit resume-boundary assertion class — a regression in the channel-B integration path would not surface as a named boundary-test failure.

Valid observation — flagged for design decision. Adding a parallel TestResumeBoundary class in test_process_channel_b.py would exercise the same scan_pos boundary through subprocess helpers. This is a meaningful test infrastructure decision beyond the scope of this bugfix PR.

Trecek

AutoSkillit review: warning-only findings detected. See inline comments — no blocking changes required.

Trecek

Observations accumulated from 2 local review rounds:

Trecek · 2026-05-31T04:47:12Z

+        f.flush()
+    # Delay must exceed session_id_timeout + Phase 1 poll so Phase 2 initializes
+    # scan_pos from discovery boundary before the marker arrives.
+    time.sleep(1.0)


Observation from local review round 0:

Timing-based synchronization between subprocess time.sleep(1.0) and the async monitor poll cadence is fragile under xdist -n 4 load. A readiness sentinel (flag file) would make ordering guaranteed rather than probabilistic.

Evidence: The sleep(1.0) is intentionally generous (1.0s vs actual constraint of ~0.02s), designed to avoid flakiness under WSL2+xdist load. Replacing with a flag-file sentinel is a meaningful redesign beyond PR scope.

Trecek · 2026-05-31T04:47:12Z

+        f.write(json.dumps(init) + "\\n")
+        f.flush()
+    # Delay must exceed session_id_timeout + Phase 1 poll so Phase 2 initializes
+    # scan_pos from discovery boundary before the marker arrives.


Observation from local review round 1:

Hard-coded 1.0s sleep to exceed session_id_timeout + Phase 1 poll is a timing assumption. The actual timeout values passed to the monitor are not visible here to confirm 1.0s is sufficient under heavy xdist load.

Evidence: Same underlying concern classified DISCUSS in round 0 (line 38). The 1.0s sleep is intentionally generous (actual constraint ~0.02s), designed to absorb WSL2+xdist event-loop jitter. Replacing with flag-file sentinel is a meaningful redesign beyond PR scope.

Trecek · 2026-05-31T04:47:12Z

+        f.flush()
+    # Delay must exceed _session_id_timeout + Phase 1 poll so the monitor
+    # discovers the file and initializes Phase 2 before the marker arrives.
+    time.sleep(1.0)


Observation from local review round 1:

1.0s sleep repeated across multiple script variants with the same comment. A shared helper or named constant would reduce fragility.

Evidence: The 1.0s values live inside textwrap.dedent subprocess script strings, making extraction to a named constant non-trivial. This is a test infrastructure design decision beyond PR scope.

On resumed sessions (claude --resume), the JSONL file already contains records from the prior session including the completion marker. Previously, Phase 2 initialized scan_pos=0 and last_size=0 causing an immediate false-fire COMPLETION result within ~14 seconds with 0 tokens. Initialize both scan_pos (character offset) and last_size (byte size) from the file's state at Phase 1 discovery time so Phase 2 only scans content written after monitoring began, eliminating stale-content false-fires. Adds TestResumeBoundary with three regression tests covering: - pre-existing completion marker must not false-fire (was failing) - new marker appended after monitoring starts still fires correctly - stale threshold triggers correctly with no new writes (was failing)

Tests that pre-populated JSONL files with completion markers and expected immediate COMPLETION now append the marker after monitoring starts, matching the new Phase 2 behavior where scan_pos starts at the file's discovery boundary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Subprocess scripts in test_process_channel_b.py now write an initial non-marker record before appending the completion marker, ensuring the marker arrives after Phase 2 initialization. Heartbeat orphan detection tests in test_process_heartbeat.py now append records after monitoring starts rather than pre-populating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…hannel B tests Scripts now delay writing the completion marker by 1-2s (up from 0.15s) to ensure Phase 2 initializes scan_pos from the discovery boundary before the marker arrives. Tests that don't write stdout reduce _session_id_timeout to 0.01s so Phase 1 starts promptly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Derive last_size from the same read_text() content used for scan_pos, closing the TOCTOU window where a write between read_text() and stat() could cause the poll loop to never detect the completion marker. Add logger.warning on OSError fallback to scan_pos=0 so operators can distinguish a transient read failure from intentional boundary-skip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The comment referenced _phase1_poll=1.0 but the test passes 0.01 after commit b28e328. Update to match the actual constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move json import to module level, removing three inline occurrences in TestResumeBoundary. Revise docstring of test_phase2_no_spurious_read_on_preexisting_content to match what is actually asserted (STALE status, not no-read invariant). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or handler The comment about "two reads in separate try/except blocks" was left stale after commit edf7f1d merged them into one block. Removed the misleading comment. Added exc_info=True to the OSError handler so errno/strerror are preserved in production logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tatus Replace isinstance(result_box[0].status, ChannelBStatus) — which is trivially true for any status value — with an assertion checking the expected COMPLETION status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… test The marker is written after 0.3s, so 5s provides ample headroom while avoiding a 30s CI hang if the marker write fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reorder the _session_id_timeout docstring so the JSONL marker timing comes first, followed by the session ID extraction constraint, matching the logical flow of the test setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Trecek force-pushed the channel-b-false-fires-on-stale-completion-marker-when-resumi/3360 branch 2 times, most recently from 1139207 to 3ce98db Compare May 31, 2026 04:33

Trecek commented May 31, 2026

View reviewed changes

Trecek and others added 11 commits May 30, 2026 21:54

fix(review): correct stale comment about _phase1_poll value

bcf1a9d

The comment referenced _phase1_poll=1.0 but the test passes 0.01 after commit b28e328. Update to match the actual constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(review): reduce stale_threshold from 30s to 5s in resume boundary…

00fe3a2

… test The marker is written after 0.3s, so 5s provides ample headroom while avoiding a 30s CI hang if the marker write fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Trecek force-pushed the channel-b-false-fires-on-stale-completion-marker-when-resumi/3360 branch from 3ce98db to 1c01bf9 Compare May 31, 2026 04:54

Trecek added this pull request to the merge queue May 31, 2026

Merged via the queue into develop with commit 6d58d85 May 31, 2026
3 checks passed

Trecek deleted the channel-b-false-fires-on-stale-completion-marker-when-resumi/3360 branch May 31, 2026 05:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rectify: Channel B Resume-Boundary False-Fire#3387

Rectify: Channel B Resume-Boundary False-Fire#3387
Trecek merged 11 commits into
developfrom
channel-b-false-fires-on-stale-completion-marker-when-resumi/3360

Trecek commented May 31, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Uh oh!

Trecek May 31, 2026

Uh oh!

Trecek May 31, 2026

Uh oh!

Uh oh!

Uh oh!

Trecek May 31, 2026

Uh oh!

Trecek May 31, 2026

Uh oh!

Trecek left a comment

Uh oh!

Trecek left a comment

Uh oh!

Trecek May 31, 2026

Uh oh!

Trecek May 31, 2026

Uh oh!

Trecek May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation Plan

Token Usage Summary

Token Efficiency

Model Usage Breakdown

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented May 31, 2026 •

edited

Loading