Rectify: Channel B Resume-Boundary False-Fire#3387
Conversation
1139207 to
3ce98db
Compare
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: approved_with_comments
| last_size = len(_initial_content.encode("utf-8")) | ||
| except OSError: | ||
| logger.warning( | ||
| "session_log_phase2_init_read_failed", |
There was a problem hiding this comment.
[warning] bugs: scan_pos is a Unicode char count (len of decoded str) while last_size is a UTF-8 byte count. The debug log at L310-L311 emits both as if comparable, but when the file contains multi-byte characters they diverge silently. Add a comment clarifying the units or use explicit variable names (scan_pos_chars / last_size_bytes).
There was a problem hiding this comment.
Valid observation — flagged for design decision. scan_pos (char count for read_text slicing) and last_size (byte count for stat comparison) intentionally track different units for different consumers. Renaming variables or adding unit-suffix comments is a design decision for a follow-up.
| assert result.session_id == "" | ||
|
|
||
|
|
||
| class TestResumeBoundary: |
There was a problem hiding this comment.
[warning] cohesion: TestResumeBoundary is added to test_process_session_log_monitor.py only. test_process_channel_b.py exercises the same scan_pos boundary through subprocess helpers but has no explicit resume-boundary assertion class — a regression in the channel-B integration path would not surface as a named boundary-test failure.
There was a problem hiding this comment.
Valid observation — flagged for design decision. Adding a parallel TestResumeBoundary class in test_process_channel_b.py would exercise the same scan_pos boundary through subprocess helpers. This is a meaningful test infrastructure decision beyond the scope of this bugfix PR.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review: warning-only findings detected. See inline comments — no blocking changes required.
Trecek
left a comment
There was a problem hiding this comment.
Observations accumulated from 2 local review rounds:
| f.flush() | ||
| # Delay must exceed session_id_timeout + Phase 1 poll so Phase 2 initializes | ||
| # scan_pos from discovery boundary before the marker arrives. | ||
| time.sleep(1.0) |
There was a problem hiding this comment.
Observation from local review round 0:
Timing-based synchronization between subprocess time.sleep(1.0) and the async monitor poll cadence is fragile under xdist -n 4 load. A readiness sentinel (flag file) would make ordering guaranteed rather than probabilistic.
Evidence: The sleep(1.0) is intentionally generous (1.0s vs actual constraint of ~0.02s), designed to avoid flakiness under WSL2+xdist load. Replacing with a flag-file sentinel is a meaningful redesign beyond PR scope.
| f.write(json.dumps(init) + "\\n") | ||
| f.flush() | ||
| # Delay must exceed session_id_timeout + Phase 1 poll so Phase 2 initializes | ||
| # scan_pos from discovery boundary before the marker arrives. |
There was a problem hiding this comment.
Observation from local review round 1:
Hard-coded 1.0s sleep to exceed session_id_timeout + Phase 1 poll is a timing assumption. The actual timeout values passed to the monitor are not visible here to confirm 1.0s is sufficient under heavy xdist load.
Evidence: Same underlying concern classified DISCUSS in round 0 (line 38). The 1.0s sleep is intentionally generous (actual constraint ~0.02s), designed to absorb WSL2+xdist event-loop jitter. Replacing with flag-file sentinel is a meaningful redesign beyond PR scope.
| f.flush() | ||
| # Delay must exceed _session_id_timeout + Phase 1 poll so the monitor | ||
| # discovers the file and initializes Phase 2 before the marker arrives. | ||
| time.sleep(1.0) |
There was a problem hiding this comment.
Observation from local review round 1:
1.0s sleep repeated across multiple script variants with the same comment. A shared helper or named constant would reduce fragility.
Evidence: The 1.0s values live inside textwrap.dedent subprocess script strings, making extraction to a named constant non-trivial. This is a test infrastructure design decision beyond PR scope.
On resumed sessions (claude --resume), the JSONL file already contains records from the prior session including the completion marker. Previously, Phase 2 initialized scan_pos=0 and last_size=0 causing an immediate false-fire COMPLETION result within ~14 seconds with 0 tokens. Initialize both scan_pos (character offset) and last_size (byte size) from the file's state at Phase 1 discovery time so Phase 2 only scans content written after monitoring began, eliminating stale-content false-fires. Adds TestResumeBoundary with three regression tests covering: - pre-existing completion marker must not false-fire (was failing) - new marker appended after monitoring starts still fires correctly - stale threshold triggers correctly with no new writes (was failing)
Tests that pre-populated JSONL files with completion markers and expected immediate COMPLETION now append the marker after monitoring starts, matching the new Phase 2 behavior where scan_pos starts at the file's discovery boundary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Subprocess scripts in test_process_channel_b.py now write an initial non-marker record before appending the completion marker, ensuring the marker arrives after Phase 2 initialization. Heartbeat orphan detection tests in test_process_heartbeat.py now append records after monitoring starts rather than pre-populating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hannel B tests Scripts now delay writing the completion marker by 1-2s (up from 0.15s) to ensure Phase 2 initializes scan_pos from the discovery boundary before the marker arrives. Tests that don't write stdout reduce _session_id_timeout to 0.01s so Phase 1 starts promptly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Derive last_size from the same read_text() content used for scan_pos, closing the TOCTOU window where a write between read_text() and stat() could cause the poll loop to never detect the completion marker. Add logger.warning on OSError fallback to scan_pos=0 so operators can distinguish a transient read failure from intentional boundary-skip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The comment referenced _phase1_poll=1.0 but the test passes 0.01 after commit b28e328. Update to match the actual constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move json import to module level, removing three inline occurrences in TestResumeBoundary. Revise docstring of test_phase2_no_spurious_read_on_preexisting_content to match what is actually asserted (STALE status, not no-read invariant). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or handler The comment about "two reads in separate try/except blocks" was left stale after commit edf7f1d merged them into one block. Removed the misleading comment. Added exc_info=True to the OSError handler so errno/strerror are preserved in production logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tatus Replace isinstance(result_box[0].status, ChannelBStatus) — which is trivially true for any status value — with an assertion checking the expected COMPLETION status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… test The marker is written after 0.3s, so 5s provides ample headroom while avoiding a 30s CI hang if the marker write fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorder the _session_id_timeout docstring so the JSONL marker timing comes first, followed by the session ID extraction constraint, matching the logical flow of the test setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3ce98db to
1c01bf9
Compare
Summary
Channel B's
_session_log_monitorhas no resume-boundary awareness. Phase 2 unconditionally initializesscan_pos = 0andlast_size = 0, causing it to read the entire JSONL file — including content from the original session that already contains the completion marker. On resumed sessions (claude --resume), this produces a falseChannelBStatus.COMPLETIONthat kills the session in ~14 seconds with 0 tokens.The architectural weakness is that
_session_log_monitortreats every JSONL file as if it were freshly created, with no concept of a content boundary between prior and current session activity. The fix makes Phase 2's scan position a function of file state at discovery time, eliminating the entire class of stale-content false-fires.Closes #3360
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/remediation-20260530-172850-168963/.autoskillit/temp/rectify/rectify_channel_b_resume_boundary_2026-05-30_172850_part_a.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary
* Step used a non-Anthropic provider; caching behavior may differ.
Token Efficiency
Model Usage Breakdown