Skip to content

[FIX] Session-scoped git write detection and FS sentinel disambiguation#3398

Merged
Trecek merged 4 commits into
developfrom
write-evidence-detection-git-writes-detected-false-positive/3372
May 31, 2026
Merged

[FIX] Session-scoped git write detection and FS sentinel disambiguation#3398
Trecek merged 4 commits into
developfrom
write-evidence-detection-git-writes-detected-false-positive/3372

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 31, 2026

Summary

Two complementary false-signal bugs in the write-evidence detection subsystem share a root cause pattern: pre/post comparisons without proper baseline handling. Bug 1 (git_writes_detected false positive) uses a global git state check instead of a session-scoped delta. Bug 2 (fs_writes_detected false negative) silently skips directories created mid-session because the pre-scan omits them and the post-scan conflates "missing key" with "scan error" via a shared None sentinel.

This is the 5th+ fix to this subsystem. The architectural solution introduces session-scoped baselines and sentinel disambiguation, following the CloneSnapshot/detect_contamination pattern already proven in clone_guard.py.

Part B will cover broader regression test hardening and the discriminated union for DirSnapshot — implement as a separate task.

Requirements

Two complementary false-signal bugs in the write-evidence detection subsystem:

  1. git_writes_detected false positive: _detect_branch_divergence() reports True for every session on any branch already ahead of origin/HEAD, regardless of whether the session committed anything. On a branch 977 commits ahead, every session gets git_writes_detected=true unconditionally.

  2. fs_writes_detected false negative: When a skill creates its output directory mid-session (e.g., mkdir -p .autoskillit/temp/review-approach/), the pre-session _stat_snapshot scan skips it (directory doesn't exist yet), and the post-session comparison short-circuits on the missing key. fs_writes_detected remains false despite confirmed file writes.

Both bugs share a root cause pattern: the pre/post comparison model lacks proper baseline handling for pre-existing state.

Closes #3372

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-232504-860060/.autoskillit/temp/rectify/rectify_write_evidence_false_signals_2026-05-30_233500.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
rectify* opus[1m] 1 70 22.4k 1.8M 98.6k 221 88.2k 17m 49s
review_approach* sonnet 1 9.8k 7.4k 154.5k 61.6k 99 67.4k 5m 7s
dry_walkthrough* opus 1 57 9.9k 1.4M 83.9k 170 67.4k 6m 45s
implement* sonnet 1 358 40.1k 3.2M 100.1k 125 84.6k 12m 31s
audit_impl* sonnet 1 78 10.4k 270.5k 38.9k 71 34.3k 5m 45s
prepare_pr* sonnet 1 76.4k 3.8k 224.9k 33.2k 23 27.1k 1m 28s
compose_pr* sonnet 1 70.4k 1.8k 191.7k 27.8k 16 15.5k 48s
review_pr* sonnet 1 150 59.7k 1.1M 102.7k 59 87.6k 13m 43s
resolve_review* opus 1 41 7.7k 1.4M 73.2k 45 57.2k 8m 10s
Total 157.4k 163.2k 9.8M 102.7k 529.2k 1h 12m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
rectify 0
review_approach 0
dry_walkthrough 0
implement 280 11470.9 302.1 143.3
audit_impl 0
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 7 196844.4 8176.4 1099.0
Total 287 34016.5 1844.0 568.8

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
opus[1m] 1 70 22.4k 1.8M 88.2k 17m 49s
sonnet 6 157.2k 123.3k 5.2M 316.5k 39m 23s
opus 2 98 17.5k 2.8M 124.6k 14m 56s

Trecek and others added 4 commits May 31, 2026 01:00
Add failing tests for two bugs in the write-evidence detection subsystem:
- T-ZW-GIT-FP: git_writes_detected false positive on pre-diverged branch
- T-WE-FS-FN: fs_writes_detected false negative on late-created watch directory
- T-WE-FS-SENTINEL: sentinel disambiguation (None/OSError vs {}/missing-dir)

Also updates TestGitWritesClonePath docstring to clarify it tests
_detect_branch_divergence in isolation (not the production call path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 1 fix (git_writes_detected false positive):
- Add _detect_session_git_writes(cwd, pre_session_sha) to _headless_git.py.
  Compares pre/post HEAD SHAs — a session-scoped delta. A branch that was
  already ahead of origin before the session returns False correctly.
- Replace _detect_branch_divergence call site in _headless_execute.py.
- Update __init__.py facade re-export (_detect_branch_divergence remains
  in _headless_git.py for existing tests that import it directly).

Bug 2 fix (fs_writes_detected false negative on late-created dirs):
- Pre-scan loop now stores {} for watch dirs that don't exist at session
  start, distinct from None (OSError). The post-scan guard (pre is not None)
  passes for {}, so session-created directories with files are detected.

Test 1D: add unit tests for _detect_session_git_writes (all four code paths).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… else branch

The session-scoped git detection and FS sentinel disambiguation fix added 4 lines
to _headless_execute.py (else branch for missing watch dirs). Update the arch
budget from 595 to 600.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ndex

Replace fragile call_count index with a `raised` flag that fires OSError
on the first _stat_snapshot call matching the target watch_dir. More
robust if call ordering or number of watch_dirs changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the write-evidence-detection-git-writes-detected-false-positive/3372 branch from 3752308 to 6e4d972 Compare May 31, 2026 08:00
@Trecek Trecek added this pull request to the merge queue May 31, 2026
Merged via the queue into develop with commit 3ee3b40 May 31, 2026
3 checks passed
@Trecek Trecek deleted the write-evidence-detection-git-writes-detected-false-positive/3372 branch May 31, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant