diff --git a/FEATURE_REBUILD/ObservedBehavior.md b/FEATURE_REBUILD/ObservedBehavior.md index 6e5583a2..f3131987 100644 --- a/FEATURE_REBUILD/ObservedBehavior.md +++ b/FEATURE_REBUILD/ObservedBehavior.md @@ -33,3 +33,12 @@ Observed runtime behavior for the optional Web UI dashboard feature on source br 4. `GET /api/metrics/timeseries` keeps `{requests, errors, latencies}` arrays of `{t, v}`. 5. Audit API and export endpoints remain backward-compatible. 6. Shared metrics persistence remains process-safe. + +## Supplemental Observation Tooling + +1. `scripts/xcode_approval_harness.py` provides deterministic MCP startup probes for + `xcrun mcpbridge` or `mcpbridge-wrapper`. +2. The harness records timestamped send/receive events, idle timeouts, EOF boundaries, and + `notifications/tools/list_changed` so approval races can be reconstructed after a run. +3. Use `--pause-before-step tools-list --pause-seconds ` when you need a stable window to + click **Allow** in Xcode before discovery requests continue. diff --git a/FEATURE_REBUILD/Workplan.md b/FEATURE_REBUILD/Workplan.md index c99984dd..844a0f9c 100644 --- a/FEATURE_REBUILD/Workplan.md +++ b/FEATURE_REBUILD/Workplan.md @@ -163,6 +163,59 @@ - Rollback: - Revert release-note and packaging metadata changes only. +#### ✅ T-010 (P1): Build Xcode approval observation harness +- Status: ✅ Completed (2026-03-10) +- Deps: none +- Parallelizable with: T-009 +- Touched files: + - `scripts/xcode_approval_harness.py` + - `tests/unit/test_xcode_approval_harness.py` + - `docs/troubleshooting.md` + - `FEATURE_REBUILD/ObservedBehavior.md` +- Acceptance criteria: + - [x] Harness can execute deterministic MCP handshake scenarios against `xcrun mcpbridge` + or the wrapper command. + - [x] Harness logs timestamped send/receive events, EOF, and timeout boundaries so approval + races can be reconstructed after a run. + - [x] Harness can hold and replay `initialize`, `notifications/initialized`, `tools/list`, + `resources/list`, and `prompts/list` steps with configurable delays around manual Xcode + approval. + - [x] Harness records whether `notifications/tools/list_changed` is observed after approval. +- Verification commands: + - `pytest tests/unit/test_xcode_approval_harness.py -v` + - `python3 scripts/xcode_approval_harness.py --help` +- Rollback: + - Remove the harness script/tests/docs note and fall back to ad hoc manual probing. + +#### ✅ T-011 (P1): Emit synthetic broker tools/list_changed on catalog warm-up +- Status: ✅ Completed (2026-03-10) +- **Description:** Extend the broker so clients can learn that the Xcode tool catalog became + available after approval even when upstream `xcrun mcpbridge` never emits + `notifications/tools/list_changed` itself. Reuse the existing broker warm-up probes and + synthesize a client-facing `tools/list_changed` only when the cached catalog transitions from + cold to ready or materially changes after reconnect. +- **Priority:** P1 +- **Dependencies:** T-010 +- **Parallelizable:** no +- **Outputs/Artifacts:** + - `src/mcpbridge_wrapper/broker/daemon.py` + - `src/mcpbridge_wrapper/broker/transport.py` + - `tests/unit/test_broker_daemon.py` + - `tests/unit/test_broker_transport.py` + - `SPECS/INPROGRESS/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up.md` + - `SPECS/INPROGRESS/T-011_Validation_Report.md` +- **Acceptance Criteria:** + - [x] Broker emits a synthetic `notifications/tools/list_changed` when its internal cached + `tools/list` transitions from empty/unavailable to a non-empty ready catalog. + - [x] Broker re-emits the synthetic notification when reconnect produces a materially changed + non-empty tool catalog, but does not spam clients on repeated empty retry probes. + - [x] Existing `tools/list` readiness gating and cache-hit behavior remain unchanged for + clients that explicitly call `tools/list`. + - [x] Unit tests cover warm-up, reconnect, and no-op retry behavior for the synthetic + notification path. + - [x] Validation notes document whether Cursor/Zed visibly react to the synthetic signal + without a manual MCP toggle. + ## Acceptance Criteria (rolled up) 1. Web UI API and dashboard contracts remain backward-compatible. diff --git a/SPECS/ARCHIVE/INDEX.md b/SPECS/ARCHIVE/INDEX.md index f9d0e307..aa863108 100644 --- a/SPECS/ARCHIVE/INDEX.md +++ b/SPECS/ARCHIVE/INDEX.md @@ -6,6 +6,8 @@ | Task ID | Folder | Archived | Verdict | |---------|--------|----------|---------| +| T-011 | [T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/](T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/) | 2026-03-10 | PASS | +| T-010 | [T-010_Build_Xcode_approval_observation_harness/](T-010_Build_Xcode_approval_observation_harness/) | 2026-03-10 | PASS | | P8-T2 | [P8-T2_Prepare_for_Release_0.4.3/](P8-T2_Prepare_for_Release_0.4.3/) | 2026-03-10 | PASS | | P1-T13 | [P1-T13_Document_stale_editable_install_version_mismatch_in_troubleshooting_guide/](P1-T13_Document_stale_editable_install_version_mismatch_in_troubleshooting_guide/) | 2026-03-10 | PASS | | P2-T8 | [P2-T8_Gate_broker_tools_list_on_warmed_tool_catalog/](P2-T8_Gate_broker_tools_list_on_warmed_tool_catalog/) | 2026-03-10 | PASS | @@ -208,6 +210,8 @@ | File | Description | |------|-------------| +| [REVIEW_t011_synthetic_broker_tools_list_changed.md](_Historical/REVIEW_t011_synthetic_broker_tools_list_changed.md) | Review report for T-011 | +| [REVIEW_t010_xcode_approval_harness.md](_Historical/REVIEW_t010_xcode_approval_harness.md) | Review report for T-010 | | [REVIEW_release_0.4.3_preparation.md](P8-T2_Prepare_for_Release_0.4.3/REVIEW_release_0.4.3_preparation.md) | Review report for P8-T2 | | [REVIEW_p1_t13_editable_install_troubleshooting.md](_Historical/REVIEW_p1_t13_editable_install_troubleshooting.md) | Review report for P1-T13 | | [REVIEW_p2_t8_tools_catalog_gate.md](_Historical/REVIEW_p2_t8_tools_catalog_gate.md) | Review report for P2-T8 | @@ -648,3 +652,5 @@ | 2026-03-10 | P2-T8 | Archived REVIEW_p2_t8_tools_catalog_gate report | | 2026-03-10 | P1-T13 | Archived Document_stale_editable_install_version_mismatch_in_troubleshooting_guide (PASS) | | 2026-03-10 | P1-T13 | Archived REVIEW_p1_t13_editable_install_troubleshooting report | +| 2026-03-10 | T-010 | Archived Build_Xcode_approval_observation_harness (PASS) | +| 2026-03-10 | T-010 | Archived REVIEW_t010_xcode_approval_harness report | diff --git a/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Build_Xcode_approval_observation_harness.md b/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Build_Xcode_approval_observation_harness.md new file mode 100644 index 00000000..9c071562 --- /dev/null +++ b/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Build_Xcode_approval_observation_harness.md @@ -0,0 +1,139 @@ +# PRD: T-010 — Build Xcode approval observation harness + +**Task ID:** T-010 +**Priority:** P1 +**Status:** Planned +**Date:** 2026-03-10 +**Owner:** Codex + +## Problem Statement + +We need a deterministic way to observe how `xcrun mcpbridge` and `mcpbridge-wrapper` +behave around the Xcode GUI approval flow. Manual probing has shown multiple race-shaped +outcomes across Cursor and Zed, but the repo lacks a repeatable harness that can capture +the exact order and timing of MCP messages before and after the user clicks **Allow** in +Xcode. + +## Goals + +1. Provide a repo-local CLI harness that can run scripted MCP startup scenarios. +2. Capture timestamped protocol events, timeout windows, and EOF boundaries in a format + suitable for post-run analysis. +3. Make it easy to answer whether readiness appears via: + - normal `initialize` + `tools/list` responses, + - reconnect / EOF patterns, + - or late `notifications/tools/list_changed`. + +## Non-Goals + +1. Automate the Xcode GUI approval click itself. +2. Change broker/runtime behavior in production code. +3. Add a packaged user-facing command to the published PyPI distribution. + +## Deliverables + +1. `scripts/xcode_approval_harness.py` + - CLI entrypoint for deterministic MCP observation runs. +2. `tests/unit/test_xcode_approval_harness.py` + - Unit coverage for scenario parsing, event formatting, and timeout handling. +3. `docs/troubleshooting.md` + - Short operator note pointing to the harness for approval-race diagnostics. +4. `FEATURE_REBUILD/ObservedBehavior.md` + - Document the harness as a repeatable observation tool for external Xcode behavior. +5. `SPECS/INPROGRESS/T-010_Validation_Report.md` + - Captured quality-gate results and a brief manual smoke-run note. + +## Functional Requirements + +1. The harness MUST support launching either: + - `xcrun mcpbridge`, or + - an arbitrary command supplied after `--`. +2. The harness MUST send newline-delimited JSON-RPC messages in a deterministic sequence. +3. The harness MUST support per-step delays so runs can pause around manual Xcode approval. +4. The harness MUST timestamp every sent and received event relative to process start. +5. The harness MUST record: + - outgoing request/notification payloads, + - incoming responses/notifications, + - EOF, + - timeout markers, + - process exit code. +6. The harness MUST highlight whether `notifications/tools/list_changed` was observed. +7. The harness MUST be usable without modifying installed package metadata or editor config. + +## CLI Design + +Planned flags: + +- `--scenario `: named scenario preset. +- `--step-delay `: default delay between scripted steps. +- `--read-timeout `: timeout for waiting on incoming lines. +- `--output `: optional JSONL event log file. +- `--pretty`: also print human-readable event summary to stdout. +- `--`: optional command override; defaults to `xcrun mcpbridge`. + +Planned scenario presets: + +1. `approval-probe` + - `initialize` + - `notifications/initialized` + - `tools/list` + - `resources/list` + - `prompts/list` +2. `tools-only` + - `initialize` + - `notifications/initialized` + - repeated `tools/list` + +## Output Contract + +Each event line SHOULD be JSON with at least: + +```json +{ + "t_ms": 1234, + "direction": "send|recv|meta", + "event": "jsonrpc|timeout|eof|exit", + "summary": "tools/list", + "payload": {} +} +``` + +The harness SHOULD print a final summary including: + +- observed response IDs +- whether a non-empty `tools/list` was seen +- whether `notifications/tools/list_changed` was seen +- whether EOF occurred before readiness + +## Test Plan + +Automated: + +1. Parse scenario presets into the expected ordered message list. +2. Format JSONL event records deterministically. +3. Recognize `tools/list_changed`, `tools/list`, EOF, and timeout markers. +4. Validate `--help` and CLI argument parsing. + +Manual smoke: + +1. Run harness against the live `xcrun mcpbridge`. +2. Pause for Xcode approval. +3. Confirm event log is written and summary renders. + +## Verification Commands + +```bash +pytest tests/unit/test_xcode_approval_harness.py -v +python3 scripts/xcode_approval_harness.py --help +ruff check src/ tests/ scripts/ +mypy src/ +pytest --cov +``` + +## Risks + +1. Real Xcode approval timing is inherently nondeterministic; the harness can observe it + but cannot stabilize it. +2. Live smoke runs depend on a local Xcode session and cannot be asserted in CI. +3. `xcrun mcpbridge` may emit no explicit approval event, leaving the harness to infer + readiness from message timing and catalog contents. diff --git a/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Validation_Report.md b/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Validation_Report.md new file mode 100644 index 00000000..12168701 --- /dev/null +++ b/SPECS/ARCHIVE/T-010_Build_Xcode_approval_observation_harness/T-010_Validation_Report.md @@ -0,0 +1,169 @@ +# Validation Report: T-010 — Build Xcode approval observation harness + +**Date:** 2026-03-10 +**Verdict:** PASS + +--- + +## Acceptance Criteria + +| # | Criterion | Status | +|---|-----------|--------| +| 1 | Harness can execute deterministic MCP handshake scenarios against `xcrun mcpbridge` or a wrapper command | ✅ PASS | +| 2 | Harness records timestamped send/receive events, EOF boundaries, and idle timeout windows | ✅ PASS | +| 3 | Harness can pause around named startup steps to create a stable manual Xcode approval window | ✅ PASS | +| 4 | Harness records whether `notifications/tools/list_changed` appears after approval | ✅ PASS | +| 5 | Unit coverage for scenario construction, parsing, summaries, and CLI behavior passes | ✅ PASS | +| 6 | Documentation points operators to the harness for approval-race investigation | ✅ PASS | + +--- + +## Evidence + +### Functional behavior + +- Added `scripts/xcode_approval_harness.py`, a repo-local CLI harness that drives scripted + MCP startup sequences against either the default `xcrun mcpbridge` target or an override + command provided after `--`. +- The harness uses the same baseline MCP protocol version (`2024-11-05`) that the current + broker path sends to Apple's bridge, so live probes stay aligned with the repo's existing + transport behavior. +- Implemented two deterministic scenarios: + - `approval-probe` for `initialize`, `notifications/initialized`, `tools/list`, + `resources/list`, `resources/templates/list`, and `prompts/list` + - `tools-only` for repeated `tools/list` probing after initialization +- The harness records: + - outbound JSON-RPC steps with timestamps + - inbound stdout JSON-RPC responses/notifications with parsed summaries + - stderr text lines + - idle timeout boundaries + - stdout/stderr EOF markers + - final process exit code +- Added summary extraction that reports response IDs, observed `tools/list` sizes, + whether any non-empty tool catalog was seen, whether `notifications/tools/list_changed` + appeared, and whether stdout hit EOF. + +### Regression coverage + +- `tests/unit/test_xcode_approval_harness.py` + - verifies both scenario layouts + - verifies unknown scenario rejection + - verifies stable parsing/formatting of JSON-RPC and plain-text lines + - verifies default command handling and `--` override behavior + - verifies timeout validation + - verifies summary extraction for late `notifications/tools/list_changed` + +### Documentation updates + +- `docs/troubleshooting.md` + - added a new "Protocol observation harness" note to the Xcode approval race section +- `Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md` + - added the matching DocC note +- `FEATURE_REBUILD/ObservedBehavior.md` + - recorded the harness as supplemental observation tooling for approval-race analysis + +### Live smoke observation + +The harness was exercised against the published broker entrypoint: + +```bash +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --scenario tools-only \ + --step-delay 0.2 \ + --read-timeout 2.0 \ + --final-read-timeout 2.0 \ + --output logs/xcode-approval-harness-smoke.jsonl \ + -- \ + uvx --isolated --no-config --from mcpbridge-wrapper mcpbridge-wrapper --broker +``` + +This specific run was performed without a synchronized click on the Xcode **Allow** dialog, +so the captured timeline represents pre-approval startup behavior rather than post-approval +catalog recovery. + +Observed result: + +- no JSON-RPC response arrived within the configured 2-second read windows +- no `notifications/tools/list_changed` was observed +- the emitted JSONL and pretty timeline captured the startup stall precisely, which is the + intended debugging value of the harness + +Summary emitted by the harness: + +```json +{ + "events_recorded": 13, + "response_ids": [], + "saw_non_empty_tools_list": false, + "saw_stdout_eof": false, + "saw_tools_list_changed": false, + "timeout_count": 6, + "tools_list_sizes": [] +} +``` + +### Interactive synchronized approval run + +The harness was then exercised directly against `xcrun mcpbridge` with a synchronized user +click on the Xcode **Allow** dialog during the scripted pause before `tools/list`: + +```bash +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --scenario approval-probe \ + --step-delay 0.2 \ + --pause-before-step tools-list \ + --pause-seconds 30 \ + --read-timeout 3.0 \ + --final-read-timeout 8.0 \ + --output logs/xcode-approval-direct.jsonl +``` + +Observed result: + +- `initialize` returned immediately before the manual approval window +- after the synchronized **Allow** click, the first `tools/list` returned a full catalog of + 20 tools +- `resources/list`, `resources/templates/list`, and `prompts/list` all returned successfully +- no `notifications/tools/list_changed` was observed at any point in the run +- stdout remained open throughout the observation window; the harness terminated the child + process itself at the end of the capture + +Summary emitted by the harness: + +```json +{ + "events_recorded": 21, + "response_ids": [1, 2, 3, 4, 5], + "saw_non_empty_tools_list": true, + "saw_stdout_eof": false, + "saw_tools_list_changed": false, + "timeout_count": 7, + "tools_list_sizes": [20] +} +``` + +--- + +## Quality Gate Results + +| Gate | Result | +|------|--------| +| `pytest tests/unit/test_xcode_approval_harness.py -v` | 10 passed | +| `python3 scripts/xcode_approval_harness.py --help` | PASS | +| `ruff check scripts/xcode_approval_harness.py tests/unit/test_xcode_approval_harness.py` | All checks passed | +| `ruff check src/` | All checks passed | +| `mypy src/` | Success: no issues found in 20 source files | +| `pytest --cov` | 912 passed, 5 skipped, 2 warnings; coverage 91.55% | +| `python3 scripts/check_doc_sync.py --all` | DocC sync checks passed | + +--- + +## Changed Files + +- `scripts/xcode_approval_harness.py` +- `tests/unit/test_xcode_approval_harness.py` +- `docs/troubleshooting.md` +- `Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md` +- `FEATURE_REBUILD/ObservedBehavior.md` diff --git a/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up.md b/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up.md new file mode 100644 index 00000000..e22def6f --- /dev/null +++ b/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up.md @@ -0,0 +1,111 @@ +# PRD: T-011 — Emit synthetic broker tools/list_changed on catalog warm-up + +**Task ID:** T-011 +**Priority:** P1 +**Status:** Planned +**Date:** 2026-03-10 +**Owner:** Codex + +## Objective + +Extend the broker so connected clients receive a synthetic +`notifications/tools/list_changed` when the broker's cached `tools/list` catalog first becomes +usable after Xcode approval, or when a later reconnect produces a materially different non-empty +catalog. The goal is to give strict MCP clients an explicit catalog-change hint even when +`xcrun mcpbridge` itself never emits that notification. + +## Context + +`T-010` established two relevant facts: + +1. During pre-approval startup, the broker can remain alive while no usable `tools/list` + response is yet available. +2. In a synchronized direct run against `xcrun mcpbridge`, Xcode approval surfaced readiness via + a successful `tools/list` returning 20 tools, but no `notifications/tools/list_changed` + arrived from upstream. + +The broker already has the right warm-up mechanism: an internal `initialize` probe followed by +repeated internal `tools/list` probes with bounded backoff until a non-empty catalog is cached. +This task should reuse that mechanism rather than adding a second polling loop. + +## Deliverables + +1. Broker state changes in `src/mcpbridge_wrapper/broker/daemon.py` +2. Client-notification plumbing in `src/mcpbridge_wrapper/broker/transport.py` +3. Unit coverage in: + - `tests/unit/test_broker_daemon.py` + - `tests/unit/test_broker_transport.py` +4. Validation report: + - `SPECS/INPROGRESS/T-011_Validation_Report.md` + +## Success Criteria + +1. When the broker cache transitions from cold/unavailable to a non-empty tool catalog, a single + synthetic `notifications/tools/list_changed` is broadcast to connected clients. +2. Repeated empty retry probes do not emit notifications. +3. A reconnect that yields the same non-empty catalog does not emit duplicate change + notifications. +4. A reconnect that yields a different non-empty catalog emits one new synthetic notification. +5. Existing `tools/list` readiness gating and cached-response behavior remain unchanged. + +## Test-First Plan + +Before changing runtime code, add or update tests that prove: + +1. Warm-up success triggers exactly one synthetic catalog-change notification. +2. Empty or invalid probe results do not trigger it. +3. Reconnect with unchanged catalog is silent. +4. Reconnect with changed catalog re-triggers the notification. +5. Transport broadcast formatting matches a normal MCP notification shape and reaches all active + sessions. + +## Implementation Plan + +### Phase 1: Broker state transition hook + +- Inputs: + - current probe handling in `BrokerDaemon._read_upstream_loop` + - current cache fields `_tools_list_cache` and `_tools_catalog_ready` +- Outputs: + - explicit helper for deciding whether a new cached catalog represents a meaningful change + - one broker-side trigger point when synthetic notification should be emitted +- Verification: + - targeted daemon unit tests for warm-up and reconnect transitions + +### Phase 2: Transport broadcast path + +- Inputs: + - existing `UnixSocketServer._broadcast` + - MCP notification shape for `notifications/tools/list_changed` +- Outputs: + - transport helper to emit a synthetic notification to all sessions + - no change to client request/response ID handling +- Verification: + - transport unit tests asserting all clients receive the notification once + +### Phase 3: Validation and operator notes + +- Inputs: + - full broker test suite results + - manual or scripted observation notes where practical +- Outputs: + - validation report capturing quality-gate results and observed client behavior +- Verification: + - `pytest` + - `ruff check src/` + - `mypy src/` + - `pytest --cov` + +## Decision Points + +1. The notification should be synthetic only on broker-owned cache transitions, not on every + upstream `tools/list` response. +2. Catalog comparison should be based on normalized cached payload content, not object identity. +3. The broker should not try to fabricate a new `tools/list` payload for clients; it should only + emit the MCP change notification and let clients decide whether to refetch. + +## Notes + +If implementation confirms improved behavior in Cursor or Zed after approval, capture that in the +validation report and consider a short troubleshooting note follow-up only if the behavior becomes +user-visible enough to document. diff --git a/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Validation_Report.md b/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Validation_Report.md new file mode 100644 index 00000000..2dad15b0 --- /dev/null +++ b/SPECS/ARCHIVE/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/T-011_Validation_Report.md @@ -0,0 +1,121 @@ +# Validation Report: T-011 — Emit synthetic broker tools/list_changed on catalog warm-up + +**Date:** 2026-03-10 +**Verdict:** PASS + +--- + +## Acceptance Criteria + +| # | Criterion | Status | +|---|-----------|--------| +| 1 | Broker emits one synthetic `notifications/tools/list_changed` when a cold catalog first becomes non-empty | ✅ PASS | +| 2 | Empty retry probes do not emit synthetic notifications | ✅ PASS | +| 3 | Reconnect with an unchanged non-empty catalog remains silent | ✅ PASS | +| 4 | Reconnect with a materially changed non-empty catalog emits one new notification | ✅ PASS | +| 5 | Existing readiness gating and cached `tools/list` behavior remain intact | ✅ PASS | + +--- + +## Evidence + +### Functional behavior + +- Added broker-side catalog fingerprinting so synthetic change signaling only fires on meaningful + non-empty catalog transitions. +- Reused the existing internal warm-up probe loop rather than introducing a second watcher or + slower polling path. +- Added transport support for a synthetic `notifications/tools/list_changed` broadcast. +- Queued that synthetic notification for sessions that have not yet completed the MCP lifecycle, + then flushed it immediately after the client sends `notifications/initialized`. +- Preserved the current readiness gate and cached `tools/list` fast path. + +### Regression coverage + +- `tests/unit/test_broker_daemon.py` + - verifies first non-empty catalog emits one synthetic notification + - verifies empty probe results stay silent + - verifies unchanged reconnect catalogs do not re-emit + - verifies changed reconnect catalogs re-emit exactly once +- `tests/unit/test_broker_transport.py` + - verifies synthetic notifications broadcast to initialized sessions + - verifies uninitialized sessions queue the notification instead of receiving it too early + - verifies queued notification flushes immediately after `notifications/initialized` + +### Interactive broker observation + +The approval harness from `T-010` was exercised against the local broker entrypoint so the +observation used the current unmerged implementation: + +```bash +env PYTHONPATH=src python3 -m mcpbridge_wrapper --broker-stop + +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --scenario approval-probe \ + --step-delay 0.2 \ + --pause-before-step tools-list \ + --pause-seconds 30 \ + --read-timeout 10.0 \ + --final-read-timeout 20.0 \ + --output logs/xcode-approval-broker-autospawn-queued.jsonl \ + -- \ + env PYTHONPATH=src python3 -m mcpbridge_wrapper --broker +``` + +Observed result: + +- the broker auto-spawned and completed `initialize` +- after the synchronized Xcode **Allow** click, the harness observed + `notifications/tools/list_changed` +- the notification arrived only after the client had sent `notifications/initialized` +- the subsequent `tools/list` returned a full catalog of 20 tools +- this confirms the broker can compensate for missing upstream `list_changed` signaling on the + approval path without regressing MCP lifecycle ordering + +Summary emitted by the harness: + +```json +{ + "events_recorded": 21, + "response_ids": [1, 2, 4, 5], + "saw_non_empty_tools_list": true, + "saw_stdout_eof": false, + "saw_tools_list_changed": true, + "timeout_count": 7, + "tools_list_sizes": [20] +} +``` + +Raw protocol trace: + +- `logs/xcode-approval-broker-autospawn-queued.jsonl` + +### Scope note + +This task validates protocol-level behavior inside the broker. It does not claim that every MCP +client will always refresh UI state without reconnect; client-specific reaction to +`notifications/tools/list_changed` remains the responsibility of Cursor, Zed, or any other MCP +consumer. + +--- + +## Quality Gate Results + +| Gate | Result | +|------|--------| +| `pytest tests/unit/test_broker_daemon.py tests/unit/test_broker_transport.py -q` | 116 passed | +| `ruff check src/` | All checks passed | +| `mypy src/` | Success: no issues found in 20 source files | +| `pytest` | 920 passed, 5 skipped, 2 warnings | +| `pytest --cov` | 920 passed, 5 skipped; coverage 91.57% | + +--- + +## Changed Files + +- `src/mcpbridge_wrapper/broker/daemon.py` +- `src/mcpbridge_wrapper/broker/transport.py` +- `src/mcpbridge_wrapper/broker/types.py` +- `tests/unit/test_broker_daemon.py` +- `tests/unit/test_broker_transport.py` diff --git a/SPECS/ARCHIVE/_Historical/REVIEW_t010_xcode_approval_harness.md b/SPECS/ARCHIVE/_Historical/REVIEW_t010_xcode_approval_harness.md new file mode 100644 index 00000000..75fad49c --- /dev/null +++ b/SPECS/ARCHIVE/_Historical/REVIEW_t010_xcode_approval_harness.md @@ -0,0 +1,42 @@ +## REVIEW REPORT — T-010 Xcode Approval Harness + +**Scope:** `origin/main..HEAD` +**Files:** 10 + +### Summary Verdict +- [x] Approve +- [ ] Approve with comments +- [ ] Request changes +- [ ] Block + +### Critical Issues + +- None. + +### Secondary Issues + +- None. + +### Architectural Notes + +- The harness is appropriately isolated under `scripts/`, so the task adds operational + diagnostics without changing the published wrapper runtime path. +- The final branch revision aligns the harness `initialize` payload with the broker's + existing `2024-11-05` protocol version, which keeps approval-race observations + representative of the repo's actual Apple bridge traffic. +- The validation report now explicitly states that the captured live smoke was performed + without a synchronized Xcode **Allow** click, so readers do not over-interpret the + resulting startup stall as post-approval behavior. + +### Tests + +- `pytest tests/unit/test_xcode_approval_harness.py -v` — PASS (`10 passed`) +- `ruff check scripts/xcode_approval_harness.py tests/unit/test_xcode_approval_harness.py` — PASS +- `ruff check src/` — PASS +- `mypy src/` — PASS +- `pytest --cov` — PASS (`912 passed, 5 skipped, 2 warnings`; coverage `91.55%`) +- `python3 scripts/check_doc_sync.py --all` — PASS + +### Next Steps + +- No actionable review findings. FOLLOW-UP is skipped. diff --git a/SPECS/ARCHIVE/_Historical/REVIEW_t011_synthetic_broker_tools_list_changed.md b/SPECS/ARCHIVE/_Historical/REVIEW_t011_synthetic_broker_tools_list_changed.md new file mode 100644 index 00000000..bf2b8b5f --- /dev/null +++ b/SPECS/ARCHIVE/_Historical/REVIEW_t011_synthetic_broker_tools_list_changed.md @@ -0,0 +1,44 @@ +## REVIEW REPORT — T-011 Synthetic Broker Tools/List Changed + +**Scope:** `1b48065..HEAD` +**Files:** 10 + +### Summary Verdict +- [x] Approve +- [ ] Approve with comments +- [ ] Request changes +- [ ] Block + +### Critical Issues + +- None. + +### Secondary Issues + +- None. + +### Architectural Notes + +- Reusing the existing broker warm-up probe loop is the correct choice. It avoids a second + polling mechanism and keeps readiness logic centralized in the daemon. +- Queueing the synthetic `notifications/tools/list_changed` until the client sends + `notifications/initialized` preserves MCP lifecycle ordering and avoids emitting a late-ready + signal into a pre-initialized session. +- Fingerprinting the non-empty catalog before broadcasting keeps reconnect behavior quiet when + the tool catalog is unchanged, which matches the task goal of signaling meaningful change + rather than probe churn. +- Residual interoperability risk remains client-side: some MCP clients may still ignore + `notifications/tools/list_changed` or require a manual reconnect despite the broker now + behaving correctly at the protocol layer. + +### Tests + +- `pytest tests/unit/test_broker_daemon.py tests/unit/test_broker_transport.py -q` — PASS (`116 passed`) +- `ruff check src/` — PASS +- `mypy src/` — PASS +- `pytest` — PASS (`920 passed, 5 skipped, 2 warnings`) +- `pytest --cov` — PASS (`920 passed, 5 skipped`; coverage `91.57%`) + +### Next Steps + +- No actionable review findings. FOLLOW-UP is skipped. diff --git a/SPECS/INPROGRESS/next.md b/SPECS/INPROGRESS/next.md index 323b4c9e..da125517 100644 --- a/SPECS/INPROGRESS/next.md +++ b/SPECS/INPROGRESS/next.md @@ -1,20 +1,16 @@ -# Next Task: None +# Next Task: Unselected ## Selected Task -- No active task selected. +No active task is selected in `SPECS/INPROGRESS/`. -## Description - -All tasks currently tracked in `SPECS/Workplan.md` are archived or completed. Select a new -task only after a follow-up or new workplan entry is created. - -## Outputs - -- None. +The most recently completed task was `T-011 — Emit synthetic broker tools/list_changed on catalog warm-up`, +archived on `2026-03-10` with verdict `PASS`. ## Recently Archived +- `2026-03-10` — `T-011` archived with verdict `PASS` +- `2026-03-10` — `T-010` archived with verdict `PASS` - `2026-03-10` — `P8-T2` archived with verdict `PASS` - `2026-03-10` — `P1-T13` archived with verdict `PASS` - `2026-03-10` — `P2-T8` archived with verdict `PASS` diff --git a/Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md b/Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md index fb8e045c..c5c04777 100644 --- a/Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md +++ b/Sources/XcodeMCPWrapper/Documentation.docc/Troubleshooting.md @@ -72,6 +72,20 @@ claude mcp add --transport stdio xcode -- uvx --from 'mcpbridge-wrapper[webui]' tail -50 "$HOME/.mcpbridge_wrapper/broker.log" | grep -E "EOF|reconnect|ready|tools" ``` +**Protocol observation harness:** To capture the raw MCP sequence around approval, run: + +```bash +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --pause-before-step tools-list \ + --pause-seconds 10 \ + --output logs/xcode-approval.jsonl +``` + +The harness records timestamped send/receive events, idle timeouts, EOF boundaries, and +whether `notifications/tools/list_changed` appears after approval. To observe wrapper or +broker behavior instead of direct `xcrun mcpbridge`, pass the target command after `--`. + ## Zed reconnects with "Context server request timeout" after first approval If Zed first shows a green connection with 0 tools and later turns red with diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 67d2a365..046904f3 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -104,6 +104,30 @@ tail -50 "$HOME/.mcpbridge_wrapper/broker.log" | grep -E "EOF|reconnect|ready|to After a successful approval and client reload you should see no new EOF entries and the upstream should remain stable. +**Protocol observation harness:** If you need raw MCP evidence instead of editor UI symptoms, +run the repo-local harness and inspect its JSONL output: + +```bash +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --pause-before-step tools-list \ + --pause-seconds 10 \ + --output logs/xcode-approval.jsonl +``` + +The harness records timestamped send/receive events, idle timeouts, EOF boundaries, and whether +`notifications/tools/list_changed` is ever observed after the Xcode dialog is approved. To +observe wrapper or broker behavior instead of direct `xcrun mcpbridge`, pass the target command +after `--`: + +```bash +python3 scripts/xcode_approval_harness.py \ + --pretty \ + --output logs/broker-approval.jsonl \ + -- \ + uvx --isolated --no-config --from mcpbridge-wrapper mcpbridge-wrapper --broker +``` + **Zed-specific escalation — green `0 tools` becomes red `Context server request timeout`:** If Zed first shows a green connection with 0 tools and later turns red with diff --git a/scripts/xcode_approval_harness.py b/scripts/xcode_approval_harness.py new file mode 100644 index 00000000..c275428a --- /dev/null +++ b/scripts/xcode_approval_harness.py @@ -0,0 +1,633 @@ +#!/usr/bin/env python3 +"""Observe MCP startup behavior around the Xcode approval dialog.""" + +from __future__ import annotations + +import argparse +import json +import queue +import subprocess +import threading +import time +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +DEFAULT_COMMAND = ["xcrun", "mcpbridge"] +DEFAULT_PROTOCOL_VERSION = "2024-11-05" +EOF_MARKER = object() + + +@dataclass(frozen=True) +class ScenarioStep: + """One deterministic outbound MCP message.""" + + name: str + payload: dict[str, Any] + + +@dataclass(frozen=True) +class Event: + """One recorded harness event.""" + + t_ms: int + direction: str + event: str + summary: str + payload: dict[str, Any] | None = None + line: str | None = None + + def to_dict(self) -> dict[str, Any]: + """Return a stable JSON-serializable representation.""" + data: dict[str, Any] = { + "t_ms": self.t_ms, + "direction": self.direction, + "event": self.event, + "summary": self.summary, + } + if self.payload is not None: + data["payload"] = self.payload + if self.line is not None: + data["line"] = self.line + return data + + +def build_scenario(name: str) -> list[ScenarioStep]: + """Return the ordered JSON-RPC sequence for a named scenario.""" + initialize = { + "jsonrpc": "2.0", + "id": 1, + "method": "initialize", + "params": { + "protocolVersion": DEFAULT_PROTOCOL_VERSION, + "capabilities": {}, + "clientInfo": {"name": "xcode-approval-harness", "version": "1.0"}, + }, + } + initialized = { + "jsonrpc": "2.0", + "method": "notifications/initialized", + "params": {}, + } + + scenarios: dict[str, list[ScenarioStep]] = { + "approval-probe": [ + ScenarioStep("initialize", initialize), + ScenarioStep("initialized-notification", initialized), + ScenarioStep( + "tools-list", + {"jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {}}, + ), + ScenarioStep( + "resources-list", + {"jsonrpc": "2.0", "id": 3, "method": "resources/list", "params": {}}, + ), + ScenarioStep( + "resources-templates-list", + { + "jsonrpc": "2.0", + "id": 4, + "method": "resources/templates/list", + "params": {}, + }, + ), + ScenarioStep( + "prompts-list", + {"jsonrpc": "2.0", "id": 5, "method": "prompts/list", "params": {}}, + ), + ], + "tools-only": [ + ScenarioStep("initialize", initialize), + ScenarioStep("initialized-notification", initialized), + ScenarioStep( + "tools-list-1", + {"jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {}}, + ), + ScenarioStep( + "tools-list-2", + {"jsonrpc": "2.0", "id": 3, "method": "tools/list", "params": {}}, + ), + ScenarioStep( + "tools-list-3", + {"jsonrpc": "2.0", "id": 4, "method": "tools/list", "params": {}}, + ), + ], + } + try: + return scenarios[name] + except KeyError as exc: + raise ValueError(f"Unknown scenario '{name}'.") from exc + + +def summarize_message(msg: dict[str, Any]) -> str: + """Return a compact human-readable JSON-RPC summary.""" + method = msg.get("method") + if isinstance(method, str): + return method + + msg_id = msg.get("id") + if "result" in msg: + result = msg.get("result") + if isinstance(result, dict) and isinstance(result.get("tools"), list): + return f"result#{msg_id} tools/list ({len(result['tools'])} tools)" + return f"result#{msg_id}" + + if "error" in msg: + error = msg.get("error") + if isinstance(error, dict): + code = error.get("code") + return f"error#{msg_id} ({code})" + return f"error#{msg_id}" + + return f"message#{msg_id}" + + +def event_from_json_line( + *, + t_ms: int, + direction: str, + raw_line: str, +) -> Event: + """Parse one JSON line into an Event with a stable summary.""" + line = raw_line.rstrip("\n") + try: + payload = json.loads(line) + except json.JSONDecodeError: + return Event( + t_ms=t_ms, + direction=direction, + event="text", + summary=f"{direction}-text", + line=line, + ) + + if isinstance(payload, dict): + return Event( + t_ms=t_ms, + direction=direction, + event="jsonrpc", + summary=summarize_message(payload), + payload=payload, + line=line, + ) + + return Event( + t_ms=t_ms, + direction=direction, + event="json", + summary=f"{direction}-json", + payload={"value": payload}, + line=line, + ) + + +def format_event_pretty(event: Event) -> str: + """Return a stable one-line human-readable event string.""" + return f"[{event.t_ms:>6} ms] {event.direction.upper():<4} {event.event:<7} {event.summary}" + + +def make_parser() -> argparse.ArgumentParser: + """Build the CLI parser.""" + parser = argparse.ArgumentParser( + description="Observe xcrun mcpbridge or wrapper behavior around Xcode approval." + ) + parser.add_argument( + "--scenario", + default="approval-probe", + choices=("approval-probe", "tools-only"), + help="Named outbound message sequence to run.", + ) + parser.add_argument( + "--step-delay", + type=float, + default=0.25, + help="Seconds to wait between outbound steps (default: 0.25).", + ) + parser.add_argument( + "--pause-before-step", + default=None, + help="Pause before sending the named step.", + ) + parser.add_argument( + "--pause-seconds", + type=float, + default=0.0, + help="Seconds to pause before the selected step (default: 0).", + ) + parser.add_argument( + "--read-timeout", + type=float, + default=2.0, + help="Idle-read timeout after each step in seconds (default: 2).", + ) + parser.add_argument( + "--final-read-timeout", + type=float, + default=5.0, + help="Idle-read timeout after the last step in seconds (default: 5).", + ) + parser.add_argument( + "--output", + type=Path, + default=None, + help="Optional JSONL path for recorded events.", + ) + parser.add_argument( + "--pretty", + action="store_true", + help="Print live event lines as they are recorded.", + ) + parser.add_argument( + "command", + nargs=argparse.REMAINDER, + help="Optional command override after '--'. Defaults to 'xcrun mcpbridge'.", + ) + return parser + + +def parse_args(argv: list[str] | None = None) -> argparse.Namespace: + """Parse CLI arguments.""" + args = make_parser().parse_args(argv) + if args.step_delay < 0 or args.pause_seconds < 0 or args.read_timeout <= 0: + raise SystemExit("step-delay, pause-seconds, and read-timeout must be positive.") + if args.final_read_timeout <= 0: + raise SystemExit("final-read-timeout must be positive.") + command = args.command + if command and command[0] == "--": + command = command[1:] + args.command = command or DEFAULT_COMMAND.copy() + return args + + +class EventRecorder: + """Collect and optionally persist recorded events.""" + + def __init__(self, *, output_path: Path | None, pretty: bool) -> None: + """Create a recorder with optional JSONL output and pretty-printing.""" + self.events: list[Event] = [] + self._output_path = output_path + self._pretty = pretty + self._output_file = None + if output_path is not None: + output_path.parent.mkdir(parents=True, exist_ok=True) + self._output_file = output_path.open("w", encoding="utf-8") + + def close(self) -> None: + """Close any open output file.""" + if self._output_file is not None: + self._output_file.close() + self._output_file = None + + def record(self, event: Event) -> None: + """Append an event to memory, optional JSONL, and optional pretty output.""" + self.events.append(event) + if self._output_file is not None: + self._output_file.write(json.dumps(event.to_dict(), sort_keys=True) + "\n") + self._output_file.flush() + if self._pretty: + print(format_event_pretty(event)) + + +def _read_stream( + stream: Any, + stream_name: str, + start_time: float, + out_queue: queue.Queue[tuple[str, int, object]], +) -> None: + """Read lines from one subprocess stream into a queue.""" + for raw_line in iter(stream.readline, ""): + t_ms = int((time.monotonic() - start_time) * 1000) + out_queue.put((stream_name, t_ms, raw_line)) + t_ms = int((time.monotonic() - start_time) * 1000) + out_queue.put((stream_name, t_ms, EOF_MARKER)) + + +def _record_queue_item( + recorder: EventRecorder, + *, + stream_name: str, + t_ms: int, + raw: object, +) -> None: + """Record one queued stdout/stderr item.""" + if raw is EOF_MARKER: + recorder.record( + Event( + t_ms=t_ms, + direction="meta", + event="eof", + summary=f"{stream_name}-eof", + ) + ) + return + direction = "recv" if stream_name == "stdout" else "meta" + recorder.record(event_from_json_line(t_ms=t_ms, direction=direction, raw_line=str(raw))) + + +def _drain_queued_stream_events( + recorder: EventRecorder, + event_queue: queue.Queue[tuple[str, int, object]], +) -> set[str]: + """Drain currently available queued subprocess events without emitting a timeout marker.""" + eof_streams: set[str] = set() + while True: + try: + stream_name, t_ms, raw = event_queue.get_nowait() + except queue.Empty: + return eof_streams + _record_queue_item( + recorder, + stream_name=stream_name, + t_ms=t_ms, + raw=raw, + ) + if raw is EOF_MARKER: + eof_streams.add(stream_name) + + +def record_subprocess_events( + recorder: EventRecorder, + event_queue: queue.Queue[tuple[str, int, object]], + start_time: float, + idle_timeout: float, +) -> None: + """Drain queued subprocess events until the queue is idle.""" + eof_streams: set[str] = set() + deadline = time.monotonic() + idle_timeout + while True: + remaining = deadline - time.monotonic() + if remaining <= 0: + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="meta", + event="timeout", + summary=f"idle-timeout ({idle_timeout:.2f}s)", + ) + ) + return + try: + stream_name, t_ms, raw = event_queue.get(timeout=remaining) + except queue.Empty: + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="meta", + event="timeout", + summary=f"idle-timeout ({idle_timeout:.2f}s)", + ) + ) + return + + deadline = time.monotonic() + idle_timeout + _record_queue_item( + recorder, + stream_name=stream_name, + t_ms=t_ms, + raw=raw, + ) + if raw is EOF_MARKER: + eof_streams.add(stream_name) + eof_streams.update(_drain_queued_stream_events(recorder, event_queue)) + if eof_streams == {"stdout", "stderr"}: + return + + +def flush_subprocess_events( + recorder: EventRecorder, + event_queue: queue.Queue[tuple[str, int, object]], + threads: list[threading.Thread], + *, + start_time: float, + max_wait_seconds: float = 0.5, +) -> None: + """Best-effort drain of late stdout/stderr events after subprocess shutdown.""" + deadline = time.monotonic() + max_wait_seconds + while True: + for thread in threads: + remaining = deadline - time.monotonic() + if remaining <= 0: + break + thread.join(timeout=min(0.05, max(remaining, 0.0))) + + _drain_queued_stream_events(recorder, event_queue) + + if event_queue.empty() and all(not thread.is_alive() for thread in threads): + return + if time.monotonic() >= deadline: + _drain_queued_stream_events(recorder, event_queue) + return + + +def summarize_events(events: list[Event]) -> dict[str, Any]: + """Build a compact verdict-oriented summary.""" + response_ids: list[int | str] = [] + tools_sizes: list[int] = [] + saw_tools_list_changed = False + saw_stdout_eof = False + timeout_count = 0 + + for event in events: + if event.event == "timeout": + timeout_count += 1 + if event.event == "eof" and event.summary == "stdout-eof": + saw_stdout_eof = True + if event.payload is None or event.direction != "recv": + continue + payload = event.payload + if payload.get("method") == "notifications/tools/list_changed": + saw_tools_list_changed = True + msg_id = payload.get("id") + if msg_id is not None: + response_ids.append(msg_id) + result = payload.get("result") + if isinstance(result, dict) and isinstance(result.get("tools"), list): + tools_sizes.append(len(result["tools"])) + + return { + "events_recorded": len(events), + "response_ids": response_ids, + "tools_list_sizes": tools_sizes, + "saw_non_empty_tools_list": any(size > 0 for size in tools_sizes), + "saw_tools_list_changed": saw_tools_list_changed, + "saw_stdout_eof": saw_stdout_eof, + "timeout_count": timeout_count, + } + + +def _record_stdin_write_failure( + recorder: EventRecorder, + *, + start_time: float, + step_name: str, + line: str, + process: subprocess.Popen[str], + exc: BaseException, +) -> None: + """Record an outbound write failure without aborting the whole harness.""" + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="meta", + event="write-error", + summary=f"stdin-write-failed ({step_name})", + payload={ + "step": step_name, + "error": str(exc), + "returncode": process.poll(), + }, + line=line, + ) + ) + + +def run_harness(args: argparse.Namespace) -> int: + """Execute the configured scenario against the target command.""" + scenario = build_scenario(args.scenario) + if args.pause_before_step is not None and args.pause_before_step not in { + step.name for step in scenario + }: + raise SystemExit(f"Unknown step '{args.pause_before_step}' for scenario '{args.scenario}'.") + + start_time = time.monotonic() + recorder = EventRecorder(output_path=args.output, pretty=args.pretty) + event_queue: queue.Queue[tuple[str, int, object]] = queue.Queue() + + process = subprocess.Popen( + args.command, + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + bufsize=1, + ) + recorder.record( + Event( + t_ms=0, + direction="meta", + event="spawn", + summary="process-started", + payload={"command": args.command, "pid": process.pid, "scenario": args.scenario}, + ) + ) + + threads = [ + threading.Thread( + target=_read_stream, + args=(process.stdout, "stdout", start_time, event_queue), + daemon=True, + ), + threading.Thread( + target=_read_stream, + args=(process.stderr, "stderr", start_time, event_queue), + daemon=True, + ), + ] + for thread in threads: + thread.start() + + try: + for index, step in enumerate(scenario): + if index > 0 and args.step_delay > 0: + time.sleep(args.step_delay) + + if step.name == args.pause_before_step and args.pause_seconds > 0: + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="meta", + event="pause", + summary=f"pause-before {step.name} ({args.pause_seconds:.2f}s)", + ) + ) + time.sleep(args.pause_seconds) + + line = json.dumps(step.payload, separators=(",", ":")) + if process.stdin is None: + _record_stdin_write_failure( + recorder, + start_time=start_time, + step_name=step.name, + line=line, + process=process, + exc=RuntimeError("stdin pipe is unavailable"), + ) + record_subprocess_events( + recorder, + event_queue, + start_time, + min(args.read_timeout, 0.25), + ) + break + try: + process.stdin.write(line + "\n") + process.stdin.flush() + except (BrokenPipeError, OSError, ValueError) as exc: + _record_stdin_write_failure( + recorder, + start_time=start_time, + step_name=step.name, + line=line, + process=process, + exc=exc, + ) + record_subprocess_events( + recorder, + event_queue, + start_time, + min(args.read_timeout, 0.25), + ) + break + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="send", + event="jsonrpc", + summary=step.name, + payload=step.payload, + line=line, + ) + ) + record_subprocess_events(recorder, event_queue, start_time, args.read_timeout) + + record_subprocess_events(recorder, event_queue, start_time, args.final_read_timeout) + finally: + if process.poll() is None: + process.terminate() + try: + process.wait(timeout=2.0) + except subprocess.TimeoutExpired: + process.kill() + process.wait(timeout=2.0) + flush_subprocess_events( + recorder, + event_queue, + threads, + start_time=start_time, + ) + recorder.record( + Event( + t_ms=int((time.monotonic() - start_time) * 1000), + direction="meta", + event="exit", + summary=f"process-exit ({process.returncode})", + payload={"returncode": process.returncode}, + ) + ) + recorder.close() + + summary = summarize_events(recorder.events) + print(json.dumps(summary, indent=2, sort_keys=True)) + if args.output is not None: + print(f"Event log written to {args.output}") + return 0 if process.returncode in (0, None, -15) else process.returncode + + +def main(argv: list[str] | None = None) -> int: + """CLI entrypoint.""" + args = parse_args(argv) + return run_harness(args) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/mcpbridge_wrapper/broker/daemon.py b/src/mcpbridge_wrapper/broker/daemon.py index 8ad21b2d..1a623dca 100644 --- a/src/mcpbridge_wrapper/broker/daemon.py +++ b/src/mcpbridge_wrapper/broker/daemon.py @@ -92,6 +92,8 @@ def __init__( self._upstream_initialized: asyncio.Event = asyncio.Event() # Cached tools/list result (JSON string); None until first successful probe. self._tools_list_cache: str | None = None + # Normalized fingerprint of the last non-empty cached tools/list result. + self._tools_catalog_fingerprint: str | None = None # Set once a usable tools/list response has been cached for clients. self._tools_catalog_ready: asyncio.Event = asyncio.Event() # Background retry task for broker-internal tools/list warm-up probes. @@ -421,6 +423,36 @@ def _cancel_tools_probe_retry(self) -> None: self._tools_probe_retry_task = None self._reset_tools_probe_retry_backoff() + def _fingerprint_tools_catalog(self, result: dict[str, Any]) -> str | None: + """Return a normalized fingerprint for a non-empty tools/list result.""" + tools = result.get("tools") if isinstance(result, dict) else None + if not isinstance(tools, list) or not tools: + return None + return json.dumps(result, sort_keys=True, separators=(",", ":")) + + async def _cache_tools_list_result( + self, + *, + line: str, + result: dict[str, Any], + fingerprint: str, + ) -> None: + """Cache a ready tools/list result and optionally emit a synthetic change notice.""" + previous_fingerprint = self._tools_catalog_fingerprint + self._cancel_tools_probe_retry() + self._tools_list_cache = line + self._tools_catalog_ready.set() + self._tools_catalog_fingerprint = fingerprint + logger.info( + "tools/list cache populated with %d tool(s) (%d bytes).", + len(result["tools"]), + len(line), + ) + + if previous_fingerprint != fingerprint and self._transport is not None: + await self._transport.emit_tools_list_changed() + logger.info("Synthetic notifications/tools/list_changed broadcast to clients.") + async def _rollback_startup(self) -> None: """Roll back a failed :meth:`start` sequence. @@ -589,15 +621,16 @@ async def _read_upstream_loop(self) -> None: # Broker's own tools/list probe response received — cache it. if isinstance(msg, dict) and "result" in msg: result = msg.get("result") - tools = result.get("tools") if isinstance(result, dict) else None - if isinstance(tools, list) and tools: - self._cancel_tools_probe_retry() - self._tools_list_cache = line - self._tools_catalog_ready.set() - logger.info( - "tools/list cache populated with %d tool(s) (%d bytes).", - len(tools), - len(line), + fingerprint = ( + self._fingerprint_tools_catalog(result) + if isinstance(result, dict) + else None + ) + if isinstance(result, dict) and fingerprint is not None: + await self._cache_tools_list_result( + line=line, + result=result, + fingerprint=fingerprint, ) else: self._tools_list_cache = None diff --git a/src/mcpbridge_wrapper/broker/transport.py b/src/mcpbridge_wrapper/broker/transport.py index a612ce6c..1761853e 100644 --- a/src/mcpbridge_wrapper/broker/transport.py +++ b/src/mcpbridge_wrapper/broker/transport.py @@ -47,6 +47,15 @@ logger = logging.getLogger(__name__) +_TOOLS_LIST_CHANGED_NOTIFICATION = json.dumps( + { + "jsonrpc": "2.0", + "method": "notifications/tools/list_changed", + "params": {}, + }, + separators=(",", ":"), +) + # Bit-shift for ID namespacing: session_id occupies the upper bits. _SESSION_SHIFT = 20 _ID_MASK = (1 << _SESSION_SHIFT) - 1 # 0xFFFFF @@ -194,6 +203,14 @@ def sessions(self) -> dict[int, ClientSession]: """Currently connected client sessions (read-only view).""" return self._sessions + async def emit_tools_list_changed(self) -> None: + """Broadcast a synthetic MCP tools/list_changed notification.""" + for session in list(self._sessions.values()): + if session.initialized: + await self._write_to_session(session, _TOOLS_LIST_CHANGED_NOTIFICATION) + else: + session.pending_tools_list_changed = True + async def start(self) -> None: """Bind to the Unix socket and begin accepting connections. @@ -572,6 +589,11 @@ async def _process_client_line(self, session: ClientSession, line: str) -> None: session.session_id, remapped_line, ) + if method_name == "notifications/initialized": + session.initialized = True + if session.pending_tools_list_changed: + session.pending_tools_list_changed = False + await self._write_to_session(session, _TOOLS_LIST_CHANGED_NOTIFICATION) except Exception as exc: logger.warning( "Failed to write to upstream from session %d: %s", diff --git a/src/mcpbridge_wrapper/broker/types.py b/src/mcpbridge_wrapper/broker/types.py index a203b75c..101e4df2 100644 --- a/src/mcpbridge_wrapper/broker/types.py +++ b/src/mcpbridge_wrapper/broker/types.py @@ -84,6 +84,10 @@ class ClientSession: int_id_map: dict[int, int] = field(default_factory=dict) # Reverse map: local_seq → original_id (int or str) for O(1) restoration id_restore: dict[int, int | str] = field(default_factory=dict) + # Set once the client finishes the MCP initialize -> initialized lifecycle. + initialized: bool = False + # Synthetic tools/list_changed notification to flush after client initialization. + pending_tools_list_changed: bool = False # Shared monotonic counter for allocating local alias IDs within this session _next_local_id: int = field(default=0, repr=False) diff --git a/tests/unit/test_broker_daemon.py b/tests/unit/test_broker_daemon.py index 62eec0a3..9aa0bb2e 100644 --- a/tests/unit/test_broker_daemon.py +++ b/tests/unit/test_broker_daemon.py @@ -1246,6 +1246,51 @@ async def _readline() -> bytes: cached = _json.loads(daemon._tools_list_cache) assert cached["result"]["tools"][0]["name"] == "BuildProject" + @pytest.mark.asyncio + async def test_first_ready_tools_catalog_emits_synthetic_list_changed( + self, tmp_path: Path + ) -> None: + """The first non-empty broker catalog emits one synthetic change notification.""" + cfg = _make_config(tmp_path) + transport = MagicMock() + transport.start = AsyncMock() + transport.stop = AsyncMock() + transport.emit_tools_list_changed = AsyncMock() + daemon = BrokerDaemon(cfg, transport=transport) + + init_response = ( + '{"jsonrpc":"2.0","id":0,"result":{"protocolVersion":"2024-11-05","capabilities":{}}}' + ) + tools_response = '{"jsonrpc":"2.0","id":-1,"result":{"tools":[{"name":"BuildProject"}]}}' + + call_count = 0 + + async def _readline() -> bytes: + nonlocal call_count + call_count += 1 + if call_count == 1: + return (init_response + "\n").encode() + if call_count == 2: + return (tools_response + "\n").encode() + daemon._stop_event.set() + return b"" + + proc = _make_mock_process() + proc.stdout.readline = _readline + proc.stdin.drain = AsyncMock() + proc.stdin.write = MagicMock() + + with patch( + "mcpbridge_wrapper.broker.daemon.asyncio.create_subprocess_exec", + new=AsyncMock(return_value=proc), + ): + await daemon.start() + if daemon._read_task: + with contextlib.suppress(asyncio.TimeoutError): + await asyncio.wait_for(daemon._read_task, timeout=1.0) + + transport.emit_tools_list_changed.assert_awaited_once() + @pytest.mark.asyncio async def test_empty_tools_list_probe_keeps_catalog_gate_closed(self, tmp_path: Path) -> None: """Empty tool catalogs must not be cached as a valid broker warm-up result.""" @@ -1291,6 +1336,56 @@ async def _readline() -> bytes: assert daemon._tools_list_cache is None assert not daemon.tools_catalog_ready.is_set() + @pytest.mark.asyncio + async def test_empty_tools_list_probe_does_not_emit_synthetic_list_changed( + self, tmp_path: Path + ) -> None: + """Empty retry probes must not broadcast synthetic catalog-change notifications.""" + cfg = _make_config(tmp_path) + transport = MagicMock() + transport.start = AsyncMock() + transport.stop = AsyncMock() + transport.emit_tools_list_changed = AsyncMock() + daemon = BrokerDaemon(cfg, transport=transport) + + init_response = ( + '{"jsonrpc":"2.0","id":0,"result":{"protocolVersion":"2024-11-05","capabilities":{}}}' + ) + empty_tools_response = '{"jsonrpc":"2.0","id":-1,"result":{"tools":[]}}' + + call_count = 0 + + async def _readline() -> bytes: + nonlocal call_count + call_count += 1 + if call_count == 1: + return (init_response + "\n").encode() + if call_count == 2: + return (empty_tools_response + "\n").encode() + daemon._stop_event.set() + return b"" + + proc = _make_mock_process() + proc.stdout.readline = _readline + proc.stdin.drain = AsyncMock() + proc.stdin.write = MagicMock() + + with patch( + "mcpbridge_wrapper.broker.daemon.asyncio.create_subprocess_exec", + new=AsyncMock(return_value=proc), + ): + await daemon.start() + if daemon._read_task: + with contextlib.suppress(asyncio.TimeoutError): + await asyncio.wait_for(daemon._read_task, timeout=1.0) + retry_task = daemon._tools_probe_retry_task + if retry_task is not None: + retry_task.cancel() + with contextlib.suppress(asyncio.CancelledError): + await retry_task + + transport.emit_tools_list_changed.assert_not_awaited() + @pytest.mark.asyncio async def test_empty_tools_list_probe_retries_until_catalog_ready(self, tmp_path: Path) -> None: """An empty warm-up probe must trigger a retry instead of a permanent outage.""" @@ -1353,6 +1448,56 @@ def _write(data: bytes) -> None: assert daemon.tools_catalog_ready.is_set() assert sum('"method":"tools/list"' in msg for msg in sent_messages) == 2 + @pytest.mark.asyncio + async def test_same_catalog_after_reconnect_does_not_emit_synthetic_list_changed( + self, tmp_path: Path + ) -> None: + """Reconnects that reproduce the same ready catalog remain silent.""" + cfg = _make_config(tmp_path) + transport = MagicMock() + transport.start = AsyncMock() + transport.stop = AsyncMock() + transport.emit_tools_list_changed = AsyncMock() + daemon = BrokerDaemon(cfg, transport=transport) + + daemon._tools_catalog_fingerprint = '{"tools":[{"name":"BuildProject"}]}' + daemon._tools_list_cache = None + daemon._tools_catalog_ready.clear() + + await daemon._cache_tools_list_result( + line='{"jsonrpc":"2.0","id":-1,"result":{"tools":[{"name":"BuildProject"}]}}', + result={"tools": [{"name": "BuildProject"}]}, + fingerprint='{"tools":[{"name":"BuildProject"}]}', + ) + + assert daemon.tools_catalog_ready.is_set() + transport.emit_tools_list_changed.assert_not_awaited() + + @pytest.mark.asyncio + async def test_changed_catalog_after_reconnect_emits_synthetic_list_changed( + self, tmp_path: Path + ) -> None: + """Reconnects that change the ready catalog emit one new notification.""" + cfg = _make_config(tmp_path) + transport = MagicMock() + transport.start = AsyncMock() + transport.stop = AsyncMock() + transport.emit_tools_list_changed = AsyncMock() + daemon = BrokerDaemon(cfg, transport=transport) + + daemon._tools_catalog_fingerprint = '{"tools":[{"name":"BuildProject"}]}' + daemon._tools_list_cache = None + daemon._tools_catalog_ready.clear() + + await daemon._cache_tools_list_result( + line='{"jsonrpc":"2.0","id":-1,"result":{"tools":[{"name":"BuildProject"},{"name":"XcodeRead"}]}}', + result={"tools": [{"name": "BuildProject"}, {"name": "XcodeRead"}]}, + fingerprint='{"tools":[{"name":"BuildProject"},{"name":"XcodeRead"}]}', + ) + + assert daemon.tools_catalog_ready.is_set() + transport.emit_tools_list_changed.assert_awaited_once() + def test_tools_list_probe_retry_backoff_is_bounded_and_resets(self, tmp_path: Path) -> None: """Retry delays should back off and reset after cancellation/success transitions.""" cfg = _make_config(tmp_path) diff --git a/tests/unit/test_broker_transport.py b/tests/unit/test_broker_transport.py index 2e566255..a5212e2b 100644 --- a/tests/unit/test_broker_transport.py +++ b/tests/unit/test_broker_transport.py @@ -178,6 +178,77 @@ async def test_non_object_json_is_silently_dropped(self, tmp_path: Any) -> None: s1.writer.write.assert_not_called() +class TestSyntheticToolsListChangedNotification: + @pytest.mark.asyncio + async def test_emit_tools_list_changed_broadcasts_to_all_clients(self, tmp_path: Any) -> None: + server = _make_server(tmp_path) + s1 = _make_session(1) + s2 = _make_session(2) + s1.initialized = True + s2.initialized = True + server._sessions[1] = s1 + server._sessions[2] = s2 + + await server.emit_tools_list_changed() + + for session in (s1, s2): + call_arg = session.writer.write.call_args[0][0] + decoded = json.loads(call_arg.rstrip(b"\n")) + assert decoded == { + "jsonrpc": "2.0", + "method": "notifications/tools/list_changed", + "params": {}, + } + + @pytest.mark.asyncio + async def test_emit_tools_list_changed_without_clients_is_noop(self, tmp_path: Any) -> None: + server = _make_server(tmp_path) + await server.emit_tools_list_changed() + + @pytest.mark.asyncio + async def test_emit_tools_list_changed_queues_for_uninitialized_clients( + self, tmp_path: Any + ) -> None: + server = _make_server(tmp_path) + session = _make_session(1) + server._sessions[1] = session + + await server.emit_tools_list_changed() + + session.writer.write.assert_not_called() + assert session.pending_tools_list_changed is True + + @pytest.mark.asyncio + async def test_initialized_notification_flushes_queued_tools_list_changed( + self, tmp_path: Any + ) -> None: + server = _make_server(tmp_path) + session = _make_session(1) + server._sessions[1] = session + + await server.emit_tools_list_changed() + await server._process_client_line( + session, + json.dumps( + { + "jsonrpc": "2.0", + "method": "notifications/initialized", + "params": {}, + } + ), + ) + + session.writer.write.assert_called_once() + decoded = json.loads(session.writer.write.call_args[0][0].rstrip(b"\n")) + assert decoded == { + "jsonrpc": "2.0", + "method": "notifications/tools/list_changed", + "params": {}, + } + assert session.initialized is True + assert session.pending_tools_list_changed is False + + # --------------------------------------------------------------------------- # route_upstream_response — targeted responses # --------------------------------------------------------------------------- diff --git a/tests/unit/test_xcode_approval_harness.py b/tests/unit/test_xcode_approval_harness.py new file mode 100644 index 00000000..b49763e4 --- /dev/null +++ b/tests/unit/test_xcode_approval_harness.py @@ -0,0 +1,354 @@ +#!/usr/bin/env python3 +"""Tests for scripts/xcode_approval_harness.py.""" + +import argparse +import io +import json +import queue +import sys +import threading +import time +from pathlib import Path +from typing import Optional + +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent.parent / "scripts")) + +from xcode_approval_harness import ( # noqa: E402 + DEFAULT_PROTOCOL_VERSION, + EOF_MARKER, + Event, + EventRecorder, + build_scenario, + event_from_json_line, + format_event_pretty, + parse_args, + record_subprocess_events, + run_harness, + summarize_events, +) + + +class TestBuildScenario: + """Scenario construction tests.""" + + def test_approval_probe_contains_expected_methods(self) -> None: + """The approval-probe scenario exercises the expected discovery methods.""" + steps = build_scenario("approval-probe") + methods = [step.payload.get("method") for step in steps] + initialize = steps[0].payload + + assert methods == [ + "initialize", + "notifications/initialized", + "tools/list", + "resources/list", + "resources/templates/list", + "prompts/list", + ] + assert initialize["params"]["protocolVersion"] == DEFAULT_PROTOCOL_VERSION + + def test_tools_only_repeats_tools_list(self) -> None: + """The tools-only scenario repeats tools/list after initialization.""" + steps = build_scenario("tools-only") + assert [step.payload.get("method") for step in steps].count("tools/list") == 3 + + def test_unknown_scenario_raises(self) -> None: + """Unknown scenario names fail clearly.""" + with pytest.raises(ValueError, match="Unknown scenario"): + build_scenario("missing") + + +class TestEventParsing: + """Event formatting and parsing tests.""" + + def test_event_from_json_line_summarizes_tools_result(self) -> None: + """tools/list result summaries include tool count.""" + line = json.dumps( + { + "jsonrpc": "2.0", + "id": 2, + "result": {"tools": [{"name": "XcodeRead"}, {"name": "XcodeWrite"}]}, + } + ) + + event = event_from_json_line(t_ms=125, direction="recv", raw_line=line) + + assert event.summary == "result#2 tools/list (2 tools)" + assert event.payload is not None + assert event.payload["result"]["tools"][0]["name"] == "XcodeRead" + + def test_event_from_json_line_keeps_text_when_not_json(self) -> None: + """Non-JSON lines are preserved as text events.""" + event = event_from_json_line(t_ms=50, direction="meta", raw_line="stderr line") + + assert event.event == "text" + assert event.summary == "meta-text" + assert event.line == "stderr line" + + def test_format_event_pretty_is_stable(self) -> None: + """Pretty format keeps columns stable for live observation.""" + pretty = format_event_pretty( + Event(t_ms=42, direction="send", event="jsonrpc", summary="tools-list") + ) + + assert pretty == "[ 42 ms] SEND jsonrpc tools-list" + + +class TestArgParsing: + """CLI parsing and defaults.""" + + def test_parse_args_uses_default_command(self) -> None: + """Missing command override falls back to xcrun mcpbridge.""" + args = parse_args(["--scenario", "tools-only"]) + assert args.command == ["xcrun", "mcpbridge"] + assert args.scenario == "tools-only" + + def test_parse_args_accepts_command_override_after_dash_dash(self) -> None: + """A remainder command after '--' replaces the default target.""" + args = parse_args(["--pretty", "--", "uvx", "--from", "mcpbridge-wrapper"]) + assert args.command == ["uvx", "--from", "mcpbridge-wrapper"] + assert args.pretty is True + + def test_parse_args_rejects_negative_read_timeout(self) -> None: + """Timeout arguments must be positive.""" + with pytest.raises(SystemExit, match="read-timeout"): + parse_args(["--read-timeout", "-1"]) + + +class TestSummaries: + """Aggregate summary extraction.""" + + def test_summarize_events_detects_tools_catalog_and_list_changed(self) -> None: + """Summary tracks both tool-catalog readiness and late list changes.""" + events = [ + Event( + t_ms=0, + direction="recv", + event="jsonrpc", + summary="result#2 tools/list (0 tools)", + payload={"jsonrpc": "2.0", "id": 2, "result": {"tools": []}}, + ), + Event( + t_ms=100, + direction="recv", + event="jsonrpc", + summary="notifications/tools/list_changed", + payload={ + "jsonrpc": "2.0", + "method": "notifications/tools/list_changed", + "params": {}, + }, + ), + Event( + t_ms=200, + direction="recv", + event="jsonrpc", + summary="result#3 tools/list (20 tools)", + payload={ + "jsonrpc": "2.0", + "id": 3, + "result": {"tools": [{"name": "XcodeRead"} for _ in range(20)]}, + }, + ), + Event(t_ms=300, direction="meta", event="eof", summary="stdout-eof"), + Event(t_ms=400, direction="meta", event="timeout", summary="idle-timeout (2.00s)"), + ] + + summary = summarize_events(events) + + assert summary["response_ids"] == [2, 3] + assert summary["tools_list_sizes"] == [0, 20] + assert summary["saw_non_empty_tools_list"] is True + assert summary["saw_tools_list_changed"] is True + assert summary["saw_stdout_eof"] is True + assert summary["timeout_count"] == 1 + + +class _BrokenPipeStdin: + """Minimal stdin stub that fails on the first write.""" + + def write(self, _data: str) -> None: + raise BrokenPipeError("simulated broken pipe") + + def flush(self) -> None: + return None + + +class _FakePopen: + """Minimal subprocess stub for early child-exit tests.""" + + def __init__(self) -> None: + self.stdin = _BrokenPipeStdin() + self.stdout = io.StringIO("") + self.stderr = io.StringIO("mock startup failure\n") + self.pid = 4242 + self.returncode = 7 + + def poll(self) -> int: + return self.returncode + + def terminate(self) -> None: + return None + + def wait(self, timeout: Optional[float] = None) -> int: + del timeout + return self.returncode + + def kill(self) -> None: + return None + + +class _DelayedStream: + """Simple blocking stream that releases queued lines only after close().""" + + def __init__(self) -> None: + self._lines: list[str] = [] + self._closed = False + self._condition = threading.Condition() + + def feed(self, *lines: str) -> None: + with self._condition: + self._lines.extend(lines) + self._condition.notify_all() + + def close(self) -> None: + with self._condition: + self._closed = True + self._condition.notify_all() + + def readline(self) -> str: + with self._condition: + while not self._lines and not self._closed: + self._condition.wait(timeout=0.1) + if self._lines: + return self._lines.pop(0) + return "" + + +class _TerminatingBrokenPipePopen: + """Process stub that emits stderr only during terminate/wait cleanup.""" + + def __init__(self) -> None: + self.stdin = _BrokenPipeStdin() + self.stdout = _DelayedStream() + self.stderr = _DelayedStream() + self.pid = 4343 + self.returncode: Optional[int] = None + + def poll(self) -> Optional[int]: + return self.returncode + + def terminate(self) -> None: + self.returncode = 9 + self.stderr.feed("late shutdown stderr\n") + self.stdout.close() + self.stderr.close() + + def wait(self, timeout: Optional[float] = None) -> int: + del timeout + if self.returncode is None: + time.sleep(0.01) + self.terminate() + return self.returncode or 0 + + def kill(self) -> None: + self.terminate() + + +class TestRunHarness: + """Behavioral tests for the live harness runner.""" + + def test_run_harness_records_write_failure_and_exits_cleanly( + self, + monkeypatch: pytest.MonkeyPatch, + tmp_path: Path, + capsys: pytest.CaptureFixture[str], + ) -> None: + """Early child exit should produce a summary and a write-error event, not a traceback.""" + monkeypatch.setattr( + "xcode_approval_harness.subprocess.Popen", lambda *args, **kwargs: _FakePopen() + ) + + output_path = tmp_path / "events.jsonl" + args = argparse.Namespace( + scenario="tools-only", + pause_before_step=None, + pause_seconds=0.0, + step_delay=0.0, + read_timeout=0.1, + final_read_timeout=0.1, + output=output_path, + pretty=False, + command=["fake-command"], + ) + + exit_code = run_harness(args) + + captured = capsys.readouterr() + assert exit_code == 7 + assert '"events_recorded"' in captured.out + assert "Traceback" not in captured.err + + events = [json.loads(line) for line in output_path.read_text(encoding="utf-8").splitlines()] + assert any(event["event"] == "write-error" for event in events) + assert any(event["event"] == "exit" for event in events) + + def test_run_harness_flushes_late_shutdown_events_before_exit( + self, + monkeypatch: pytest.MonkeyPatch, + tmp_path: Path, + ) -> None: + """Late stderr/EOF queued during shutdown should be recorded before the exit event.""" + monkeypatch.setattr( + "xcode_approval_harness.subprocess.Popen", + lambda *args, **kwargs: _TerminatingBrokenPipePopen(), + ) + + output_path = tmp_path / "events.jsonl" + args = argparse.Namespace( + scenario="tools-only", + pause_before_step=None, + pause_seconds=0.0, + step_delay=0.0, + read_timeout=0.1, + final_read_timeout=0.1, + output=output_path, + pretty=False, + command=["fake-command"], + ) + + exit_code = run_harness(args) + + events = [json.loads(line) for line in output_path.read_text(encoding="utf-8").splitlines()] + event_names = [event["event"] for event in events] + summaries = [event["summary"] for event in events] + assert exit_code == 9 + assert "meta-text" in summaries + assert "stderr-eof" in summaries + assert event_names.index("exit") > summaries.index("stderr-eof") + + +class TestRecordSubprocessEvents: + """Direct queue-drain behavior.""" + + def test_record_subprocess_events_returns_after_both_eof_without_timeout(self) -> None: + """Two EOF markers should end the read loop without a synthetic idle-timeout.""" + recorder = EventRecorder(output_path=None, pretty=False) + event_queue: queue.Queue[tuple[str, int, object]] = queue.Queue() + event_queue.put(("stdout", 10, "ready\n")) + event_queue.put(("stdout", 11, EOF_MARKER)) + event_queue.put(("stderr", 12, EOF_MARKER)) + + record_subprocess_events( + recorder, + event_queue, + start_time=time.monotonic(), + idle_timeout=0.05, + ) + + summaries = [event.summary for event in recorder.events] + assert "stdout-eof" in summaries + assert "stderr-eof" in summaries + assert not any(event.event == "timeout" for event in recorder.events)