Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
2f25da7
Branch for T-010: Xcode approval harness
SoundBlaster Mar 10, 2026
9dfaa49
Select task T-010: Build Xcode approval observation harness
SoundBlaster Mar 10, 2026
8491128
Plan task T-010: Build Xcode approval observation harness
SoundBlaster Mar 10, 2026
27bb491
Implement T-010: add Xcode approval observation harness
SoundBlaster Mar 10, 2026
eb2c340
Archive T-010: Xcode approval observation harness
SoundBlaster Mar 10, 2026
79acae0
Align T-010 harness protocol version with broker
SoundBlaster Mar 10, 2026
7036368
Archive review for T-010 harness
SoundBlaster Mar 10, 2026
c39ea11
Record interactive approval trace for T-010
SoundBlaster Mar 10, 2026
de9940d
Add T-011 follow-up for synthetic broker catalog change
SoundBlaster Mar 10, 2026
1b48065
Plan task T-011: Emit synthetic broker tools/list_changed on catalog …
SoundBlaster Mar 10, 2026
85e94fb
Implement T-011: emit synthetic broker catalog change notifications
SoundBlaster Mar 10, 2026
d26bf17
Archive task T-011: Emit synthetic broker tools/list_changed on catal…
SoundBlaster Mar 10, 2026
1f953a8
Review T-011: synthetic broker tools/list_changed
SoundBlaster Mar 10, 2026
63d220b
Archive REVIEW_t011_synthetic_broker_tools_list_changed report
SoundBlaster Mar 10, 2026
5705652
Format T-011 broker transport test file
SoundBlaster Mar 10, 2026
81a73a6
Handle early harness stdin write failures
SoundBlaster Mar 10, 2026
9b4ee58
Fix harness shutdown drain and Python 3.9 test compatibility
SoundBlaster Mar 10, 2026
dddd550
Refine harness shutdown tracing and broker catalog caching
SoundBlaster Mar 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions FEATURE_REBUILD/ObservedBehavior.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,12 @@ Observed runtime behavior for the optional Web UI dashboard feature on source br
4. `GET /api/metrics/timeseries` keeps `{requests, errors, latencies}` arrays of `{t, v}`.
5. Audit API and export endpoints remain backward-compatible.
6. Shared metrics persistence remains process-safe.

## Supplemental Observation Tooling

1. `scripts/xcode_approval_harness.py` provides deterministic MCP startup probes for
`xcrun mcpbridge` or `mcpbridge-wrapper`.
2. The harness records timestamped send/receive events, idle timeouts, EOF boundaries, and
`notifications/tools/list_changed` so approval races can be reconstructed after a run.
3. Use `--pause-before-step tools-list --pause-seconds <N>` when you need a stable window to
click **Allow** in Xcode before discovery requests continue.
53 changes: 53 additions & 0 deletions FEATURE_REBUILD/Workplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,59 @@
- Rollback:
- Revert release-note and packaging metadata changes only.

#### ✅ T-010 (P1): Build Xcode approval observation harness
- Status: ✅ Completed (2026-03-10)
- Deps: none
- Parallelizable with: T-009
- Touched files:
- `scripts/xcode_approval_harness.py`
- `tests/unit/test_xcode_approval_harness.py`
- `docs/troubleshooting.md`
- `FEATURE_REBUILD/ObservedBehavior.md`
- Acceptance criteria:
- [x] Harness can execute deterministic MCP handshake scenarios against `xcrun mcpbridge`
or the wrapper command.
- [x] Harness logs timestamped send/receive events, EOF, and timeout boundaries so approval
races can be reconstructed after a run.
- [x] Harness can hold and replay `initialize`, `notifications/initialized`, `tools/list`,
`resources/list`, and `prompts/list` steps with configurable delays around manual Xcode
approval.
- [x] Harness records whether `notifications/tools/list_changed` is observed after approval.
- Verification commands:
- `pytest tests/unit/test_xcode_approval_harness.py -v`
- `python3 scripts/xcode_approval_harness.py --help`
- Rollback:
- Remove the harness script/tests/docs note and fall back to ad hoc manual probing.

#### ✅ T-011 (P1): Emit synthetic broker tools/list_changed on catalog warm-up
- Status: ✅ Completed (2026-03-10)
- **Description:** Extend the broker so clients can learn that the Xcode tool catalog became
available after approval even when upstream `xcrun mcpbridge` never emits
`notifications/tools/list_changed` itself. Reuse the existing broker warm-up probes and
synthesize a client-facing `tools/list_changed` only when the cached catalog transitions from
cold to ready or materially changes after reconnect.
- **Priority:** P1
- **Dependencies:** T-010
- **Parallelizable:** no
- **Outputs/Artifacts:**
- `src/mcpbridge_wrapper/broker/daemon.py`
- `src/mcpbridge_wrapper/broker/transport.py`
- `tests/unit/test_broker_daemon.py`
- `tests/unit/test_broker_transport.py`
- `SPECS/INPROGRESS/T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up.md`
- `SPECS/INPROGRESS/T-011_Validation_Report.md`
- **Acceptance Criteria:**
- [x] Broker emits a synthetic `notifications/tools/list_changed` when its internal cached
`tools/list` transitions from empty/unavailable to a non-empty ready catalog.
- [x] Broker re-emits the synthetic notification when reconnect produces a materially changed
non-empty tool catalog, but does not spam clients on repeated empty retry probes.
- [x] Existing `tools/list` readiness gating and cache-hit behavior remain unchanged for
clients that explicitly call `tools/list`.
- [x] Unit tests cover warm-up, reconnect, and no-op retry behavior for the synthetic
notification path.
- [x] Validation notes document whether Cursor/Zed visibly react to the synthetic signal
without a manual MCP toggle.

## Acceptance Criteria (rolled up)

1. Web UI API and dashboard contracts remain backward-compatible.
Expand Down
6 changes: 6 additions & 0 deletions SPECS/ARCHIVE/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

| Task ID | Folder | Archived | Verdict |
|---------|--------|----------|---------|
| T-011 | [T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/](T-011_Emit_synthetic_broker_tools_list_changed_on_catalog_warm-up/) | 2026-03-10 | PASS |
| T-010 | [T-010_Build_Xcode_approval_observation_harness/](T-010_Build_Xcode_approval_observation_harness/) | 2026-03-10 | PASS |
| P8-T2 | [P8-T2_Prepare_for_Release_0.4.3/](P8-T2_Prepare_for_Release_0.4.3/) | 2026-03-10 | PASS |
| P1-T13 | [P1-T13_Document_stale_editable_install_version_mismatch_in_troubleshooting_guide/](P1-T13_Document_stale_editable_install_version_mismatch_in_troubleshooting_guide/) | 2026-03-10 | PASS |
| P2-T8 | [P2-T8_Gate_broker_tools_list_on_warmed_tool_catalog/](P2-T8_Gate_broker_tools_list_on_warmed_tool_catalog/) | 2026-03-10 | PASS |
Expand Down Expand Up @@ -208,6 +210,8 @@

| File | Description |
|------|-------------|
| [REVIEW_t011_synthetic_broker_tools_list_changed.md](_Historical/REVIEW_t011_synthetic_broker_tools_list_changed.md) | Review report for T-011 |
| [REVIEW_t010_xcode_approval_harness.md](_Historical/REVIEW_t010_xcode_approval_harness.md) | Review report for T-010 |
| [REVIEW_release_0.4.3_preparation.md](P8-T2_Prepare_for_Release_0.4.3/REVIEW_release_0.4.3_preparation.md) | Review report for P8-T2 |
| [REVIEW_p1_t13_editable_install_troubleshooting.md](_Historical/REVIEW_p1_t13_editable_install_troubleshooting.md) | Review report for P1-T13 |
| [REVIEW_p2_t8_tools_catalog_gate.md](_Historical/REVIEW_p2_t8_tools_catalog_gate.md) | Review report for P2-T8 |
Expand Down Expand Up @@ -648,3 +652,5 @@
| 2026-03-10 | P2-T8 | Archived REVIEW_p2_t8_tools_catalog_gate report |
| 2026-03-10 | P1-T13 | Archived Document_stale_editable_install_version_mismatch_in_troubleshooting_guide (PASS) |
| 2026-03-10 | P1-T13 | Archived REVIEW_p1_t13_editable_install_troubleshooting report |
| 2026-03-10 | T-010 | Archived Build_Xcode_approval_observation_harness (PASS) |
| 2026-03-10 | T-010 | Archived REVIEW_t010_xcode_approval_harness report |
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# PRD: T-010 — Build Xcode approval observation harness

**Task ID:** T-010
**Priority:** P1
**Status:** Planned
**Date:** 2026-03-10
**Owner:** Codex

## Problem Statement

We need a deterministic way to observe how `xcrun mcpbridge` and `mcpbridge-wrapper`
behave around the Xcode GUI approval flow. Manual probing has shown multiple race-shaped
outcomes across Cursor and Zed, but the repo lacks a repeatable harness that can capture
the exact order and timing of MCP messages before and after the user clicks **Allow** in
Xcode.

## Goals

1. Provide a repo-local CLI harness that can run scripted MCP startup scenarios.
2. Capture timestamped protocol events, timeout windows, and EOF boundaries in a format
suitable for post-run analysis.
3. Make it easy to answer whether readiness appears via:
- normal `initialize` + `tools/list` responses,
- reconnect / EOF patterns,
- or late `notifications/tools/list_changed`.

## Non-Goals

1. Automate the Xcode GUI approval click itself.
2. Change broker/runtime behavior in production code.
3. Add a packaged user-facing command to the published PyPI distribution.

## Deliverables

1. `scripts/xcode_approval_harness.py`
- CLI entrypoint for deterministic MCP observation runs.
2. `tests/unit/test_xcode_approval_harness.py`
- Unit coverage for scenario parsing, event formatting, and timeout handling.
3. `docs/troubleshooting.md`
- Short operator note pointing to the harness for approval-race diagnostics.
4. `FEATURE_REBUILD/ObservedBehavior.md`
- Document the harness as a repeatable observation tool for external Xcode behavior.
5. `SPECS/INPROGRESS/T-010_Validation_Report.md`
- Captured quality-gate results and a brief manual smoke-run note.

## Functional Requirements

1. The harness MUST support launching either:
- `xcrun mcpbridge`, or
- an arbitrary command supplied after `--`.
2. The harness MUST send newline-delimited JSON-RPC messages in a deterministic sequence.
3. The harness MUST support per-step delays so runs can pause around manual Xcode approval.
4. The harness MUST timestamp every sent and received event relative to process start.
5. The harness MUST record:
- outgoing request/notification payloads,
- incoming responses/notifications,
- EOF,
- timeout markers,
- process exit code.
6. The harness MUST highlight whether `notifications/tools/list_changed` was observed.
7. The harness MUST be usable without modifying installed package metadata or editor config.

## CLI Design

Planned flags:

- `--scenario <name>`: named scenario preset.
- `--step-delay <seconds>`: default delay between scripted steps.
- `--read-timeout <seconds>`: timeout for waiting on incoming lines.
- `--output <path>`: optional JSONL event log file.
- `--pretty`: also print human-readable event summary to stdout.
- `--`: optional command override; defaults to `xcrun mcpbridge`.

Planned scenario presets:

1. `approval-probe`
- `initialize`
- `notifications/initialized`
- `tools/list`
- `resources/list`
- `prompts/list`
2. `tools-only`
- `initialize`
- `notifications/initialized`
- repeated `tools/list`

## Output Contract

Each event line SHOULD be JSON with at least:

```json
{
"t_ms": 1234,
"direction": "send|recv|meta",
"event": "jsonrpc|timeout|eof|exit",
"summary": "tools/list",
"payload": {}
}
```

The harness SHOULD print a final summary including:

- observed response IDs
- whether a non-empty `tools/list` was seen
- whether `notifications/tools/list_changed` was seen
- whether EOF occurred before readiness

## Test Plan

Automated:

1. Parse scenario presets into the expected ordered message list.
2. Format JSONL event records deterministically.
3. Recognize `tools/list_changed`, `tools/list`, EOF, and timeout markers.
4. Validate `--help` and CLI argument parsing.

Manual smoke:

1. Run harness against the live `xcrun mcpbridge`.
2. Pause for Xcode approval.
3. Confirm event log is written and summary renders.

## Verification Commands

```bash
pytest tests/unit/test_xcode_approval_harness.py -v
python3 scripts/xcode_approval_harness.py --help
ruff check src/ tests/ scripts/
mypy src/
pytest --cov
```

## Risks

1. Real Xcode approval timing is inherently nondeterministic; the harness can observe it
but cannot stabilize it.
2. Live smoke runs depend on a local Xcode session and cannot be asserted in CI.
3. `xcrun mcpbridge` may emit no explicit approval event, leaving the harness to infer
readiness from message timing and catalog contents.
Loading