From 6db534f585565583ee1c4f400c9d73b853004cc8 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 10:01:56 -0400 Subject: [PATCH 1/7] feat(harness): public adk facade for the unified surface + docs (AGX1-375); cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-export UnifiedEmitter, SpanTracer, TurnUsage, TurnResult, HarnessTurn, StreamTaskMessage, OpenSpan, CloseSpan, and SpanSignal from agentex.lib.adk so agent authors can import the unified harness surface from the canonical public facade instead of the internal core.harness path (AGX1-375). Add adk/docs/harness.md covering the canonical StreamTaskMessage* stream, convert__to_agentex_events taps, HarnessTurn protocol, UnifiedEmitter (yield_turn vs auto_send_turn), tracing-on-by-default, TurnUsage/TurnResult, and per-channel usage examples. Dead-code sweep: nothing safe to remove — the _async modules have not been migrated to use the new auto_send path yet (those are separate migration PRs), and the intentionally-kept _*_tracing handlers are not present in this branch. Co-Authored-By: Claude Sonnet 4.6 --- adk/docs/harness.md | 196 ++++++++++++++++++++++++++++++++ src/agentex/lib/adk/__init__.py | 23 ++++ 2 files changed, 219 insertions(+) create mode 100644 adk/docs/harness.md diff --git a/adk/docs/harness.md b/adk/docs/harness.md new file mode 100644 index 000000000..80fda9133 --- /dev/null +++ b/adk/docs/harness.md @@ -0,0 +1,196 @@ +# Unified Harness Surface + +The unified harness surface gives every agent harness (pydantic-ai, LangGraph, OpenAI Agents, and future parsers) a single, shared path to streaming, message persistence, and tracing. The Agentex `StreamTaskMessage*` event stream is the canonical wire format. A harness tap produces that stream once; the shared machinery delivers it and derives spans from it. + +All public names are re-exported from `agentex.lib.adk`: + +```python +from agentex.lib.adk import ( + UnifiedEmitter, + SpanTracer, + TurnUsage, + TurnResult, + HarnessTurn, + StreamTaskMessage, + OpenSpan, + CloseSpan, + SpanSignal, +) +``` + +The implementation lives at `src/agentex/lib/core/harness/`. + +--- + +## The canonical stream: `StreamTaskMessage` + +`StreamTaskMessage` is a union of the four wire-protocol update types: + +``` +StreamTaskMessageStart - opens a content slot (text, reasoning, tool request, ...) +StreamTaskMessageDelta - appends a token/fragment to an open slot +StreamTaskMessageFull - posts a complete message in one shot (tool response, ...) +StreamTaskMessageDone - closes an open slot +``` + +Every harness tap produces a sequence of these. Everything downstream (delivery, tracing) reads the same sequence. + +--- + +## Per-harness taps: `convert__to_agentex_events` + +A tap is an async generator that translates the harness's native event stream into `StreamTaskMessage*` events. The currently shipped taps are: + +| Harness | Tap function | Exported from | +|---|---|---| +| pydantic-ai | `convert_pydantic_ai_to_agentex_events` | `agentex.lib.adk` | +| LangGraph | `convert_langgraph_to_agentex_events` | `agentex.lib.adk` | + +Taps for claude-code and codex will be added in subsequent PRs (AGX1-420, AGX1-421) and exported from `agentex.lib.adk` in the same way. + +--- + +## `HarnessTurn` protocol + +`HarnessTurn` is the interface a harness turn object must satisfy to plug into `UnifiedEmitter`: + +```python +@runtime_checkable +class HarnessTurn(Protocol): + @property + def events(self) -> AsyncIterator[StreamTaskMessage]: ... + + def usage(self) -> TurnUsage: ... +``` + +`events` is the canonical stream for this turn. `usage()` is valid only after `events` is exhausted (async generators cannot cleanly return a value to the consumer, so usage travels out-of-band). + +--- + +## `TurnUsage` + +Token counts and cost for one turn, harness-independent: + +```python +class TurnUsage(BaseModel): + model: str | None = None + input_tokens: int | None = None + output_tokens: int | None = None + cached_input_tokens: int | None = None + reasoning_tokens: int | None = None + total_tokens: int | None = None + cost_usd: float | None = None + duration_ms: int | None = None + num_llm_calls: int = 0 + num_tool_calls: int = 0 + num_reasoning_blocks: int = 0 +``` + +Field names align with `agentex.lib.core.observability.llm_metrics` for easy conversion. + +--- + +## `UnifiedEmitter` + +`UnifiedEmitter` ties a turn's canonical stream, tracing context, and delivery mode together. Construct one per turn with the task/trace context from the request: + +```python +emitter = UnifiedEmitter( + task_id=params.task.id, + trace_id=params.task.id, # or None to disable tracing + parent_span_id=turn_span.id if turn_span else None, +) +``` + +**Tracing is on by default** when `trace_id` is provided. To disable it explicitly, pass `tracer=False`. To inject a custom `SpanTracer` (e.g. in tests), pass it as `tracer=`. + +### Delivery mode 1: `yield_turn` (sync HTTP ACP) + +For sync ACP agents that return events directly over the HTTP response: + +```python +@acp.on_message_send +async def handle(params): + turn = MyHarnessTurn(params) # implements HarnessTurn + async for event in emitter.yield_turn(turn): + yield event +``` + +`yield_turn` forwards each event to the caller and traces spans as a side effect. It is a passthrough when `tracer` is `None`. + +### Delivery mode 2: `auto_send_turn` (async/Temporal) + +For async or Temporal agents that push to the task stream via Redis: + +```python +result: TurnResult = await emitter.auto_send_turn(turn, created_at=workflow.now()) +``` + +`auto_send_turn` drives `adk.streaming` contexts for every message in the stream, derives and records spans, and returns a `TurnResult` with the final text and usage. Pass `created_at` under Temporal to back-date message timestamps deterministically. + +--- + +## `TurnResult` + +```python +class TurnResult(BaseModel): + final_text: str = "" + usage: TurnUsage = TurnUsage() +``` + +Returned by `auto_send_turn`. `final_text` is the last text segment of the turn (multi-step runs return only the final segment, matching `stream_langgraph_events` / `stream_pydantic_ai_events` semantics). + +--- + +## Tracing: span derivation + +Spans are derived from the canonical stream by `SpanDeriver` (pure, no `adk` dependency) and dispatched to `adk.tracing` by `SpanTracer`. The mapping: + +- `StreamTaskMessageStart(ToolRequestContent)` + `StreamTaskMessageDone` on that index -> tool span open (keyed by `tool_call_id`) +- `StreamTaskMessageFull(ToolResponseContent)` whose `tool_call_id` was opened -> tool span close +- `StreamTaskMessageFull(ToolRequestContent)` (harnesses that emit tool calls as Full) -> opens a tool span; matching `Full(ToolResponseContent)` closes it +- `StreamTaskMessageStart(ReasoningContent)` + `StreamTaskMessageDone` -> reasoning span + +`SpanTracer` is `SpanDeriver`'s consumer. You can inject a custom `SpanTracer` via `UnifiedEmitter(tracer=)` for advanced use or testing. + +--- + +## Usage examples by channel + +### Sync ACP (pydantic-ai tap) + +```python +import agentex.lib.adk as adk +from agentex.lib.adk import UnifiedEmitter, convert_pydantic_ai_to_agentex_events, create_pydantic_ai_tracing_handler + +@acp.on_message_send +async def handle(params): + task_id = params.task.id + async with adk.tracing.span(trace_id=task_id, name="message", ...) as turn_span: + emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=turn_span.id if turn_span else None, + ) + tap = convert_pydantic_ai_to_agentex_events(pydantic_stream) + # wrap tap in a HarnessTurn then yield_turn, or yield directly: + async for event in tap: + yield event +``` + +For the pre-unified sync path the tap is still yielded directly; `UnifiedEmitter.yield_turn` is the forward-looking integration point when a `HarnessTurn` wrapper is available. + +### Async Temporal (auto-send) + +```python +from agentex.lib.adk import UnifiedEmitter + +emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=parent_span_id, +) +result = await emitter.auto_send_turn(turn, created_at=workflow.now()) +# result.final_text — last text segment +# result.usage — TurnUsage (tokens, cost, ...) +``` diff --git a/src/agentex/lib/adk/__init__.py b/src/agentex/lib/adk/__init__.py index f6713be7c..fedd52f7a 100644 --- a/src/agentex/lib/adk/__init__.py +++ b/src/agentex/lib/adk/__init__.py @@ -27,6 +27,19 @@ from agentex.lib.adk._modules.tasks import TasksModule from agentex.lib.adk._modules.tracing import TracingModule +# Unified harness surface (AGX1-375) +from agentex.lib.core.harness import ( + UnifiedEmitter, + SpanTracer, + OpenSpan, + CloseSpan, + SpanSignal, + StreamTaskMessage, + TurnUsage, + TurnResult, + HarnessTurn, +) + from agentex.lib.adk import providers from agentex.lib.adk import utils @@ -69,6 +82,16 @@ "convert_codex_to_agentex_events", "CodexTurn", "codex_usage_to_turn_usage", + # Unified harness surface (AGX1-375) + "UnifiedEmitter", + "SpanTracer", + "OpenSpan", + "CloseSpan", + "SpanSignal", + "StreamTaskMessage", + "TurnUsage", + "TurnResult", + "HarnessTurn", # Providers "providers", # Utils From 7655ad9fe1d9d227e4c08c06512cbde1f52725b6 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 10:08:58 -0400 Subject: [PATCH 2/7] docs: PR 10 post-merge cleanup plan (remove deprecated tracing handlers, workaround markers, optional adk.harness namespace) Documents the deferred, breaking-ish cleanup to run as PR 10 once the whole harness-surface stack merges, the deprecation window passes, and the golden agent is migrated off the bespoke paths. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plans/2026-06-22-pr10-harness-cleanup.md | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md diff --git a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md new file mode 100644 index 000000000..f14ed788d --- /dev/null +++ b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md @@ -0,0 +1,49 @@ +# PR 10 — Post-Merge Harness Cleanup Plan + +Date: 2026-06-22 +Status: Plan — execute as **PR 10**, only AFTER the whole harness-surface stack merges and the deprecation/migration preconditions below hold. +Repo: `scale-agentex-python` + +## Why this is a separate, later PR + +The harness-surface stack (foundation #412, conformance #414, migrations #415/#416/#417/#420/#421, facade+docs #423) was built **additively** so nothing regressed and the stack stayed reviewable. That deliberately left behind a few transitional artifacts — deprecated-but-kept shims, resolved-workaround comments, and a flat public namespace that grew as taps were added. Removing them is a breaking-ish, cross-cutting change that should NOT happen inside the feature PRs. PR 10 does that cleanup once it's safe. + +## Preconditions (do not start PR 10 until ALL hold) + +1. **Entire stack merged** to `next`/main: #412, #414, #415, #416, #417, #420, #421, #423. +2. **Deprecation window observed** (or a minor/major version boundary) for the publicly-deprecated symbols below — they were only docstring-deprecated, never runtime-warned, so external code may still import them. +3. **Golden agent migrated** off the bespoke paths (per the adoption plan, #422 → implementation in `agentex-agents`): it no longer constructs the deprecated tracing handlers or any pre-unified converter path. Grep the golden agent + any other internal consumers first. +4. **No external consumers** depend on the removed symbols (check downstream usage; add a changelog/release note for the removal). + +## Scope — what PR 10 removes / consolidates + +### 1. Delete the deprecated bespoke tracing handlers (primary item) +These were superseded by `SpanTracer`/`UnifiedEmitter` (which derive spans from the canonical stream) and only docstring-deprecated: +- `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` — `create_pydantic_ai_tracing_handler`, `AgentexPydanticAITracingHandler`. +- `src/agentex/lib/adk/_modules/_langgraph_tracing.py` — `create_langgraph_tracing_handler`, `AgentexLangGraphTracingHandler`. +- Any openai bespoke tracing shim deprecated in #416 (`sync_provider.py` `SyncStreamingModel`/`SyncStreamingProvider` if applicable). +Remove the modules (or the deprecated symbols), their `adk/__init__.py` exports, and all references/tests that only existed to exercise the deprecated path. Keep any genuinely-shared helpers they used if still referenced elsewhere. + +### 2. Remove resolved-workaround markers and transitional comments +Now that AGX1-377/378 are fixed in the foundation and the migrations dropped their workarounds, delete the leftover transitional breadcrumbs: +- Any remaining `# AGX1-377`/`# AGX1-378` "workaround/limitation" comments in `auto_send.py`, the per-harness turns/async helpers, and the conformance runner (the coalescing is gone; `created_at` is restored; streamed tool delivery works). +- Stale docstring notes that describe behavior that has since changed (e.g. "created_at limitation", "coalescing workaround"). +Keep comments that document *current* contracts; only remove ones describing now-removed transitional state. + +### 3. (Optional) Introduce an `adk.harness` namespace to de-crowd the flat facade +#423 exposed the surface flat on `agentex.lib.adk` for consistency. With five `convert__to_agentex_events` taps + `Turn`s + `UnifiedEmitter`/types, the flat namespace is crowded. Consider a dedicated `agentex.lib.adk.harness` submodule that re-exports the surface, while keeping flat `adk.*` re-exports for one release (back-compat), then dropping the flat ones in a later major. Decide with the team; this is polish, not required. If done, update the #423 docs page (`adk/docs/harness.md`) accordingly. + +### 4. Remove any vestigial simple-conformance-runner paths +All harnesses now register into the cross-channel conformance runner (#414). If, after merge, any simple/determinism-only runner code path or the standalone `derive_all`-based test remains unused, remove it (or keep `derive_all` only if it's still a useful primitive). Verify nothing imports a removed helper. + +### 5. De-duplicate per-harness `_*_sync` / `_*_async` if anything remains +The async helpers (`stream_pydantic_ai_events`, `stream_langgraph_events`, `run_agent_streamed_auto_send`) now delegate to `UnifiedEmitter.auto_send_turn`. Confirm no hand-rolled `adk.streaming` streaming loops remain in those modules post-merge; remove any leftover dead branches. + +## Verification +- Grep the whole repo (and confirm with the golden agent / known consumers) for each removed symbol — zero references before deletion. +- Full `./scripts/test` on Python 3.12 AND 3.13 (run the two versions separately or in shorter scoped batches — the dual-version `./scripts/test` in one shot has tripped a 600s no-output watchdog; prefer scoped runs or background with periodic output). +- `./scripts/lint` clean (whole-repo ruff + pyright). +- Changelog / release note documenting the removal of the deprecated public symbols. + +## Risk +Removing publicly-exported (deprecated) symbols is a breaking change — gate PR 10 on the version-bump policy and on confirming the golden agent + any external consumers are migrated. Everything here is recoverable from history; sequence it as the final, deliberate cleanup of the harness-surface workstream. From a6f2d894730312b9e87cf40adea7c21069585bd3 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 12:01:39 -0400 Subject: [PATCH 3/7] docs(harness): drop unused deprecated import from sync example [greptile] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The sync ACP example imported create_pydantic_ai_tracing_handler (deprecated) but never used it — only UnifiedEmitter and convert_pydantic_ai_to_agentex_events are referenced. Remove it so the example doesn't steer readers toward the deprecated handler. Co-Authored-By: Claude Opus 4.8 (1M context) --- adk/docs/harness.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/adk/docs/harness.md b/adk/docs/harness.md index 80fda9133..6a9d8947a 100644 --- a/adk/docs/harness.md +++ b/adk/docs/harness.md @@ -161,7 +161,7 @@ Spans are derived from the canonical stream by `SpanDeriver` (pure, no `adk` dep ```python import agentex.lib.adk as adk -from agentex.lib.adk import UnifiedEmitter, convert_pydantic_ai_to_agentex_events, create_pydantic_ai_tracing_handler +from agentex.lib.adk import UnifiedEmitter, convert_pydantic_ai_to_agentex_events @acp.on_message_send async def handle(params): From 4797963868597af399f7a89fae28251dbc6dbf8c Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 16:26:08 -0400 Subject: [PATCH 4/7] docs(harness-cleanup): add cross-PR duplication consolidation to PR 10 plan Folds the review finding (duplicated _FakeTracing/_FakeSpan test doubles, copy-pasted determinism test, five parallel harness-turn usage normalizers, three divergent sync-path structures, competing adk/__init__.py edits, and tutorial-scaffolding drift) into the existing post-merge cleanup plan as scope items 6-12, flagged non-breaking so they can land independently of the deprecation-gated removals. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plans/2026-06-22-pr10-harness-cleanup.md | 42 +++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md index f14ed788d..d9b9c337f 100644 --- a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md +++ b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md @@ -39,8 +39,50 @@ All harnesses now register into the cross-channel conformance runner (#414). If, ### 5. De-duplicate per-harness `_*_sync` / `_*_async` if anything remains The async helpers (`stream_pydantic_ai_events`, `stream_langgraph_events`, `run_agent_streamed_auto_send`) now delegate to `UnifiedEmitter.auto_send_turn`. Confirm no hand-rolled `adk.streaming` streaming loops remain in those modules post-merge; remove any leftover dead branches. +### 6. Consolidate duplicated test scaffolding (cross-PR review finding) + +The migration PRs were reviewed independently, so each re-introduced the same test doubles instead of sharing them. After merge these are the concrete duplicates: + +- **`_FakeTracing` / `_FakeSpan`** are defined ~9 times: the foundation tests already carry three copies (`tests/lib/core/harness/test_tracer.py`, `test_emitter.py`, `conformance/runner.py`), and the integration suites add six more — `tests/lib/core/harness/test_harness_pydantic_ai_{sync,async,temporal}.py` (#415) and `test_harness_langgraph_{sync,async,temporal}.py` (#417) each redefine them. +- **`_run_yield_turn`** is duplicated between the pydantic-ai and langgraph integration suites. + +Extract a single shared module — `tests/lib/core/harness/_fakes.py` (`FakeSpan`, `FakeTracing`, `run_yield_turn`) or a `conftest.py` `fake_tracing` fixture — and have every harness test import it. Delete the per-file copies. + +### 7. Parametrize the generic conformance determinism test once + +`test_span_derivation_is_deterministic` is copy-pasted into every per-harness conformance module (`conformance/test__conformance.py`) on top of the copy already in the shared `conformance/test_conformance.py`. It is harness-agnostic — it only re-derives over registered fixtures. Keep ONE parametrized version in the shared conformance module driven by `all_fixtures()`, and delete the per-harness copies (the per-harness modules keep only their fixture registration + cross-channel assertions). + +### 8. Extract a shared harness-turn usage-normalization helper + +The five `HarnessTurn` implementations — `_pydantic_ai_turn.py`, `_langgraph_turn.py`, `providers/_modules/openai_turn.py`, `_claude_code_turn.py`, `_codex_turn.py` (134–214 lines each) — are not copy-paste, but they repeat the same shape: wrap a tap's event stream and normalize provider usage into `TurnUsage`. Pull the common normalization into a shared primitive in the foundation (e.g. `core/harness/usage.py` `normalize_usage(...)` or a `HarnessTurnBase` mixin), leaving each module only its provider-specific mapping. Do NOT force-fit harnesses whose usage shape genuinely diverges (codex is the largest for a reason — check before collapsing). + +### 9. Converge the three sync-path structures + +"Sync delivery" was implemented three different ways across the migrations: openai modifies `providers/_modules/sync_provider.py` + adds `openai_turn.py`; pydantic-ai/langgraph modify their existing `_*_sync.py`; claude/codex add new `_claude_code_sync.py` / `_codex_sync.py`. Pick one structural convention and align the five harnesses to it so the sync path reads the same everywhere. (Overlaps item 5 — do them together.) + +### 10. Reconcile the competing `adk/__init__.py` edits + +`src/agentex/lib/adk/__init__.py` is edited by three PRs — claude (#420, +9), codex (#421, +6), and the facade (#423, +22). Once merged, the facade in #423 should be the single source of the public surface; fold the claude/codex ad-hoc export additions into it and drop the duplicates. (Subsumed by the facade work in item 3 — track here so it isn't missed.) + +### 11. Tutorial-agent consistency pass + +The 15 tutorial projects (5 harnesses × sync/async/temporal) are intentionally tailored per harness, so there is no code to dedupe — but the scaffolding drifted and should be standardized: +- **Naming:** `harness_` (pydantic-ai, langgraph, codex) vs numeric prefixes `060_/130_/140_` (openai, claude_code). Pick one convention and rename. +- **`.dockerignore`:** byte-identical in pydantic-ai/openai/claude, **absent in langgraph and codex**. Add the shared file everywhere (or none). +- **`conftest.py`:** present only in codex (one per tier). Either promote it to the shared tutorial test setup or remove if unneeded. + +### 12. Decide integration-test coverage parity + +Only pydantic-ai (#415) and langgraph (#417) ship `test_harness_*_{sync,async,temporal}` integration suites + CI live-matrix rows; openai/claude/codex (#416/#420/#421) ship only conformance + turn tests. Either add the missing suites (and their matrix rows — note #415's matrix comment already invites PRs 5–8 to do so) or document the intentional difference. The two existing matrix-job definitions are near-identical and should collapse to one matrix once item 6's shared fakes land. + +> **Sequencing note:** items 6–9 and 11–12 are **non-breaking refactors** (tests, internal helpers, examples) — they only need the stack merged (precondition 1), NOT the deprecation window / consumer-migration gates that items 1–2 require. They can land as their own earlier cleanup PR if PR 10's breaking removals are blocked on the version-bump policy. Item 10 rides with item 3. + +(Also noted, no action: #417 already carries a `tests/lib/adk/test_pydantic_ai_async.py` change via a shared tracing-handler fix — recorded here only so it isn't mistaken for stray duplication during cleanup.) + ## Verification - Grep the whole repo (and confirm with the golden agent / known consumers) for each removed symbol — zero references before deletion. +- After the test-scaffolding consolidation (items 6–7): the shared `_fakes` module / fixture is the only definition of `_FakeTracing`/`_FakeSpan`, and the determinism test exists once — grep confirms no per-file/per-harness copies remain. +- After the turn/sync consolidation (items 8–9): the five turn modules import the shared usage helper and the sync path follows one convention; harness conformance + integration suites stay green. - Full `./scripts/test` on Python 3.12 AND 3.13 (run the two versions separately or in shorter scoped batches — the dual-version `./scripts/test` in one shot has tripped a 600s no-output watchdog; prefer scoped runs or background with periodic output). - `./scripts/lint` clean (whole-repo ruff + pyright). - Changelog / release note documenting the removal of the deprecated public symbols. From 289fd609175ca0722758d138baea9aec46b054f1 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 16:28:58 -0400 Subject: [PATCH 5/7] docs(harness-cleanup): plan retirement of duplicate pre-unified tutorials The migrations added a second set of langgraph/pydantic-ai/openai tutorial agents alongside the ones already on next. The pre-existing ones demonstrate the deprecated bespoke-tracing path (they import create_*_tracing_handler), so retiring/migrating them is a hard prerequisite of item 1's symbol removal, not optional. Adds item 13 (with a replace-in-place vs keep-new decision), cross-references item 1, and gates item 13 alongside item 1. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plans/2026-06-22-pr10-harness-cleanup.md | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md index d9b9c337f..54c71622b 100644 --- a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md +++ b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md @@ -24,6 +24,8 @@ These were superseded by `SpanTracer`/`UnifiedEmitter` (which derive spans from - Any openai bespoke tracing shim deprecated in #416 (`sync_provider.py` `SyncStreamingModel`/`SyncStreamingProvider` if applicable). Remove the modules (or the deprecated symbols), their `adk/__init__.py` exports, and all references/tests that only existed to exercise the deprecated path. Keep any genuinely-shared helpers they used if still referenced elsewhere. +**Note — these symbols still have tutorial consumers (see item 13):** the pre-unified example agents (`examples/tutorials/.../030_langgraph`, `100_langgraph`, `040_pydantic_ai`, `110_pydantic_ai`, etc.) import `create_langgraph_tracing_handler` / `create_pydantic_ai_tracing_handler`. Deleting the symbols breaks those tutorials, so item 13 (retire/migrate them) is a hard prerequisite of this removal, not optional polish. + ### 2. Remove resolved-workaround markers and transitional comments Now that AGX1-377/378 are fixed in the foundation and the migrations dropped their workarounds, delete the leftover transitional breadcrumbs: - Any remaining `# AGX1-377`/`# AGX1-378` "workaround/limitation" comments in `auto_send.py`, the per-harness turns/async helpers, and the conformance runner (the coalescing is gone; `created_at` is restored; streamed tool delivery works). @@ -75,7 +77,25 @@ The 15 tutorial projects (5 harnesses × sync/async/temporal) are intentionally Only pydantic-ai (#415) and langgraph (#417) ship `test_harness_*_{sync,async,temporal}` integration suites + CI live-matrix rows; openai/claude/codex (#416/#420/#421) ship only conformance + turn tests. Either add the missing suites (and their matrix rows — note #415's matrix comment already invites PRs 5–8 to do so) or document the intentional difference. The two existing matrix-job definitions are near-identical and should collapse to one matrix once item 6's shared fakes land. -> **Sequencing note:** items 6–9 and 11–12 are **non-breaking refactors** (tests, internal helpers, examples) — they only need the stack merged (precondition 1), NOT the deprecation window / consumer-migration gates that items 1–2 require. They can land as their own earlier cleanup PR if PR 10's breaking removals are blocked on the version-bump policy. Item 10 rides with item 3. +### 13. Retire the duplicate pre-unified framework tutorials + +The migrations added a **second** set of framework tutorials alongside the ones already on `next`, so langgraph / pydantic-ai / openai now each have two demonstrations of the same framework: + +| Framework | Pre-existing (pre-unified, on `next`) | New (unified surface, harness PRs) | +| --- | --- | --- | +| langgraph | `00_sync/030_langgraph`, `10_async/00_base/100_langgraph`, `10_async/10_temporal/130_langgraph` | `harness_langgraph` ×3 (#417) | +| pydantic-ai | `00_sync/040_pydantic_ai`, `10_async/00_base/110_pydantic_ai`, `10_async/10_temporal/110_pydantic_ai` | `harness_pydantic_ai` ×3 (#415) | +| openai | `00_sync/050_openai_agents_local_sandbox`, `10_async/00_base/120_…`, `10_async/10_temporal/120_…` | `060/130/140_harness_openai` (#416) | + +The old ones demonstrate the **deprecated pre-unified path** — verified: `040_pydantic_ai` imports `create_pydantic_ai_tracing_handler` + `convert_*(tracing_handler=...)`; `030_langgraph`/`100_langgraph` import `create_langgraph_tracing_handler` (+ `stream_langgraph_events`). The new `harness_*` agents are their unified-surface (`UnifiedEmitter` + `Turn`) replacements. So this is the tutorial-facing half of item 1's removal. + +**Decision needed — which set survives:** +- **(a) Replace in place (preferred):** port the unified-surface implementation into the existing numbered slots (`040_pydantic_ai`, `030_langgraph`, …) and delete the new `harness_*` dirs. Keeps the established numbered tutorial sequence and resolves item 11's naming inconsistency for free. +- **(b) Keep the new dirs, delete the old:** simpler diff, but leaves the `harness_*` vs numbered naming split (then settle naming per item 11) and orphans the old numbers. + +Either way: pick one set per framework, delete the other, fix any tutorial index/README that links the removed dirs, and confirm the surviving agents don't import the item-1 deprecated symbols. claude-code and codex are net-new (the existing `090_claude_agents_sdk_mvp` is the Claude **Agents SDK**, not the claude-code CLI harness) — nothing to retire there. + +> **Sequencing note:** items 6–9 and 11–12 are **non-breaking refactors** (tests, internal helpers, examples) — they only need the stack merged (precondition 1), NOT the deprecation window / consumer-migration gates that items 1–2 require. They can land as their own earlier cleanup PR if PR 10's breaking removals are blocked on the version-bump policy. Item 10 rides with item 3; **item 13 is gated with item 1** (the old tutorials import the symbols item 1 removes), so do them in the same PR. (Also noted, no action: #417 already carries a `tests/lib/adk/test_pydantic_ai_async.py` change via a shared tracing-handler fix — recorded here only so it isn't mistaken for stray duplication during cleanup.) @@ -83,6 +103,7 @@ Only pydantic-ai (#415) and langgraph (#417) ship `test_harness_*_{sync,async,te - Grep the whole repo (and confirm with the golden agent / known consumers) for each removed symbol — zero references before deletion. - After the test-scaffolding consolidation (items 6–7): the shared `_fakes` module / fixture is the only definition of `_FakeTracing`/`_FakeSpan`, and the determinism test exists once — grep confirms no per-file/per-harness copies remain. - After the turn/sync consolidation (items 8–9): the five turn modules import the shared usage helper and the sync path follows one convention; harness conformance + integration suites stay green. +- After the tutorial retirement (item 13): exactly one tutorial agent per framework per tier remains, none import the item-1 deprecated symbols, and the tutorial CI/test job + any index/README links resolve (no references to deleted dirs). - Full `./scripts/test` on Python 3.12 AND 3.13 (run the two versions separately or in shorter scoped batches — the dual-version `./scripts/test` in one shot has tripped a 600s no-output watchdog; prefer scoped runs or background with periodic output). - `./scripts/lint` clean (whole-repo ruff + pyright). - Changelog / release note documenting the removal of the deprecated public symbols. From 8680163fdd2aa56459e8fa4da003aa670eb0ac7c Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 16:34:12 -0400 Subject: [PATCH 6/7] docs(harness-cleanup): settle tutorial naming on numbered paradigm Resolves the open naming decision in items 11/13: standardize on the numbered NNN_ convention and rename the new harness_* tutorials off the bare scheme, folding the rename into the replace-in-place retirement of the pre-unified tutorials (pydantic-ai/langgraph/openai take over their old numbered slots; codex gets fresh numbers; claude_code already compliant). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plans/2026-06-22-pr10-harness-cleanup.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md index 54c71622b..651febbe1 100644 --- a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md +++ b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md @@ -69,7 +69,12 @@ The five `HarnessTurn` implementations — `_pydantic_ai_turn.py`, `_langgraph_t ### 11. Tutorial-agent consistency pass The 15 tutorial projects (5 harnesses × sync/async/temporal) are intentionally tailored per harness, so there is no code to dedupe — but the scaffolding drifted and should be standardized: -- **Naming:** `harness_` (pydantic-ai, langgraph, codex) vs numeric prefixes `060_/130_/140_` (openai, claude_code). Pick one convention and rename. +- **Naming — standardize on the numbered `NNN_` paradigm** (matches every pre-existing tutorial). Rename the new harness agents off the bare `harness_*` scheme: + - `harness_pydantic_ai` and `harness_langgraph` (the bare-named sync/async/temporal dirs) → take the numbered slots of the pre-unified tutorials they replace (`040_pydantic_ai`, `110_pydantic_ai`; `030_langgraph`, `100_langgraph`, `130_langgraph`). This is the same move as item 13's replace-in-place — do the rename and the old-tutorial retirement as one step rather than twice. + - `060_harness_openai` / `130_harness_openai` / `140_harness_openai` → drop the `harness_` infix so they read `NNN_openai_*` like the rest, again folding into the openai retirement in item 13. + - `harness_codex` → assign fresh `NNN_codex` numbers consistent with the sequence (net-new; no old slot to reuse). + - `claude_code` (`060/130/140_claude_code`) already follows the numbered paradigm — no rename. + - Because the tutorials job discovers by `manifest.yaml` glob, the renames don't need a CI/allowlist change, but update any tutorial index / README / cross-links that reference the old paths (see item 13 verification). - **`.dockerignore`:** byte-identical in pydantic-ai/openai/claude, **absent in langgraph and codex**. Add the shared file everywhere (or none). - **`conftest.py`:** present only in codex (one per tier). Either promote it to the shared tutorial test setup or remove if unneeded. @@ -89,11 +94,9 @@ The migrations added a **second** set of framework tutorials alongside the ones The old ones demonstrate the **deprecated pre-unified path** — verified: `040_pydantic_ai` imports `create_pydantic_ai_tracing_handler` + `convert_*(tracing_handler=...)`; `030_langgraph`/`100_langgraph` import `create_langgraph_tracing_handler` (+ `stream_langgraph_events`). The new `harness_*` agents are their unified-surface (`UnifiedEmitter` + `Turn`) replacements. So this is the tutorial-facing half of item 1's removal. -**Decision needed — which set survives:** -- **(a) Replace in place (preferred):** port the unified-surface implementation into the existing numbered slots (`040_pydantic_ai`, `030_langgraph`, …) and delete the new `harness_*` dirs. Keeps the established numbered tutorial sequence and resolves item 11's naming inconsistency for free. -- **(b) Keep the new dirs, delete the old:** simpler diff, but leaves the `harness_*` vs numbered naming split (then settle naming per item 11) and orphans the old numbers. +**Decision — replace in place, numbered paradigm (settled):** port each unified-surface (`harness_*`) implementation into the numbered slot of the pre-unified tutorial it supersedes (`harness_pydantic_ai` → `040_pydantic_ai` / `110_pydantic_ai`; `harness_langgraph` → `030_langgraph` / `100_langgraph` / `130_langgraph`; `*_harness_openai` → the `NNN_openai_*` slots) and delete the old deprecated dirs. This keeps the established numbered sequence and is the same operation as item 11's rename — execute them together (the rename *is* the retirement). codex is net-new, so it takes fresh `NNN_codex` numbers; claude-code is already numbered. The rejected alternative (keep the bare `harness_*` dirs, delete the old) is not taken — it would orphan the existing numbers and leave the naming split. -Either way: pick one set per framework, delete the other, fix any tutorial index/README that links the removed dirs, and confirm the surviving agents don't import the item-1 deprecated symbols. claude-code and codex are net-new (the existing `090_claude_agents_sdk_mvp` is the Claude **Agents SDK**, not the claude-code CLI harness) — nothing to retire there. +For every framework: confirm the surviving agent does not import the item-1 deprecated symbols, and fix any tutorial index/README that links the removed dirs. The existing `090_claude_agents_sdk_mvp` is the Claude **Agents SDK** (not the claude-code CLI harness), so it stays. > **Sequencing note:** items 6–9 and 11–12 are **non-breaking refactors** (tests, internal helpers, examples) — they only need the stack merged (precondition 1), NOT the deprecation window / consumer-migration gates that items 1–2 require. They can land as their own earlier cleanup PR if PR 10's breaking removals are blocked on the version-bump policy. Item 10 rides with item 3; **item 13 is gated with item 1** (the old tutorials import the symbols item 1 removes), so do them in the same PR. From f58cfc923ef864cf68f35a57844c72f5f3c92145 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Mon, 22 Jun 2026 18:50:48 -0400 Subject: [PATCH 7/7] docs(harness-cleanup): move PR 10 plan out of PR 9 into the pr10 branch PR 9 stays scoped to the public adk facade + adk/docs/harness.md. The PR 10 cleanup plan now lives on declan-scale/pr10-harness-cleanup. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plans/2026-06-22-pr10-harness-cleanup.md | 115 ------------------ 1 file changed, 115 deletions(-) delete mode 100644 docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md diff --git a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md b/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md deleted file mode 100644 index 651febbe1..000000000 --- a/docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md +++ /dev/null @@ -1,115 +0,0 @@ -# PR 10 — Post-Merge Harness Cleanup Plan - -Date: 2026-06-22 -Status: Plan — execute as **PR 10**, only AFTER the whole harness-surface stack merges and the deprecation/migration preconditions below hold. -Repo: `scale-agentex-python` - -## Why this is a separate, later PR - -The harness-surface stack (foundation #412, conformance #414, migrations #415/#416/#417/#420/#421, facade+docs #423) was built **additively** so nothing regressed and the stack stayed reviewable. That deliberately left behind a few transitional artifacts — deprecated-but-kept shims, resolved-workaround comments, and a flat public namespace that grew as taps were added. Removing them is a breaking-ish, cross-cutting change that should NOT happen inside the feature PRs. PR 10 does that cleanup once it's safe. - -## Preconditions (do not start PR 10 until ALL hold) - -1. **Entire stack merged** to `next`/main: #412, #414, #415, #416, #417, #420, #421, #423. -2. **Deprecation window observed** (or a minor/major version boundary) for the publicly-deprecated symbols below — they were only docstring-deprecated, never runtime-warned, so external code may still import them. -3. **Golden agent migrated** off the bespoke paths (per the adoption plan, #422 → implementation in `agentex-agents`): it no longer constructs the deprecated tracing handlers or any pre-unified converter path. Grep the golden agent + any other internal consumers first. -4. **No external consumers** depend on the removed symbols (check downstream usage; add a changelog/release note for the removal). - -## Scope — what PR 10 removes / consolidates - -### 1. Delete the deprecated bespoke tracing handlers (primary item) -These were superseded by `SpanTracer`/`UnifiedEmitter` (which derive spans from the canonical stream) and only docstring-deprecated: -- `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` — `create_pydantic_ai_tracing_handler`, `AgentexPydanticAITracingHandler`. -- `src/agentex/lib/adk/_modules/_langgraph_tracing.py` — `create_langgraph_tracing_handler`, `AgentexLangGraphTracingHandler`. -- Any openai bespoke tracing shim deprecated in #416 (`sync_provider.py` `SyncStreamingModel`/`SyncStreamingProvider` if applicable). -Remove the modules (or the deprecated symbols), their `adk/__init__.py` exports, and all references/tests that only existed to exercise the deprecated path. Keep any genuinely-shared helpers they used if still referenced elsewhere. - -**Note — these symbols still have tutorial consumers (see item 13):** the pre-unified example agents (`examples/tutorials/.../030_langgraph`, `100_langgraph`, `040_pydantic_ai`, `110_pydantic_ai`, etc.) import `create_langgraph_tracing_handler` / `create_pydantic_ai_tracing_handler`. Deleting the symbols breaks those tutorials, so item 13 (retire/migrate them) is a hard prerequisite of this removal, not optional polish. - -### 2. Remove resolved-workaround markers and transitional comments -Now that AGX1-377/378 are fixed in the foundation and the migrations dropped their workarounds, delete the leftover transitional breadcrumbs: -- Any remaining `# AGX1-377`/`# AGX1-378` "workaround/limitation" comments in `auto_send.py`, the per-harness turns/async helpers, and the conformance runner (the coalescing is gone; `created_at` is restored; streamed tool delivery works). -- Stale docstring notes that describe behavior that has since changed (e.g. "created_at limitation", "coalescing workaround"). -Keep comments that document *current* contracts; only remove ones describing now-removed transitional state. - -### 3. (Optional) Introduce an `adk.harness` namespace to de-crowd the flat facade -#423 exposed the surface flat on `agentex.lib.adk` for consistency. With five `convert__to_agentex_events` taps + `Turn`s + `UnifiedEmitter`/types, the flat namespace is crowded. Consider a dedicated `agentex.lib.adk.harness` submodule that re-exports the surface, while keeping flat `adk.*` re-exports for one release (back-compat), then dropping the flat ones in a later major. Decide with the team; this is polish, not required. If done, update the #423 docs page (`adk/docs/harness.md`) accordingly. - -### 4. Remove any vestigial simple-conformance-runner paths -All harnesses now register into the cross-channel conformance runner (#414). If, after merge, any simple/determinism-only runner code path or the standalone `derive_all`-based test remains unused, remove it (or keep `derive_all` only if it's still a useful primitive). Verify nothing imports a removed helper. - -### 5. De-duplicate per-harness `_*_sync` / `_*_async` if anything remains -The async helpers (`stream_pydantic_ai_events`, `stream_langgraph_events`, `run_agent_streamed_auto_send`) now delegate to `UnifiedEmitter.auto_send_turn`. Confirm no hand-rolled `adk.streaming` streaming loops remain in those modules post-merge; remove any leftover dead branches. - -### 6. Consolidate duplicated test scaffolding (cross-PR review finding) - -The migration PRs were reviewed independently, so each re-introduced the same test doubles instead of sharing them. After merge these are the concrete duplicates: - -- **`_FakeTracing` / `_FakeSpan`** are defined ~9 times: the foundation tests already carry three copies (`tests/lib/core/harness/test_tracer.py`, `test_emitter.py`, `conformance/runner.py`), and the integration suites add six more — `tests/lib/core/harness/test_harness_pydantic_ai_{sync,async,temporal}.py` (#415) and `test_harness_langgraph_{sync,async,temporal}.py` (#417) each redefine them. -- **`_run_yield_turn`** is duplicated between the pydantic-ai and langgraph integration suites. - -Extract a single shared module — `tests/lib/core/harness/_fakes.py` (`FakeSpan`, `FakeTracing`, `run_yield_turn`) or a `conftest.py` `fake_tracing` fixture — and have every harness test import it. Delete the per-file copies. - -### 7. Parametrize the generic conformance determinism test once - -`test_span_derivation_is_deterministic` is copy-pasted into every per-harness conformance module (`conformance/test__conformance.py`) on top of the copy already in the shared `conformance/test_conformance.py`. It is harness-agnostic — it only re-derives over registered fixtures. Keep ONE parametrized version in the shared conformance module driven by `all_fixtures()`, and delete the per-harness copies (the per-harness modules keep only their fixture registration + cross-channel assertions). - -### 8. Extract a shared harness-turn usage-normalization helper - -The five `HarnessTurn` implementations — `_pydantic_ai_turn.py`, `_langgraph_turn.py`, `providers/_modules/openai_turn.py`, `_claude_code_turn.py`, `_codex_turn.py` (134–214 lines each) — are not copy-paste, but they repeat the same shape: wrap a tap's event stream and normalize provider usage into `TurnUsage`. Pull the common normalization into a shared primitive in the foundation (e.g. `core/harness/usage.py` `normalize_usage(...)` or a `HarnessTurnBase` mixin), leaving each module only its provider-specific mapping. Do NOT force-fit harnesses whose usage shape genuinely diverges (codex is the largest for a reason — check before collapsing). - -### 9. Converge the three sync-path structures - -"Sync delivery" was implemented three different ways across the migrations: openai modifies `providers/_modules/sync_provider.py` + adds `openai_turn.py`; pydantic-ai/langgraph modify their existing `_*_sync.py`; claude/codex add new `_claude_code_sync.py` / `_codex_sync.py`. Pick one structural convention and align the five harnesses to it so the sync path reads the same everywhere. (Overlaps item 5 — do them together.) - -### 10. Reconcile the competing `adk/__init__.py` edits - -`src/agentex/lib/adk/__init__.py` is edited by three PRs — claude (#420, +9), codex (#421, +6), and the facade (#423, +22). Once merged, the facade in #423 should be the single source of the public surface; fold the claude/codex ad-hoc export additions into it and drop the duplicates. (Subsumed by the facade work in item 3 — track here so it isn't missed.) - -### 11. Tutorial-agent consistency pass - -The 15 tutorial projects (5 harnesses × sync/async/temporal) are intentionally tailored per harness, so there is no code to dedupe — but the scaffolding drifted and should be standardized: -- **Naming — standardize on the numbered `NNN_` paradigm** (matches every pre-existing tutorial). Rename the new harness agents off the bare `harness_*` scheme: - - `harness_pydantic_ai` and `harness_langgraph` (the bare-named sync/async/temporal dirs) → take the numbered slots of the pre-unified tutorials they replace (`040_pydantic_ai`, `110_pydantic_ai`; `030_langgraph`, `100_langgraph`, `130_langgraph`). This is the same move as item 13's replace-in-place — do the rename and the old-tutorial retirement as one step rather than twice. - - `060_harness_openai` / `130_harness_openai` / `140_harness_openai` → drop the `harness_` infix so they read `NNN_openai_*` like the rest, again folding into the openai retirement in item 13. - - `harness_codex` → assign fresh `NNN_codex` numbers consistent with the sequence (net-new; no old slot to reuse). - - `claude_code` (`060/130/140_claude_code`) already follows the numbered paradigm — no rename. - - Because the tutorials job discovers by `manifest.yaml` glob, the renames don't need a CI/allowlist change, but update any tutorial index / README / cross-links that reference the old paths (see item 13 verification). -- **`.dockerignore`:** byte-identical in pydantic-ai/openai/claude, **absent in langgraph and codex**. Add the shared file everywhere (or none). -- **`conftest.py`:** present only in codex (one per tier). Either promote it to the shared tutorial test setup or remove if unneeded. - -### 12. Decide integration-test coverage parity - -Only pydantic-ai (#415) and langgraph (#417) ship `test_harness_*_{sync,async,temporal}` integration suites + CI live-matrix rows; openai/claude/codex (#416/#420/#421) ship only conformance + turn tests. Either add the missing suites (and their matrix rows — note #415's matrix comment already invites PRs 5–8 to do so) or document the intentional difference. The two existing matrix-job definitions are near-identical and should collapse to one matrix once item 6's shared fakes land. - -### 13. Retire the duplicate pre-unified framework tutorials - -The migrations added a **second** set of framework tutorials alongside the ones already on `next`, so langgraph / pydantic-ai / openai now each have two demonstrations of the same framework: - -| Framework | Pre-existing (pre-unified, on `next`) | New (unified surface, harness PRs) | -| --- | --- | --- | -| langgraph | `00_sync/030_langgraph`, `10_async/00_base/100_langgraph`, `10_async/10_temporal/130_langgraph` | `harness_langgraph` ×3 (#417) | -| pydantic-ai | `00_sync/040_pydantic_ai`, `10_async/00_base/110_pydantic_ai`, `10_async/10_temporal/110_pydantic_ai` | `harness_pydantic_ai` ×3 (#415) | -| openai | `00_sync/050_openai_agents_local_sandbox`, `10_async/00_base/120_…`, `10_async/10_temporal/120_…` | `060/130/140_harness_openai` (#416) | - -The old ones demonstrate the **deprecated pre-unified path** — verified: `040_pydantic_ai` imports `create_pydantic_ai_tracing_handler` + `convert_*(tracing_handler=...)`; `030_langgraph`/`100_langgraph` import `create_langgraph_tracing_handler` (+ `stream_langgraph_events`). The new `harness_*` agents are their unified-surface (`UnifiedEmitter` + `Turn`) replacements. So this is the tutorial-facing half of item 1's removal. - -**Decision — replace in place, numbered paradigm (settled):** port each unified-surface (`harness_*`) implementation into the numbered slot of the pre-unified tutorial it supersedes (`harness_pydantic_ai` → `040_pydantic_ai` / `110_pydantic_ai`; `harness_langgraph` → `030_langgraph` / `100_langgraph` / `130_langgraph`; `*_harness_openai` → the `NNN_openai_*` slots) and delete the old deprecated dirs. This keeps the established numbered sequence and is the same operation as item 11's rename — execute them together (the rename *is* the retirement). codex is net-new, so it takes fresh `NNN_codex` numbers; claude-code is already numbered. The rejected alternative (keep the bare `harness_*` dirs, delete the old) is not taken — it would orphan the existing numbers and leave the naming split. - -For every framework: confirm the surviving agent does not import the item-1 deprecated symbols, and fix any tutorial index/README that links the removed dirs. The existing `090_claude_agents_sdk_mvp` is the Claude **Agents SDK** (not the claude-code CLI harness), so it stays. - -> **Sequencing note:** items 6–9 and 11–12 are **non-breaking refactors** (tests, internal helpers, examples) — they only need the stack merged (precondition 1), NOT the deprecation window / consumer-migration gates that items 1–2 require. They can land as their own earlier cleanup PR if PR 10's breaking removals are blocked on the version-bump policy. Item 10 rides with item 3; **item 13 is gated with item 1** (the old tutorials import the symbols item 1 removes), so do them in the same PR. - -(Also noted, no action: #417 already carries a `tests/lib/adk/test_pydantic_ai_async.py` change via a shared tracing-handler fix — recorded here only so it isn't mistaken for stray duplication during cleanup.) - -## Verification -- Grep the whole repo (and confirm with the golden agent / known consumers) for each removed symbol — zero references before deletion. -- After the test-scaffolding consolidation (items 6–7): the shared `_fakes` module / fixture is the only definition of `_FakeTracing`/`_FakeSpan`, and the determinism test exists once — grep confirms no per-file/per-harness copies remain. -- After the turn/sync consolidation (items 8–9): the five turn modules import the shared usage helper and the sync path follows one convention; harness conformance + integration suites stay green. -- After the tutorial retirement (item 13): exactly one tutorial agent per framework per tier remains, none import the item-1 deprecated symbols, and the tutorial CI/test job + any index/README links resolve (no references to deleted dirs). -- Full `./scripts/test` on Python 3.12 AND 3.13 (run the two versions separately or in shorter scoped batches — the dual-version `./scripts/test` in one shot has tripped a 600s no-output watchdog; prefer scoped runs or background with periodic output). -- `./scripts/lint` clean (whole-repo ruff + pyright). -- Changelog / release note documenting the removal of the deprecated public symbols. - -## Risk -Removing publicly-exported (deprecated) symbols is a breaking change — gate PR 10 on the version-bump policy and on confirming the golden agent + any external consumers are migrated. Everything here is recoverable from history; sequence it as the final, deliberate cleanup of the harness-surface workstream.