Skip to content

refactor(harness)!: post-merge cleanup + init templates for the unified harness surface (PR 10)#425

Open
declan-scale wants to merge 12 commits into
nextfrom
declan-scale/pr10-harness-cleanup
Open

refactor(harness)!: post-merge cleanup + init templates for the unified harness surface (PR 10)#425
declan-scale wants to merge 12 commits into
nextfrom
declan-scale/pr10-harness-cleanup

Conversation

@declan-scale

@declan-scale declan-scale commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Post-merge cleanup and consolidation of the unified harness-surface stack (#412/#414#421, #423), plus agentex init template parity for the harnesses that lacked it. Now that #423 is merged, this targets next directly.

Non-breaking

  • Shared test fakes — extracted tests/lib/core/harness/_fakes.py (FakeSpan/FakeTracing), removing ~9 duplicated copies; parametrized the conformance determinism test to one runtime-iterating guard that covers every harness fixture.
  • harness.md refresh — all five taps, the canonical UnifiedEmitter.yield_turn example (fixes the unused-emitter snippet), no stale references.
  • Integration-test parity — added test_harness_{openai,claude_code,codex}_{sync,async,temporal} suites (+76 tests) and extended the live-matrix to all five harnesses.

Breaking

  • Removed deprecated tracing handlerscreate_langgraph_tracing_handler / create_pydantic_ai_tracing_handler (and their handler classes) are gone; span tracing is derived from the canonical stream by UnifiedEmitter. The five affected agentex init templates were migrated to the unified surface.
  • Filesystem consolidation — every harness is now exactly _<harness>_sync.py + _<harness>_turn.py under adk/_modules/; the OpenAI harness Turn + tap moved into _modules with back-compat shims left at providers/_modules/{openai_turn,sync_provider}.py (the larger Temporal/MCP provider stays under adk.providers). Public facade names are unchanged.
  • Retired the duplicate pre-unified tutorials onto the numbered NNN_<name> paradigm (fixing a 060/130/140 numbering collision); codex took fresh 070/140/150 slots.

Feature

  • New agentex init templates — claude-code and codex across all three tiers (sync / async / temporal), plus the missing default-openai-agents (async-base). TemplateType 12 → 19; all render to valid Python under test_init_templates.

Docs

  • CHANGELOG breaking-changes note; removed the now-landed PR 10 plan doc and the stale unified-harness planning doc.

Test plan

  • uv run --all-packages --all-extras pytest tests/lib/core/harness/ tests/lib/adk/ tests/lib/cli/ — green (584+ passed)
  • ./scripts/lint — ruff + pyright clean
  • test_init_templates.py — all 19 template types render to valid Python

Notes

  • Greptile P1 fixed: the determinism test had silently dropped per-harness coverage (import-time parametrization); now iterates all_fixtures() at runtime with a conformance conftest.py guaranteeing registration.
  • The breaking removals are gated in-repo (zero references confirmed); confirm the golden agent in agentex-agents is migrated before release.

🤖 Generated with Claude Code

Greptile Summary

Post-merge cleanup for the unified harness surface: removes deprecated create_langgraph_tracing_handler / create_pydantic_ai_tracing_handler, consolidates filesystem layout into _<harness>_sync.py + _<harness>_turn.py, and extracts shared FakeSpan/FakeTracing test doubles. Adds agentex init templates for claude-code and codex across all three tiers, plus the missing default-openai-agents async template.

  • Conformance test hardeningtest_span_derivation_is_deterministic moved from import-time parametrization to a runtime loop over all_fixtures(), paired with a new conftest.py that eagerly imports every per-harness conformance module; this fixes the silent fixture-set truncation noted in a prior review.
  • Filesystem consolidation_langgraph_messages.py absorbed into _langgraph_sync.py; OpenAITurn moved to _modules/_openai_turn.py with a back-compat shim retained at the old providers/_modules/openai_turn.py path; __init__.py re-wired accordingly.
  • New init templates — six claude-code and codex templates (sync/default/temporal) plus default-openai-agents; the new default-openai-agents template imports OpenAITurn from the back-compat shim path rather than from the agentex.lib.adk package, unlike every other Turn-based template in the same family.

Confidence Score: 4/5

Safe to merge with one follow-up: the new default-openai-agents template imports OpenAITurn from the back-compat shim instead of the canonical adk package.

The conformance fix, FakeTracing consolidation, and filesystem cleanup are all correct. The only active defect is the new default-openai-agents template importing from the back-compat shim while ClaudeCodeTurn and CodexTurn are exported from agentex.lib.adk — users who scaffold with that template get an import path that diverges from every other Turn-based template in the family and is routed through the compatibility layer.

src/agentex/lib/cli/templates/default-openai-agents/project/acp.py.j2 and src/agentex/lib/adk/init.py — OpenAITurn should be added to the adk package exports to match ClaudeCodeTurn/CodexTurn.

Important Files Changed

Filename Overview
src/agentex/lib/cli/commands/init.py Adds 7 new TemplateType entries (claude-code/codex/openai-agents across sync/async/temporal tiers); project_files map and CLI prompts are updated consistently.
src/agentex/lib/cli/templates/default-openai-agents/project/acp.py.j2 New async OpenAI Agents template; imports OpenAITurn from the back-compat shim path instead of from agentex.lib.adk like the other new Turn templates do.
tests/lib/core/harness/conformance/test_conformance.py test_span_derivation_is_deterministic converted from parametrize-at-import to runtime iteration over all_fixtures(), fixing silent fixture-set truncation; guard assertion ensures per-harness fixtures are registered.
tests/lib/core/harness/conformance/conftest.py New conftest eagerly imports all per-harness conformance modules so all_fixtures() is fully populated regardless of pytest collection order.
tests/lib/core/harness/_fakes.py New shared FakeSpan/FakeTracing doubles replacing ~9 duplicated in-file fakes; richest shape with convenience properties (started_names, started_pairs, ended_outputs) for backward-compatible assertions.
src/agentex/lib/adk/init.py Removes deprecated create_langgraph_tracing_handler and create_pydantic_ai_tracing_handler; re-routes stream_langgraph_events from _langgraph_turn and emit_langgraph_messages from _langgraph_sync. OpenAITurn is not added to all, leaving it without a stable adk-level export.
src/agentex/lib/adk/providers/_modules/openai_turn.py Replaced implementation with a one-line back-compat re-export shim; existing importers continue to work.
src/agentex/lib/adk/_modules/_langgraph_sync.py emit_langgraph_messages absorbed from the deleted _langgraph_messages.py; implementation is identical, only consolidated into one file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph "adk/__init__.py (public surface)"
        A1[stream_langgraph_events]
        A2[emit_langgraph_messages]
        A3[ClaudeCodeTurn]
        A4[CodexTurn]
        A5["OpenAITurn ❌ not exported"]
    end

    subgraph "_modules/ (canonical)"
        B1[_langgraph_turn.py\nstream_langgraph_events]
        B2[_langgraph_sync.py\nemit_langgraph_messages]
        B3[_claude_code_turn.py\nClaudeCodeTurn]
        B4[_codex_turn.py\nCodexTurn]
        B5[_openai_turn.py\nOpenAITurn]
    end

    subgraph "providers/_modules/ (shims)"
        C1[openai_turn.py\n→ re-exports OpenAITurn]
    end

    subgraph "Templates"
        T1[default-claude-code\nfrom agentex.lib.adk import ClaudeCodeTurn ✓]
        T2[default-codex\nfrom agentex.lib.adk import CodexTurn ✓]
        T3[default-openai-agents\nfrom providers._modules.openai_turn import OpenAITurn ⚠️]
    end

    A1 --> B1
    A2 --> B2
    A3 --> B3
    A4 --> B4
    C1 --> B5
    T1 --> A3
    T2 --> A4
    T3 --> C1
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    subgraph "adk/__init__.py (public surface)"
        A1[stream_langgraph_events]
        A2[emit_langgraph_messages]
        A3[ClaudeCodeTurn]
        A4[CodexTurn]
        A5["OpenAITurn ❌ not exported"]
    end

    subgraph "_modules/ (canonical)"
        B1[_langgraph_turn.py\nstream_langgraph_events]
        B2[_langgraph_sync.py\nemit_langgraph_messages]
        B3[_claude_code_turn.py\nClaudeCodeTurn]
        B4[_codex_turn.py\nCodexTurn]
        B5[_openai_turn.py\nOpenAITurn]
    end

    subgraph "providers/_modules/ (shims)"
        C1[openai_turn.py\n→ re-exports OpenAITurn]
    end

    subgraph "Templates"
        T1[default-claude-code\nfrom agentex.lib.adk import ClaudeCodeTurn ✓]
        T2[default-codex\nfrom agentex.lib.adk import CodexTurn ✓]
        T3[default-openai-agents\nfrom providers._modules.openai_turn import OpenAITurn ⚠️]
    end

    A1 --> B1
    A2 --> B2
    A3 --> B3
    A4 --> B4
    C1 --> B5
    T1 --> A3
    T2 --> A4
    T3 --> C1
Loading

Reviews (3): Last reviewed commit: "fix(tests): use relative imports for the..." | Re-trigger Greptile

Comment thread docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md Outdated
Comment thread docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md Outdated
Comment thread docs/superpowers/plans/2026-06-22-pr10-harness-cleanup.md Outdated
Base automatically changed from declan-scale/pr9-harness-cleanup to next June 23, 2026 00:10
declan-scale and others added 9 commits June 22, 2026 20:11
…ified-harness planning doc

PR 10 is stacked on PR 9 (#423). The plan sequences the 13 cleanup-scope items
plus a filesystem layout/naming consolidation (every harness -> _<harness>_sync.py
+ _<harness>_turn.py under _modules/, openai moved out of providers/_modules/)
and a harness.md docs refresh. The pre-unified unified-harness-surface plan doc is
removed now that the stack is merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, canonical yield_turn example)

Completes the taps table with claude-code/codex/openai, names the per-harness
HarnessTurn wrappers, and replaces the pre-unified sync example (which left
UnifiedEmitter unused) with the canonical emitter.yield_turn(turn) flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rminism test once

Adds tests/lib/core/harness/_fakes.py with a single superset FakeSpan/FakeTracing
(started/ended/ended_spans plus started_names/started_pairs/ended_outputs views) and
migrates every consumer off its local copy. Keeps the conformance determinism test
once (parametrized over all_fixtures) and drops the per-harness copies. run_yield_turn
and test_langgraph_sync_unified's _FakeTracingBackend left in place (genuinely divergent).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s onto the numbered paradigm

Replaces the pre-unified langgraph/pydantic-ai/openai tutorials (which imported the
deprecated create_*_tracing_handler) with their unified-surface harness_* counterparts,
moved into the numbered NNN_<name> slots. codex takes fresh numbers (070/140/150).
Fixes the 060/130/140 numbering collision between harness_openai and claude_code by
folding openai into the old openai slots (050/120, renamed _openai_agents). Adds the
shared .dockerignore to langgraph/codex tutorials. 090_claude_agents_sdk_mvp untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emplates to the unified surface

BREAKING CHANGE: removes create_langgraph_tracing_handler / create_pydantic_ai_tracing_handler
and their handler classes (AgentexLangGraphTracingHandler / AgentexPydanticAITracingHandler) from
the public adk surface. Span tracing is now derived from the canonical stream by UnifiedEmitter.

Migrates the five sync-/default-/temporal- pydantic-ai and langgraph CLI templates onto
UnifiedEmitter + the per-harness Turn wrappers (mirroring the migrated tutorials), drops the now-dead
tracing_handler parameter from the pydantic-ai sync/async/turn modules, deletes the deprecated-path
tests, and trims the resolved AGX1-377/378 workaround markers to plain current-contract comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_<harness>_turn.py

Folds the pydantic-ai/langgraph _async + _langgraph_messages helpers into their
turn/sync modules (stream_*_events -> _<h>_turn.py, emit_langgraph_messages ->
_langgraph_sync.py); public facade names are unchanged. Relocates the OpenAI harness
Turn + convert_openai_to_agentex_events tap into _modules/_openai_turn.py /
_modules/_openai_sync.py, leaving back-compat shims at providers/_modules/{openai_turn,
sync_provider}.py so the adk.providers namespace + CLI template keep working (the larger
openai.py Temporal/MCP provider stays under adk.providers). Merges the duplicate
_sync / _sync_unified test modules into one per harness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… claude_code, codex

Brings integration-test parity across all five harnesses (was pydantic-ai + langgraph only):
9 new test_harness_<h>_{sync,async,temporal}.py suites built on the shared _fakes, with
native-stream shapes drawn from each harness's turn + conformance tests. Extends the
harness-integration.yml live-matrix to all five harnesses and generalizes the trigger glob.
Temporal suites assert the auto_send delivery + created_at threading (no harness has a
separate temporal stream helper), documented per file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x stale tutorial refs

Documents the deprecated-tracing-handler removal and the _modules consolidation /
openai relocation (with back-compat shim window) under CHANGELOG Unreleased, and
updates the sync_provider deprecation note to the renamed openai tutorial slots.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ile P1)

The shared test_span_derivation_is_deterministic parametrized over all_fixtures()
at COLLECTION time, which froze the set to the 5 generic fixtures registered before
test_conformance.py was imported — silently dropping per-harness determinism coverage
when the per-harness copies were removed in Batch A2. Make it iterate all_fixtures()
at RUN time (after all modules are collected) with a guard asserting per-harness
fixtures are present, and add a conformance/conftest.py that eagerly imports every
per-harness module so coverage is independent of collection order / run scope.

Also fixes the plan doc per greptile: make the openai tutorial mapping explicitly
delete-and-replace the old slots, and correct the Batch I/H verification to expect the
sync_provider/openai_turn shims (used by the sync-openai template + base sync tutorials)
rather than zero references.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale force-pushed the declan-scale/pr10-harness-cleanup branch from bdca8ad to 55f2cac Compare June 23, 2026 00:17
declan-scale and others added 3 commits June 22, 2026 20:19
…the cleanup has landed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nc openai-agents

Adds scaffolding for the harnesses that lacked an init path: claude-code and codex
across all three tiers (sync / default-async / temporal), plus the missing
default-openai-agents (async-base) variant. Each template uses the unified harness
surface (UnifiedEmitter + the harness *Turn), mirrors the migrated tutorials, and is
wired into the TemplateType enum, project-file map, and the sync/async/temporal init
menus. All 19 template types render to valid Python under test_init_templates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pyright can't statically resolve the absolute `tests.lib.core.harness._fakes` import
(it only works at pytest runtime via the rootdir on sys.path), failing ./scripts/lint.
Switch every consumer to a relative import (matching the conformance package's
convention) and re-sort the affected import blocks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale changed the title refactor(harness): post-merge cleanup of the unified harness surface (PR 10) refactor(harness)!: post-merge cleanup + init templates for the unified harness surface (PR 10) Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant