refactor(harness)!: post-merge cleanup + init templates for the unified harness surface (PR 10)#425
Open
declan-scale wants to merge 12 commits into
Open
refactor(harness)!: post-merge cleanup + init templates for the unified harness surface (PR 10)#425declan-scale wants to merge 12 commits into
declan-scale wants to merge 12 commits into
Conversation
…ified-harness planning doc PR 10 is stacked on PR 9 (#423). The plan sequences the 13 cleanup-scope items plus a filesystem layout/naming consolidation (every harness -> _<harness>_sync.py + _<harness>_turn.py under _modules/, openai moved out of providers/_modules/) and a harness.md docs refresh. The pre-unified unified-harness-surface plan doc is removed now that the stack is merged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, canonical yield_turn example) Completes the taps table with claude-code/codex/openai, names the per-harness HarnessTurn wrappers, and replaces the pre-unified sync example (which left UnifiedEmitter unused) with the canonical emitter.yield_turn(turn) flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rminism test once Adds tests/lib/core/harness/_fakes.py with a single superset FakeSpan/FakeTracing (started/ended/ended_spans plus started_names/started_pairs/ended_outputs views) and migrates every consumer off its local copy. Keeps the conformance determinism test once (parametrized over all_fixtures) and drops the per-harness copies. run_yield_turn and test_langgraph_sync_unified's _FakeTracingBackend left in place (genuinely divergent). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s onto the numbered paradigm Replaces the pre-unified langgraph/pydantic-ai/openai tutorials (which imported the deprecated create_*_tracing_handler) with their unified-surface harness_* counterparts, moved into the numbered NNN_<name> slots. codex takes fresh numbers (070/140/150). Fixes the 060/130/140 numbering collision between harness_openai and claude_code by folding openai into the old openai slots (050/120, renamed _openai_agents). Adds the shared .dockerignore to langgraph/codex tutorials. 090_claude_agents_sdk_mvp untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emplates to the unified surface BREAKING CHANGE: removes create_langgraph_tracing_handler / create_pydantic_ai_tracing_handler and their handler classes (AgentexLangGraphTracingHandler / AgentexPydanticAITracingHandler) from the public adk surface. Span tracing is now derived from the canonical stream by UnifiedEmitter. Migrates the five sync-/default-/temporal- pydantic-ai and langgraph CLI templates onto UnifiedEmitter + the per-harness Turn wrappers (mirroring the migrated tutorials), drops the now-dead tracing_handler parameter from the pydantic-ai sync/async/turn modules, deletes the deprecated-path tests, and trims the resolved AGX1-377/378 workaround markers to plain current-contract comments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_<harness>_turn.py
Folds the pydantic-ai/langgraph _async + _langgraph_messages helpers into their
turn/sync modules (stream_*_events -> _<h>_turn.py, emit_langgraph_messages ->
_langgraph_sync.py); public facade names are unchanged. Relocates the OpenAI harness
Turn + convert_openai_to_agentex_events tap into _modules/_openai_turn.py /
_modules/_openai_sync.py, leaving back-compat shims at providers/_modules/{openai_turn,
sync_provider}.py so the adk.providers namespace + CLI template keep working (the larger
openai.py Temporal/MCP provider stays under adk.providers). Merges the duplicate
_sync / _sync_unified test modules into one per harness.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… claude_code, codex
Brings integration-test parity across all five harnesses (was pydantic-ai + langgraph only):
9 new test_harness_<h>_{sync,async,temporal}.py suites built on the shared _fakes, with
native-stream shapes drawn from each harness's turn + conformance tests. Extends the
harness-integration.yml live-matrix to all five harnesses and generalizes the trigger glob.
Temporal suites assert the auto_send delivery + created_at threading (no harness has a
separate temporal stream helper), documented per file.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x stale tutorial refs Documents the deprecated-tracing-handler removal and the _modules consolidation / openai relocation (with back-compat shim window) under CHANGELOG Unreleased, and updates the sync_provider deprecation note to the renamed openai tutorial slots. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ile P1) The shared test_span_derivation_is_deterministic parametrized over all_fixtures() at COLLECTION time, which froze the set to the 5 generic fixtures registered before test_conformance.py was imported — silently dropping per-harness determinism coverage when the per-harness copies were removed in Batch A2. Make it iterate all_fixtures() at RUN time (after all modules are collected) with a guard asserting per-harness fixtures are present, and add a conformance/conftest.py that eagerly imports every per-harness module so coverage is independent of collection order / run scope. Also fixes the plan doc per greptile: make the openai tutorial mapping explicitly delete-and-replace the old slots, and correct the Batch I/H verification to expect the sync_provider/openai_turn shims (used by the sync-openai template + base sync tutorials) rather than zero references. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bdca8ad to
55f2cac
Compare
…the cleanup has landed Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nc openai-agents Adds scaffolding for the harnesses that lacked an init path: claude-code and codex across all three tiers (sync / default-async / temporal), plus the missing default-openai-agents (async-base) variant. Each template uses the unified harness surface (UnifiedEmitter + the harness *Turn), mirrors the migrated tutorials, and is wired into the TemplateType enum, project-file map, and the sync/async/temporal init menus. All 19 template types render to valid Python under test_init_templates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pyright can't statically resolve the absolute `tests.lib.core.harness._fakes` import (it only works at pytest runtime via the rootdir on sys.path), failing ./scripts/lint. Switch every consumer to a relative import (matching the conformance package's convention) and re-sort the affected import blocks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Post-merge cleanup and consolidation of the unified harness-surface stack (#412/#414–#421, #423), plus
agentex inittemplate parity for the harnesses that lacked it. Now that #423 is merged, this targetsnextdirectly.Non-breaking
tests/lib/core/harness/_fakes.py(FakeSpan/FakeTracing), removing ~9 duplicated copies; parametrized the conformance determinism test to one runtime-iterating guard that covers every harness fixture.harness.mdrefresh — all five taps, the canonicalUnifiedEmitter.yield_turnexample (fixes the unused-emitter snippet), no stale references.test_harness_{openai,claude_code,codex}_{sync,async,temporal}suites (+76 tests) and extended the live-matrix to all five harnesses.Breaking
create_langgraph_tracing_handler/create_pydantic_ai_tracing_handler(and their handler classes) are gone; span tracing is derived from the canonical stream byUnifiedEmitter. The five affectedagentex inittemplates were migrated to the unified surface._<harness>_sync.py+_<harness>_turn.pyunderadk/_modules/; the OpenAI harness Turn + tap moved into_moduleswith back-compat shims left atproviders/_modules/{openai_turn,sync_provider}.py(the larger Temporal/MCP provider stays underadk.providers). Public facade names are unchanged.NNN_<name>paradigm (fixing a 060/130/140 numbering collision); codex took fresh070/140/150slots.Feature
agentex inittemplates — claude-code and codex across all three tiers (sync / async / temporal), plus the missingdefault-openai-agents(async-base).TemplateType12 → 19; all render to valid Python undertest_init_templates.Docs
Test plan
uv run --all-packages --all-extras pytest tests/lib/core/harness/ tests/lib/adk/ tests/lib/cli/— green (584+ passed)./scripts/lint— ruff + pyright cleantest_init_templates.py— all 19 template types render to valid PythonNotes
all_fixtures()at runtime with a conformanceconftest.pyguaranteeing registration.agentex-agentsis migrated before release.🤖 Generated with Claude Code
Greptile Summary
Post-merge cleanup for the unified harness surface: removes deprecated
create_langgraph_tracing_handler/create_pydantic_ai_tracing_handler, consolidates filesystem layout into_<harness>_sync.py+_<harness>_turn.py, and extracts sharedFakeSpan/FakeTracingtest doubles. Addsagentex inittemplates for claude-code and codex across all three tiers, plus the missingdefault-openai-agentsasync template.test_span_derivation_is_deterministicmoved from import-time parametrization to a runtime loop overall_fixtures(), paired with a newconftest.pythat eagerly imports every per-harness conformance module; this fixes the silent fixture-set truncation noted in a prior review._langgraph_messages.pyabsorbed into_langgraph_sync.py;OpenAITurnmoved to_modules/_openai_turn.pywith a back-compat shim retained at the oldproviders/_modules/openai_turn.pypath;__init__.pyre-wired accordingly.default-openai-agents; the newdefault-openai-agentstemplate importsOpenAITurnfrom the back-compat shim path rather than from theagentex.lib.adkpackage, unlike every other Turn-based template in the same family.Confidence Score: 4/5
Safe to merge with one follow-up: the new default-openai-agents template imports OpenAITurn from the back-compat shim instead of the canonical adk package.
The conformance fix, FakeTracing consolidation, and filesystem cleanup are all correct. The only active defect is the new default-openai-agents template importing from the back-compat shim while ClaudeCodeTurn and CodexTurn are exported from agentex.lib.adk — users who scaffold with that template get an import path that diverges from every other Turn-based template in the family and is routed through the compatibility layer.
src/agentex/lib/cli/templates/default-openai-agents/project/acp.py.j2 and src/agentex/lib/adk/init.py — OpenAITurn should be added to the adk package exports to match ClaudeCodeTurn/CodexTurn.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD subgraph "adk/__init__.py (public surface)" A1[stream_langgraph_events] A2[emit_langgraph_messages] A3[ClaudeCodeTurn] A4[CodexTurn] A5["OpenAITurn ❌ not exported"] end subgraph "_modules/ (canonical)" B1[_langgraph_turn.py\nstream_langgraph_events] B2[_langgraph_sync.py\nemit_langgraph_messages] B3[_claude_code_turn.py\nClaudeCodeTurn] B4[_codex_turn.py\nCodexTurn] B5[_openai_turn.py\nOpenAITurn] end subgraph "providers/_modules/ (shims)" C1[openai_turn.py\n→ re-exports OpenAITurn] end subgraph "Templates" T1[default-claude-code\nfrom agentex.lib.adk import ClaudeCodeTurn ✓] T2[default-codex\nfrom agentex.lib.adk import CodexTurn ✓] T3[default-openai-agents\nfrom providers._modules.openai_turn import OpenAITurn ⚠️] end A1 --> B1 A2 --> B2 A3 --> B3 A4 --> B4 C1 --> B5 T1 --> A3 T2 --> A4 T3 --> C1%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD subgraph "adk/__init__.py (public surface)" A1[stream_langgraph_events] A2[emit_langgraph_messages] A3[ClaudeCodeTurn] A4[CodexTurn] A5["OpenAITurn ❌ not exported"] end subgraph "_modules/ (canonical)" B1[_langgraph_turn.py\nstream_langgraph_events] B2[_langgraph_sync.py\nemit_langgraph_messages] B3[_claude_code_turn.py\nClaudeCodeTurn] B4[_codex_turn.py\nCodexTurn] B5[_openai_turn.py\nOpenAITurn] end subgraph "providers/_modules/ (shims)" C1[openai_turn.py\n→ re-exports OpenAITurn] end subgraph "Templates" T1[default-claude-code\nfrom agentex.lib.adk import ClaudeCodeTurn ✓] T2[default-codex\nfrom agentex.lib.adk import CodexTurn ✓] T3[default-openai-agents\nfrom providers._modules.openai_turn import OpenAITurn ⚠️] end A1 --> B1 A2 --> B2 A3 --> B3 A4 --> B4 C1 --> B5 T1 --> A3 T2 --> A4 T3 --> C1Reviews (3): Last reviewed commit: "fix(tests): use relative imports for the..." | Re-trigger Greptile