Skip to content

feat(claude-code): stream-json parser tap for the unified harness surface#420

Merged
declan-scale merged 11 commits into
nextfrom
declan-scale/pr7-claude-code
Jun 22, 2026
Merged

feat(claude-code): stream-json parser tap for the unified harness surface#420
declan-scale merged 11 commits into
nextfrom
declan-scale/pr7-claude-code

Conversation

@declan-scale

@declan-scale declan-scale commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Ports the golden agent's _StreamJsonProcessor to a pure SDK tap: convert_claude_code_to_agentex_events + ClaudeCodeTurn + usage normalization (claude_code_usage_to_turn_usage)
  • Cross-channel conformance fixtures (text-only, tool-call-result, thinking-block, multi-step) added to the shared conformance runner
  • Orchestration (sandbox/secret/MCP) stays in the golden agent; no deployable agent is added and no CI live-matrix row is needed

🤖 Generated with Claude Code

Greptile Summary

This PR ports the golden agent's _StreamJsonProcessor to a reusable SDK tap — convert_claude_code_to_agentex_events + ClaudeCodeTurn — that converts Claude Code's stream-json stdout into canonical StreamTaskMessage* events consumable by the unified harness. Three tutorial agents (sync, async-base, async-Temporal) are added to demonstrate the tap wired to a real subprocess, along with a comprehensive unit-test suite and cross-channel conformance fixtures.

  • _claude_code_sync.py: stateful async generator that maps all envelope types (assistant/user content blocks, stream_event triples, result) to harness events; per-block dedup via _streamed_block_indexes correctly handles multi-turn and multi-thinking-block streams.
  • _claude_code_turn.py: ClaudeCodeTurn wraps the generator as a HarnessTurn; claude_code_usage_to_turn_usage defensively maps all known CLI usage key shapes and preserves real zeros vs. absent fields.
  • Tutorial agents: all three use an injectable _spawn_claude seam with concurrent stderr drain and try/finally subprocess/task cleanup, resolving the deadlock and leak concerns from the prior review pass.

Confidence Score: 5/5

Safe to merge; all blocking issues from the previous review pass are addressed and no new defects were found.

The per-block streaming dedup, multi-thinking-block index claiming, stderr deadlock, and subprocess/task leak are all correctly resolved. The remaining observations are cleanup items that do not affect correctness or runtime behaviour.

No files require special attention.

Important Files Changed

Filename Overview
src/agentex/lib/adk/_modules/_claude_code_sync.py Core stream-json parser tap; previous issues (per-block dedup, thinking-stream reset, stderr deadlock) are all addressed. Two unused accumulation buffers (_text_buf, _thinking_buf) and the tool-response name gap remain as minor cleanup items.
src/agentex/lib/adk/_modules/_claude_code_turn.py ClaudeCodeTurn and usage normalization; defensive handling of all known cost/usage key shapes, correct lazy iterator caching, and clean protocol conformance assertion.
tests/lib/adk/test_claude_code_sync.py Comprehensive unit tests covering text, thinking, tool calls/results, multi-turn dedup, on_result callback, and index monotonicity.
tests/lib/core/harness/conformance/test_claude_code_conformance.py Cross-channel conformance fixtures; _run_pure_async correctly replaces asyncio.run() to avoid running-loop conflicts, but lacks explicit coroutine cleanup on unexpected exceptions.
examples/tutorials/00_sync/060_claude_code/project/acp.py Sync tutorial ACP handler; stderr deadlock and subprocess/task-leak issues from previous review are resolved with concurrent stderr drain and try/finally cleanup.
examples/tutorials/10_async/10_temporal/140_claude_code/project/activities.py Temporal activity spawning claude CLI; correctly passes session_id for multi-turn resume, identical subprocess cleanup pattern as sync/async tutorials.
examples/tutorials/10_async/10_temporal/140_claude_code/project/workflow.py Temporal workflow persisting session_id across turns for multi-turn Claude Code sessions; correctly delegates subprocess I/O to an activity.
src/agentex/lib/core/harness/types.py Minor addition of HarnessTurn protocol and TurnUsage/TurnResult models; clean and backwards-compatible.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CLI as claude CLI (stdout)
    participant Tap as convert_claude_code_to_agentex_events
    participant Turn as ClaudeCodeTurn
    participant Emitter as UnifiedEmitter

    CLI->>Tap: stream_event / content_block_start(text)
    Tap->>Emitter: StreamTaskMessageStart(TextContent)
    CLI->>Tap: stream_event / content_block_delta(text_delta)
    Tap->>Emitter: StreamTaskMessageDelta(TextDelta)
    CLI->>Tap: stream_event / content_block_stop
    Tap->>Emitter: StreamTaskMessageDone

    CLI->>Tap: assistant tool_use block
    Tap->>Emitter: StreamTaskMessageStart(ToolRequestContent)
    Tap->>Emitter: StreamTaskMessageDone

    CLI->>Tap: user tool_result block
    Tap->>Emitter: StreamTaskMessageFull(ToolResponseContent)

    CLI->>Tap: result envelope
    Tap->>Turn: on_result stores _result_envelope
    Turn->>Emitter: usage() returns TurnUsage after stream exhausted
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CLI as claude CLI (stdout)
    participant Tap as convert_claude_code_to_agentex_events
    participant Turn as ClaudeCodeTurn
    participant Emitter as UnifiedEmitter

    CLI->>Tap: stream_event / content_block_start(text)
    Tap->>Emitter: StreamTaskMessageStart(TextContent)
    CLI->>Tap: stream_event / content_block_delta(text_delta)
    Tap->>Emitter: StreamTaskMessageDelta(TextDelta)
    CLI->>Tap: stream_event / content_block_stop
    Tap->>Emitter: StreamTaskMessageDone

    CLI->>Tap: assistant tool_use block
    Tap->>Emitter: StreamTaskMessageStart(ToolRequestContent)
    Tap->>Emitter: StreamTaskMessageDone

    CLI->>Tap: user tool_result block
    Tap->>Emitter: StreamTaskMessageFull(ToolResponseContent)

    CLI->>Tap: result envelope
    Tap->>Turn: on_result stores _result_envelope
    Turn->>Emitter: usage() returns TurnUsage after stream exhausted
Loading

Reviews (14): Last reviewed commit: "fix(claude-code): address Greptile revie..." | Re-trigger Greptile

@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 2e820c7 to 37421b6 Compare June 22, 2026 13:48
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from 84de7ca to 8e65988 Compare June 22, 2026 13:48
Comment thread src/agentex/lib/adk/_modules/_claude_code_sync.py
Comment thread tests/lib/core/harness/conformance/test_claude_code_conformance.py Outdated
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 37421b6 to df3461c Compare June 22, 2026 14:13
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch 2 times, most recently from a3c635d to da16e0e Compare June 22, 2026 14:37
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from ccbd5cf to e3fa1cc Compare June 22, 2026 15:14
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from da16e0e to ba51b5b Compare June 22, 2026 15:17
Comment thread examples/tutorials/00_sync/060_claude_code/project/acp.py Outdated
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from 21e8269 to 2107343 Compare June 22, 2026 15:57
Comment thread src/agentex/lib/adk/_modules/_claude_code_sync.py
Comment thread examples/tutorials/00_sync/060_claude_code/project/acp.py Outdated
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c8c63d1 to 05120f3 Compare June 22, 2026 18:47
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from 87079d0 to e7c4c5e Compare June 22, 2026 18:47
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 05120f3 to c9a907c Compare June 22, 2026 19:54
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from e7c4c5e to 5d8b352 Compare June 22, 2026 19:54
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c9a907c to a04bf5e Compare June 22, 2026 20:01
Base automatically changed from declan-scale/agx1-373-conformance-equivalence to next June 22, 2026 20:09
declan-scale and others added 7 commits June 22, 2026 16:10
…face (convert_claude_code_to_agentex_events + ClaudeCodeTurn)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ified surface via local CLI subprocess

Three tutorial agents demonstrating ClaudeCodeTurn + UnifiedEmitter:
- examples/tutorials/00_sync/060_claude_code/ (sync, yield_turn)
- examples/tutorials/10_async/00_base/130_claude_code/ (async, auto_send_turn)
- examples/tutorials/10_async/10_temporal/140_claude_code/ (Temporal, auto_send_turn + workflow.now())

Each spawns `claude -p --output-format stream-json --verbose` via asyncio.create_subprocess_exec.
Offline unit tests (17 passing) inject a fake subprocess via the _spawn_claude seam.
CI auto-discovers via the manifest.yaml find pattern in agentex-tutorials-test.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ESTS + add offline coverage (matches codex; prevents tutorials-CI failure with no claude CLI)

Each of the three claude-code tutorial test_agent.py files previously ran
live client.agents.send_message* calls unconditionally. The agentex-tutorials
CI runner has no claude CLI and no ANTHROPIC_API_KEY, so pytest would fail
and red the Test-Tutorial-Agents workflow.

Fix mirrors the codex-agent pattern (pr8-codex):
- Gate all live tests behind @pytest.mark.skipif(not CLAUDE_LIVE_TESTS)
  so they skip automatically in CI.
- Add TestClaudeCodeOffline inline in each test_agent.py, driving
  ClaudeCodeTurn + UnifiedEmitter with a fake async iterator of recorded
  stream-json lines. Asserts events yielded, StreamTaskMessageDone present,
  usage populated, no CLI or API key needed.

Result: pytest tests/test_agent.py exits 0 in the tutorials runner
(offline tests PASS, live tests SKIP).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tile]

_saw_text_stream / _saw_thinking_stream were turn-wide latches: once any block
was streamed via stream_event deltas, every later assistant text/thinking block
that was NOT streamed got skipped (model output silently dropped). Use only the
precise per-block signal (idx in _streamed_block_indexes) and reset that set
(and the thinking once-guard) at each materialised-message boundary, so an
earlier turn's block index can't linger and drop a later turn's block at the
same index. Drop the now-dead _saw_text_stream latch. Add a regression test for
a non-streamed text block in a later turn.

Also (P2) replace asyncio.run() at module import in the claude-code conformance
test with a loop-free driver: the fixture conversion only iterates in-memory
envelopes, so it never suspends on real I/O. asyncio.run() at import raises when
a loop is already running (programmatic pytest, Jupyter, session-scoped loops);
the manual driver is unaffected by ambient loop state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dlock [greptile]

All three claude-code tutorial spawn helpers pass --verbose and capture stderr
via asyncio.subprocess.PIPE but never read it. A non-trivial session can write
enough verbose output to fill the OS pipe buffer, at which point Claude Code
blocks on its stderr write while we block reading stdout — a hard deadlock.
Drain stderr in a concurrent background task (awaited after proc.wait()) so
stdout never stalls. Applied to 060_claude_code (sync), 130_claude_code
(async base), and 140_claude_code (temporal).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ates [greptile]

_saw_thinking_stream is set on a thinking block's first delta (to claim its
index in _streamed_block_indexes) but was only reset at the materialised-message
boundary. If one streaming turn produced a SECOND thinking block, the guard
stayed True, the second block's index was never claimed, and the final assistant
envelope re-emitted it — duplicate Start/Delta/Done. Reset the guard on
content_block_stop so it is scoped per thinking block. Adds a regression test
with two streamed thinking blocks in one turn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onment [greptile]

The tutorial _spawn_claude async generators spawned a child process and a
background stderr-drain task but had no try/finally, so if the consumer
abandoned the generator mid-stream (task cancellation / client disconnect) both
were leaked. Wrap the stdout loop in try/finally: cancel and await the stderr
task, and terminate+reap the process if it is still running. Applied to all
three tutorials (060 sync, 130 async-base, 140 temporal).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale force-pushed the declan-scale/pr7-claude-code branch from 5d8b352 to 594681d Compare June 22, 2026 20:11
declan-scale and others added 3 commits June 22, 2026 17:00
…mports

The per-tutorial [tool.pytest.ini_options] block (only asyncio_mode="auto")
made pytest treat the tutorial's own pyproject.toml as the rootdir config, so
examples/tutorials/pytest.ini (pythonpath=.) was never discovered and
`from test_utils...` failed with ModuleNotFoundError in the live test.
asyncio_mode="auto" was redundant: every async test already uses an explicit
@pytest.mark.asyncio marker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The async 130/140 live tests called client.tasks.create, which does not exist
on TasksResource (AttributeError) — it was previously masked by the test_utils
ModuleNotFoundError. Switch to client.agents.create_task(...).result, the same
create-task + send_event + poll pattern the langgraph async tutorials use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kflow code

The temporal tutorial spawned the CLI subprocess directly in the workflow
signal handler. Temporal runs signal-handler bodies on its deterministic
sandbox event loop, which does not implement asyncio.create_subprocess_exec —
so the worker crashed with NotImplementedError and no tool_request/response was
ever produced. (Replay was never the issue; the sandboxed loop is.)

Move the subprocess + ClaudeCodeTurn + UnifiedEmitter.auto_send_turn into a new
run_claude_code_turn activity (project/activities.py), register it on the
worker, and have the signal handler delegate via workflow.execute_activity. The
activity returns final_text + session_id so multi-turn resume still works.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…session_id)

- TurnUsage.num_llm_calls is now Optional and defaults to None: it is
  provider-reported (from claude-code's num_turns) and may be absent, so None
  ("not reported") is distinct from a real 0. claude_code_usage_to_turn_usage
  returns None when num_turns is missing (real zeros still preserved). Counted
  fields (num_tool_calls / num_reasoning_blocks) stay int=0.
- Removed a dead `full_text` assignment in the thinking content_block_stop path.
- Added a public ClaudeCodeTurn.session_id accessor (mirrors CodexTurn) so the
  temporal activity no longer reaches into the private _result_envelope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale merged commit 904339c into next Jun 22, 2026
53 checks passed
@declan-scale declan-scale deleted the declan-scale/pr7-claude-code branch June 22, 2026 22:21
@stainless-app stainless-app Bot mentioned this pull request Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants