feat(claude-code): stream-json parser tap for the unified harness surface#420
Merged
Conversation
2e820c7 to
37421b6
Compare
84de7ca to
8e65988
Compare
37421b6 to
df3461c
Compare
a3c635d to
da16e0e
Compare
ccbd5cf to
e3fa1cc
Compare
da16e0e to
ba51b5b
Compare
21e8269 to
2107343
Compare
danielmillerp
approved these changes
Jun 22, 2026
c8c63d1 to
05120f3
Compare
87079d0 to
e7c4c5e
Compare
05120f3 to
c9a907c
Compare
e7c4c5e to
5d8b352
Compare
c9a907c to
a04bf5e
Compare
Base automatically changed from
declan-scale/agx1-373-conformance-equivalence
to
next
June 22, 2026 20:09
…face (convert_claude_code_to_agentex_events + ClaudeCodeTurn) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ified surface via local CLI subprocess Three tutorial agents demonstrating ClaudeCodeTurn + UnifiedEmitter: - examples/tutorials/00_sync/060_claude_code/ (sync, yield_turn) - examples/tutorials/10_async/00_base/130_claude_code/ (async, auto_send_turn) - examples/tutorials/10_async/10_temporal/140_claude_code/ (Temporal, auto_send_turn + workflow.now()) Each spawns `claude -p --output-format stream-json --verbose` via asyncio.create_subprocess_exec. Offline unit tests (17 passing) inject a fake subprocess via the _spawn_claude seam. CI auto-discovers via the manifest.yaml find pattern in agentex-tutorials-test.yml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ESTS + add offline coverage (matches codex; prevents tutorials-CI failure with no claude CLI) Each of the three claude-code tutorial test_agent.py files previously ran live client.agents.send_message* calls unconditionally. The agentex-tutorials CI runner has no claude CLI and no ANTHROPIC_API_KEY, so pytest would fail and red the Test-Tutorial-Agents workflow. Fix mirrors the codex-agent pattern (pr8-codex): - Gate all live tests behind @pytest.mark.skipif(not CLAUDE_LIVE_TESTS) so they skip automatically in CI. - Add TestClaudeCodeOffline inline in each test_agent.py, driving ClaudeCodeTurn + UnifiedEmitter with a fake async iterator of recorded stream-json lines. Asserts events yielded, StreamTaskMessageDone present, usage populated, no CLI or API key needed. Result: pytest tests/test_agent.py exits 0 in the tutorials runner (offline tests PASS, live tests SKIP). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tile] _saw_text_stream / _saw_thinking_stream were turn-wide latches: once any block was streamed via stream_event deltas, every later assistant text/thinking block that was NOT streamed got skipped (model output silently dropped). Use only the precise per-block signal (idx in _streamed_block_indexes) and reset that set (and the thinking once-guard) at each materialised-message boundary, so an earlier turn's block index can't linger and drop a later turn's block at the same index. Drop the now-dead _saw_text_stream latch. Add a regression test for a non-streamed text block in a later turn. Also (P2) replace asyncio.run() at module import in the claude-code conformance test with a loop-free driver: the fixture conversion only iterates in-memory envelopes, so it never suspends on real I/O. asyncio.run() at import raises when a loop is already running (programmatic pytest, Jupyter, session-scoped loops); the manual driver is unaffected by ambient loop state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dlock [greptile] All three claude-code tutorial spawn helpers pass --verbose and capture stderr via asyncio.subprocess.PIPE but never read it. A non-trivial session can write enough verbose output to fill the OS pipe buffer, at which point Claude Code blocks on its stderr write while we block reading stdout — a hard deadlock. Drain stderr in a concurrent background task (awaited after proc.wait()) so stdout never stalls. Applied to 060_claude_code (sync), 130_claude_code (async base), and 140_claude_code (temporal). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ates [greptile] _saw_thinking_stream is set on a thinking block's first delta (to claim its index in _streamed_block_indexes) but was only reset at the materialised-message boundary. If one streaming turn produced a SECOND thinking block, the guard stayed True, the second block's index was never claimed, and the final assistant envelope re-emitted it — duplicate Start/Delta/Done. Reset the guard on content_block_stop so it is scoped per thinking block. Adds a regression test with two streamed thinking blocks in one turn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onment [greptile] The tutorial _spawn_claude async generators spawned a child process and a background stderr-drain task but had no try/finally, so if the consumer abandoned the generator mid-stream (task cancellation / client disconnect) both were leaked. Wrap the stdout loop in try/finally: cancel and await the stderr task, and terminate+reap the process if it is still running. Applied to all three tutorials (060 sync, 130 async-base, 140 temporal). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5d8b352 to
594681d
Compare
…mports The per-tutorial [tool.pytest.ini_options] block (only asyncio_mode="auto") made pytest treat the tutorial's own pyproject.toml as the rootdir config, so examples/tutorials/pytest.ini (pythonpath=.) was never discovered and `from test_utils...` failed with ModuleNotFoundError in the live test. asyncio_mode="auto" was redundant: every async test already uses an explicit @pytest.mark.asyncio marker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The async 130/140 live tests called client.tasks.create, which does not exist on TasksResource (AttributeError) — it was previously masked by the test_utils ModuleNotFoundError. Switch to client.agents.create_task(...).result, the same create-task + send_event + poll pattern the langgraph async tutorials use. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kflow code The temporal tutorial spawned the CLI subprocess directly in the workflow signal handler. Temporal runs signal-handler bodies on its deterministic sandbox event loop, which does not implement asyncio.create_subprocess_exec — so the worker crashed with NotImplementedError and no tool_request/response was ever produced. (Replay was never the issue; the sandboxed loop is.) Move the subprocess + ClaudeCodeTurn + UnifiedEmitter.auto_send_turn into a new run_claude_code_turn activity (project/activities.py), register it on the worker, and have the signal handler delegate via workflow.execute_activity. The activity returns final_text + session_id so multi-turn resume still works. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…session_id)
- TurnUsage.num_llm_calls is now Optional and defaults to None: it is
provider-reported (from claude-code's num_turns) and may be absent, so None
("not reported") is distinct from a real 0. claude_code_usage_to_turn_usage
returns None when num_turns is missing (real zeros still preserved). Counted
fields (num_tool_calls / num_reasoning_blocks) stay int=0.
- Removed a dead `full_text` assignment in the thinking content_block_stop path.
- Added a public ClaudeCodeTurn.session_id accessor (mirrors CodexTurn) so the
temporal activity no longer reaches into the private _result_envelope.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_StreamJsonProcessorto a pure SDK tap:convert_claude_code_to_agentex_events+ClaudeCodeTurn+ usage normalization (claude_code_usage_to_turn_usage)🤖 Generated with Claude Code
Greptile Summary
This PR ports the golden agent's
_StreamJsonProcessorto a reusable SDK tap —convert_claude_code_to_agentex_events+ClaudeCodeTurn— that converts Claude Code'sstream-jsonstdout into canonicalStreamTaskMessage*events consumable by the unified harness. Three tutorial agents (sync, async-base, async-Temporal) are added to demonstrate the tap wired to a real subprocess, along with a comprehensive unit-test suite and cross-channel conformance fixtures._claude_code_sync.py: stateful async generator that maps all envelope types (assistant/user content blocks, stream_event triples, result) to harness events; per-block dedup via_streamed_block_indexescorrectly handles multi-turn and multi-thinking-block streams._claude_code_turn.py:ClaudeCodeTurnwraps the generator as aHarnessTurn;claude_code_usage_to_turn_usagedefensively maps all known CLI usage key shapes and preserves real zeros vs. absent fields._spawn_claudeseam with concurrent stderr drain andtry/finallysubprocess/task cleanup, resolving the deadlock and leak concerns from the prior review pass.Confidence Score: 5/5
Safe to merge; all blocking issues from the previous review pass are addressed and no new defects were found.
The per-block streaming dedup, multi-thinking-block index claiming, stderr deadlock, and subprocess/task leak are all correctly resolved. The remaining observations are cleanup items that do not affect correctness or runtime behaviour.
No files require special attention.
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant CLI as claude CLI (stdout) participant Tap as convert_claude_code_to_agentex_events participant Turn as ClaudeCodeTurn participant Emitter as UnifiedEmitter CLI->>Tap: stream_event / content_block_start(text) Tap->>Emitter: StreamTaskMessageStart(TextContent) CLI->>Tap: stream_event / content_block_delta(text_delta) Tap->>Emitter: StreamTaskMessageDelta(TextDelta) CLI->>Tap: stream_event / content_block_stop Tap->>Emitter: StreamTaskMessageDone CLI->>Tap: assistant tool_use block Tap->>Emitter: StreamTaskMessageStart(ToolRequestContent) Tap->>Emitter: StreamTaskMessageDone CLI->>Tap: user tool_result block Tap->>Emitter: StreamTaskMessageFull(ToolResponseContent) CLI->>Tap: result envelope Tap->>Turn: on_result stores _result_envelope Turn->>Emitter: usage() returns TurnUsage after stream exhausted%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant CLI as claude CLI (stdout) participant Tap as convert_claude_code_to_agentex_events participant Turn as ClaudeCodeTurn participant Emitter as UnifiedEmitter CLI->>Tap: stream_event / content_block_start(text) Tap->>Emitter: StreamTaskMessageStart(TextContent) CLI->>Tap: stream_event / content_block_delta(text_delta) Tap->>Emitter: StreamTaskMessageDelta(TextDelta) CLI->>Tap: stream_event / content_block_stop Tap->>Emitter: StreamTaskMessageDone CLI->>Tap: assistant tool_use block Tap->>Emitter: StreamTaskMessageStart(ToolRequestContent) Tap->>Emitter: StreamTaskMessageDone CLI->>Tap: user tool_result block Tap->>Emitter: StreamTaskMessageFull(ToolResponseContent) CLI->>Tap: result envelope Tap->>Turn: on_result stores _result_envelope Turn->>Emitter: usage() returns TurnUsage after stream exhaustedReviews (14): Last reviewed commit: "fix(claude-code): address Greptile revie..." | Re-trigger Greptile