Skip to content

feat(langgraph): migrate LangGraph harness onto unified surface#417

Merged
declan-scale merged 15 commits into
nextfrom
declan-scale/pr5-langgraph
Jun 22, 2026
Merged

feat(langgraph): migrate LangGraph harness onto unified surface#417
declan-scale merged 15 commits into
nextfrom
declan-scale/pr5-langgraph

Conversation

@declan-scale

@declan-scale declan-scale commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrates the LangGraph harness onto the unified harness surface introduced in PR 4 (pydantic-ai). Implements 12 tasks covering the new LangGraphTurn adapter, bespoke helper rewrites, offline integration tests, conformance fixtures, tutorial agents, and CI matrix.

New surface:

turn = LangGraphTurn(stream, model=model_name)
# Sync HTTP ACP
async for event in emitter.yield_turn(turn):
    yield event
# Async / temporal
result = await emitter.auto_send_turn(turn)

Key implementation points:

  • LangGraphTurn wraps LangGraph astream() and implements HarnessTurn (tasks 1-2)
  • stream_langgraph_events reimplemented on UnifiedEmitter (task 4)
  • _langgraph_tracing.py create_langgraph_tracing_handler marked deprecated with warnings.warn(DeprecationWarning) (task 3)
  • AGX1-377 documented: LangGraph emits tool requests as StreamTaskMessageFull (not Start+Delta+Done); SpanDeriver does not produce tool spans from Full events today (tracked in AGX1-373)
  • Usage timing: LangGraphTurn.usage() is populated via on_final_ai_message callback during event iteration; TurnResult.usage is a pre-iteration snapshot — callers should read turn.usage() after auto_send_turn returns
  • Added AsyncGenerator return type annotation to convert_langgraph_to_agentex_events and _generate_events to fix pyright inference (was treating them as coroutines)

Tests added (tasks 5-8, 219 passing):

  • test_langgraph_sync.py: 11 unit tests for convert_langgraph_to_agentex_events + deprecation
  • test_langgraph_turn.py: 19 unit tests for LangGraphTurn + langgraph_usage_to_turn_usage
  • test_langgraph_async.py: 6 characterization tests for the unified stream_langgraph_events
  • test_langgraph_sync_unified.py: 6 passthrough + span derivation tests
  • test_langgraph_conformance.py: 4 conformance fixtures (text-only, single-tool, reasoning, multi-step)
  • test_harness_langgraph_sync.py: 6 offline integration tests (yield channel)
  • test_harness_langgraph_async.py: 7 offline integration tests (auto_send channel)
  • test_harness_langgraph_temporal.py: 5 offline integration tests (temporal channel)

Tutorial agents (task 9):

  • examples/tutorials/00_sync/harness_langgraph/ (s-harness-langgraph) — sync, yield_turn
  • examples/tutorials/10_async/00_base/harness_langgraph/ (a-harness-langgraph) — async, auto_send_turn
  • examples/tutorials/10_async/10_temporal/harness_langgraph/ (at-harness-langgraph) — temporal, LangGraphPlugin + emit_langgraph_messages

CI (task 10): Enabled live-matrix job in harness-integration.yml with 3-way matrix over [sync, async, temporal] running offline LangGraph integration tests.

Test plan

  • uv run --all-packages --all-extras pytest tests/lib/core/harness/ tests/lib/adk/ -v — 219 passed
  • ./scripts/lint — 0 errors, 0 warnings (ruff + pyright)
  • Live agent smoke test (requires running AgentEx server + LLM keys)

🤖 Generated with Claude Code

Greptile Summary

Migrates the LangGraph harness onto the unified LangGraphTurn + UnifiedEmitter surface introduced for pydantic-ai, eliminating ~165 lines of bespoke Redis-streaming code and aligning all three channels (sync HTTP yield, async Redis, Temporal) behind the same adapter pattern.

  • LangGraphTurn (new): wraps graph.astream() and accumulates token usage additively across multi-step LLM calls via _accumulate_turn_usage, fixing the silent per-call overwrite from the old bespoke path.
  • _langgraph_sync.py: adds on_final_ai_message callback for lazy usage capture and fixes the reasoning-block StreamTaskMessageStart content type from TextContent to ReasoningContent (matching the conformance fixture and downstream schema expectations).
  • CI matrix extended to [pydantic_ai, langgraph] × [sync, async, temporal] with 219 new offline tests covering unit, integration, and conformance layers.

Confidence Score: 5/5

Safe to merge — core logic is correct, all three channels work, and the one reasoning-content-type fix is guarded by a new assertion in the test suite.

The implementation is well-structured: bespoke Redis code is cleanly replaced, multi-step usage now accumulates additively instead of overwriting, and the reasoning-block content-type fix is confirmed by a targeted test. The only finding is a stale docstring claiming SpanDeriver does not handle Full tool events, while the tests in this same PR assert it does — a documentation inconsistency that does not affect runtime behavior.

The module docstring in tests/lib/core/harness/test_harness_langgraph_async.py and both acp.py tutorial files carry a stale claim about SpanDeriver not producing tool spans from Full events; those comments should be updated before they mislead future contributors.

Important Files Changed

Filename Overview
src/agentex/lib/adk/_modules/_langgraph_turn.py New LangGraphTurn HarnessTurn adapter — correctly accumulates usage across multi-step LLM calls via _accumulate_turn_usage; the known eager-snapshot limitation for TurnResult.usage is documented.
src/agentex/lib/adk/_modules/_langgraph_sync.py Fixes reasoning-block Start content type (now ReasoningContent instead of TextContent); adds on_final_ai_message callback for usage capture; overall event sequencing is correct.
src/agentex/lib/adk/_modules/_langgraph_async.py Stripped 165 lines of bespoke Redis-streaming code and replaced with a two-line LangGraphTurn + UnifiedEmitter.auto_send_turn delegation; public signature preserved.
src/agentex/lib/adk/_modules/_langgraph_tracing.py Marked deprecated via docstrings only (no runtime warnings.warn); test confirms no DeprecationWarning is emitted, keeping callers safe under -W error.
tests/lib/core/harness/test_harness_langgraph_async.py Good coverage of the async auto_send channel; module docstring incorrectly states that tool spans are not produced, contradicting the passing test assertions.
tests/lib/core/harness/test_harness_langgraph_sync.py Thorough coverage of sync yield channel — text, tool calls, multi-step, tracing, and usage capture all tested.
tests/lib/core/harness/conformance/test_langgraph_conformance.py Four canonical fixtures (text-only, single-tool, reasoning, multi-step) registered with the cross-channel runner; span-derivation determinism guard also added.
.github/workflows/harness-integration.yml Extends CI matrix to [pydantic_ai, langgraph] x [sync, async, temporal]; path triggers and job names updated consistently.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent as Agent ACP Handler
    participant LGT as LangGraphTurn
    participant Conv as convert_langgraph_to_agentex_events
    participant UE as UnifiedEmitter
    participant SD as SpanDeriver
    participant Backend as Streaming Backend (Redis / HTTP)

    Agent->>LGT: LangGraphTurn(graph.astream(), model)
    Agent->>UE: yield_turn(turn) OR auto_send_turn(turn)

    UE->>LGT: iterate turn.events
    LGT->>Conv: "convert_langgraph_to_agentex_events(stream, on_final_ai_message=_capture)"

    loop LangGraph messages events
        Conv-->>LGT: StreamTaskMessageStart / Delta / Done (text or reasoning)
        LGT-->>UE: event
        UE->>SD: derive span signal
        UE-->>Backend: stream text delta
    end

    loop LangGraph updates events
        Conv-->>LGT: StreamTaskMessageFull(ToolRequestContent)
        LGT-->>UE: Full event
        UE->>SD: open tool span
        UE-->>Backend: Full message
        Conv->>Conv: on_final_ai_message(_capture) accumulate TurnUsage
        Conv-->>LGT: StreamTaskMessageFull(ToolResponseContent)
        LGT-->>UE: Full event
        UE->>SD: close tool span
        UE-->>Backend: Full message
    end

    UE-->>Agent: TurnResult(final_text) / yield events
    Agent->>LGT: turn.usage() accumulated TurnUsage
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent as Agent ACP Handler
    participant LGT as LangGraphTurn
    participant Conv as convert_langgraph_to_agentex_events
    participant UE as UnifiedEmitter
    participant SD as SpanDeriver
    participant Backend as Streaming Backend (Redis / HTTP)

    Agent->>LGT: LangGraphTurn(graph.astream(), model)
    Agent->>UE: yield_turn(turn) OR auto_send_turn(turn)

    UE->>LGT: iterate turn.events
    LGT->>Conv: "convert_langgraph_to_agentex_events(stream, on_final_ai_message=_capture)"

    loop LangGraph messages events
        Conv-->>LGT: StreamTaskMessageStart / Delta / Done (text or reasoning)
        LGT-->>UE: event
        UE->>SD: derive span signal
        UE-->>Backend: stream text delta
    end

    loop LangGraph updates events
        Conv-->>LGT: StreamTaskMessageFull(ToolRequestContent)
        LGT-->>UE: Full event
        UE->>SD: open tool span
        UE-->>Backend: Full message
        Conv->>Conv: on_final_ai_message(_capture) accumulate TurnUsage
        Conv-->>LGT: StreamTaskMessageFull(ToolResponseContent)
        LGT-->>UE: Full event
        UE->>SD: close tool span
        UE-->>Backend: Full message
    end

    UE-->>Agent: TurnResult(final_text) / yield events
    Agent->>LGT: turn.usage() accumulated TurnUsage
Loading

Comments Outside Diff (1)

  1. src/agentex/lib/adk/_modules/_langgraph_sync.py, line 147-153 (link)

    P1 Reasoning block StreamTaskMessageStart uses wrong content type

    When a reasoning model emits a block of type "reasoning", the code opens the stream with TextContent(type="text", ...) instead of ReasoningContent. Downstream consumers that dispatch on content.type (e.g. rendering pipelines, the SpanDeriver text-span logic) will receive a TextContent wrapper for what is actually a reasoning block, then see a ReasoningContentDelta arrive — a type mismatch that will confuse or break those consumers. ReasoningContent is also not imported in this file, confirming the intended type was never used. The conformance fixture _REASONING correctly shows ReasoningContent as the expected start content, but it constructs the events by hand and never runs them through the actual converter, so no test catches this today.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agentex/lib/adk/_modules/_langgraph_sync.py
    Line: 147-153
    
    Comment:
    **Reasoning block `StreamTaskMessageStart` uses wrong content type**
    
    When a reasoning model emits a block of type `"reasoning"`, the code opens the stream with `TextContent(type="text", ...)` instead of `ReasoningContent`. Downstream consumers that dispatch on `content.type` (e.g. rendering pipelines, the `SpanDeriver` text-span logic) will receive a `TextContent` wrapper for what is actually a reasoning block, then see a `ReasoningContentDelta` arrive — a type mismatch that will confuse or break those consumers. `ReasoningContent` is also not imported in this file, confirming the intended type was never used. The conformance fixture `_REASONING` correctly shows `ReasoningContent` as the expected start content, but it constructs the events by hand and never runs them through the actual converter, so no test catches this today.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Reviews (13): Last reviewed commit: "test(harness): num_llm_calls is None (no..." | Re-trigger Greptile

Comment on lines +95 to +119
def __init__(self, stream: Any, model: str | None = None) -> None:
self._stream = stream
self._model = model
self._usage: TurnUsage = TurnUsage(model=model)

@property
def events(self) -> AsyncIterator[StreamTaskMessage]:
return self._generate_events()

async def _generate_events(self) -> AsyncGenerator[StreamTaskMessage, None]:
def _capture(ai_msg: Any) -> None:
usage_metadata = getattr(ai_msg, "usage_metadata", None)
if usage_metadata is not None:
self._usage = langgraph_usage_to_turn_usage(usage_metadata, self._model)

async for ev in convert_langgraph_to_agentex_events(self._stream, on_final_ai_message=_capture):
yield ev

def usage(self) -> TurnUsage:
"""Return the usage captured from the last AIMessage in the stream.

Valid only after ``events`` has been fully consumed.
Returns a zero-usage ``TurnUsage`` if the model did not report usage.
"""
return self._usage

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TurnResult.usage is always empty when using auto_send_turn

LangGraphTurn populates self._usage lazily via the on_final_ai_message callback, which fires during event iteration. However, UnifiedEmitter.auto_send_turn passes usage=turn.usage() as an argument to auto_send before iteration begins (Python evaluates all arguments before the call). By the time the stream is consumed and _capture updates self._usage, the pre-iteration snapshot has already been handed to TurnResult.

Concretely: every caller that reads result.usage after await emitter.auto_send_turn(turn) gets TurnUsage(model=model) — zero token counts regardless of what the model reported. The PR description documents the workaround ("callers should read turn.usage() after auto_send_turn returns"), but TurnResult.usage existing with silent stale data is a trap for every future user of this API.

The fix belongs in emitter.py: call turn.usage() after await auto_send(turn.events, ...) returns, then construct the TurnResult from the now-populated usage.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agentex/lib/adk/_modules/_langgraph_turn.py
Line: 95-119

Comment:
**`TurnResult.usage` is always empty when using `auto_send_turn`**

`LangGraphTurn` populates `self._usage` lazily via the `on_final_ai_message` callback, which fires _during_ event iteration. However, `UnifiedEmitter.auto_send_turn` passes `usage=turn.usage()` as an argument to `auto_send` _before_ iteration begins (Python evaluates all arguments before the call). By the time the stream is consumed and `_capture` updates `self._usage`, the pre-iteration snapshot has already been handed to `TurnResult`.

Concretely: every caller that reads `result.usage` after `await emitter.auto_send_turn(turn)` gets `TurnUsage(model=model)` — zero token counts regardless of what the model reported. The PR description documents the workaround ("callers should read `turn.usage()` after `auto_send_turn` returns"), but `TurnResult.usage` existing with silent stale data is a trap for every future user of this API.

The fix belongs in `emitter.py`: call `turn.usage()` _after_ `await auto_send(turn.events, ...)` returns, then construct the `TurnResult` from the now-populated usage.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from dc5c81d to 68572d5 Compare June 18, 2026 21:12
@declan-scale

Copy link
Copy Markdown
Contributor Author

@greptile review

@declan-scale declan-scale force-pushed the declan-scale/unified-harness-surface branch from b4b8b33 to da780a1 Compare June 22, 2026 13:48
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from 68572d5 to 40b4cbe Compare June 22, 2026 13:53
@declan-scale declan-scale changed the base branch from declan-scale/unified-harness-surface to declan-scale/agx1-373-conformance-equivalence June 22, 2026 13:53
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 37421b6 to df3461c Compare June 22, 2026 14:13
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch 2 times, most recently from 4fd2ff4 to e03a584 Compare June 22, 2026 14:37
Comment thread src/agentex/lib/adk/_modules/_langgraph_turn.py
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from ccbd5cf to e3fa1cc Compare June 22, 2026 15:14
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch 2 times, most recently from 734b298 to 6ac00e3 Compare June 22, 2026 15:54
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c8c63d1 to 05120f3 Compare June 22, 2026 18:47
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from a367469 to af6a4b2 Compare June 22, 2026 18:47
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 05120f3 to c9a907c Compare June 22, 2026 19:54
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from af6a4b2 to 002d5f9 Compare June 22, 2026 19:54
@declan-scale declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c9a907c to a04bf5e Compare June 22, 2026 20:01
Base automatically changed from declan-scale/agx1-373-conformance-equivalence to next June 22, 2026 20:09
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from 002d5f9 to bdd528b Compare June 22, 2026 20:11
declan-scale and others added 9 commits June 22, 2026 18:24
Adds an additive on_final_ai_message=None parameter to
convert_langgraph_to_agentex_events so callers can capture AIMessage
usage_metadata without re-traversing the stream. No behavior change when
omitted. Also adds a DeprecationWarning to create_langgraph_tracing_handler
and its module docstring, pointing to the unified harness surface, and
updates the sync module docstring with the preferred unified path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements LangGraphTurn (HarnessTurn protocol) that wraps a LangGraph
astream() event stream and captures usage from AIMessage.usage_metadata
via the on_final_ai_message callback. Implements langgraph_usage_to_turn_usage
that maps all UsageMetadata fields (input/output/total/cache_read/reasoning)
onto the framework-agnostic TurnUsage model. Zero token counts are preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pre-refactor)

Records the current bespoke behavior as a contract test. After Task 4 rewrites
the internals to use UnifiedEmitter + LangGraphTurn, these tests must still pass
to confirm behavioral parity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urface

Replaces the bespoke Redis-streaming loop with UnifiedEmitter.auto_send_turn(
LangGraphTurn(...)), matching the pattern established for pydantic-ai. Public
signature preserved identically. Behavioral difference: tool calls/responses
are now posted via streaming_task_message_context (not adk.messages.create),
and final_text accumulates all text across the turn. Updates the characterization
test to document these unified-surface semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verifies yield_turn(LangGraphTurn) produces identical events to direct
iteration, and documents the AGX1-377 behavior (LangGraph Full tool events
don't produce SpanDeriver spans today; cross-channel equivalence comes with
AGX1-373).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-step)

Registers LangGraph-specific conformance fixtures with the shared harness
conformance runner. Documents the AGX1-377 behavior (tool requests are Full
events, not Start+Done). Span derivation is deterministic for all 4 fixtures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ral channels

Adds 18 offline integration tests across the three delivery channels using
fake LangGraph event streams and fake streaming backends. Documents the
AGX1-377 behavior (Full events don't produce tool spans). Notes the usage
capture timing: turn.usage() is the authoritative post-iteration value since
auto_send_turn evaluates usage eagerly before events are consumed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task 9: add 3 deployable tutorial agents that demonstrate the unified
harness surface side-by-side with the bespoke reference examples:
- examples/tutorials/00_sync/harness_langgraph/ (s-harness-langgraph)
  uses UnifiedEmitter.yield_turn(LangGraphTurn(stream))
- examples/tutorials/10_async/00_base/harness_langgraph/ (a-harness-langgraph)
  uses UnifiedEmitter.auto_send_turn(LangGraphTurn(stream))
- examples/tutorials/10_async/10_temporal/harness_langgraph/ (at-harness-langgraph)
  follows 130_langgraph pattern (LangGraphPlugin + emit_langgraph_messages)

Task 10: enable live-matrix CI job in harness-integration.yml with a
3-way matrix over [sync, async, temporal] running offline integration tests.
Also add test_harness_langgraph_*.py to PR path triggers.

Task 11 (pyright fixes): annotate convert_langgraph_to_agentex_events and
_generate_events with AsyncGenerator return types so pyright infers them as
async generators rather than coroutines. Add start_time to Span construction
in test_langgraph_sync_unified.py fake tracing backend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…racing handler (PR 5/6)

AGX1-378: wire workflow_now_if_in_workflow() into stream_langgraph_events so
Temporal callers get deterministic message timestamps, matching the pattern
used by the openai/litellm providers.

Deprecation alignment: remove runtime warnings.warn from
create_langgraph_tracing_handler (and unused import warnings) to match PR 4/6
pydantic-ai convention. Deprecation remains in docstrings on module, class,
and function. Callers under -W error are no longer broken.

Test alignment after rebase onto unified-harness-surface (b4b8b33):
- FakeStreamingModule.streaming_task_message_context in test_langgraph_async.py
  and test_pydantic_ai_async.py updated to accept **kw (foundation now passes
  created_at).
- Three "no tool spans for Full events" tests updated to assert the new
  SpanDeriver behaviour: Full(ToolRequestContent) opens a span,
  Full(ToolResponseContent) closes it.
- Two "accumulates all text" multi-step tests corrected to last-segment
  semantics (auto_send resets final_text_parts on each new Start(TextContent)).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
declan-scale and others added 5 commits June 22, 2026 18:26
…-373)

Rewrites test_langgraph_conformance.py to use the cross-channel runner
from PR #414 (run_cross_channel_conformance, LogicalDelivery) instead of
the simpler derive_all-only API it was written against.

The four fixtures (text-only, single-tool, reasoning, multi-step) are
retained as canonical StreamTaskMessage* sequences. Each is now exercised
by test_cross_channel_equivalence (yield_events vs auto_send logical
deliveries and span signals) plus the backward-compat
test_span_derivation_is_deterministic guard.

LangGraph tool requests arrive as Full events from the "updates" stream;
auto_send handles them via open+close, yielding the same LogicalDelivery
on both channels. No coalesce_tool_requests option is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_capture overwrote self._usage on every AIMessage, so a multi-step turn (text
-> tool decision -> final text) reported only the last LLM call's tokens and
silently dropped the rest — undercounting in any billing/monitoring that reads
turn.usage(). Accumulate additively across calls via _accumulate_turn_usage
(None+None stays None; real 0 contributes 0). Add a test asserting summed
input/output/total/cache/reasoning tokens across two AIMessages.

The separate 06-18 "TurnResult.usage empty via auto_send_turn" comment is
resolved by the foundation (emitter reads turn.usage() after stream exhaustion).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e [greptile]

The sync harness_langgraph tutorial set turn_span.output to
{"final_output": turn.usage().model_dump()} — token metrics under a key that
means the assistant's text, producing misleading AGENT_WORKFLOW trace data
versus the async tutorial. Accumulate text deltas during the yield loop (as the
030_langgraph tutorial does) and store {"final_output": final_text, "usage":
...} so the final output is the text and usage stays available under its own key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt [greptile]

The sync converter opened a reasoning stream with a StreamTaskMessageStart
wrapping TextContent, even though the deltas are ReasoningContentDelta — so the
Start's content type contradicted its payload (and downstream span/UI handling
of reasoning). Emit ReasoningContent for the reasoning Start. Add a test
asserting the reasoning block opens a ReasoningContent Start + ReasoningContentDelta.

Also refresh the stale test_harness_langgraph_sync.py module docstring: it
claimed the SpanDeriver does NOT produce tool spans for LangGraph Full events,
but the foundation now opens/closes tool spans from Full(ToolRequest/Response)
(asserted by test_tracer_produces_tool_spans_for_full_events).

Note: the "TurnResult.usage pre-iteration snapshot" item is already resolved at
root — UnifiedEmitter.auto_send_turn reads turn.usage() after auto_send drains
the stream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _langgraph_sync.py: reasoning StreamTaskMessageStart now sets style="active".
  The AgentEx server's StreamTaskMessageStartEntity rejects reasoning.style=None
  (enum), which killed the sync stream — breaking harness_langgraph and the
  pre-existing 030_langgraph that share this emitter.
- temporal harness tools.py: give get_weather a native async coroutine so
  tools_node's `await tool.ainvoke(...)` runs on the workflow loop instead of
  LangChain's run_in_executor fallback (NotImplementedError in the deterministic
  Temporal workflow sandbox).
- test_langgraph_sync.py: assert the reasoning Start carries a non-null style.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale force-pushed the declan-scale/pr5-langgraph branch from cb03bbc to 4ceb1cf Compare June 22, 2026 22:28
The shared TurnUsage.num_llm_calls became Optional (None = "not reported")
when it landed on next. The openai / pydantic-ai turn tests still asserted
== 0 for the default (no-usage / pre-exhaustion) construction path, which now
yields None — matching the token fields. Update those three assertions to
`is None`; real-zero cases (provider reported 0) are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale merged commit d344228 into next Jun 22, 2026
67 checks passed
@declan-scale declan-scale deleted the declan-scale/pr5-langgraph branch June 22, 2026 22:45
@stainless-app stainless-app Bot mentioned this pull request Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants