Python: Fix doubled tool_call arguments in MESSAGES_SNAPSHOT when streaming#4200
Conversation
|
I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where MESSAGES_SNAPSHOT events contained doubled function.arguments strings when streaming tool calls. The issue occurred when streaming providers send a full-arguments replay after incremental deltas, causing the arguments to be accumulated twice. The fix adds duplicate detection logic (similar to the existing text content handling) to skip re-accumulating arguments when the incoming delta equals the already-accumulated value.
Changes:
- Added duplicate detection guard in
_emit_tool_call()to prevent doubling tool call arguments in MESSAGES_SNAPSHOT events - Follows the existing pattern from
_emit_text()for handling provider full-content replays - Includes detailed inline comments explaining the fix and referencing issue #4194
|
@microsoft-github-policy-service agree |
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
moonbox3
left a comment
There was a problem hiding this comment.
LGTM. Thanks for fixing, @LEDazzio01.
When streaming with client-side tools, some providers send a full-
arguments replay after the streaming deltas complete. The `_emit_tool_call`
function unconditionally appends every arguments delta to the internal
`flow.tool_calls_by_id` tracking dictionary via `+=`. When the replay
contains the exact same complete arguments string that was already
accumulated from prior deltas, the arguments get doubled (e.g.,
`{"todoText":"buy groceries"}{"todoText":"buy groceries"}`).
This causes `MESSAGES_SNAPSHOT` events to contain invalid doubled JSON in
`tool_calls[].function.arguments`, breaking any client or middleware that
relies on snapshots for state reconstruction.
The fix adds a guard (mirroring the existing duplicate guard in
`_emit_text`) that detects when the incoming delta exactly equals the
already-accumulated arguments string, indicating a full-arguments replay
rather than an incremental delta. In this case the append is skipped,
preventing the doubling.
The `ToolCallArgsEvent` deltas are still emitted correctly for real-time
streaming — only the internal snapshot accumulator is guarded.
Fixes microsoft#4194
Address Copilot review feedback: 1. Move duplicate full-arguments replay detection BEFORE emitting ToolCallArgsEvent, for consistency with _emit_text() which returns early without emitting any events on replay detection. 2. Add test_emit_tool_call_skips_duplicate_full_arguments_replay() to verify the duplicate detection behavior for tool call arguments, matching the existing test pattern for text content.
3b31dee to
6f2f9fa
Compare
Motivation and Context
Fixes #4194 —
MESSAGES_SNAPSHOTevents contain doubledfunction.argumentsstrings for tool calls when streaming with client-side tools.When a streaming provider sends a full-arguments replay after incremental deltas complete,
_emit_tool_call()unconditionally appends the replay to the accumulated arguments via+=, causing the arguments string to double (e.g.,{"todoText":"buy groceries"}{"todoText":"buy groceries"}). This breaks any client or middleware that relies onMESSAGES_SNAPSHOTfor state reconstruction, since the doubled string is not valid JSON.Description
Root cause: In
_run_common.py, the_emit_tool_callfunction tracks accumulated arguments inflow.tool_calls_by_id[tool_call_id]["function"]["arguments"]using+=. Unlike_emit_text()which already has a guard against full-message replays,_emit_tool_callhad no equivalent protection.Fix: Add a duplicate detection guard (mirroring the existing pattern in
_emit_text) that checks whether the incoming delta exactly equals the already-accumulated arguments string. When matched, this indicates a full-arguments replay rather than an incremental delta, and the append is skipped:Key properties:
ToolCallArgsEventdeltas are still emitted correctly for real-time streaming — only the internal snapshot accumulator is guarded_emit_text()for text content replay detectionContribution Checklist