fix: replace RuntimeError with cancel_tool to prevent memory corruption#1
Open
xiaosu19 wants to merge 3 commits into
Open
fix: replace RuntimeError with cancel_tool to prevent memory corruption#1xiaosu19 wants to merge 3 commits into
xiaosu19 wants to merge 3 commits into
Conversation
added 3 commits
May 13, 2026 15:22
…on on tool limit When tool calls reach the 20-call limit, raising RuntimeError in AfterToolCallEvent breaks the message history, leaving toolUse blocks without matching toolResult blocks. When AgentCore Memory restores this corrupted history in subsequent requests, Bedrock's ConverseStream API rejects it with ValidationException. Fix: Use BeforeToolCallEvent with event.cancel_tool instead. This cancels the tool gracefully by returning an error message to the model, which then responds using already-gathered information. The conversation history remains consistent and Memory can safely restore it.
…emory When MCP tool calls are interrupted (timeout, network error), Memory saves incomplete history with toolUse but no toolResult. On restoration, Strands SDK's repair logic can add incorrect toolResult counts (strands-agents/sdk-python#2296), causing Bedrock API rejection. Add fix_message_history() that validates toolUse/toolResult pairing before each invocation and corrects any mismatches.
The previous fix_message_history() ran after Agent creation but before invoke_async(). However, session_manager restores history inside invoke_async(), so the fix ran too early. Move to BeforeModelCallEvent hook which fires right before each model call, after history restoration and SDK's own (buggy) repair logic. This ensures messages are always valid when sent to Bedrock.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When tool calls reach the 20-call limit,
raise RuntimeErrorinAfterToolCallEventbreaks the message history —toolUseblocks are left without matchingtoolResultblocks. When AgentCore Memory restores this corrupted history in subsequent requests, Bedrock's ConverseStream API rejects it with:This makes the affected session permanently broken.
Fix
Replace
AfterToolCallEvent+raise RuntimeErrorwithBeforeToolCallEvent+event.cancel_tool.This is the recommended pattern from Strands SDK docs. The tool call is cancelled gracefully — the model receives an error message and responds using already-gathered information. The conversation history remains consistent.
Changes
main.py: ~10 lines changed in the tool limit hook sectionTesting
Deployed and verified in us-east-2 with Memory enabled. New sessions work correctly when hitting the 20-tool limit.