-
Notifications
You must be signed in to change notification settings - Fork 5
feat: token budget tracking and message compaction for ReactAgent (#352) #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add module-level constants (DEFAULT_COMPACTION_THRESHOLD, PRESERVE_RECENT_PAIRS, DEFAULT_CONTEXT_WINDOW) and instance attributes for token budget tracking. Implement _estimate_message_tokens, _estimate_conversation_tokens, and _should_compact methods using len(str)/4 heuristic. Env var CODEFRAME_REACT_COMPACT_THRESHOLD configures threshold with validation (0.5-0.95 range, clamping).
Tier 1 (_compact_tool_results): Replace verbose tool result content with short summaries while preserving error results. Tier 2 (_remove_intermediate_steps): Remove redundant file reads (same file read without intervening edit) and passed test outputs. Tier 3 (_summarize_old_messages): Replace oldest messages with a single summary preserving file paths, errors, and architectural decisions. All tiers respect PRESERVE_RECENT_PAIRS boundary.
Add compact_conversation orchestrator that runs tiers 1->2->3 in order, stopping when under threshold. Add get_token_stats for diagnostics. Replace _trim_messages call in _react_loop with compact_conversation. Emit AGENT_COMPACTION event and verbose output when compaction occurs. Add AGENT_COMPACTION to EventType.
Add helper _build_conversation, tier edge case tests (token savings, unique reads), orchestration tests (tier1 early stop, all tiers, multiple rounds), and regression guard tests confirming existing ReactAgent behavior is unaffected. Total: 51 compaction tests.
Tier 2 intermediate step removal was matching any test output containing "passed", even when failures were also present. Now requires absence of "failed" and "error" to avoid discarding useful failure context.
WalkthroughAdded token-budget tracking and a three-tier conversation compaction system to ReactAgent, plus a new Changes
Sequence Diagram(s)sequenceDiagram
participant ReactLoop as React Loop
participant Agent as ReactAgent
participant Tier1 as Tier 1: ToolResults
participant Tier2 as Tier 2: IntermediateSteps
participant Tier3 as Tier 3: Summarizer
participant Emitter as EventEmitter
ReactLoop->>Agent: iteration / _react_loop
Agent->>Agent: _estimate_conversation_tokens(messages)
Agent->>Agent: _should_compact(messages)
alt compaction needed
Agent->>Tier1: _compact_tool_results(messages)
Tier1-->>Agent: (messages', tokens_saved)
Agent->>Tier2: _remove_intermediate_steps(messages')
Tier2-->>Agent: (messages'', tokens_saved)
Agent->>Tier3: _summarize_old_messages(messages'')
Tier3-->>Agent: (messages_final, tokens_saved)
Agent->>Agent: _finalize_compaction(messages_final, stats)
Agent->>Emitter: emit(AGENT_COMPACTION, stats)
Emitter-->>ReactLoop: event published
end
Agent-->>ReactLoop: continue with compacted messages
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Comment |
Add token budget tracking and multi‑tier message compaction to
|
Code Review: Token Budget Tracking and Message Compaction (#352)OverviewThis PR implements a well-designed 3-tier message compaction strategy for the ReactAgent to prevent context window overflow. The implementation is comprehensive with excellent test coverage (52 tests) and follows the project's v2 architectural principles. ✅ Strengths1. Architecture & Design
2. Test Coverage
3. Observability
4. Configuration
🔍 Issues & ConcernsCritical: Mutating Input MessagesLocation: react_agent.py:955 The _compact_tool_results method mutates the input messages list in place. While the calling code in compact_conversation passes list(messages) copies, this creates a shallow copy. The dictionaries themselves are still shared references, so mutating messages[i] can affect the original data structure. Impact: Potential unintended side effects if the messages list is used elsewhere after compaction. Fix: Create deep copies of messages being modified, or document that the method expects ownership of the list. Moderate: Tier 2 Assumption About Message PairingLocation: react_agent.py:980 The code assumes strict alternating assistant/user message pairing. If the conversation history ever contains consecutive assistant or user messages, this logic will misalign. Recommendations:
Minor: Token Estimation Edge CasesLocation: react_agent.py:872, 875 Issue 1: JSON serialization of tool_calls and tool_results happens on every token estimation call. For large conversations, this could become expensive. Issue 2: No handling of messages with neither content nor tool_calls/results. Empty messages return 0 tokens but should probably count for message overhead. Suggestion: Add a base token count per message (~5 tokens for role/metadata). 📊 Performance ConsiderationsPositive:
Potential Concern:
📝 Documentation & Code QualityStrengths:
Minor Issues:
✅ Testing QualityExcellent coverage:
One gap: No test for multiple compactions within a single run() execution. 📋 Recommendations SummaryMust Fix (Critical):
Should Fix (Moderate):
Nice to Have (Minor):
🎯 VerdictOverall: Strong implementation ✨ This PR successfully implements the acceptance criteria from #352 with excellent test coverage and thoughtful design. The 3-tier compaction strategy is well-architected and the code is clean. The critical issue around message mutation should be addressed before merge. The moderate issues are worth fixing but not blockers. The minor suggestions can be addressed in follow-up PRs if desired. Recommendation: Request changes for the mutation issue, then approve after fix. Great work on this feature! The token budget management will significantly improve the agent's ability to handle long-running tasks without hitting context limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@codeframe/core/react_agent.py`:
- Around line 1054-1103: The compaction logic in the method that builds
summary_msg ignores any existing "[Summary]" user messages in old_messages,
causing previously preserved file_paths/errors/decisions to be dropped on
subsequent compactions; update the code that loops old_messages (the block that
builds file_paths, errors, decisions) to first detect and parse any existing
summary messages (role == "user" and content.startswith("[Summary]")) and
extract the previously saved file paths, error snippets, and decision snippets
(parse patterns like "analyzed N files (...)", "errors encountered (...)", and
"architectural decisions: ...") into the file_paths/errors/decisions lists
before processing tool_calls/tool_results/assistant content so earlier preserved
data is merged rather than overwritten.
- Around line 353-360: The AGENT_COMPACTION event currently emits only
compact_stats; modify the emission so compact_stats includes the current task id
for traceability by adding something like compact_stats["task_id"] =
self.task_id (or the appropriate task identifier property) immediately before
calling self._emit(EventType.AGENT_COMPACTION, compact_stats) inside the
compacting block in react_agent.py (see compact_conversation, compact_stats,
_emit, and EventType.AGENT_COMPACTION).
- Around line 928-953: The summaries currently use the single variable tool_name
(from _extract_tool_name_from_pair) for all tool_results, which mislabels
multi-tool responses; before iterating tool_results, build a mapping of
tool_call_id -> tool name by extracting the tool calls from the corresponding
assistant message (e.g., inspect messages[i - 1] for a tool calls list or
construct a dict from any existing tool-call metadata), then inside the loop use
mapping.get(tr["tool_call_id"], tool_name) to pick the correct name for each tr
(falling back to the old _extract_tool_name_from_pair result if no mapping entry
exists) when composing summary and updating saved.
In `@tests/core/test_react_agent_compaction.py`:
- Around line 825-858: In test_compact_conversation_called_in_loop, after
calling agent.run("task-1") add assertions to verify the MockProvider recorded
LLM calls (e.g. assert provider.call_count >= 1 or assert len(provider.calls) >=
1) so the test not only checks compact_conversation was invoked but also that
the provider was actually called during the react loop; place these assertions
immediately after agent.run and before asserting compact_called.
- Include task_id in AGENT_COMPACTION event payload for SSE correlation - Map tool results to actual tool names via tool_call_id for multi-tool calls - Preserve prior [Summary] messages across compaction rounds - Add defensive copy in compact_conversation to avoid caller mutation - Add role validation in tier 2 before assuming assistant/user pairing - Add MockProvider call_count assertion in integration test
Responses to Code Review FeedbackThanks for the thorough review! Here's what was addressed in b7e56db: FixedCritical: Mutating Input Messages Moderate: Tier 2 Message Pairing Assumption Intentionally SkippedMinor: Base token overhead per message Minor: JSON serialization optimization |
Follow-Up Review: All Issues Resolved ✅I've reviewed the latest commit (fix: address PR review feedback for compaction system) and can confirm that all previously identified issues have been properly addressed. ✅ Fixed Issues from Previous Review1. Critical: Message Mutation Issue - RESOLVED
2. Moderate: Role Validation in Tier 2 - RESOLVED
3. Moderate: Tool Name Mapping - RESOLVED
4. Minor: Prior Summary Preservation - RESOLVED
5. Minor: Event Traceability - RESOLVED
🎯 Final AssessmentCode Quality: Excellent
Test Coverage: Comprehensive
Performance: Appropriate
Observability: Strong
📋 RecommendationAPPROVE ✅ This PR is ready to merge. The implementation successfully delivers all acceptance criteria from issue #352:
The latest commit addressed all review feedback thoroughly and professionally. Great work on this feature! 🎉 The token budget management will significantly improve the ReactAgent's ability to handle long-running tasks without hitting context limits. |
|
|
||
| # Build a map of file reads and edits in the compactable zone | ||
| # Process pairs: assistant at even index, user at odd index | ||
| for i in range(0, cutoff - 1, 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 Low
core/react_agent.py:987 Suggestion: _remove_intermediate_steps relies on even/odd indices and a truncated scan, which breaks when tier‑3 adds a user summary and can miss the last assistant step. Iterate by roles instead (find each assistant and its following user) and scan to the true end. Optionally emit the tier‑3 summary as a system message to avoid affecting pairing.
🚀 Want me to fix this? Reply ex: "fix it for me".
🤖 Prompt for AI
In file codeframe/core/react_agent.py around line 987:
Suggestion: `_remove_intermediate_steps` relies on even/odd indices and a truncated scan, which breaks when tier‑3 adds a `user` summary and can miss the last `assistant` step. Iterate by roles instead (find each `assistant` and its following `user`) and scan to the true end. Optionally emit the tier‑3 summary as a `system` message to avoid affecting pairing.
Summary
Implements #352: Adds context window management to the ReAct loop with token tracking and 3-tier message compaction strategy.
len(text) / 4conservative heuristicCODEFRAME_REACT_COMPACT_THRESHOLDenvironment variable (0.5-0.95)Acceptance Criteria
len(text) / 4)CODEFRAME_REACT_COMPACT_THRESHOLDenv varTest Plan
Implementation Notes
_trim_messages()character-based approach with smart token-aware compaction_trim_messages()kept but marked deprecated (not deleted to avoid breaking any external callers)AGENT_COMPACTIONevent type for observabilityCloses #352
Summary by CodeRabbit
New Features
Tests