Skip to content

Fix: race condition in update_chat_ctx deletes server-created function calls#4960

Open
StianHanssen wants to merge 4 commits intolivekit:mainfrom
StianHanssen:fix-tool-sync-can-cause-item-loss
Open

Fix: race condition in update_chat_ctx deletes server-created function calls#4960
StianHanssen wants to merge 4 commits intolivekit:mainfrom
StianHanssen:fix-tool-sync-can-cause-item-loss

Conversation

@StianHanssen
Copy link
Contributor

Summary

update_chat_ctx can delete in-flight function_call items from the OpenAI Realtime server, causing cascading "failed to insert item: previous_item_id not found" corruption of _remote_chat_ctx.

The root cause is a timing gap between two context-tracking structures:

  • _remote_chat_ctx: Updated immediately when the server sends conversation.item.added
  • _agent._chat_ctx: Updated later, only when tool execution starts (_tool_execution_started_cb)

If update_chat_ctx runs during this window (e.g. from summarization), the diff sees the function_call in remote but not in local, treats it as intentionally removed, and sends a delete event. The _is_content_empty guard only protects message items, function_call items pass through unconditionally.

I created a unit test gist replicating the exact pipeline that demonstrates how update_chat_ctx deletes in-flight function_call items. Note the "test" passes when the failure scenario happens.

Fix

Track in-flight function calls with an _inflight_fc_ids set on the OpenAI RealtimeSession. Items enter the set when the server creates them and leave when the agent framework acknowledges them (tool execution starts, is rejected, or is interrupted). _create_update_chat_ctx_events skips deletion for any item still in the set.

A runtime_checkable Protocol (DualChatContextSyncSession) bridges the signalling between livekit-agents (where tool execution happens) and livekit-plugins-openai (where the set lives), without modifying the abstract RealtimeSession base class.

Future consideration

This fix only tracks function_call items. function_call_output items are currently client-initiated (manual_function_calls=True), so they enter _agent._chat_ctx before _remote_chat_ctx and are not vulnerable to this race. If auto_tool_reply_generation is enabled in a future configuration (server-generated outputs), a guard should be added to cover function_call_output items as well.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines 2682 to +2683
await utils.aio.cancel_and_wait(exe_task)
_notify_fc_processed(function_calls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Incomplete cleanup of _inflight_fc_ids on interruption leaks stale entries that permanently block deletion

When a speech is interrupted, _notify_fc_processed(function_calls) is called at line 2683 to release all in-flight function call IDs. However, function_calls is populated by the _read_fnc_stream task, which was already cancelled at line 2632 (await utils.aio.cancel_and_wait(*tasks)). This means function_calls may be incomplete — it might not include function calls whose IDs were already added to _inflight_fc_ids by the server's conversation.item.added event.

Root Cause: timing gap between server event and stream reader

The _inflight_fc_ids set is populated in _handle_conversion_item_added (realtime_model.py:1510-1511) when the server sends conversation.item.added (step 3 in the event sequence). However, a function call only enters function_calls after response.output_item.done (step 8) pushes it to function_ch, and then _read_fnc_stream reads it from the tee.

Two scenarios cause a leak:

  1. Response cancelled before response.output_item.done: The server creates the function_call item (step 3) but cancels the response before step 8. The function call never reaches function_ch, so it's never in function_calls.

  2. _read_fnc_stream cancelled before reading buffered items: Even if step 8 fires and the item is in the tee buffer, _read_fnc_stream is cancelled at agent_activity.py:2632 before reading it.

In both cases, the stale ID remains in _inflight_fc_ids indefinitely. Every future update_chat_ctx call sees the function_call in _remote_chat_ctx but not in local context, yet the _inflight_fc_ids guard (realtime_model.py:1189-1190) prevents deletion. The item permanently occupies space in the server's conversation context window.

Impact: After an interruption during function call generation, orphaned function_call items can accumulate in the remote context and can never be cleaned up by update_chat_ctx (e.g., during summarization), causing gradual context window waste.

Prompt for agents
In livekit-agents/livekit/agents/voice/agent_activity.py, at lines 2681-2684 (the interrupted return path), the _notify_fc_processed(function_calls) call uses an incomplete list because _read_fnc_stream was already cancelled. To fix this, instead of relying on function_calls (which comes from the cancelled stream reader), the cleanup should directly clear all inflight IDs for this generation. One approach: add a method to the DualChatContextSyncSession protocol (and its implementations in realtime_model.py and realtime_model_beta.py) like clear_all_inflight_fcs() that does self._inflight_fc_ids.clear(), and call it in the interrupted path. Alternatively, track which response's function calls are inflight (keyed by response_id) so only the relevant IDs are cleared. The simplest safe fix is to call a method that clears all inflight IDs when the generation is interrupted, since no other generation should be active concurrently.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, function_calls can be incomplete if _read_fnc_stream was cancelled before consuming all items from the stream. However, this is expected and covered by a secondary clean-up path: _handle_conversion_item_deleted in realtime_model.py calls self._inflight_fc_ids.discard(event.item_id) whenever the server deletes an item. During interruption, the server typically deletes/truncates these items, which triggers the clean-up.

In the worst case (server retains the item), the ID lingers in _inflight_fc_ids, which prevents update_chat_ctx from deleting it, but I believe that is the correct behaviour, since the item legitimately exists on the server and deleting it would cause the cascade corruption this fix prevents.

@longcw
Copy link
Contributor

longcw commented Feb 27, 2026

thanks for the investigation on this issue.

the core issue is that callers base their modifications on agent.chat_ctx (which is missing inflight items), but the diff runs against _remote_chat_ctx (which has them). Instead of tracking inflight IDs, what if we ensure the context passed to update_chat_ctx is always derived from the latest remote state?

The flow would be:

  1. Caller snapshots context for summarization (v1) — from agent or session, doesn't matter
  2. Summarize v1 → summary
  3. At sync time, update_chat_ctx takes a fresh copy of _remote_chat_ctx (v2)
  4. Replace the items from v1 in v2 with summary, leaving everything else (e.g. inflight FC1) untouched
  5. Diff _remote_chat_ctx vs v2

@StianHanssen
Copy link
Contributor Author

StianHanssen commented Feb 27, 2026

@longcw Thank you for your thoughts!
Sorry, I am not fully understanding your suggestion.

  1. Replace the items from v1 in v2 with summary, leaving everything else (e.g. inflight FC1) untouched

In your approach, when updating from v1 to v2, how do you know which items to change to make v2? How do you know if a FunctionCall is inflight or intentionally deleted from v1?

@longcw
Copy link
Contributor

longcw commented Feb 27, 2026

How do you know if a FunctionCall is inflight or intentionally deleted from v1?

I see your point, that approach only protects function calls that arrived after the snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants