Fix: race condition in update_chat_ctx deletes server-created function calls by StianHanssen · Pull Request #4960 · livekit/agents

StianHanssen · 2026-02-26T13:50:21Z

Summary

update_chat_ctx can delete in-flight function_call items from the OpenAI Realtime server, causing cascading "failed to insert item: previous_item_id not found" corruption of _remote_chat_ctx.

The root cause is a timing gap between two context-tracking structures:

_remote_chat_ctx: Updated immediately when the server sends conversation.item.added
_agent._chat_ctx: Updated later, only when tool execution starts (_tool_execution_started_cb)

If update_chat_ctx runs during this window (e.g. from summarization), the diff sees the function_call in remote but not in local, treats it as intentionally removed, and sends a delete event. The _is_content_empty guard only protects message items, function_call items pass through unconditionally.

I created a unit test gist replicating the exact pipeline that demonstrates how update_chat_ctx deletes in-flight function_call items. Note the "test" passes when the failure scenario happens.

Fix

Track in-flight function calls with an _inflight_fc_ids set on the OpenAI RealtimeSession. Items enter the set when the server creates them and leave when the agent framework acknowledges them (tool execution starts, is rejected, or is interrupted). _create_update_chat_ctx_events skips deletion for any item still in the set.

A runtime_checkable Protocol (DualChatContextSyncSession) bridges the signalling between livekit-agents (where tool execution happens) and livekit-plugins-openai (where the set lives), without modifying the abstract RealtimeSession base class.

Future consideration

This fix only tracks function_call items. function_call_output items are currently client-initiated (manual_function_calls=True), so they enter _agent._chat_ctx before _remote_chat_ctx and are not vulnerable to this race. If auto_tool_reply_generation is enabled in a future configuration (server-generated outputs), a guard should be added to cover function_call_output items as well.

…t context state

…xt state that tool call is processed

devin-ai-integration

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

devin-ai-integration · 2026-02-26T14:05:17Z

livekit-agents/livekit/agents/voice/agent_activity.py

            await utils.aio.cancel_and_wait(exe_task)
+            _notify_fc_processed(function_calls)


🟡 Incomplete cleanup of _inflight_fc_ids on interruption leaks stale entries that permanently block deletion

When a speech is interrupted, _notify_fc_processed(function_calls) is called at line 2683 to release all in-flight function call IDs. However, function_calls is populated by the _read_fnc_stream task, which was already cancelled at line 2632 (await utils.aio.cancel_and_wait(*tasks)). This means function_calls may be incomplete — it might not include function calls whose IDs were already added to _inflight_fc_ids by the server's conversation.item.added event.

Root Cause: timing gap between server event and stream reader

The _inflight_fc_ids set is populated in _handle_conversion_item_added (realtime_model.py:1510-1511) when the server sends conversation.item.added (step 3 in the event sequence). However, a function call only enters function_calls after response.output_item.done (step 8) pushes it to function_ch, and then _read_fnc_stream reads it from the tee.

Two scenarios cause a leak:

Response cancelled before response.output_item.done: The server creates the function_call item (step 3) but cancels the response before step 8. The function call never reaches function_ch, so it's never in function_calls.

_read_fnc_stream cancelled before reading buffered items: Even if step 8 fires and the item is in the tee buffer, _read_fnc_stream is cancelled at agent_activity.py:2632 before reading it.

In both cases, the stale ID remains in _inflight_fc_ids indefinitely. Every future update_chat_ctx call sees the function_call in _remote_chat_ctx but not in local context, yet the _inflight_fc_ids guard (realtime_model.py:1189-1190) prevents deletion. The item permanently occupies space in the server's conversation context window.

Impact: After an interruption during function call generation, orphaned function_call items can accumulate in the remote context and can never be cleaned up by update_chat_ctx (e.g., during summarization), causing gradual context window waste.

Prompt for agents

In livekit-agents/livekit/agents/voice/agent_activity.py, at lines 2681-2684 (the interrupted return path), the _notify_fc_processed(function_calls) call uses an incomplete list because _read_fnc_stream was already cancelled. To fix this, instead of relying on function_calls (which comes from the cancelled stream reader), the cleanup should directly clear all inflight IDs for this generation. One approach: add a method to the DualChatContextSyncSession protocol (and its implementations in realtime_model.py and realtime_model_beta.py) like clear_all_inflight_fcs() that does self._inflight_fc_ids.clear(), and call it in the interrupted path. Alternatively, track which response's function calls are inflight (keyed by response_id) so only the relevant IDs are cleared. The simplest safe fix is to call a method that clears all inflight IDs when the generation is interrupted, since no other generation should be active concurrently.

Was this helpful? React with 👍 or 👎 to provide feedback.

Ah, yes, function_calls can be incomplete if _read_fnc_stream was cancelled before consuming all items from the stream. However, this is expected and covered by a secondary clean-up path: _handle_conversion_item_deleted in realtime_model.py calls self._inflight_fc_ids.discard(event.item_id) whenever the server deletes an item. During interruption, the server typically deletes/truncates these items, which triggers the clean-up.

In the worst case (server retains the item), the ID lingers in _inflight_fc_ids, which prevents update_chat_ctx from deleting it, but I believe that is the correct behaviour, since the item legitimately exists on the server and deleting it would cause the cascade corruption this fix prevents.

longcw · 2026-02-27T04:40:11Z

thanks for the investigation on this issue.

the core issue is that callers base their modifications on agent.chat_ctx (which is missing inflight items), but the diff runs against _remote_chat_ctx (which has them). Instead of tracking inflight IDs, what if we ensure the context passed to update_chat_ctx is always derived from the latest remote state?

The flow would be:

Caller snapshots context for summarization (v1) — from agent or session, doesn't matter
Summarize v1 → summary
At sync time, update_chat_ctx takes a fresh copy of _remote_chat_ctx (v2)
Replace the items from v1 in v2 with summary, leaving everything else (e.g. inflight FC1) untouched
Diff _remote_chat_ctx vs v2

StianHanssen · 2026-02-27T05:50:13Z

@longcw Thank you for your thoughts!
Sorry, I am not fully understanding your suggestion.

Replace the items from v1 in v2 with summary, leaving everything else (e.g. inflight FC1) untouched

In your approach, when updating from v1 to v2, how do you know which items to change to make v2? How do you know if a FunctionCall is inflight or intentionally deleted from v1?

longcw · 2026-02-27T06:20:34Z

How do you know if a FunctionCall is inflight or intentionally deleted from v1?

I see your point, that approach only protects function calls that arrived after the snapshot.

StianHanssen added 3 commits February 26, 2026 22:26

feat(realtime.py): add a protocol for realtime sessions with dual cha…

1c7c834

…t context state

feat(openai/realtime): add diff guard for inflight function calls

0a9747d

feat(agent_actiity.py): notify realtime sessions with dual chat conte…

1a3dadc

…xt state that tool call is processed

This comment was marked as resolved.

Sign in to view

fix(agent_activity.py): ensure function call processed is always sent

d8b2f99

devin-ai-integration bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: race condition in update_chat_ctx deletes server-created function calls#4960

Fix: race condition in update_chat_ctx deletes server-created function calls#4960
StianHanssen wants to merge 4 commits intolivekit:mainfrom
StianHanssen:fix-tool-sync-can-cause-item-loss

StianHanssen commented Feb 26, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Uh oh!

StianHanssen Feb 26, 2026

Uh oh!

longcw commented Feb 27, 2026

Uh oh!

StianHanssen commented Feb 27, 2026 •

edited

Loading

Uh oh!

longcw commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		await utils.aio.cancel_and_wait(exe_task)
		_notify_fc_processed(function_calls)

Conversation

StianHanssen commented Feb 26, 2026

Summary

Fix

Future consideration

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

StianHanssen Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

longcw commented Feb 27, 2026

Uh oh!

StianHanssen commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

longcw commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

StianHanssen commented Feb 27, 2026 •

edited

Loading