Skip to content

fix: file descriptors leaks#632

Open
LautaroPetaccio wants to merge 6 commits intolivekit:mainfrom
LautaroPetaccio:fix/file-descriptors-leaks
Open

fix: file descriptors leaks#632
LautaroPetaccio wants to merge 6 commits intolivekit:mainfrom
LautaroPetaccio:fix/file-descriptors-leaks

Conversation

@LautaroPetaccio
Copy link
Copy Markdown

When multiple participants connect and disconnect rapidly from a room, native FFI handles, stream controllers, and event listeners accumulate without being released. Each leaked resource holds an open file descriptor. Under sustained churn (the typical pattern for applications where participants join and leave frequently), the process eventually exhausts its FD limit and crashes. Additionally, stream readers accumulate buffered data that is never released, causing gradual memory growth.

There are eleven independent leak vectors, each small on its own, but compounding under load:

  1. In-progress stream controllers survive disconnect().
    Room.disconnect() removed event listeners and called removeAllListeners(), but never closed the ReadableStreamDefaultController instances stored in byteStreamControllers and textStreamControllers. If a sender disconnects mid-transfer (so no trailer is ever received), the receiving side's stream stays open forever. Each open stream holds a controller reference and any buffered chunks, preventing GC.

  2. Track publication handles are never disposed on remote participant disconnect.
    When the room processes a participantDisconnected event, it deletes the participant from the remoteParticipants map and emits an event — but never touches the participant's trackPublications. Each TrackPublication wraps an FfiHandle that maps to a native resource. Dropping the JS reference without calling dispose() means the native side never frees the handle. With N participants each publishing M tracks, every disconnect leaks N×M handles.

  3. Audio and video stream native handles leak on normal stream end.
    AudioStreamSource and VideoStreamSource register an event listener on FfiClient in their constructor and create an FfiHandle for the native stream. On eos (end-of-stream), they removed the listener and closed the controller but never called ffiHandle.dispose(). The handle was only disposed in the cancel() path. Streams that end normally (the common case) leaked the native handle. Additionally, AudioStreamSource.cancel() did not close the frameProcessor, so cancelling a noise-cancelled stream leaked the processor.

  4. AudioResampler has no way to release its native handle.
    The class creates an FfiHandle in its constructor but exposes no close() or dispose() method. Every resampler instance leaks for the lifetime of the process.

  5. AudioSource.close() leaves a dangling setTimeout.
    captureFrame() schedules a timeout that calls this.release after the current queue drains. close() disposed the native handle and set closed = true, but never cleared the timeout. The callback fires after disposal, referencing freed native state. While not a direct FD leak, it causes use-after-free on the native side that can prevent the handle from being fully released.

  6. Concurrent getSid() calls each register independent listeners.
    Each call creates its own RoomSidChanged + Disconnected listener pair. If multiple calls race, only one resolves the SID and cleans up its listener — the rest stay attached until disconnect, at which point only the Disconnected listener is cleaned per call. The RoomSidChanged listeners from already-resolved calls persist.

  7. FfiClient.waitFor() listeners leak when their predicate never matches.
    waitFor() registers a listener on the FfiClient EventEmitter that is only removed when the predicate returns true. If the room disconnects before the expected FFI callback arrives (e.g. a publishData call is in-flight during disconnect), the listener stays attached forever. Every await waitFor() call site (~28 across participant.ts, room.ts, and audio_source.ts) is affected. Under rapid connect/disconnect cycles, this is the primary source of unbounded listener accumulation on the singleton FfiClient.

  8. WritableStream.abort() in streamText/streamBytes doesn't close the remote stream.
    The abort() handlers only logged the error. If a WritableStream is aborted due to a write failure, the remote side's ReadableStreamDefaultController stays open waiting for chunks that will never arrive — the same leak pattern as Still seeing identity is required for join but not set when calling RoomService methods on version >= 0.2.1 #1 but triggered from the sender side.

  9. Stream reader async iterator lock not released on done/error.
    Both ByteStreamReader and TextStreamReader async iterators only released the reader lock in the explicit return() method. If the stream completed normally (done: true) or an error was caught in next(), the lock stayed held. Consumers that don't call return() (e.g. for-await loops that break on a condition, or readAll() which runs to completion) would leave the underlying ReadableStream locked and unable to be garbage-collected.

  10. AudioMixer leaks orphaned setTimeout timers on every mixing iteration.
    getContribution() uses Promise.race([iterator.next(), this.timeout(ms)]) to race the stream against a deadline. When the iterator wins (the normal case), the losing timeout's setTimeout is never cleared. It fires harmlessly but creates an orphaned timer on every iteration. Under sustained mixing this creates a steady stream of wasted work, and in pathological cases can pressure the timer heap.

  11. TextStreamReader.receivedChunks accumulates indefinitely (memory leak).
    The receivedChunks Map stores every chunk received during the stream's lifetime for version-based deduplication and progress tracking. It is never cleared — not when the stream is fully consumed, not on error, and not when the consumer breaks out of iteration. For large or frequent text streams, this keeps all chunk data in memory for the lifetime of the reader object, causing gradual memory growth.

How

1. Close stream controllers in Room.disconnect()room.ts

After the FFI disconnect callback completes but before removing listeners, iterate both byteStreamControllers and textStreamControllers, call controller.close() on each (wrapped in try/catch since a controller may already be closed), and clear the maps. This ensures streams that never received a trailer are properly terminated.

2. Dispose track publication handles on participant disconnect — room.ts

In the participantDisconnected event handler, before deleting the participant from remoteParticipants and emitting the event, loop through participant.trackPublications, call ffiHandle.dispose() on each publication, and clear the map. The emit happens after cleanup so consumers see the participant in a clean state.

3. Dispose native handles on audio/video stream EOS — audio_stream.ts, video_stream.ts

In the eos case of both AudioStreamSource.onEvent and VideoStreamSource.onEvent, add this.ffiHandle.dispose() after closing the controller. This makes the EOS path symmetric with the cancel() path. Also add this.frameProcessor?.close() to AudioStreamSource.cancel() so both termination paths release the processor.

4. Add close() to AudioResampleraudio_resampler.ts

Add a public close() method that calls this.#ffiHandle.dispose(). This follows the same pattern as AudioSource.close() and VideoSource.close(), giving callers a way to release the native resampler when done.

5. Clear timeout in AudioSource.close()audio_source.ts

Before disposing the handle, check for a pending this.timeout and call clearTimeout() + set it to undefined. This prevents the scheduled release callback from firing after the native handle is freed.

6. Deduplicate getSid() listeners — room.ts

Add a private sidPromise?: Promise<string> field. On the first getSid() call that needs to wait, create the promise and store it. Subsequent concurrent calls return the same promise. When the promise resolves or rejects, sidPromise is cleared so future calls after a reconnect work normally. This ensures exactly one RoomSidChanged + Disconnected listener pair exists regardless of how many callers are waiting.

7. Add AbortSignal support to FfiClient.waitFor()ffi_client.ts, room.ts, participant.ts

waitFor() now accepts an optional { signal?: AbortSignal } parameter. When the signal fires, the listener is removed from the EventEmitter and the promise rejects. The Room class holds an AbortController that is aborted in disconnect() and reset in connect(). Its signal is passed to LocalParticipant on construction, which threads it into all 14 waitFor() call sites. This ensures every pending FFI listener is cleaned up when the room disconnects, regardless of whether the expected callback ever arrives.

8. Send trailer on WritableStream abort — participant.ts

Both streamText() and streamBytes() abort handlers now send a DataStream_Trailer with the error reason (best-effort, wrapped in try/catch since the connection may already be gone). This closes the remote side's stream controller instead of leaving it open waiting for data that will never arrive.

9. Release reader lock on done/error in stream iterators — stream_reader.ts

Both ByteStreamReader and TextStreamReader async iterators now call reader.releaseLock() in the done and catch paths of next(), not just in return(). This ensures the lock is released regardless of how the iteration ends — normal completion, error, or explicit return().

10. Cancel losing timeout in AudioMixer.timeoutRace()audio_mixer.ts

Replace the old timeout() method (which returned an uncancellable promise) with timeoutRace(), which returns both the race promise and a clearTimeout handle. After the race resolves in getContribution(), cancel() is called immediately to clear the losing timer. This eliminates orphaned setTimeout callbacks on every mixing iteration.

11. Clear receivedChunks on stream completion — stream_reader.ts

Add this.receivedChunks.clear() in all three termination paths of the TextStreamReader async iterator: the done path (stream fully consumed), the catch path (error during read), and the return() path (consumer breaks out of for-await early). This ensures the buffered chunk data can be garbage-collected as soon as the stream iteration ends, rather than being held for the lifetime of the TextStreamReader instance.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 31, 2026

⚠️ No Changeset found

Latest commit: 1a418c5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines +109 to 112
// Dispose the native handle so the FD is released on stream end,
// not just when cancel() is called explicitly by the consumer.
this.ffiHandle.dispose();
this.frameProcessor?.close();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Double-dispose of native FfiHandle when EOS fires with queued frames and consumer cancels

The new EOS handler calls this.ffiHandle.dispose() at line 111, and the cancel() method also calls this.ffiHandle.dispose() at line 123. Per the WHATWG ReadableStream spec, when controller.close() is called (line 108), enqueued frames are still drained by the consumer. If there are queued frames and the consumer cancels before reading them all (e.g., by breaking out of a for await...of loop), the stream is still in the "readable" state (close pending), so the runtime invokes cancel() on the underlying source — leading to ffiHandle.dispose() being called a second time on an already-freed native handle. The same double-invocation applies to this.frameProcessor?.close() on lines 112 and 126.

Prompt for agents
In packages/livekit-rtc/src/audio_stream.ts, add a boolean flag (e.g. `private disposed = false`) to AudioStreamSource. In both the EOS handler (around line 107-113) and the cancel() method (around line 121-127), check and set the flag before calling ffiHandle.dispose() and frameProcessor?.close(). For example:

  private disposed = false;

  // In onEvent EOS case:
  case 'eos':
    FfiClient.instance.off(FfiClientEvent.FfiEvent, this.onEvent);
    this.controller.close();
    if (!this.disposed) {
      this.disposed = true;
      this.ffiHandle.dispose();
      this.frameProcessor?.close();
    }
    break;

  // In cancel():
  cancel() {
    FfiClient.instance.off(FfiClientEvent.FfiEvent, this.onEvent);
    if (!this.disposed) {
      this.disposed = true;
      this.ffiHandle.dispose();
      this.frameProcessor?.close();
    }
  }
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +68 to +70
// Dispose the native handle so the FD is released on stream end,
// not just when cancel() is called explicitly by the consumer.
this.ffiHandle.dispose();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Double-dispose of native FfiHandle in VideoStreamSource when EOS fires with queued frames and consumer cancels

Same issue as in AudioStreamSource: the new EOS handler at line 70 calls this.ffiHandle.dispose(), and cancel() at line 81 also calls this.ffiHandle.dispose(). If EOS fires while there are still enqueued video frames and the consumer cancels the stream before draining them (e.g., breaking from for await...of), the ReadableStream invokes both the EOS cleanup and cancel(), disposing the native handle twice.

Prompt for agents
In packages/livekit-rtc/src/video_stream.ts, add a boolean flag (e.g. `private disposed = false`) to VideoStreamSource. Guard both the EOS handler (around line 65-71) and cancel() (around line 79-82) so that ffiHandle.dispose() is only called once:

  private disposed = false;

  // In onEvent EOS case:
  case 'eos':
    FfiClient.instance.off(FfiClientEvent.FfiEvent, this.onEvent);
    this.controller.close();
    if (!this.disposed) {
      this.disposed = true;
      this.ffiHandle.dispose();
    }
    break;

  // In cancel():
  cancel() {
    FfiClient.instance.off(FfiClientEvent.FfiEvent, this.onEvent);
    if (!this.disposed) {
      this.disposed = true;
      this.ffiHandle.dispose();
    }
  }
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant