Skip to content

Feat/barge in#1002

Open
lukasIO wants to merge 48 commits intomainfrom
feat/barge-in
Open

Feat/barge in#1002
lukasIO wants to merge 48 commits intomainfrom
feat/barge-in

Conversation

@lukasIO
Copy link
Contributor

@lukasIO lukasIO commented Jan 29, 2026

Description

Adds improved interruption handling capabilities.

  • new TurnHandling config
  • support for barge-in detection

TODO

  • ensure migrated options are exposed consistently and redundant fields are being deprecated

Summary by CodeRabbit

  • New Features

    • Added adaptive interruption detection (barge-in) supporting HTTP and WebSocket transports.
    • Added model usage tracking across LLM, TTS, and STT with detailed token and duration metrics.
    • Added response latency metrics including time-to-first-token and time-to-first-byte.
  • Improvements

    • Restructured turn handling configuration for better organization.
    • Enhanced stream channel with abort and input stream management.
    • Exposed model and provider identification across inference services.
    • Improved session options structure with backward compatibility.
  • Chores

    • Updated examples with new logging patterns and configuration.

@changeset-bot
Copy link

changeset-bot bot commented Jan 29, 2026

🦋 Changeset detected

Latest commit: cfe0362

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

This PR introduces an adaptive interruption detection system for voice agents with HTTP and WebSocket transport layers, refactors turn handling configuration into a centralized structure, adds comprehensive model usage metrics collection and telemetry enhancements, and extends inference implementations to expose provider and model metadata. Several voice pipeline components are updated to integrate the interruption detector, and backward compatibility is maintained through configuration migration utilities.

Changes

Cohort / File(s) Summary
Interruption Detection System
agents/src/inference/interruption/*
New complete interruption detection module including types, error handling, caching, HTTP/WebSocket transports, streaming orchestration, defaults, utilities, and detector implementation supporting both adaptive and fallback modes with timeout and retry mechanisms.
Turn Handling Configuration
agents/src/voice/turn_config/*
New configuration structures (TurnHandlingConfig, EndpointingConfig, InterruptionConfig) and migration utilities that consolidate turn detection, endpointing, and interruption options with backward-compatible migration from legacy flat options.
Voice Pipeline Integration
agents/src/voice/agent.ts, agent_activity.ts, agent_session.ts, audio_recognition.ts
Integration of AdaptiveInterruptionDetector into voice session, agent activity, and audio recognition with new options, lifecycle hooks, overlap speech tracking, stream wiring, and interruption event handling.
Stream Channel Enhancement
agents/src/stream/stream_channel.ts
Added generic error type parameter, abort(error: E) method for error-driven closure, and addStreamInput(stream) method for composing multiple input streams.
Model Usage Metrics
agents/src/metrics/model_usage.*
New ModelUsageCollector class aggregating LLM, TTS, and STT metrics with per-provider/model usage tracking, token accounting (including cached and modality-specific tokens), and SessionReport integration.
Telemetry & Observability
agents/src/telemetry/trace_types.ts, agents/src/voice/generation.ts, agents/src/voice/report.ts, agents/src/telemetry/traces.ts
New trace attributes for interruption metrics, response timing (TTFT/TTFB), E2E latency, and OpenTelemetry provider names; timing instrumentation in LLM/TTS inference; model usage integration into session reports.
Provider & Model Accessors
agents/src/inference/llm.ts, stt.ts, tts.ts, agents/src/llm/llm.ts, realtime.ts, agents/src/stt/stt.ts, agents/src/tts/tts.ts, plugins/*/src/{llm,stt,tts,realtime}*.ts
New model() and provider() getters across base inference classes and plugin implementations (OpenAI, Google, Deepgram, Cartesia) for consistent provider/model identification and metrics metadata.
Inference URL Resolution
agents/src/inference/utils.ts, agents/src/inference/llm.ts, stt.ts, tts.ts
New constants (DEFAULT_INFERENCE_URL, STAGING_INFERENCE_URL) and getDefaultInferenceUrl() function for environment-aware inference endpoint selection; refactored LLM/STT/TTS to use centralized URL resolution.
Metrics Base Types
agents/src/metrics/base.ts, agents/src/metrics/index.ts
Added MetricsMetadata type and optional metadata fields to LLM/STT/TTS/RealtimeModel metrics; added sessionDurationMs to RealtimeModelMetrics; new exports for model usage utilities.
Usage Collection & Deprecation
agents/src/metrics/usage_collector.ts
Added UsageSummary interface and deprecation notice to UsageCollector; new ModelUsageCollector available as replacement with per-model aggregation.
Configuration & Examples
examples/src/*.ts, .changeset/*
Updated example files to use new turn handling configuration structure, removed legacy UsageCollector calls, added session usage logging via shutdown callbacks; added changesets documenting additions.
Dependencies & Build
agents/package.json, .github/workflows/test.yml, .changeset/config.json
Added ofetch dependency for HTTP transport; commented out test:examples step in CI workflow; reformatted changeset configuration.
Tests & Utilities
agents/src/inference/interruption/utils.test.ts, agents/src/voice/turn_config/utils.test.ts, agents/src/metrics/model_usage.test.ts, agents/src/voice/agent.test.ts
New comprehensive test suites for sliding window utilities, configuration migration, and model usage aggregation; added test logger initialization.

Sequence Diagram(s)

sequenceDiagram
    participant Client as AudioClient
    participant AR as AudioRecognition
    participant ID as AdaptiveInterruptionDetector
    participant Transport as HTTP/WS Transport
    participant Inference as Inference Service
    
    Client->>AR: pushFrame(AudioFrame)
    AR->>AR: Buffer & resample audio
    
    rect rgba(100, 150, 200, 0.5)
        Note over AR,ID: Interruption Detection Flow
        AR->>ID: stream.pushFrame(audio)
        ID->>Transport: Accumulate frames
        Transport->>Inference: POST /bargein (HTTP) or WS message
        Inference-->>Transport: Predictions + probabilities
    end
    
    rect rgba(200, 100, 100, 0.5)
        Note over Transport,AR: Event Generation & Handling
        Transport->>Transport: Build InterruptionEvent
        Transport->>ID: Emit InterruptionEvent
        ID->>AR: onInterruption(event)
        AR->>AR: Update overlap speech tracking
        AR->>AR: Coordinate STT/EOU handling
    end
    
    AR-->>Client: InterruptionEvent
    AR-->>Client: Metrics (with interruption metadata)
Loading
sequenceDiagram
    participant Agent as Agent
    participant Session as AgentSession
    participant Activity as AgentActivity
    participant Metrics as MetricsCollector
    participant Report as SessionReport
    
    Agent->>Session: Create with turnHandling config
    Session->>Activity: Initialize with InterruptionConfig
    Activity->>Activity: Setup AdaptiveInterruptionDetector
    
    rect rgba(100, 200, 100, 0.5)
        Note over Session,Metrics: Metrics Collection Flow
        Activity->>Metrics: emit('llm_metrics', ...)
        Activity->>Metrics: emit('tts_metrics', ...)
        Activity->>Metrics: emit('stt_metrics', ...)
    end
    
    rect rgba(200, 150, 100, 0.5)
        Note over Metrics,Report: Model Usage Aggregation
        Metrics->>Metrics: ModelUsageCollector.collect()
        Metrics->>Metrics: Aggregate by provider/model
        Metrics->>Report: Include modelUsage array
    end
    
    Session->>Report: Generate final report
    Report-->>Agent: SessionReport with usage breakdown
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • chenghao-mou
  • theomonnom

🐰 Hop-skip through the code with glee,
Interruption detection flows so free,
Turn handling config, metrics that gleam,
A voice-agent refactor dream!

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and does not follow the repository template. It lacks detailed change documentation, pre-review checklist completion, testing details, and rationale for included changes. Complete the description template with detailed 'Changes Made' section documenting all major additions (interruption detector, HTTP/WebSocket transports, turn config, telemetry), confirm all checklist items, document testing approach, and explain deprecation/migration strategy for legacy options.
Docstring Coverage ⚠️ Warning Docstring coverage is 36.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feat/barge in' is vague and uses non-descriptive formatting that doesn't clearly convey the scope of changes. The PR encompasses extensive interruption handling refactoring beyond just barge-in detection. Consider revising to 'Add interruption detection with barge-in support and turn handling config' or similar to better reflect the full scope of changes including new interruption detection system, turn handling configuration, and stream channel extensions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/barge-in

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 245bc66 and 63eccca.

📒 Files selected for processing (8)
  • agents/package.json
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
  • examples/src/basic_agent.ts
  • examples/src/cartesia_tts.ts
  • examples/src/hedra/hedra_avatar.ts
  • plugins/openai/src/llm.ts
  • plugins/openai/src/tts.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/src/hedra/hedra_avatar.ts
  • plugins/openai/src/llm.ts
  • agents/package.json
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
🧠 Learnings (3)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example) : Initialize logger before using any LLM functionality with `initializeLogger({ pretty: true })` from 'livekit/agents'

Applied to files:

  • examples/src/cartesia_tts.ts
  • examples/src/basic_agent.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Applied to files:

  • examples/src/basic_agent.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Use `pnpm build && pnpm dlx tsx ./examples/src/my_agent.ts dev|download-files --log-level=debug|info(default)` to run example agents from the examples directory

Applied to files:

  • examples/src/basic_agent.ts
🧬 Code graph analysis (1)
examples/src/basic_agent.ts (4)
agents/src/voice/agent.ts (1)
  • session (214-216)
plugins/openai/src/realtime/realtime_model.ts (1)
  • session (316-318)
plugins/openai/src/realtime/realtime_model_beta.ts (1)
  • session (304-306)
agents/src/index.ts (2)
  • voice (40-40)
  • metrics (40-40)
🔇 Additional comments (10)
plugins/openai/src/tts.ts (1)

35-46: Clean, low‑risk accessors for model/provider metadata.
Getters are straightforward and defensive; no behavior changes beyond exposing metadata.

agents/src/voice/agent_activity.ts (5)

98-223: Interruption state wiring looks consistent.

Initialization of interruption flags and defaults is cohesive with the new detection flow.


308-321: AudioRecognition hookup is clean.

Passing the interruption detector and endpointing delay boundaries into AudioRecognition looks correct.


413-442: Runtime turnDetection propagation is solid.

The updateOptions path correctly syncs internal flags and forwards the change to AudioRecognition.


725-785: Interruption gating and onInterruption hook integration look good.

The min-words gating and new onInterruption path are well-aligned with the updated flow.


1348-1496: Telemetry metadata propagation is a nice touch.

Passing model/provider into LLM/TTS inference improves trace quality without side effects.

agents/src/voice/audio_recognition.ts (4)

116-257: Interruption pipeline wiring looks good.

The sentinel flow and overlap/agent speech hooks are well integrated.


407-424: STT hold/flush logic is coherent.

The buffering and release flow reads cleanly and should be stable under overlap.


900-961: Interruption task lifecycle looks robust.

Forwarding frames and handling output events is well-structured and cancel-safe.


846-855: Nice addition of STT model/provider attributes.

This improves traceability for user-turn spans.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lukasIO lukasIO requested a review from chenghao-mou January 29, 2026 10:32
@lukasIO lukasIO requested a review from toubatbrian January 30, 2026 14:30
@lukasIO lukasIO marked this pull request as ready for review January 30, 2026 14:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agents/src/voice/agent_activity.ts (1)

914-924: ⚠️ Potential issue | 🟡 Minor

Use the new turnHandling.interruption.minWords in logs to avoid undefined values.

The log currently reads the deprecated options.minInterruptionWords. It should reflect the new config path.

🛠️ Suggested fix
         this.logger.info(
           {
             wordCount,
-            minInterruptionWords: this.agentSession.options.minInterruptionWords,
+            minInterruptionWords: this.agentSession.options.turnHandling.interruption?.minWords,
           },
           'skipping user input, word count below minimum interruption threshold',
         );
🤖 Fix all issues with AI agents
In `@agents/src/inference/interruption/interruption_stream.ts`:
- Around line 372-374: The close() implementation on InterruptionStreamBase
currently only closes inputStream and never removes the stream from
AdaptiveInterruptionDetector's Set, causing leaks and stale updateOptions()
calls; modify the close() method (the async close(): Promise<void> in
interruption_stream.ts) to also unregister/remove this stream from the detector
(e.g., call a detector.unregister/removeStream method or delete it from the
detector's internal Set such as detector.streams.delete(this)) after closing the
inputStream so closed streams are no longer tracked or receive updateOptions().

In `@agents/src/inference/interruption/ws_transport.ts`:
- Around line 180-191: Cache lookups fail and durations are wrong because
entries are keyed with performance.now() but looked up with message.created_at
and durations mix performance.now() with Date-based timestamps; fix by switching
to a single time base (use millisecond Unix time via
Date.now()/message.created_at) for cache keys and all timing math: when
constructing InterruptionCacheEntry (and when populating state.cache), compute
const eventTsMs = message.created_at ?? Date.now(), use
state.cache.get(eventTsMs) / state.cache.set(eventTsMs, ...) and compute
totalDurationInS as (Date.now() - eventTsMs)/1000 while keeping
detectionDelayInS as (Date.now() - state.overlapSpeechStartedAt)/1000; apply the
same change where similar logic exists (the other InterruptionCacheEntry
creation blocks around the 228-239 and 268-272 regions).

In `@agents/src/stream/stream_channel.ts`:
- Around line 24-40: The addStreamInput async IIFE must observe and handle
errors and abort() must be idempotent: inside addStreamInput (function
addStreamInput / the async IIFE using reader.read() and writer.write()) wrap the
loop in try/catch and on any error call the existing abort(error) (or
writer.abort(error) via abort()) so rejections are observed and optionally
surface/log the error, keeping reader.releaseLock() in the finally; also ensure
the IIFE returns its promise (or at least attach .catch(...) to it) to avoid
unhandled rejections. For abort() (the abort: (error: E) => { ... } function)
add an isClosed check/guard like close() uses so calling abort() multiple times
is a no-op (return a resolved promise) and only calls writer.abort(error) once
while setting isClosed = true.

In `@agents/src/voice/audio_recognition.ts`:
- Around line 254-287: The current gating for building eventsToEmit treats
emitFromIndex === 0 as falsy and also compares shouldFlush to undefined; update
the condition so it explicitly checks emitFromIndex for null and that
shouldFlush is true. Replace the final expression that computes eventsToEmit to
use an explicit null check (e.g., emitFromIndex !== null) and a boolean check of
shouldFlush (e.g., shouldFlush === true or simply shouldFlush) so that slice(0)
is allowed and events from index 0 are not dropped; keep using
this.transcriptBuffer.slice(emitFromIndex) when the checks pass.
- Around line 302-329: The function shouldHoldSttEvent never initializes
this.inputStartedAt, so timestamp-based holding/clearing never works; when you
detect the first STT event with valid timestamps (e.g., ev.alternatives[0]
exists, startTime !== endTime and endTime > 0) set this.inputStartedAt to a
reference epoch that aligns the event's endTime with now (for example:
this.inputStartedAt = Date.now() - alternative.endTime * 1000 if endTime is in
seconds, or use Date.now() / 1000 - alternative.endTime if you keep seconds) so
subsequent comparisons in shouldHoldSttEvent and flushHeldTranscripts use the
same timebase; place this initialization inside shouldHoldSttEvent before any
timestamp comparisons when inputStartedAt is undefined.

In `@agents/src/voice/events.ts`:
- Around line 28-29: The file is missing the InterruptionEvent type integration:
import or define InterruptionEvent, add factory helpers
createUserInterruptionDetectedEvent and createUserNonInterruptionDetectedEvent
mirroring the pattern of existing event creators, and include the two new event
types (UserInterruptionDetected and UserNonInterruptionDetected) in the
AgentEvent union so the public AgentSession API payloads are properly typed;
reference the existing event creator functions and AgentEvent union to match
shape and typing conventions.

In `@examples/src/basic_agent.ts`:
- Around line 61-69: The turnHandling block is currently passed at the root of
the AgentSession config but must be nested inside the options property; update
the AgentSession constructor call to move the turnHandling object (including
turnDetection: new livekit.turnDetector.MultilingualModel(), interruption, and
preemptiveGeneration) under options so it matches the AgentSessionOptions /
SessionOptions shape, leaving voiceOptions and other root-level keys unchanged
(e.g., ensure AgentSession({... options: { turnHandling: { ... } },
voiceOptions: { ... } })).
🟡 Minor comments (14)
agents/src/voice/turn_config/endpointing.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Fix SPDX header year to match repo guidelines.

Line 1 uses 2026; new TS files should use 2025 in the SPDX-FileCopyrightText line.

🔧 Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.

As per coding guidelines, "Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'".

.github/workflows/test.yml-49-53 (1)

49-53: ⚠️ Potential issue | 🟡 Minor

Example tests are now skipped in CI.

Lines 49-53 comment out pnpm test:examples, so example regressions won’t be caught on PRs. If this is temporary, please add a tracking issue or move it to a secrets-enabled workflow/scheduled job.

agents/src/inference/interruption/utils.test.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: SPDX headers should use 2025.

agents/src/voice/turn_config/interruption.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: SPDX headers should use 2025.

agents/src/voice/turn_config/turn_handling.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: SPDX headers should use 2025.

agents/src/inference/interruption/types.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

The SPDX header specifies 2026, but the coding guidelines require 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: "Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.'"

agents/src/inference/interruption/errors.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: SPDX headers should use 2025.

agents/src/voice/turn_config/utils.test.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX copyright year should be 2025.

Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0

As per coding guidelines: SPDX headers should use 2025.

agents/src/voice/agent.ts-102-104 (1)

102-104: ⚠️ Potential issue | 🟡 Minor

Legacy turnDetection option is not honored in constructor.

The deprecated turnDetection option from AgentOptions is not destructured or used in the constructor. While it's marked deprecated, it should still work for backward compatibility. Currently, only turnHandling.turnDetection is used.

🐛 Proposed fix for backward compatibility
   constructor(options: AgentOptions<UserData>) {
-    const { id, instructions, chatCtx, tools, stt, vad, llm, tts, turnHandling } = options;
+    const { id, instructions, chatCtx, tools, stt, vad, llm, tts, turnHandling, turnDetection: legacyTurnDetection, allowInterruptions: legacyAllowInterruptions } = options;
     // ... existing code ...
 
     this.turnHandling = turnHandling; // TODO migrate legacy options to new turn handling config when turnConfig is unset
 
-    this.turnDetection = this.turnHandling?.turnDetection;
+    this.turnDetection = this.turnHandling?.turnDetection ?? legacyTurnDetection;
     // ... existing code ...
 
     this._interruptionDetection = this.turnHandling?.interruption.mode;
     if (this.turnHandling?.interruption.mode !== undefined) {
       this._allowInterruptions = !!this.turnHandling.interruption.mode;
+    } else if (legacyAllowInterruptions !== undefined) {
+      this._allowInterruptions = legacyAllowInterruptions;
     }

Also applies to: 127-129

agents/src/voice/turn_config/utils.ts-28-45 (1)

28-45: ⚠️ Potential issue | 🟡 Minor

Mutation after as const may cause TypeScript errors.

The object turnHandling is declared with as const on line 41, making it deeply readonly. However, line 44 attempts to mutate turnHandling.interruption.mode. This will cause a TypeScript compilation error.

🐛 Proposed fix: Remove `as const` or restructure
   const turnHandling: TurnHandlingConfig = {
     turnDetection: turnDetection,
     interruption: {
       discardAudioIfUninterruptible: mergedOptions?.discardAudioIfUninterruptible,
       minDuration: mergedOptions?.minInterruptionDuration,
       minWords: mergedOptions?.minInterruptionWords,
     },
     endpointing: {
       minDelay: mergedOptions?.minEndpointingDelay,
       maxDelay: mergedOptions?.maxEndpointingDelay,
     },

     ...mergedOptions.turnHandling,
-  } as const;
+  };

   if (mergedOptions?.allowInterruptions === false) {
     turnHandling.interruption.mode = false;
   }
agents/src/voice/agent.ts-150-153 (1)

150-153: ⚠️ Potential issue | 🟡 Minor

Potential undefined access on interruption.mode.

turnHandling.interruption is typed as Partial<InterruptionConfig>, which means accessing .mode is safe (returns undefined), but the current logic assumes interruption itself exists. If turnHandling is provided without an interruption field, this.turnHandling?.interruption.mode would throw.

🛡️ Safer access pattern
-    this._interruptionDetection = this.turnHandling?.interruption.mode;
-    if (this.turnHandling?.interruption.mode !== undefined) {
+    this._interruptionDetection = this.turnHandling?.interruption?.mode;
+    if (this.turnHandling?.interruption?.mode !== undefined) {
       this._allowInterruptions = !!this.turnHandling.interruption.mode;
     }
agents/src/inference/interruption/interruption_stream.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX header year should be 2025 for new files.

Update the year to match the required header format.

🔧 Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.

As per coding guidelines: **/*.{ts,tsx,js,jsx}: Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'.

agents/src/inference/interruption/http_transport.ts-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

SPDX header year should be 2025 for new files.

Update the copyright year to match the required header format.

🔧 Suggested fix
-// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2025 LiveKit, Inc.

As per coding guidelines: **/*.{ts,tsx,js,jsx}: Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'.

agents/src/inference/interruption/http_transport.ts-42-77 (1)

42-77: ⚠️ Potential issue | 🟡 Minor

Clarify timestamp semantics to prevent time-base mismatch in totalDurationInS calculation.

The code sends a local performance.now() timestamp to the server as created_at, then uses the server's returned created_at value to calculate totalDurationInS in the caller (line 143: (performance.now() - createdAt) / 1000). This assumes the server echoes back a value in the same time base, but the /bargein endpoint contract doesn't guarantee this. If the server returns a different timestamp (server time, epoch, etc.), the duration calculation becomes invalid.

Rename the local request timestamp to startedAt to clarify that it measures request duration. Separately, the returned createdAt field should be documented or renamed to clarify it represents the server's received timestamp, not the duration reference point.

🛠️ Suggested fix
 export async function predictHTTP(
   data: Int16Array,
   predictOptions: PredictOptions,
   options: PostOptions,
 ): Promise<PredictResponse> {
-  const createdAt = performance.now();
+  const startedAt = performance.now();
   const url = new URL(`/bargein`, options.baseUrl);
   url.searchParams.append('threshold', predictOptions.threshold.toString());
   url.searchParams.append('min_frames', predictOptions.minFrames.toFixed());
-  url.searchParams.append('created_at', createdAt.toFixed());
+  url.searchParams.append('created_at', startedAt.toFixed());

   let retryCount = 0;
   const response = await ofetch(url.toString(), {
@@
   const { created_at, is_bargein, probabilities } = predictEndpointResponseSchema.parse(response);

   return {
     createdAt: created_at,
     isBargein: is_bargein,
     probabilities,
-    predictionDurationInS: (performance.now() - createdAt) / 1000,
+    predictionDurationInS: (performance.now() - startedAt) / 1000,
   };
 }
🧹 Nitpick comments (9)
examples/src/basic_agent.ts (1)

72-75: Potential duplication: preemptiveGeneration is set in both turnHandling and voiceOptions.

preemptiveGeneration: true is set at line 68 inside turnHandling and again at line 74 inside voiceOptions. Once the type issue is resolved, consider whether both locations are necessary or if one should take precedence.

agents/src/inference/interruption/defaults.ts (1)

39-52: Verify consistency between inferenceTimeout and REMOTE_INFERENCE_TIMEOUT_IN_S.

inferenceTimeout is set to 1_000 (milliseconds) while REMOTE_INFERENCE_TIMEOUT_IN_S is 1.0 (seconds). While they represent the same duration, consider using REMOTE_INFERENCE_TIMEOUT_IN_S * 1000 for clarity and to ensure they stay in sync if the constant changes.

🔧 Suggested improvement for consistency
 export const interruptionOptionDefaults: InterruptionOptions = {
   sampleRate: SAMPLE_RATE,
   threshold: THRESHOLD,
   minFrames: Math.ceil(MIN_INTERRUPTION_DURATION_IN_S * FRAMES_PER_SECOND),
   maxAudioDurationInS: MAX_AUDIO_DURATION_IN_S,
   audioPrefixDurationInS: AUDIO_PREFIX_DURATION_IN_S,
   detectionIntervalInS: DETECTION_INTERVAL_IN_S,
-  inferenceTimeout: 1_000,
+  inferenceTimeout: REMOTE_INFERENCE_TIMEOUT_IN_S * 1000,
   baseUrl: DEFAULT_BASE_URL,
agents/src/voice/turn_config/utils.ts (1)

26-26: Handle case where both voiceOptions and sessionOptions are undefined.

When both voiceOptions and sessionOptions are undefined, spreading them results in { ...undefined, ...undefined } which evaluates to {}. While this works, calling structuredClone on an empty object and then accessing mergedOptions.turnHandling on line 40 could produce unexpected results if turnHandling is expected to be undefined.

🛡️ Defensive handling suggestion
-  const mergedOptions = structuredClone({ ...voiceOptions, ...sessionOptions });
+  const mergedOptions = structuredClone({ ...(voiceOptions ?? {}), ...(sessionOptions ?? {}) });
agents/src/inference/interruption/ws_transport.ts (2)

114-119: Empty catch block silently swallows parse errors.

While the warning is logged, the empty catch block at line 117 discards the original error information. Consider including the error in the log for debugging purposes.

🔍 Proposed improvement
       try {
         const message = wsMessageSchema.parse(JSON.parse(data.toString()));
         handleMessage(message);
-      } catch {
-        logger.warn({ data: data.toString() }, 'Failed to parse WebSocket message');
+      } catch (err) {
+        logger.warn({ data: data.toString(), err }, 'Failed to parse WebSocket message');
       }

315-318: reconnect() doesn't actively reconnect.

The reconnect() method only calls close() and relies on a subsequent ensureConnection() call to re-establish the connection. However, there's no guarantee that ensureConnection() will be called after reconnect(). If this is intentional (lazy reconnection on next transform), the comment should clarify this behavior. Otherwise, consider awaiting ensureConnection() within reconnect().

♻️ Option A: Active reconnection
   async function reconnect(): Promise<void> {
     close();
-    // Connection will be re-established on next ensureConnection call
+    await ensureConnection();
   }
♻️ Option B: Clarify lazy reconnection in comment
   async function reconnect(): Promise<void> {
     close();
-    // Connection will be re-established on next ensureConnection call
+    // Connection will be lazily re-established on next transform() call via ensureConnection()
+    // This is intentional to avoid reconnecting if the stream is being closed
   }
agents/src/inference/interruption/utils.ts (2)

43-60: Type safety concern in setOrUpdate with Record<string, unknown> cast.

The cast to Record<string, unknown> on line 56 bypasses TypeScript's type checking. While this is acceptable for the current use case, it could allow setting invalid field names. Consider adding runtime validation or documenting the expected usage.


127-139: Performance: O(n×w) complexity for sliding window calculation.

The slidingWindowMinMax function creates a new array slice for each window position (line 135), resulting in O(n×w) time complexity. For large probability arrays, consider using a monotonic deque for O(n) complexity.

For current use cases with small window sizes (likely 2-3 frames based on MIN_INTERRUPTION_DURATION_IN_S), this is acceptable but may become a bottleneck with longer audio durations.

agents/src/inference/interruption/interruption_detector.ts (2)

25-26: Public options property exposes mutable internal state.

The options property is public, allowing external code to directly mutate the detector's configuration without triggering updateOptions(). Consider making it readonly or private with a getter.

♻️ Encapsulate options
 export class AdaptiveInterruptionDetector extends (EventEmitter as new () => TypedEventEmitter<InterruptionCallbacks>) {
-  options: InterruptionOptions;
+  private _options: InterruptionOptions;
   private readonly _label: string;
   // ... later in constructor ...
-    this.options = {
+    this._options = {
   // ... add getter ...
+  get options(): Readonly<InterruptionOptions> {
+    return this._options;
+  }

81-94: Duplicate minFrames calculation logic.

The formula Math.ceil(minInterruptionDurationInS * FRAMES_PER_SECOND) appears here and in defaults.ts (line 42) and updateOptions (line 179). Consider extracting this to a shared helper function to ensure consistency.

♻️ Extract helper function

Add to defaults.ts:

export function computeMinFrames(minInterruptionDurationInS: number): number {
  return Math.ceil(minInterruptionDurationInS * FRAMES_PER_SECOND);
}

Then use it consistently across files.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb2745 and 96d6b57.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (27)
  • .changeset/config.json
  • .github/workflows/test.yml
  • agents/package.json
  • agents/src/inference/interruption/defaults.ts
  • agents/src/inference/interruption/errors.ts
  • agents/src/inference/interruption/http_transport.ts
  • agents/src/inference/interruption/interruption_cache_entry.ts
  • agents/src/inference/interruption/interruption_detector.ts
  • agents/src/inference/interruption/interruption_stream.ts
  • agents/src/inference/interruption/types.ts
  • agents/src/inference/interruption/utils.test.ts
  • agents/src/inference/interruption/utils.ts
  • agents/src/inference/interruption/ws_transport.ts
  • agents/src/stream/stream_channel.ts
  • agents/src/telemetry/trace_types.ts
  • agents/src/voice/agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/agent_session.ts
  • agents/src/voice/audio_recognition.ts
  • agents/src/voice/events.ts
  • agents/src/voice/turn_config/endpointing.ts
  • agents/src/voice/turn_config/interruption.ts
  • agents/src/voice/turn_config/turn_handling.ts
  • agents/src/voice/turn_config/utils.test.ts
  • agents/src/voice/turn_config/utils.ts
  • examples/package.json
  • examples/src/basic_agent.ts
💤 Files with no reviewable changes (1)
  • examples/package.json
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • agents/src/inference/interruption/utils.test.ts
  • agents/src/voice/turn_config/utils.test.ts
  • agents/src/voice/turn_config/turn_handling.ts
  • agents/src/voice/turn_config/endpointing.ts
  • agents/src/voice/turn_config/interruption.ts
  • examples/src/basic_agent.ts
  • agents/src/inference/interruption/ws_transport.ts
  • agents/src/inference/interruption/interruption_cache_entry.ts
  • agents/src/inference/interruption/errors.ts
  • agents/src/inference/interruption/types.ts
  • agents/src/inference/interruption/utils.ts
  • agents/src/stream/stream_channel.ts
  • agents/src/voice/turn_config/utils.ts
  • agents/src/inference/interruption/defaults.ts
  • agents/src/voice/events.ts
  • agents/src/inference/interruption/interruption_detector.ts
  • agents/src/voice/audio_recognition.ts
  • agents/src/voice/agent.ts
  • agents/src/inference/interruption/http_transport.ts
  • agents/src/inference/interruption/interruption_stream.ts
  • agents/src/telemetry/trace_types.ts
  • agents/src/voice/agent_session.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • agents/src/inference/interruption/utils.test.ts
  • agents/src/voice/turn_config/utils.test.ts
  • agents/src/voice/turn_config/turn_handling.ts
  • agents/src/voice/turn_config/endpointing.ts
  • agents/src/voice/turn_config/interruption.ts
  • examples/src/basic_agent.ts
  • agents/src/inference/interruption/ws_transport.ts
  • agents/src/inference/interruption/interruption_cache_entry.ts
  • agents/src/inference/interruption/errors.ts
  • agents/src/inference/interruption/types.ts
  • agents/src/inference/interruption/utils.ts
  • agents/src/stream/stream_channel.ts
  • agents/src/voice/turn_config/utils.ts
  • agents/src/inference/interruption/defaults.ts
  • agents/src/voice/events.ts
  • agents/src/inference/interruption/interruption_detector.ts
  • agents/src/voice/audio_recognition.ts
  • agents/src/voice/agent.ts
  • agents/src/inference/interruption/http_transport.ts
  • agents/src/inference/interruption/interruption_stream.ts
  • agents/src/telemetry/trace_types.ts
  • agents/src/voice/agent_session.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • agents/src/inference/interruption/utils.test.ts
  • agents/src/voice/turn_config/utils.test.ts
  • agents/src/voice/turn_config/turn_handling.ts
  • agents/src/voice/turn_config/endpointing.ts
  • agents/src/voice/turn_config/interruption.ts
  • examples/src/basic_agent.ts
  • agents/src/inference/interruption/ws_transport.ts
  • agents/src/inference/interruption/interruption_cache_entry.ts
  • agents/src/inference/interruption/errors.ts
  • agents/src/inference/interruption/types.ts
  • agents/src/inference/interruption/utils.ts
  • agents/src/stream/stream_channel.ts
  • agents/src/voice/turn_config/utils.ts
  • agents/src/inference/interruption/defaults.ts
  • agents/src/voice/events.ts
  • agents/src/inference/interruption/interruption_detector.ts
  • agents/src/voice/audio_recognition.ts
  • agents/src/voice/agent.ts
  • agents/src/inference/interruption/http_transport.ts
  • agents/src/inference/interruption/interruption_stream.ts
  • agents/src/telemetry/trace_types.ts
  • agents/src/voice/agent_session.ts
  • agents/src/voice/agent_activity.ts
🧠 Learnings (5)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`

Applied to files:

  • .github/workflows/test.yml
  • agents/src/voice/turn_config/utils.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Use `pnpm build && pnpm dlx tsx ./examples/src/my_agent.ts dev|download-files --log-level=debug|info(default)` to run example agents from the examples directory

Applied to files:

  • .github/workflows/test.yml
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Applied to files:

  • agents/src/inference/interruption/utils.test.ts
  • agents/src/voice/agent_session.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality

Applied to files:

  • agents/src/inference/interruption/utils.test.ts
  • agents/src/voice/turn_config/utils.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example) : Initialize logger before using any LLM functionality with `initializeLogger({ pretty: true })` from 'livekit/agents'

Applied to files:

  • examples/src/basic_agent.ts
🧬 Code graph analysis (12)
agents/src/inference/interruption/utils.test.ts (1)
agents/src/inference/interruption/utils.ts (1)
  • slidingWindowMinMax (127-140)
agents/src/voice/turn_config/utils.test.ts (5)
agents/src/log.ts (1)
  • initializeLogger (34-40)
agents/src/voice/turn_config/utils.ts (1)
  • migrateLegacyOptions (14-63)
agents/src/voice/turn_config/turn_handling.ts (1)
  • defaultTurnHandlingConfig (41-45)
agents/src/voice/turn_config/endpointing.ts (1)
  • defaultEndpointingConfig (23-26)
agents/src/voice/turn_config/interruption.ts (1)
  • defaultInterruptionConfig (42-49)
agents/src/voice/turn_config/turn_handling.ts (3)
agents/src/voice/agent_session.ts (1)
  • TurnDetectionMode (123-123)
agents/src/voice/turn_config/endpointing.ts (2)
  • EndpointingConfig (7-21)
  • defaultEndpointingConfig (23-26)
agents/src/voice/turn_config/interruption.ts (2)
  • InterruptionConfig (7-40)
  • defaultInterruptionConfig (42-49)
agents/src/inference/interruption/interruption_cache_entry.ts (1)
agents/src/inference/interruption/utils.ts (1)
  • estimateProbability (115-125)
agents/src/stream/stream_channel.ts (1)
agents/src/stream/index.ts (3)
  • StreamChannel (7-7)
  • createStreamChannel (7-7)
  • IdentityTransform (5-5)
agents/src/voice/turn_config/utils.ts (5)
agents/src/voice/agent_session.ts (2)
  • AgentSessionOptions (139-152)
  • InternalSessionOptions (109-111)
agents/src/voice/turn_config/turn_handling.ts (2)
  • TurnHandlingConfig (12-34)
  • defaultTurnHandlingConfig (41-45)
agents/src/voice/agent_activity.ts (1)
  • turnDetection (379-382)
agents/src/voice/turn_config/endpointing.ts (1)
  • defaultEndpointingConfig (23-26)
agents/src/voice/turn_config/interruption.ts (1)
  • defaultInterruptionConfig (42-49)
agents/src/inference/interruption/interruption_detector.ts (2)
agents/src/inference/interruption/types.ts (1)
  • InterruptionOptions (33-46)
agents/src/inference/interruption/interruption_stream.ts (2)
  • InterruptionStreamBase (77-375)
  • stream (338-340)
agents/src/voice/agent.ts (3)
agents/src/voice/agent_session.ts (1)
  • TurnDetectionMode (123-123)
agents/src/voice/turn_config/turn_handling.ts (1)
  • TurnHandlingConfig (12-34)
agents/src/voice/turn_config/interruption.ts (1)
  • InterruptionConfig (7-40)
agents/src/inference/interruption/http_transport.ts (4)
agents/src/inference/interruption/defaults.ts (1)
  • intervalForRetry (28-37)
agents/src/inference/interruption/interruption_cache_entry.ts (1)
  • InterruptionCacheEntry (10-47)
agents/src/inference/interruption/types.ts (1)
  • InterruptionEvent (17-28)
agents/src/inference/utils.ts (1)
  • createAccessToken (10-19)
agents/src/inference/interruption/interruption_stream.ts (4)
agents/src/inference/interruption/types.ts (9)
  • AgentSpeechStarted (59-61)
  • AgentSpeechEnded (63-65)
  • OverlapSpeechStarted (67-71)
  • OverlapSpeechEnded (73-75)
  • Flush (77-79)
  • InterruptionSentinel (84-89)
  • InterruptionEvent (17-28)
  • InterruptionOptions (33-46)
  • ApiConnectOptions (51-55)
agents/src/inference/interruption/interruption_cache_entry.ts (1)
  • InterruptionCacheEntry (10-47)
agents/src/inference/interruption/defaults.ts (2)
  • apiConnectDefaults (18-22)
  • FRAMES_PER_SECOND (14-14)
agents/src/inference/interruption/utils.ts (1)
  • BoundedCache (10-109)
agents/src/voice/agent_session.ts (5)
agents/src/voice/turn_config/turn_handling.ts (2)
  • TurnHandlingConfig (12-34)
  • InternalTurnHandlingConfig (36-39)
agents/src/inference/interruption/types.ts (1)
  • InterruptionEvent (17-28)
agents/src/voice/run_context.ts (1)
  • UnknownUserData (8-8)
agents/src/voice/turn_config/interruption.ts (1)
  • InterruptionConfig (7-40)
agents/src/voice/turn_config/utils.ts (1)
  • migrateLegacyOptions (14-63)
agents/src/voice/agent_activity.ts (4)
agents/src/voice/audio_recognition.ts (1)
  • RecognitionHooks (42-53)
agents/src/voice/agent_session.ts (1)
  • TurnDetectionMode (123-123)
agents/src/inference/interruption/interruption_detector.ts (1)
  • AdaptiveInterruptionDetector (25-189)
agents/src/inference/interruption/types.ts (1)
  • InterruptionEvent (17-28)
🪛 GitHub Actions: Test
examples/src/basic_agent.ts

[error] 61-61: TS2353: Object literal may only specify known properties, and 'turnHandling' does not exist in type 'AgentSessionOptions'.

🔇 Additional comments (34)
.changeset/config.json (1)

11-11: Formatting-only change looks good.

Line 11 keeps the fixed package groups intact; no behavioral change expected.

agents/src/telemetry/trace_types.ts (1)

54-59: Telemetry attribute additions look consistent.

Line 54-59 follow the existing lk.* naming and section placement.

agents/package.json (1)

72-72: No action needed — ofetch 1.5.1 provides CommonJS compatibility.

ofetch publishes both ESM and CJS builds with proper conditional exports. The CJS entry point (./dist/node.cjs) is configured in package exports and the legacy main field for broader resolver compatibility. No require() consumers will break.

agents/src/inference/interruption/types.ts (1)

6-89: Well-structured type definitions.

The enum, interfaces, and union types are clearly documented and provide a clean contract for the interruption detection system. The sentinel types using discriminated unions (type property) enable type-safe stream control signal handling.

agents/src/inference/interruption/utils.test.ts (1)

7-30: Good test coverage for slidingWindowMinMax.

Tests cover all important edge cases: empty arrays, arrays shorter than window, window size of 1, and boundary conditions where array length equals window size. The inline comments clearly explain the expected calculations.

agents/src/voice/turn_config/interruption.ts (1)

7-49: Well-designed configuration interface.

The InterruptionConfig interface is thoroughly documented with clear JSDoc comments explaining each field's purpose and default value. The as const satisfies InterruptionConfig pattern ensures type safety while preserving literal types in the default object.

agents/src/voice/turn_config/utils.test.ts (2)

11-13: Good practice: Logger initialization in beforeAll.

Properly initializes the logger before tests run, which aligns with the coding guidelines for test files.


15-100: Comprehensive migration test coverage.

The tests thoroughly verify:

  • Default configuration when no options provided
  • Legacy flat field migration to nested turnHandling structure
  • allowInterruptions: false correctly sets interruption.mode to false
  • Precedence rules between options and voiceOptions
  • Explicit turnHandling overriding legacy fields
  • Preservation of turnDetection at both levels
agents/src/voice/turn_config/turn_handling.ts (1)

8-45: Clean type hierarchy for turn handling configuration.

Good design separating TurnHandlingConfig (public API with Partial<> for optional overrides) from InternalTurnHandlingConfig (internal use with fully concrete types). The JSDoc clearly documents the turn detection strategies and fallback behavior.

agents/src/inference/interruption/errors.ts (1)

7-25: Well-designed custom error class.

The InterruptionDetectionError provides useful metadata (timestamp, label, recoverable) for error handling and debugging. The recoverable flag allows callers to determine whether to retry or fail permanently.

agents/src/inference/interruption/defaults.ts (2)

1-6: LGTM on SPDX headers and imports.

The file correctly includes the SPDX copyright and license headers. Consider verifying that ApiConnectOptions is intentionally imported from interruption_stream.js rather than types.ts, which also defines this interface per the relevant code snippets.


28-37: Exponential backoff implementation looks correct.

The intervalForRetry function correctly implements exponential backoff with jitter (0-25% randomization). The formula baseInterval * 2^attempt + jitter aligns with standard retry patterns.

agents/src/voice/turn_config/utils.ts (3)

1-3: LGTM on SPDX headers.


65-68: LGTM on stripUndefined implementation.

Clean and effective utility for preventing undefined values from shadowing defaults during object spread.


70-76: LGTM on mergeWithDefaults implementation.

Correctly applies defaults while using stripUndefined to preserve default values when config fields are undefined.

agents/src/inference/interruption/ws_transport.ts (2)

320-350: LGTM on TransformStream implementation.

The transform logic correctly handles both Int16Array audio chunks and passthrough of InterruptionEvent objects. Error handling and cleanup via flush() are properly implemented.


1-3: ⚠️ Potential issue | 🟡 Minor

Update copyright year to 2026.

The SPDX copyright year is 2025 but should be 2026 per coding guidelines.

📝 Proposed fix
-// SPDX-FileCopyrightText: 2025 LiveKit, Inc.
+// SPDX-FileCopyrightText: 2026 LiveKit, Inc.
 //
 // SPDX-License-Identifier: Apache-2.0
⛔ Skipped due to learnings
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx,js,jsx} : Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
agents/src/inference/interruption/interruption_cache_entry.ts (3)

1-5: LGTM on SPDX headers and imports.


10-35: LGTM on class structure and constructor.

The mutable design is intentionally documented to support the setOrUpdate pattern from the cache. Constructor defaults are sensible with numeric durations defaulting to 0.


37-46: LGTM on probability getter and default factory.

The getter correctly delegates to estimateProbability which handles edge cases (returns 0 for insufficient data per the relevant code snippet showing if (probabilities.length < minWindow) { return 0; }).

agents/src/inference/interruption/utils.ts (3)

1-4: LGTM on SPDX headers and imports.


10-25: LGTM on BoundedCache set/get implementation.

The FIFO eviction strategy is correctly implemented by removing the first inserted key when size exceeds the limit.


115-125: LGTM on estimateProbability.

Correctly handles edge cases by returning 0 when there's insufficient data for the minimum window size.

agents/src/voice/agent.ts (2)

32-33: LGTM on new type imports and AgentOptions additions.

The deprecation annotation on turnDetection and addition of turnHandling properly signals the migration path to users.

Also applies to: 68-69, 76-77


194-200: LGTM on new getters.

The interruptionDetection and allowInterruptions getters correctly expose the internal state derived from the turn handling configuration.

agents/src/inference/interruption/interruption_detector.ts (4)

1-15: LGTM on SPDX headers and imports.


47-49: Validation message is slightly misleading.

The condition maxAudioDurationInS > 3.0 rejects values strictly greater than 3.0, but the error message says "must be less than or equal to 3.0". This is correct behavior but the phrasing could be clearer by explicitly stating the limit.


170-188: LGTM on updateOptions implementation.

The method correctly propagates option updates to all active streams and awaits all promises with Promise.all(). This ensures consistent configuration across all streams.


153-157: Usage is correct and intentional.

The createStream() method passing an empty object {} as the second argument to InterruptionStreamBase is valid and by design. The constructor accepts apiOptions: Partial<ApiConnectOptions>, which makes all properties optional. The implementation merges this with apiConnectDefaults, allowing callers to pass an empty object to use all defaults. This is an idiomatic TypeScript pattern and requires no changes.

Likely an incorrect or invalid review comment.

agents/src/inference/interruption/http_transport.ts (1)

105-174: LGTM: transport keeps stream flowing even on inference failures.

Graceful error handling and event emission look solid.

agents/src/voice/audio_recognition.ts (1)

851-914: LGTM: interruption task wiring is clean and abort-safe.

The split between forwardTask and eventReader with abort handling looks solid.

agents/src/inference/interruption/interruption_stream.ts (1)

158-319: LGTM: two-stage pipeline cleanly separates framing and transport.

The audioTransformer -> transport -> eventEmitter composition is easy to reason about.

agents/src/voice/agent_session.ts (1)

73-265: LGTM: legacy option migration keeps API compatibility while moving to SessionOptions.

The constructor flow and defaults look consistent with the new turn-handling config.

agents/src/voice/agent_activity.ts (1)

307-318: Nice: interruption detector is cleanly threaded into AudioRecognition setup.

The wiring keeps the recognition pipeline cohesive.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 302 to 329
private shouldHoldSttEvent(ev: SpeechEvent): boolean {
if (!this.isInterruptionEnabled) {
return false;
}
if (this.isAgentSpeaking) {
return true;
}

if (this.ignoreUserTranscriptUntil === undefined) {
return false;
}
// sentinel events are always held until we have something concrete to release them
if (!ev.alternatives || ev.alternatives.length === 0) {
return true;
}

const alternative = ev.alternatives[0];

if (
this.inputStartedAt &&
alternative.startTime !== alternative.endTime &&
alternative.endTime > 0 &&
alternative.endTime + this.inputStartedAt < this.ignoreUserTranscriptUntil
) {
return true;
}
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and explore its structure
git ls-files | grep -i audio_recognition

Repository: livekit/agents-js

Length of output: 98


🏁 Script executed:

# Look at the full audio_recognition.ts file to understand the class structure
wc -l agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 103


🏁 Script executed:

# Get the file to see where inputStartedAt is declared and used
rg "inputStartedAt" agents/src/voice/audio_recognition.ts -B 2 -A 2

Repository: livekit/agents-js

Length of output: 822


🏁 Script executed:

# Check the flushHeldTranscripts method mentioned in the comment
rg "flushHeldTranscripts" agents/src/voice/audio_recognition.ts -B 5 -A 15

Repository: livekit/agents-js

Length of output: 2120


🏁 Script executed:

# Look at the onSTTEvent method to see the current implementation
rg "private async onSTTEvent|private onSTTEvent" agents/src/voice/audio_recognition.ts -A 30

Repository: livekit/agents-js

Length of output: 1190


🏁 Script executed:

# Search for all assignments/initializations of inputStartedAt
rg "inputStartedAt\s*=" agents/src/voice/audio_recognition.ts -B 2 -A 2

Repository: livekit/agents-js

Length of output: 43


🏁 Script executed:

# Check the constructor and initialization section
rg "constructor|private inputStartedAt" agents/src/voice/audio_recognition.ts -A 20 | head -50

Repository: livekit/agents-js

Length of output: 1386


🏁 Script executed:

# Look for any place where inputStartedAt might be set in event handlers or methods
rg "this\.inputStartedAt" agents/src/voice/audio_recognition.ts -B 1 -A 1

Repository: livekit/agents-js

Length of output: 486


🏁 Script executed:

# Check if there's a reset or clear method that might initialize it
rg "reset|clear|initialize" agents/src/voice/audio_recognition.ts -i | grep -i "inputstarted\|method"

Repository: livekit/agents-js

Length of output: 43


🏁 Script executed:

# Check what class this is and if there's inheritance
rg "class AudioRecognition|export class|extends" agents/src/voice/audio_recognition.ts -A 5 | head -20

Repository: livekit/agents-js

Length of output: 262


🏁 Script executed:

# Look for where ignoreUserTranscriptUntil is set to understand the pattern
rg "ignoreUserTranscriptUntil\s*=" agents/src/voice/audio_recognition.ts -B 3 -A 3

Repository: livekit/agents-js

Length of output: 1352


🏁 Script executed:

# Check if there's any method that might set inputStartedAt indirectly
rg "Date\.now|endTime|startTime" agents/src/voice/audio_recognition.ts | head -20

Repository: livekit/agents-js

Length of output: 1072


🏁 Script executed:

# Look for the complete flushHeldTranscripts method to see the full logic
sed -n '600,700p' agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 4142


🏁 Script executed:

# Let's view the method around line 302-329 to see shouldHoldSttEvent in full context
sed -n '300,350p' agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 1431


🏁 Script executed:

# Get the complete onSTTEvent method
sed -n '220,320p' agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 2823


🏁 Script executed:

# Check if there are any other places that might be responsible for initializing inputStartedAt
rg "reset|clear|interrupt" agents/src/voice/audio_recognition.ts -i -B 2 -A 5 | head -60

Repository: livekit/agents-js

Length of output: 2378


🏁 Script executed:

# Look at the method around line 220 to understand when onSTTEvent is called
sed -n '180,230p' agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 1663


🏁 Script executed:

# Check if inputStartedAt is ever reset/cleared or reassigned
rg "inputStartedAt\s*(=|:)" agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 43


🏁 Script executed:

# Let's see the full onSTTEvent method to understand the flow better
sed -n '330,450p' agents/src/voice/audio_recognition.ts

Repository: livekit/agents-js

Length of output: 4448


🏁 Script executed:

# Check if there's any event listener or callback that might be setting timestamps
rg "startTime|endTime" agents/src/voice/audio_recognition.ts -B 1 -A 1 | head -40

Repository: livekit/agents-js

Length of output: 803


🏁 Script executed:

# Verify the complete picture - check what SpeechEvent.alternatives[].endTime represents (its units/type)
rg "interface SpeechEvent|type SpeechEvent" agents/src -A 20 | head -50

Repository: livekit/agents-js

Length of output: 3458


🏁 Script executed:

# Check STT implementations to understand the endTime contract
fd -e ts -e tsx | xargs rg "endTime" | grep -i "speech\|stt\|transcript" | head -20

Repository: livekit/agents-js

Length of output: 1354


🏁 Script executed:

# Search for SpeechEvent definition
rg "class SpeechEvent|interface.*SpeechEvent|type.*SpeechEvent" agents/src -A 15

Repository: livekit/agents-js

Length of output: 16679


Initialize inputStartedAt on the first STT event with valid timestamps; otherwise held transcripts get cleared and timestamp-based filtering never works.

inputStartedAt is never set, so flushHeldTranscripts() clears the buffer and shouldHoldSttEvent() cannot compare transcript timestamps. Set it on the first STT event with valid endTime to establish the reference point for timestamp calculations.

🛠️ Suggested fix (adjust units as needed)
   private async onSTTEvent(ev: SpeechEvent) {
+    const firstAlt = ev.alternatives?.[0];
+    if (this.inputStartedAt === undefined && firstAlt?.endTime && firstAlt.endTime > 0) {
+      // If endTime is already in ms, remove the * 1000.
+      this.inputStartedAt = Date.now() - firstAlt.endTime * 1000;
+    }
     if (
       this.turnDetectionMode === 'manual' &&
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 302 - 329, The function
shouldHoldSttEvent never initializes this.inputStartedAt, so timestamp-based
holding/clearing never works; when you detect the first STT event with valid
timestamps (e.g., ev.alternatives[0] exists, startTime !== endTime and endTime >
0) set this.inputStartedAt to a reference epoch that aligns the event's endTime
with now (for example: this.inputStartedAt = Date.now() - alternative.endTime *
1000 if endTime is in seconds, or use Date.now() / 1000 - alternative.endTime if
you keep seconds) so subsequent comparisons in shouldHoldSttEvent and
flushHeldTranscripts use the same timebase; place this initialization inside
shouldHoldSttEvent before any timestamp comparisons when inputStartedAt is
undefined.

Comment on lines +28 to +29
UserInterruptionDetected = 'user_interruption_detected',
UserNonInterruptionDetected = 'user_non_interruption_detected',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n -C3 "user_interruption_detected|UserInterruptionDetected" agents/src

Repository: livekit/agents-js

Length of output: 3493


🏁 Script executed:

rg -n -C3 "user_non_interruption_detected|UserNonInterruptionDetected" agents/src

Repository: livekit/agents-js

Length of output: 3209


🏁 Script executed:

# Also check the events.ts file to understand the event structure
cat -n agents/src/voice/events.ts

Repository: livekit/agents-js

Length of output: 8320


Add InterruptionEvent type definition and payload typings to events.ts.

The new UserInterruptionDetected and UserNonInterruptionDetected events (lines 28-29) are emitted through the public AgentSession API, but their payload type (InterruptionEvent) is not integrated into events.ts. Following the established pattern for other events in this file:

  • Import or define the InterruptionEvent type
  • Add createUserInterruptionDetectedEvent and createUserNonInterruptionDetectedEvent helper functions
  • Add both event types to the AgentEvent union (currently lines 254-263)
🤖 Prompt for AI Agents
In `@agents/src/voice/events.ts` around lines 28 - 29, The file is missing the
InterruptionEvent type integration: import or define InterruptionEvent, add
factory helpers createUserInterruptionDetectedEvent and
createUserNonInterruptionDetectedEvent mirroring the pattern of existing event
creators, and include the two new event types (UserInterruptionDetected and
UserNonInterruptionDetected) in the AgentEvent union so the public AgentSession
API payloads are properly typed; reference the existing event creator functions
and AgentEvent union to match shape and typing conventions.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@agents/src/voice/audio_recognition.ts`:
- Around line 881-889: The cleanup closure may call eventReader.releaseLock()
before eventReader is assigned, causing a TypeError; update cleanup (the async
function defined alongside abortHandler) to defensively check that eventReader
is defined (and optionally has releaseLock as a function) before calling
releaseLock, and likewise guard await stream.close() with a null/undefined
check; ensure you still removeEventListener('abort', abortHandler)
unconditionally but only call eventReader.releaseLock() and stream.close() when
those objects are non-null to avoid exceptions during early aborts.
- Around line 256-269: In flushHeldTranscripts(), avoid dropping held
transcripts when inputStartedAt is undefined: instead of clearing
transcriptBuffer and resetting ignoreUserTranscriptUntil when inputStartedAt is
falsy, retain the buffer and either log a warning or postpone flushing until
inputStartedAt is set; update the logic in flushHeldTranscripts (referencing
inputStartedAt, transcriptBuffer, ignoreUserTranscriptUntil) to preserve events
and emit a warning if onEndOfAgentSpeech triggers before any audio frames, or
queue a retry to flush once inputStartedAt becomes defined.
🧹 Nitpick comments (1)
agents/src/voice/audio_recognition.ts (1)

1035-1043: Consider adding an explicit return false for clarity.

The getter implicitly returns undefined for modes like 'manual' and 'stt'. While this works in boolean contexts, an explicit return false would make the intent clearer.

♻️ Suggested improvement
   private get vadBaseTurnDetection() {
     if (typeof this.turnDetectionMode === 'object') {
       return false;
     }
 
     if (this.turnDetectionMode === undefined || this.turnDetectionMode === 'vad') {
       return true;
     }
+    return false;
   }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5c541f and 76bd4e8.

📒 Files selected for processing (1)
  • agents/src/voice/audio_recognition.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • agents/src/voice/audio_recognition.ts
🔇 Additional comments (12)
agents/src/voice/audio_recognition.ts (12)

1-71: LGTM!

The imports and interface extensions are well-structured. The RecognitionHooks extension with onInterruption and the new AudioRecognitionOptions fields follow the existing patterns in this file.


110-119: LGTM!

The new private fields for interruption state management are well-typed and appropriately scoped.


132-155: LGTM!

The conditional stream setup correctly handles both cases (with and without interruption detection). The 3-way tee for the interruption path and the fallback 2-way tee maintain backward compatibility.


174-197: LGTM!

The start() and stop() methods correctly manage the interruption task lifecycle. The early return in createInterruptionTask handles the case when interruption detection is not configured.


199-250: LGTM!

The agent speech lifecycle methods properly coordinate the interruption state transitions and sentinel dispatching. The use of Math.min for ignoreUserTranscriptUntil correctly handles overlapping end-of-speech events.


319-346: LGTM!

The shouldHoldSttEvent logic correctly handles edge cases: events without alternatives are held, and timestamp comparisons are guarded by inputStartedAt being defined. The initialization of inputStartedAt in createInterruptionTask provides a stable reference point.


348-366: LGTM!

The method has appropriate defensive checks and error handling. Returning a boolean for success/failure allows callers to react accordingly.


390-407: LGTM!

The interruption-aware STT event handling correctly buffers events during interruption states and flushes them when appropriate. The exemption for RECOGNITION_USAGE events is sensible.


579-583: LGTM!

The method cleanly dispatches interruption events to the registered hook.


892-906: LGTM!

The forwardTask correctly sets inputStartedAt once using nullish coalescing assignment, addressing the previous concern about timestamp drift. The async IIFE pattern with proper cleanup in finally is appropriate.


908-934: LGTM!

The event reading loop correctly uses Promise.race for cancellation support and has proper error handling that distinguishes between abort-induced and unexpected errors.


1001-1010: LGTM!

The close method properly cleans up interruption resources in the correct order: canceling the task before closing the channel.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 256 to 269
private async flushHeldTranscripts() {
if (
!this.isInterruptionEnabled ||
this.ignoreUserTranscriptUntil === undefined ||
this.transcriptBuffer.length === 0
) {
return;
}

if (!this.inputStartedAt) {
this.transcriptBuffer = [];
this.ignoreUserTranscriptUntil = undefined;
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Transcripts may be dropped if inputStartedAt is not yet set when flush is called.

When inputStartedAt is undefined (no audio frames processed yet), the buffer is cleared and events are lost. This could happen if onEndOfAgentSpeech is called before any audio frames flow through the interruption task.

Consider holding the events instead of dropping them, or logging a warning when this edge case occurs.

🛠️ Suggested defensive logging
     if (!this.inputStartedAt) {
+      if (this.transcriptBuffer.length > 0) {
+        this.logger.warn(
+          `flushHeldTranscripts: dropping ${this.transcriptBuffer.length} events because inputStartedAt is not set`,
+        );
+      }
       this.transcriptBuffer = [];
       this.ignoreUserTranscriptUntil = undefined;
       return;
     }
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 256 - 269, In
flushHeldTranscripts(), avoid dropping held transcripts when inputStartedAt is
undefined: instead of clearing transcriptBuffer and resetting
ignoreUserTranscriptUntil when inputStartedAt is falsy, retain the buffer and
either log a warning or postpone flushing until inputStartedAt is set; update
the logic in flushHeldTranscripts (referencing inputStartedAt, transcriptBuffer,
ignoreUserTranscriptUntil) to preserve events and emit a warning if
onEndOfAgentSpeech triggers before any audio frames, or queue a retry to flush
once inputStartedAt becomes defined.

Comment on lines +881 to +889
const cleanup = async () => {
try {
signal.removeEventListener('abort', abortHandler);
eventReader.releaseLock();
await stream.close();
} catch (e) {
this.logger.debug('createInterruptionTask: error during abort handler:', e);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential TypeError if cleanup is called before eventReader is assigned.

The cleanup function references eventReader at line 884, but eventReader is declared at line 909. If signal.aborted is already true when entering the function, or if abort fires during forwardTask before line 909 executes, eventReader will be undefined and releaseLock() will throw.

🛠️ Suggested fix
   const cleanup = async () => {
     try {
       signal.removeEventListener('abort', abortHandler);
-      eventReader.releaseLock();
+      eventReader?.releaseLock();
       await stream.close();
     } catch (e) {
       this.logger.debug('createInterruptionTask: error during abort handler:', e);
     }
   };
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 881 - 889, The cleanup
closure may call eventReader.releaseLock() before eventReader is assigned,
causing a TypeError; update cleanup (the async function defined alongside
abortHandler) to defensively check that eventReader is defined (and optionally
has releaseLock as a function) before calling releaseLock, and likewise guard
await stream.close() with a null/undefined check; ensure you still
removeEventListener('abort', abortHandler) unconditionally but only call
eventReader.releaseLock() and stream.close() when those objects are non-null to
avoid exceptions during early aborts.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/src/cartesia_tts.ts (1)

26-39: ⚠️ Potential issue | 🟠 Major

Add logger initialization before LLM usage.

The file must call initializeLogger({ pretty: true }) before using any LLM functionality, as required by the examples guideline. Currently, log() is called without prior logger initialization, and openai.LLM() is instantiated without a properly initialized logger.

🔧 Proposed fix
 import {
   type JobContext,
   type JobProcess,
   WorkerOptions,
   cli,
   defineAgent,
+  initializeLogger,
   log,
   metrics,
   voice,
 } from '@livekit/agents';
@@
   entry: async (ctx: JobContext) => {
+    initializeLogger({ pretty: true });
     const agent = new voice.Agent({
       instructions:
         "You are a helpful assistant, you can hear the user's message and respond to it.",
     });
examples/src/basic_agent.ts (1)

26-55: ⚠️ Potential issue | 🟠 Major

Add logger initialization before using LLM/STT/TTS inference.

The example creates inference models (LLM, STT, TTS) before initializing the logger. Add initializeLogger({ pretty: true }) at the start of the entry function, following the pattern used in other examples.

🔧 Proposed fix
 import {
   type JobContext,
   type JobProcess,
   WorkerOptions,
   cli,
   defineAgent,
   inference,
   llm,
+  initializeLogger,
   log,
   metrics,
   voice,
 } from '@livekit/agents';
@@
   entry: async (ctx: JobContext) => {
+    initializeLogger({ pretty: true });
     const agent = new voice.Agent({
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 245bc66 and 63eccca.

📒 Files selected for processing (8)
  • agents/package.json
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
  • examples/src/basic_agent.ts
  • examples/src/cartesia_tts.ts
  • examples/src/hedra/hedra_avatar.ts
  • plugins/openai/src/llm.ts
  • plugins/openai/src/tts.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/src/hedra/hedra_avatar.ts
  • plugins/openai/src/llm.ts
  • agents/package.json
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • examples/src/cartesia_tts.ts
  • plugins/openai/src/tts.ts
  • examples/src/basic_agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/audio_recognition.ts
🧠 Learnings (3)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example) : Initialize logger before using any LLM functionality with `initializeLogger({ pretty: true })` from 'livekit/agents'

Applied to files:

  • examples/src/cartesia_tts.ts
  • examples/src/basic_agent.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Applied to files:

  • examples/src/basic_agent.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Use `pnpm build && pnpm dlx tsx ./examples/src/my_agent.ts dev|download-files --log-level=debug|info(default)` to run example agents from the examples directory

Applied to files:

  • examples/src/basic_agent.ts
🧬 Code graph analysis (1)
examples/src/basic_agent.ts (4)
agents/src/voice/agent.ts (1)
  • session (214-216)
plugins/openai/src/realtime/realtime_model.ts (1)
  • session (316-318)
plugins/openai/src/realtime/realtime_model_beta.ts (1)
  • session (304-306)
agents/src/index.ts (2)
  • voice (40-40)
  • metrics (40-40)
🔇 Additional comments (10)
plugins/openai/src/tts.ts (1)

35-46: Clean, low‑risk accessors for model/provider metadata.
Getters are straightforward and defensive; no behavior changes beyond exposing metadata.

agents/src/voice/agent_activity.ts (5)

98-223: Interruption state wiring looks consistent.

Initialization of interruption flags and defaults is cohesive with the new detection flow.


308-321: AudioRecognition hookup is clean.

Passing the interruption detector and endpointing delay boundaries into AudioRecognition looks correct.


413-442: Runtime turnDetection propagation is solid.

The updateOptions path correctly syncs internal flags and forwards the change to AudioRecognition.


725-785: Interruption gating and onInterruption hook integration look good.

The min-words gating and new onInterruption path are well-aligned with the updated flow.


1348-1496: Telemetry metadata propagation is a nice touch.

Passing model/provider into LLM/TTS inference improves trace quality without side effects.

agents/src/voice/audio_recognition.ts (4)

116-257: Interruption pipeline wiring looks good.

The sentinel flow and overlap/agent speech hooks are well integrated.


407-424: STT hold/flush logic is coherent.

The buffering and release flow reads cleanly and should be stable under overlap.


900-961: Interruption task lifecycle looks robust.

Forwarding frames and handling output events is well-structured and cancel-safe.


846-855: Nice addition of STT model/provider attributes.

This improves traceability for user-turn spans.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 17 additional findings in Devin Review.

Open in Devin Review

Comment on lines +2607 to +2613
);
return undefined;
}
}

if (
(interruptionDetection !== undefined && interruptionDetection === false) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 AdaptiveInterruptionDetector created when mode is undefined (default), causing unintended API calls

The resolveInterruptionDetector method creates an AdaptiveInterruptionDetector when the interruption mode is undefined (the default), even though the user never opted in to adaptive interruption detection. This means every session meeting the prerequisites (VAD + streaming aligned STT + pipeline LLM) silently activates barge-in inference, making HTTP or WebSocket requests to the LiveKit inference service.

Root Cause

The guard logic in resolveInterruptionDetector only returns undefined for mode === false or mode === 'vad'. When mode is undefined (the defaultInterruptionConfig.mode), the function falls through and creates a detector:

// Line 2607-2611
if (
  (interruptionDetection !== undefined && interruptionDetection === false) ||
  interruptionDetection === 'vad'
) {
  return undefined;
}

const detector = new AdaptiveInterruptionDetector(); // ← always reached when mode is undefined

Since undefined !== undefined is false, the first condition short-circuits to false, and undefined === 'vad' is also false. The code falls through to create the detector.

Expected: The detector should only be created when mode === 'adaptive'.
Actual: The detector is created whenever mode is undefined (the default) and prerequisites are met.

Impact: Every default-configured session with VAD and streaming STT will make unnecessary inference API calls to the barge-in detection service, wasting resources and potentially incurring costs. Users must explicitly set mode: false to prevent this.

Suggested change
);
return undefined;
}
}
if (
(interruptionDetection !== undefined && interruptionDetection === false) ||
if (
(interruptionDetection !== undefined && interruptionDetection === false) ||
interruptionDetection === 'vad' ||
interruptionDetection === undefined
) {
return undefined;
}
const detector = new AdaptiveInterruptionDetector();
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 21 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1077 to 1085
private get vadBaseTurnDetection() {
return ['vad', undefined].includes(this.turnDetectionMode);
if (typeof this.turnDetectionMode === 'object') {
return false;
}

if (this.turnDetectionMode === undefined || this.turnDetectionMode === 'vad') {
return true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 vadBaseTurnDetection getter missing explicit return for non-'vad' string modes

The vadBaseTurnDetection getter in AudioRecognition does not have an explicit return false at the end. When turnDetectionMode is a string other than 'vad' (e.g., 'stt', 'manual', 'realtime_llm'), the function implicitly returns undefined.

Root Cause

At agents/src/voice/audio_recognition.ts:1077-1085, the getter handles typeof === 'object' and undefined || 'vad' but has no fallback return false for other string values. While undefined is falsy and currently behaves identically to false in all call sites (e.g., if (this.vadBaseTurnDetection || this.userTurnCommitted)), this is fragile and could cause subtle issues if the return value is ever compared with === false or used in a more type-strict context.

Impact: Currently no functional impact since undefined is falsy, but violates the expected boolean contract of the getter.

Suggested change
private get vadBaseTurnDetection() {
return ['vad', undefined].includes(this.turnDetectionMode);
if (typeof this.turnDetectionMode === 'object') {
return false;
}
if (this.turnDetectionMode === undefined || this.turnDetectionMode === 'vad') {
return true;
}
}
private get vadBaseTurnDetection() {
if (typeof this.turnDetectionMode === 'object') {
return false;
}
if (this.turnDetectionMode === undefined || this.turnDetectionMode === 'vad') {
return true;
}
return false;
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 23 additional findings in Devin Review.

Open in Devin Review

}

if (ev.speechDuration >= this.agentSession.options.minInterruptionDuration) {
if (ev.speechDuration >= this.agentSession.options.turnHandling.interruption?.minDuration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unit mismatch: speechDuration (seconds) compared against minDuration (milliseconds) in onVADInferenceDone

In onVADInferenceDone, ev.speechDuration is documented in seconds (agents/src/vad.ts:34: /** Duration of the speech segment in seconds. */), but minDuration (default 500) is in milliseconds (from agents/src/voice/turn_config/interruption.ts:22: /** Minimum speech length in milliseconds */). The comparison ev.speechDuration >= ...minDuration effectively requires >=500 seconds of speech to trigger an interruption, making the duration-based check permanently disabled.

Root Cause

The old code had the same comparison (ev.speechDuration >= this.agentSession.options.minInterruptionDuration) and was already broken. The migration to the new nested config at agents/src/voice/agent_activity.ts:725 preserved the unit mismatch:

if (ev.speechDuration >= this.agentSession.options.turnHandling.interruption?.minDuration) {

With the default minDuration of 500 (ms) and speechDuration of e.g. 0.5 (seconds for 500ms of speech), the comparison 0.5 >= 500 is always false.

Impact: The speech-duration-based interruption guard in onVADInferenceDone never fires. The agent never calls interruptByAudioActivity() based on VAD speech duration alone, relying entirely on other paths (interim transcripts, final transcripts) for interruption.

Suggested change
if (ev.speechDuration >= this.agentSession.options.turnHandling.interruption?.minDuration) {
if (ev.speechDuration * 1000 >= this.agentSession.options.turnHandling.interruption?.minDuration) {
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +34 to +49
const turnHandling: TurnHandlingConfig = {
interruption: {
discardAudioIfUninterruptible: mergedOptions?.discardAudioIfUninterruptible,
minDuration: mergedOptions?.minInterruptionDuration,
minWords: mergedOptions?.minInterruptionWords,
},
endpointing: {
minDelay: mergedOptions?.minEndpointingDelay,
maxDelay: mergedOptions?.maxEndpointingDelay,
},

...mergedOptions.turnHandling,
// Restore original turnDetection after spread to preserve class instance with methods
// (structuredClone converts class instances to plain objects, losing prototype methods)
turnDetection: originalTurnDetection,
} as const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Deprecated flat interruption/endpointing fields silently dropped when turnHandling sub-object is also provided

In migrateLegacyOptions, the initial turnHandling object is built from deprecated flat fields (e.g. discardAudioIfUninterruptible, minInterruptionWords), then ...mergedOptions.turnHandling is spread over it. Since JavaScript spread does a shallow merge, if the user also sets turnHandling.interruption, the entire interruption sub-object is replaced — losing any deprecated flat fields that were set.

Root Cause

At agents/src/voice/turn_config/utils.ts:34-49:

const turnHandling: TurnHandlingConfig = {
    interruption: {
      discardAudioIfUninterruptible: mergedOptions?.discardAudioIfUninterruptible,
      minDuration: mergedOptions?.minInterruptionDuration,
      minWords: mergedOptions?.minInterruptionWords,
    },
    endpointing: { ... },
    ...mergedOptions.turnHandling,  // <-- replaces entire `interruption`/`endpointing` objects
    turnDetection: originalTurnDetection,
  };

For example, if a user passes { discardAudioIfUninterruptible: false, turnHandling: { interruption: { minDuration: 2000 } } }, the spread replaces the entire interruption object to { minDuration: 2000 }, losing discardAudioIfUninterruptible: false. After mergeWithDefaults, it reverts to the default true.

Impact: Users who mix deprecated flat fields with the new nested turnHandling config will have their deprecated field values silently ignored, potentially changing agent behavior (e.g. audio discarding being unexpectedly re-enabled).

Prompt for agents
In agents/src/voice/turn_config/utils.ts lines 34-49, the shallow spread of `...mergedOptions.turnHandling` replaces entire `interruption` and `endpointing` sub-objects, losing any deprecated flat fields that were also set. To fix this, deep-merge the legacy fields with the turnHandling sub-objects instead of relying on object-level spread. For example, build the `interruption` object as: `interruption: { ...legacyInterruptionFields, ...mergedOptions.turnHandling?.interruption }` where `legacyInterruptionFields` contains the deprecated flat field values (with undefined stripped). Apply the same approach for the `endpointing` sub-object. This ensures deprecated flat fields are preserved unless explicitly overridden by the nested config.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


export const defaultEndpointingConfig = {
minDelay: 500,
maxDelay: 3000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 maxEndpointingDelay default silently halved from 6000ms to 3000ms

The new defaultEndpointingConfig.maxDelay at agents/src/voice/turn_config/endpointing.ts:25 is 3000 ms, but the previous defaultVoiceOptions.maxEndpointingDelay was 6000 ms. This is a breaking change in default behavior that is not documented in the changeset.

Impact

All existing users who rely on the default maxEndpointingDelay will see the maximum turn-end wait time drop from 6 seconds to 3 seconds. This means the agent will now decide the user's turn is complete much sooner during long pauses, potentially cutting off users who take longer pauses while thinking or speaking in complex sentences. The end-of-turn detection model's predictions are clamped between minDelay and maxDelay, so this directly affects how long the agent waits before responding.

Compare agents/src/voice/turn_config/endpointing.ts:25 (maxDelay: 3000) with the git-baseline default of maxEndpointingDelay: 6000 in the old defaultVoiceOptions.

Suggested change
maxDelay: 3000,
maxDelay: 6000,
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 30 additional findings in Devin Review.

Open in Devin Review

Comment on lines +328 to +343
transform(chunk, controller) {
// Pass through InterruptionEvents unchanged
if (!(chunk instanceof Int16Array)) {
controller.enqueue(chunk);
return;
}

const state = getState();
if (!state.overlapSpeechStartedAt) return;

try {
sendAudioData(chunk);
} catch (err) {
logger.error({ err }, 'Failed to send audio data over WebSocket');
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 WebSocket transport reconnect() permanently disconnects because ensureConnection() is never re-called

After reconnect() is invoked (e.g., when options are updated via updateOptions()), the WebSocket is closed and ws is set to null. However, the transform handler never calls ensureConnection() before sendAudioData(). ensureConnection() is only called once in the start() handler at stream creation time.

Root Cause

The reconnect() function at agents/src/inference/interruption/ws_transport.ts:316-319 calls close() which sets ws = null, then relies on a future ensureConnection() call to re-establish the connection:

async function reconnect(): Promise<void> {
    close();  // sets ws = null
    // Connection will be re-established on next ensureConnection call
}

But the transform handler at line 328 calls sendAudioData(chunk) directly, which immediately throws "WebSocket not connected" because ws is null:

function sendAudioData(audioSlice: Int16Array): void {
    if (!ws || ws.readyState !== WebSocket.OPEN) {
      throw new Error('WebSocket not connected');
    }
    ...

The error is caught at line 340-342 and only logged, so all subsequent audio data is silently dropped. The WebSocket is never re-established because ensureConnection() (agents/src/inference/interruption/ws_transport.ts:131) is only invoked in the start() callback (line 324-325), which runs once during TransformStream initialization.

Impact: After any call to updateOptions() on an InterruptionStreamBase using WebSocket transport (useProxy=true), barge-in detection permanently stops working for that stream. All audio chunks are silently discarded.

Suggested change
transform(chunk, controller) {
// Pass through InterruptionEvents unchanged
if (!(chunk instanceof Int16Array)) {
controller.enqueue(chunk);
return;
}
const state = getState();
if (!state.overlapSpeechStartedAt) return;
try {
sendAudioData(chunk);
} catch (err) {
logger.error({ err }, 'Failed to send audio data over WebSocket');
}
},
async transform(chunk, controller) {
// Pass through InterruptionEvents unchanged
if (!(chunk instanceof Int16Array)) {
controller.enqueue(chunk);
return;
}
const state = getState();
if (!state.overlapSpeechStartedAt) return;
try {
await ensureConnection();
sendAudioData(chunk);
} catch (err) {
logger.error({ err }, 'Failed to send audio data over WebSocket');
}
},
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@livekit livekit deleted a comment from kbpranay Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments