feat(voice): add console audio IO and SessionHost audio routing by toubatbrian · Pull Request #1694 · livekit/agents-js

toubatbrian · 2026-06-02T22:00:09Z

Summary

Second PR in the series porting Python's TCP console session support to agents-js (follows #1693, the transport + updateIo plumbing). This adds the audio IO that lets a console-mode session exchange audio with a local broker (the LiveKit CLI lk session daemon).

TcpAudioInput (agents/src/voice/console_io.ts) — resamples inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent rate and feeds them into the base AudioInput stream the STT pipeline reads from.
TcpAudioOutput — resamples the agent's TTS frames back up to the wire rate, streams them as audio_output messages, and drives the flush/clear playout handshake: a flush blocks the agent turn until the broker reports audio_playback_finished, or reports an interruption (with a clamped playback position) when the buffer is cleared.
SessionHost now accepts optional audioInput/audioOutput and routes inbound audio_input / audio_playback_finished messages to them in recvLoop.

Notes / divergences from the Python port

Python's TcpAudioInput uses a stdlib queue + run_in_executor to bridge the producer and consumer event loops under JobExecutorType.THREAD. The JS console job runs in-process on a single event loop, so a StreamChannel is sufficient — no cross-thread queue.
Time is tracked in milliseconds per the JS conventions; PlaybackFinishedEvent.playbackPosition is reported in seconds to match the base AudioOutput contract.
The SessionHost audio fields are typed via import type from console_io.ts (the TS equivalent of Python's TYPE_CHECKING import) so there's no runtime import cycle.

Reference

Ports from Python livekit-agents cli/tcp_console.py (TcpAudioInput/TcpAudioOutput) and voice/remote_session.py (SessionHost._dispatch_transport_message).

Test plan

New agents/src/voice/console_io.test.ts (5 cases, all green):

TcpAudioInput resamples 48 kHz wire frames to 24 kHz and exposes them on the stream
TcpAudioInput drops frames pushed after close
TcpAudioOutput streams resampled audio_output + audio_playback_flush, and the flush handshake completes (uninterrupted) on notifyPlayoutFinished
TcpAudioOutput reports interruption (clamped position) when the buffer is cleared mid-playout
SessionHost routes audio_input -> pushFrame and audio_playback_finished -> notifyPlayoutFinished

Also verified: pnpm build:agents, ESLint, and Prettier clean on the changed files. The existing room-based path is untouched.

Follow-up (next stacked PR)

PR3: unregistered/console run path on the worker + JobContext fake-job support + CLI console subcommand wiring the transport + audio IO + SessionHost together.

Made with Cursor

Port the python tcp_console audio IO to agents-js. TcpAudioInput resamples inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent rate and feeds them to the STT pipeline; TcpAudioOutput resamples the agent's TTS frames back up, streams them as audio_output messages, and drives the flush/clear playout handshake (blocking the agent turn until the broker reports audio_playback_finished, or reporting an interruption when the buffer is cleared). SessionHost now accepts optional audio IO and routes inbound audio_input/audio_playback_finished messages to it. Co-authored-by: Cursor <cursoragent@cursor.com>

changeset-bot · 2026-06-02T22:00:20Z

🦋 Changeset detected

Latest commit: 126992f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-perplexity	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-soniox	Patch
@livekit/agents-plugin-tavus	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 126992faef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T22:02:37Z

+    void this.transport.sendMessage(
+      new pb.AgentSessionMessage({
+        message: {
+          case: 'audioPlaybackFlush',
+          value: new pb.AgentSessionMessage_ConsoleIO_AudioPlaybackFlush(),
+        },
+      }),
+    );


Drain the resampler before flushing playback

When a TTS segment ends, this sends audioPlaybackFlush without first emitting any frames returned by this.resampler.flush(). AudioResampler can hold tail samples until it is flushed (the existing resampling paths in utils.ts/generation.ts explicitly drain it at stream boundaries), so console playback can clip the end of each response or carry buffered samples into the next segment before the broker is told playback is complete. Please write the flushed resampled frames before sending the playback flush.

Useful? React with 👍 / 👎.

devin-ai-integration

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-06-02T22:07:53Z

+  override flush(): void {
+    super.flush();
+    void this.transport.sendMessage(
+      new pb.AgentSessionMessage({
+        message: {
+          case: 'audioPlaybackFlush',
+          value: new pb.AgentSessionMessage_ConsoleIO_AudioPlaybackFlush(),
+        },
+      }),
+    );


🟡 Internal resampler not flushed in TcpAudioOutput.flush(), losing tail audio and leaking samples across segments

When TcpAudioOutput.flush() is called, the internal AudioResampler (24 kHz → 48 kHz) is not flushed. The resampler buffers a small number of trailing samples internally during the push() call in captureFrame() (console_io.ts:111). On flush, those buffered samples are never sent to the broker. Instead, they silently leak into the first frame of the next audio segment (when the next captureFrame() pushes data through the same resampler). This is inconsistent with how every other resampler in the codebase is used—see generation.ts:901-904, utils.ts:768-795, fallback_adapter.ts:366,529, recorder_io.ts:284—where resampler.flush() is always called at segment boundaries. The remaining frames should be sent as audioOutput messages before the audioPlaybackFlush marker.

Prompt for agents

In TcpAudioOutput.flush() (console_io.ts:120-129), the internal AudioResampler (this.resampler, 24kHz -> 48kHz) is never flushed, losing its internal buffer of trailing samples. The fix needs to call this.resampler.flush() and send the remaining resampled frames via the transport BEFORE sending the audioPlaybackFlush message. However, since flush() is a synchronous void method (matching the base class signature), the transport.sendMessage calls for flushed frames must also be fire-and-forget (using void). The key concern is ordering: the flushed audio frames must be sent before the flush marker so the broker receives them as part of the current segment. Since both are dispatched via void (fire-and-forget) in the same synchronous call, the microtask ordering should be preserved. Add a loop like: for (const resampled of this.resampler.flush()) { void this.transport.sendMessage(...audioOutput message with rtcFrameToConsole(resampled)...); } before the audioPlaybackFlush sendMessage call.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-02T22:07:54Z

+  private interrupted = new Future();
+
+  constructor(transport: SessionTransport) {
+    super(AGENT_SAMPLE_RATE, undefined, { pause: true });


🟡 TcpAudioOutput declares pause capability but never implements pause()/resume()

The constructor passes { pause: true } to the base class (console_io.ts:92), causing canPause to return true. However, TcpAudioOutput does not override pause() or resume(), and with no nextInChain, the inherited base methods are no-ops. When the pipeline's resumeFalseInterruption feature is enabled, agent_activity.ts:3842-3849 checks canPause and uses pause()/resume() to handle suspected false interruptions. Because these calls do nothing on TcpAudioOutput, audio continues playing during the "paused" state, defeating the false-interruption detection. Compare with RoomAudioOutput (room_io/_output.ts:389,400-415) which also declares { pause: true } but actually implements both methods.

Suggested change

super(AGENT_SAMPLE_RATE, undefined, { pause: true });

super(AGENT_SAMPLE_RATE, undefined, { pause: false });

Was this helpful? React with 👍 or 👎 to provide feedback.

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

devin-ai-integration Bot reviewed Jun 2, 2026

View reviewed changes

toubatbrian added the verified-port label Jun 3, 2026

toubatbrian mentioned this pull request Jun 3, 2026

feat(console): console CLI runner + AgentsConsole session wiring (text mode) #1706

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): add console audio IO and SessionHost audio routing#1694

feat(voice): add console audio IO and SessionHost audio routing#1694
toubatbrian wants to merge 1 commit into
mainfrom
brian/node-console-audio

toubatbrian commented Jun 2, 2026

Uh oh!

changeset-bot Bot commented Jun 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 2, 2026

Uh oh!

devin-ai-integration Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	super(AGENT_SAMPLE_RATE, undefined, { pause: true });
	super(AGENT_SAMPLE_RATE, undefined, { pause: false });

Conversation

toubatbrian commented Jun 2, 2026

Summary

Notes / divergences from the Python port

Reference

Test plan

Follow-up (next stacked PR)

Uh oh!

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant