Skip to content

feat(voice): add console audio IO and SessionHost audio routing#1694

Open
toubatbrian wants to merge 1 commit into
mainfrom
brian/node-console-audio
Open

feat(voice): add console audio IO and SessionHost audio routing#1694
toubatbrian wants to merge 1 commit into
mainfrom
brian/node-console-audio

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Second PR in the series porting Python's TCP console session support to agents-js (follows #1693, the transport + updateIo plumbing). This adds the audio IO that lets a console-mode session exchange audio with a local broker (the LiveKit CLI lk session daemon).

  • TcpAudioInput (agents/src/voice/console_io.ts) — resamples inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent rate and feeds them into the base AudioInput stream the STT pipeline reads from.
  • TcpAudioOutput — resamples the agent's TTS frames back up to the wire rate, streams them as audio_output messages, and drives the flush/clear playout handshake: a flush blocks the agent turn until the broker reports audio_playback_finished, or reports an interruption (with a clamped playback position) when the buffer is cleared.
  • SessionHost now accepts optional audioInput/audioOutput and routes inbound audio_input / audio_playback_finished messages to them in recvLoop.

Notes / divergences from the Python port

  • Python's TcpAudioInput uses a stdlib queue + run_in_executor to bridge the producer and consumer event loops under JobExecutorType.THREAD. The JS console job runs in-process on a single event loop, so a StreamChannel is sufficient — no cross-thread queue.
  • Time is tracked in milliseconds per the JS conventions; PlaybackFinishedEvent.playbackPosition is reported in seconds to match the base AudioOutput contract.
  • The SessionHost audio fields are typed via import type from console_io.ts (the TS equivalent of Python's TYPE_CHECKING import) so there's no runtime import cycle.

Reference

Ports from Python livekit-agents cli/tcp_console.py (TcpAudioInput/TcpAudioOutput) and voice/remote_session.py (SessionHost._dispatch_transport_message).

Test plan

New agents/src/voice/console_io.test.ts (5 cases, all green):

  • TcpAudioInput resamples 48 kHz wire frames to 24 kHz and exposes them on the stream
  • TcpAudioInput drops frames pushed after close
  • TcpAudioOutput streams resampled audio_output + audio_playback_flush, and the flush handshake completes (uninterrupted) on notifyPlayoutFinished
  • TcpAudioOutput reports interruption (clamped position) when the buffer is cleared mid-playout
  • SessionHost routes audio_input -> pushFrame and audio_playback_finished -> notifyPlayoutFinished

Also verified: pnpm build:agents, ESLint, and Prettier clean on the changed files. The existing room-based path is untouched.

Follow-up (next stacked PR)

  • PR3: unregistered/console run path on the worker + JobContext fake-job support + CLI console subcommand wiring the transport + audio IO + SessionHost together.

Made with Cursor

Port the python tcp_console audio IO to agents-js. TcpAudioInput resamples
inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent
rate and feeds them to the STT pipeline; TcpAudioOutput resamples the
agent's TTS frames back up, streams them as audio_output messages, and
drives the flush/clear playout handshake (blocking the agent turn until the
broker reports audio_playback_finished, or reporting an interruption when
the buffer is cleared). SessionHost now accepts optional audio IO and routes
inbound audio_input/audio_playback_finished messages to it.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: 126992f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 126992faef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +122 to +129
void this.transport.sendMessage(
new pb.AgentSessionMessage({
message: {
case: 'audioPlaybackFlush',
value: new pb.AgentSessionMessage_ConsoleIO_AudioPlaybackFlush(),
},
}),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drain the resampler before flushing playback

When a TTS segment ends, this sends audioPlaybackFlush without first emitting any frames returned by this.resampler.flush(). AudioResampler can hold tail samples until it is flushed (the existing resampling paths in utils.ts/generation.ts explicitly drain it at stream boundaries), so console playback can clip the end of each response or carry buffered samples into the next segment before the broker is told playback is complete. Please write the flushed resampled frames before sending the playback flush.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +120 to +129
override flush(): void {
super.flush();
void this.transport.sendMessage(
new pb.AgentSessionMessage({
message: {
case: 'audioPlaybackFlush',
value: new pb.AgentSessionMessage_ConsoleIO_AudioPlaybackFlush(),
},
}),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Internal resampler not flushed in TcpAudioOutput.flush(), losing tail audio and leaking samples across segments

When TcpAudioOutput.flush() is called, the internal AudioResampler (24 kHz → 48 kHz) is not flushed. The resampler buffers a small number of trailing samples internally during the push() call in captureFrame() (console_io.ts:111). On flush, those buffered samples are never sent to the broker. Instead, they silently leak into the first frame of the next audio segment (when the next captureFrame() pushes data through the same resampler). This is inconsistent with how every other resampler in the codebase is used—see generation.ts:901-904, utils.ts:768-795, fallback_adapter.ts:366,529, recorder_io.ts:284—where resampler.flush() is always called at segment boundaries. The remaining frames should be sent as audioOutput messages before the audioPlaybackFlush marker.

Prompt for agents
In TcpAudioOutput.flush() (console_io.ts:120-129), the internal AudioResampler (this.resampler, 24kHz -> 48kHz) is never flushed, losing its internal buffer of trailing samples. The fix needs to call this.resampler.flush() and send the remaining resampled frames via the transport BEFORE sending the audioPlaybackFlush message. However, since flush() is a synchronous void method (matching the base class signature), the transport.sendMessage calls for flushed frames must also be fire-and-forget (using void). The key concern is ordering: the flushed audio frames must be sent before the flush marker so the broker receives them as part of the current segment. Since both are dispatched via void (fire-and-forget) in the same synchronous call, the microtask ordering should be preserved. Add a loop like: for (const resampled of this.resampler.flush()) { void this.transport.sendMessage(...audioOutput message with rtcFrameToConsole(resampled)...); } before the audioPlaybackFlush sendMessage call.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

private interrupted = new Future();

constructor(transport: SessionTransport) {
super(AGENT_SAMPLE_RATE, undefined, { pause: true });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 TcpAudioOutput declares pause capability but never implements pause()/resume()

The constructor passes { pause: true } to the base class (console_io.ts:92), causing canPause to return true. However, TcpAudioOutput does not override pause() or resume(), and with no nextInChain, the inherited base methods are no-ops. When the pipeline's resumeFalseInterruption feature is enabled, agent_activity.ts:3842-3849 checks canPause and uses pause()/resume() to handle suspected false interruptions. Because these calls do nothing on TcpAudioOutput, audio continues playing during the "paused" state, defeating the false-interruption detection. Compare with RoomAudioOutput (room_io/_output.ts:389,400-415) which also declares { pause: true } but actually implements both methods.

Suggested change
super(AGENT_SAMPLE_RATE, undefined, { pause: true });
super(AGENT_SAMPLE_RATE, undefined, { pause: false });
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant