feat(tts): add FallbackAdapter for TTS failover support#1022
feat(tts): add FallbackAdapter for TTS failover support#1022toubatbrian merged 31 commits intolivekit:mainfrom
Conversation
…anisms for TTS instances
…ror handling during TTS synthesis
…kAdapter for enhanced TTS synthesis
…e method for task cancellation and resource cleanup
…ror reporting for TTS instance failures
…apter for better readability and maintainability
…module accessibility
…tries and cleanup
🦋 Changeset detectedLatest commit: 2b95856 The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📝 WalkthroughWalkthroughAdds an exported FallbackAdapter TTS that orchestrates multiple TTS providers with per-provider availability tracking and recovery, unified sample-rate selection and optional per-provider resampling, and public APIs: synthesize, stream, getStreamingInstance, close, plus an AvailabilityChangedEvent type. (48 words) Changes
Sequence DiagramsequenceDiagram
participant Client
participant FallbackAdapter
participant TTS1 as TTS_Instance_1
participant TTS2 as TTS_Instance_2
participant Resampler
participant OutputStream
Client->>FallbackAdapter: synthesize(text) / stream(options)
activate FallbackAdapter
FallbackAdapter->>TTS1: request synthesis / start stream
activate TTS1
alt TTS1 returns audio
TTS1-->>FallbackAdapter: audio chunks
else TTS1 errors/unavailable
TTS1-->>FallbackAdapter: error
FallbackAdapter->>FallbackAdapter: mark TTS1 unavailable & schedule recovery
FallbackAdapter->>TTS2: attempt synthesis / start stream (fallback)
activate TTS2
TTS2-->>FallbackAdapter: audio chunks or error
deactivate TTS2
end
deactivate TTS1
alt resampling required
FallbackAdapter->>Resampler: resample chunks to target sample rate/channels
Resampler-->>FallbackAdapter: resampled chunks
end
FallbackAdapter->>OutputStream: write audio chunks
OutputStream-->>Client: deliver audio
deactivate FallbackAdapter
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example|spec)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
🧠 Learnings (1)📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
🔇 Additional comments (9)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…sing block syntax for clearTimeout
…ures during TTS synthesis
…nal options for TTS instance management
|
@toubatbrian This is ready for review . Please let me know anything needs to be changed. |
… and improve retry logic
…ream and FallbackSynthesizeStream
…rt signals and ensure proper stream closure
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac026870e7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
agents/src/tts/fallback_adapter.ts
Outdated
| // Use cached resampler for this TTS instance | ||
| const resampler = status.resampler; | ||
| if (resampler) { | ||
| for (const frame of resampler.push(audio.frame)) { |
There was a problem hiding this comment.
Create a resampler per stream to avoid cross-talk
The cached resampler is stored per TTS instance and then reused by every FallbackChunkedStream/FallbackSynthesizeStream call. AudioResampler is stateful (it buffers samples until flush()), so if two syntheses run concurrently on the same TTS instance, their frames will be interleaved in the shared resampler and the flush() from one stream can drain buffered audio from the other. This leads to corrupted or missing audio when the adapter is used for parallel syntheses (a common pattern with multiple speakers/requests). Consider instantiating a new resampler per stream or per call instead of sharing the instance from status.
Useful? React with 👍 / 👎.
…e createResamplerForTTS method for better management of audio resampling
|
@toubatbrian added a fix for that edge case |
|
@toubatbrian and update in this pr? |
agents/src/tts/fallback_adapter.ts
Outdated
| const processOutputPromise = processOutput(); | ||
| let outputError: unknown = null; | ||
| try { | ||
| await processOutputPromise; |
There was a problem hiding this comment.
Should we wait for both promise here?
There was a problem hiding this comment.
@toubatbrian Actually, we should not wait for both to finish, because that would add delay when switching to the new TTS. We would be waiting for the entire text to be generated. I think we should handle it independently and start receiving text from the LLM as it arrives.
…esizeStream for improved stream management and error handling
| return tts; | ||
| } | ||
| // Wrap non-streaming TTS with StreamAdapter | ||
| return new StreamAdapter(tts, new basic.SentenceTokenizer()); |
There was a problem hiding this comment.
🟡 New StreamAdapter with leaked event listeners created on every fallback attempt for non-streaming TTS
getStreamingInstance() at line 144 creates a new StreamAdapter wrapping the underlying TTS instance every time it is called for a non-streaming TTS. Each StreamAdapter constructor adds metrics_collected and error event listeners to the underlying TTS instance (agents/src/tts/stream_adapter.ts:24-29) that are never removed.
Root Cause
Each call to getStreamingInstance(i) for a non-streaming TTS (line 144) does:
return new StreamAdapter(tts, new basic.SentenceTokenizer());The StreamAdapter constructor at agents/src/tts/stream_adapter.ts:24-29 adds listeners:
this.#tts.on('metrics_collected', (metrics) => {
this.emit('metrics_collected', metrics);
});
this.#tts.on('error', (error) => {
this.emit('error', error);
});These listeners are never removed because the StreamAdapter instance is ephemeral and not tracked. Over many fallback attempts (e.g., repeated failures and retries across multiple stream() calls), listeners accumulate on the underlying TTS instance, eventually triggering Node.js's MaxListenersExceededWarning and causing a memory leak.
Impact: Memory leak and potential MaxListenersExceededWarning in long-running applications with non-streaming TTS providers that experience repeated failures.
Prompt for agents
Cache the StreamAdapter instances per TTS index instead of creating new ones each time. In the FallbackAdapter class, add a private field like `private _streamAdapters: Map<number, StreamAdapter> = new Map();` and modify `getStreamingInstance` to check the cache first:
getStreamingInstance(index: number): TTS {
const tts = this.ttsInstances[index]!;
if (tts.capabilities.streaming) {
return tts;
}
let adapter = this._streamAdapters.get(index);
if (!adapter) {
adapter = new StreamAdapter(tts, new basic.SentenceTokenizer());
this._streamAdapters.set(index, adapter);
}
return adapter;
}
Also make sure to close the cached StreamAdapters in the close() method.
Was this helpful? React with 👍 or 👎 to provide feedback.
…tream to prevent unhandled promise rejections
| for (const tts of this.ttsInstances) { | ||
| tts.removeAllListeners('metrics_collected'); | ||
| tts.removeAllListeners('error'); | ||
| } |
There was a problem hiding this comment.
🟡 close() uses removeAllListeners which strips listeners added by other code on shared TTS instances
The close() method calls tts.removeAllListeners('metrics_collected') and tts.removeAllListeners('error') on each underlying TTS instance. This removes all listeners for those events, not just the ones registered by the FallbackAdapter.
Detailed Explanation
At lines 279-282, the close() method does:
for (const tts of this.ttsInstances) {
tts.removeAllListeners('metrics_collected');
tts.removeAllListeners('error');
}
If the TTS instances are shared with other parts of the application (e.g., used directly elsewhere, or wrapped by another adapter), their listeners for metrics_collected and error will be silently removed. Additionally, the StreamAdapter instances created by getStreamingInstance (agents/src/tts/fallback_adapter.ts:144) also register listeners on the original TTS instances — these are correctly cleaned up, but any listeners registered by the TTS provider itself or other consumers are also removed.
Impact: Other code that registered metrics_collected or error listeners on the TTS instances will stop receiving events after the FallbackAdapter is closed.
Fix: Store references to the specific listener functions added in setupEventForwarding and use tts.off(event, listener) to remove only those specific listeners.
Prompt for agents
In the FallbackAdapter class, store references to the specific listener functions created in setupEventForwarding(), then in close() use tts.off() to remove only those specific listeners instead of removeAllListeners.
For example, add a private field:
private _eventListeners: Map<TTS, { metrics: Function; error: Function }> = new Map();
In setupEventForwarding(), store the listeners:
const metricsListener = (metrics) => this.emit('metrics_collected', metrics);
const errorListener = (error) => this.emit('error', error);
tts.on('metrics_collected', metricsListener);
tts.on('error', errorListener);
this._eventListeners.set(tts, { metrics: metricsListener, error: errorListener });
In close(), remove only the specific listeners:
for (const [tts, listeners] of this._eventListeners) {
tts.off('metrics_collected', listeners.metrics);
tts.off('error', listeners.error);
}
this._eventListeners.clear();
Was this helpful? React with 👍 or 👎 to provide feedback.
…ucing streamOutputCompleted flag to manage TTS stream completion
…of FallbackSynthesizeStream to improve robustness and prevent unhandled errors
… FallbackSynthesizeStream for improved maintainability
… audio verification to prevent silent failures
|
@toubatbrian, any update here |
|
@toubatbrian Thanks man |
This adds a FallbackAdapter for TTS that lets you configure multiple TTS providers and automatically switches to the next one if the current one fails. It handles both connection errors and silent failures where the TTS connects but doesn't return any audio. Failed providers are automatically tested in the background and restored when they come back online. It also normalizes sample rates across different providers so you can mix and match TTS services without worrying about audio format differences.
Summary by CodeRabbit
New Features
Bug Fixes