Skip to content

feat(google-cloud): add Google Cloud Text-to-Speech plugin#1671

Open
mshivam019 wants to merge 9 commits into
livekit:mainfrom
mshivam019:feat/google-cloud-tts
Open

feat(google-cloud): add Google Cloud Text-to-Speech plugin#1671
mshivam019 wants to merge 9 commits into
livekit:mainfrom
mshivam019:feat/google-cloud-tts

Conversation

@mshivam019
Copy link
Copy Markdown
Contributor

Summary

  • Adds @livekit/agents-plugin-google-cloud using the @google-cloud/text-to-speech client library
  • gRPC bidirectional streaming: SynthesizeStream via TextToSpeechClient.streamingSynthesize() with sentence tokenization
  • REST one-shot synthesis: ChunkedStream via TextToSpeechClient.synthesizeSpeech() with LINEAR16 WAV -> PCM extraction
  • updateOptions() for runtime voice/language/gender changes
  • Credentials follow the standard Google Cloud auth chain: credentials object -> keyFilename -> GOOGLE_APPLICATION_CREDENTIALS -> ADC

Why this plugin vs the existing google.beta.TTS

The existing google.beta.TTS (Gemini TTS) uses @google/genai which does not support streaming synthesis. This plugin uses the @google-cloud/text-to-speech client which supports gRPC bidirectional streaming — needed for Gemini Flash TTS models like gemini-3.1-flash-tts-preview as well as classic models (Journey, Chirp 3, Standard, WaveNet).

The LiveKit docs show "Available in: [ ] Node.js, [x] Python" — this closes the Node.js gap.

Credentials

// Option 1: credentials object
const tts = new TTS({
  credentials: { client_email: '...', private_key: '...' },
});

// Option 2: key file path
const tts = new TTS({
  keyFilename: '/path/to/service-account.json',
});

// Option 3: GOOGLE_APPLICATION_CREDENTIALS env var or gcloud ADC (auto-detected)
const tts = new TTS();

Streaming synthesis

  • Uses TextToSpeechClient.streamingSynthesize() for gRPC bidirectional streaming
  • Input text sentence-tokenized via tokenize.basic.SentenceTokenizer
  • Audio decoded via AudioByteStream, emitted as SynthesizedAudio frames
  • Proper abort signal handling: cleans up gRPC call via call.cancel()/call.destroy()
  • AudioByteStream.flush() called on stream end to prevent trailing audio truncation

Files changed

File Change
plugins/google-cloud/package.json New package: @livekit/agents-plugin-google-cloud
plugins/google-cloud/src/tts.ts TTS, SynthesizeStream, ChunkedStream classes
plugins/google-cloud/src/models.ts TTSModel, TTSGender, TTSLanguage types
plugins/google-cloud/src/index.ts Plugin registration + exports
plugins/google-cloud/tsconfig.json Extends root tsconfig
plugins/google-cloud/tsup.config.ts Extends root tsup config
plugins/google-cloud/api-extractor.json Extends shared API extractor config
plugins/google-cloud/README.md Installation, auth, usage docs
.changeset/google-cloud-tts-plugin.md Changeset entry
pnpm-lock.yaml Lock @google-cloud/text-to-speech dependency

Usage

import { TTS } from '@livekit/agents-plugin-google-cloud';

// Streaming synthesis with Gemini Flash TTS
const tts = new TTS({
  modelName: 'gemini-3.1-flash-tts-preview',
  voiceName: 'Zephyr',
  language: 'en-IN',
});

// Non-streaming with standard voices
const tts = new TTS({
  language: 'en-US',
  voiceName: 'en-US-Standard-H',
  streaming: false,
});

Verification

  • tsc --noEmit — zero errors
  • eslint src/**/*.ts — zero errors
  • tsup build — CJS + ESM bundles compile cleanly

Add @livekit/agents-plugin-google-cloud using the
@google-cloud/text-to-speech client library. Supports both gRPC
bidirectional streaming and REST-based synthesis.

The existing google.beta.TTS uses @google/genai (Gemini API) which
does not support streaming. This plugin uses the Google Cloud TTS
client which supports streaming with Gemini Flash TTS models like
gemini-3.1-flash-tts-preview, as well as standard models (Journey,
Chirp 3, Standard, WaveNet).

Credentials follow the standard Google Cloud auth chain: credentials
object -> keyFilename -> GOOGLE_APPLICATION_CREDENTIALS -> ADC.
devin-ai-integration[bot]

This comment was marked as resolved.

- Remove queue.close() from ChunkedStream finally (base class handles retry)
- Remove tokenizer.close() from SynthesizeStream finally (breaks retry path)
- Skip toLiveKitTtsError wrapping for existing APIConnectionError/APIStatusError
- Fix voiceName type from TTSLanguage to string (semantically misleading)
- Log warning when gender overrides explicit voiceName
- Restore updateOptions method (dropped during squash)
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 1, 2026

🦋 Changeset detected

Latest commit: cca2c86

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents-plugin-google-cloud Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

Avoid wrapping existing API errors, prevent streaming audio flush after gRPC errors, and ensure streaming call/tokenizer cleanup does not interfere with retry handling. Resolve the pnpm lockfile conflict after rebasing on main.
Use the Google gax cancellable unary call for synthesizeSpeech so aborting a ChunkedStream cancels the in-flight RPC. Pass the connection timeout through CallOptions and add updateOptions warnings for gender-derived Standard voices overriding voice selection.
Reject pending streaming writes when the gRPC stream closes before drain, and destroy failed streaming calls with an error so concurrent tasks settle during cleanup. Treat DEADLINE_EXCEEDED as retryable for Google Cloud TTS errors.
Treat gax CANCELLED rejections as normal ChunkedStream aborts when the abort signal is set. Also document that Google Cloud TTS numeric provider errors are gRPC status codes with explicit retryability.
devin-ai-integration[bot]

This comment was marked as resolved.

Keep a no-op error listener attached when destroying a failed Google Cloud streaming call with an error. Node streams may emit destroy errors asynchronously, so removing the listener immediately can still produce an unhandled error.
devin-ai-integration[bot]

This comment was marked as resolved.

Abort per-attempt tokenization instead of closing the shared SynthesizeStream input queue. This lets cleanup settle without poisoning the base retry path. Add abortable AsyncIterableQueue.next support for the plugin cleanup path.
devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant