feat(google-cloud): add Google Cloud Text-to-Speech plugin#1671
Open
mshivam019 wants to merge 9 commits into
Open
feat(google-cloud): add Google Cloud Text-to-Speech plugin#1671mshivam019 wants to merge 9 commits into
mshivam019 wants to merge 9 commits into
Conversation
Add @livekit/agents-plugin-google-cloud using the @google-cloud/text-to-speech client library. Supports both gRPC bidirectional streaming and REST-based synthesis. The existing google.beta.TTS uses @google/genai (Gemini API) which does not support streaming. This plugin uses the Google Cloud TTS client which supports streaming with Gemini Flash TTS models like gemini-3.1-flash-tts-preview, as well as standard models (Journey, Chirp 3, Standard, WaveNet). Credentials follow the standard Google Cloud auth chain: credentials object -> keyFilename -> GOOGLE_APPLICATION_CREDENTIALS -> ADC.
- Remove queue.close() from ChunkedStream finally (base class handles retry) - Remove tokenizer.close() from SynthesizeStream finally (breaks retry path) - Skip toLiveKitTtsError wrapping for existing APIConnectionError/APIStatusError - Fix voiceName type from TTSLanguage to string (semantically misleading) - Log warning when gender overrides explicit voiceName - Restore updateOptions method (dropped during squash)
🦋 Changeset detectedLatest commit: cca2c86 The changes in this PR will be included in the next version bump. This PR includes changesets to release 35 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Avoid wrapping existing API errors, prevent streaming audio flush after gRPC errors, and ensure streaming call/tokenizer cleanup does not interfere with retry handling. Resolve the pnpm lockfile conflict after rebasing on main.
Use the Google gax cancellable unary call for synthesizeSpeech so aborting a ChunkedStream cancels the in-flight RPC. Pass the connection timeout through CallOptions and add updateOptions warnings for gender-derived Standard voices overriding voice selection.
Reject pending streaming writes when the gRPC stream closes before drain, and destroy failed streaming calls with an error so concurrent tasks settle during cleanup. Treat DEADLINE_EXCEEDED as retryable for Google Cloud TTS errors.
Treat gax CANCELLED rejections as normal ChunkedStream aborts when the abort signal is set. Also document that Google Cloud TTS numeric provider errors are gRPC status codes with explicit retryability.
Keep a no-op error listener attached when destroying a failed Google Cloud streaming call with an error. Node streams may emit destroy errors asynchronously, so removing the listener immediately can still produce an unhandled error.
Abort per-attempt tokenization instead of closing the shared SynthesizeStream input queue. This lets cleanup settle without poisoning the base retry path. Add abortable AsyncIterableQueue.next support for the plugin cleanup path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@livekit/agents-plugin-google-cloudusing the@google-cloud/text-to-speechclient librarySynthesizeStreamviaTextToSpeechClient.streamingSynthesize()with sentence tokenizationChunkedStreamviaTextToSpeechClient.synthesizeSpeech()with LINEAR16 WAV -> PCM extractionupdateOptions()for runtime voice/language/gender changescredentialsobject ->keyFilename->GOOGLE_APPLICATION_CREDENTIALS-> ADCWhy this plugin vs the existing
google.beta.TTSThe existing
google.beta.TTS(Gemini TTS) uses@google/genaiwhich does not support streaming synthesis. This plugin uses the@google-cloud/text-to-speechclient which supports gRPC bidirectional streaming — needed for Gemini Flash TTS models likegemini-3.1-flash-tts-previewas well as classic models (Journey, Chirp 3, Standard, WaveNet).The LiveKit docs show "Available in: [ ] Node.js, [x] Python" — this closes the Node.js gap.
Credentials
Streaming synthesis
TextToSpeechClient.streamingSynthesize()for gRPC bidirectional streamingtokenize.basic.SentenceTokenizerAudioByteStream, emitted asSynthesizedAudioframescall.cancel()/call.destroy()AudioByteStream.flush()called on stream end to prevent trailing audio truncationFiles changed
plugins/google-cloud/package.json@livekit/agents-plugin-google-cloudplugins/google-cloud/src/tts.tsTTS,SynthesizeStream,ChunkedStreamclassesplugins/google-cloud/src/models.tsTTSModel,TTSGender,TTSLanguagetypesplugins/google-cloud/src/index.tsplugins/google-cloud/tsconfig.jsonplugins/google-cloud/tsup.config.tsplugins/google-cloud/api-extractor.jsonplugins/google-cloud/README.md.changeset/google-cloud-tts-plugin.mdpnpm-lock.yaml@google-cloud/text-to-speechdependencyUsage
Verification
tsc --noEmit— zero errorseslint src/**/*.ts— zero errorstsupbuild — CJS + ESM bundles compile cleanly