Skip to content

plexusone/omnivoice

Repository files navigation

OmniVoice

Go CI Go Lint Go SAST Go Report Card Docs Visualization License

Batteries-included voice pipeline framework for Go. This package provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) with all providers included.

For a minimal dependency footprint, use omnivoice-core instead.

Features

  • 🎯 Unified Interface: Single API for all STT and TTS providers
  • 🗂️ Provider Registry: Get providers by name - no need to import individual provider packages
  • 🔌 Multiple Providers: OpenAI, Deepgram, ElevenLabs, Twilio, Telnyx
  • Streaming Support: Real-time transcription and synthesis
  • 🚀 Easy Integration: Import and use with minimal configuration

Installation

go get github.com/plexusone/omnivoice

Quick Start

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all" // Register all providers
)

Usage

package main

import (
    "context"
    "log"
    "os"

    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all"
)

func main() {
    ctx := context.Background()

    // Get providers by name using the registry
    sttProvider, err := omnivoice.GetSTTProvider("deepgram",
        omnivoice.WithAPIKey(os.Getenv("DEEPGRAM_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    ttsProvider, err := omnivoice.GetTTSProvider("elevenlabs",
        omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    // Transcribe audio
    result, err := sttProvider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
        Language:             "en",
        EnableWordTimestamps: true,
    })
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Transcription: %s", result.Text)

    // Synthesize speech
    audio, err := ttsProvider.Synthesize(ctx, "Hello, world!", omnivoice.SynthesisConfig{
        VoiceID: "pNInz6obpgDQGcFmaJgB", // Adam
    })
    if err != nil {
        log.Fatal(err)
    }
    // audio.Audio contains the audio bytes
}

Provider Registry

Get providers by name at runtime - no need to import individual provider packages:

// Available providers: "openai", "elevenlabs", "deepgram", "twilio"
ttsProvider, _ := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(key))
sttProvider, _ := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(key))

// List registered providers
fmt.Println(omnivoice.ListTTSProviders()) // [openai elevenlabs deepgram twilio]
fmt.Println(omnivoice.ListSTTProviders()) // [openai elevenlabs deepgram twilio]

Language Codes

OmniVoice accepts language codes in BCP-47 format, which includes ISO 639-1 two-letter codes and regional variants.

Common codes:

Code Language
en English
en-US English (US)
en-GB English (UK)
es Spanish
es-MX Spanish (Mexico)
fr French
de German
it Italian
pt Portuguese
pt-BR Portuguese (Brazil)
ja Japanese
ko Korean
zh Chinese
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
ar Arabic
hi Hindi
ru Russian

Notes:

  • Use simple codes (en) for broad compatibility across providers
  • Use regional variants (en-US) when accent/dialect matters for TTS
  • Provider support varies; see provider documentation for full language lists
  • STT providers generally support automatic language detection when no code is specified

Included Providers

Provider STT TTS Registry Name
OpenAI Whisper TTS-1/TTS-1-HD "openai"
ElevenLabs Scribe Multilingual v2 "elevenlabs"
Deepgram Nova-2 Aura "deepgram"
Twilio Media Streams Media Streams "twilio"

Related Packages

License

MIT License - see LICENSE for details.

About

Batteries-included voice pipeline framework for Go. This package provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) with all providers included.

Resources

License

Stars

Watchers

Forks

Contributors

Languages