pronsim-cli

Command-line client for the Pronunciation Similarity API.

Installation

Requires Python 3.11+.

pip install git+https://github.com/deepgram/pronsim-cli.git

Configuration

Create a .env file in your working directory (the CLI loads it automatically):

PRONSIM_CLI_BASE_URL=https://your-api-host:8080
PRONSIM_API_TOKEN=your-token-here

Or pass options directly:

pronsim --base-url https://your-api-host:8080 --token YOUR_TOKEN health

Global options

Option	Env var	Default	Description
`--base-url`	`PRONSIM_CLI_BASE_URL`	`http://localhost:8080`	API base URL
`--timeout`		`120`	HTTP timeout in seconds
`--token`	`PRONSIM_API_TOKEN`		Bearer token for authentication

Commands

Health & info

pronsim health                  # Check API health
pronsim list-words              # List all words with exemplar counts

Exemplar upload

# Single exemplar
pronsim upload-exemplar audio.wav hello --save-embeddings -o out.json

# Batch upload from JSON file
pronsim upload-exemplars exemplars.json --batch-size 10 --save-embeddings -o out.json

Batch input format (exemplars.json):

{
  "hello": ["recordings/hello_1.wav", "recordings/hello_2.wav"],
  "world": ["recordings/world_1.wav"]
}

Scoring

# Single file
pronsim score audio.wav "say hello to the world" hello --save-embeddings -o result.json

# Batch score from JSON file
pronsim score-batch scores.json --batch-size 10 --save-embeddings -o results.json

Batch input format (scores.json):

[
  {"audio": "utterances/sample1.wav", "text": "say hello to the world", "word": "hello"},
  {"audio": "utterances/sample2.wav", "text": "hello world", "word": "world"}
]

Exemplar retrieval & management

pronsim get-exemplars hello --save-embeddings --output-dir ./embeddings
pronsim get-exemplar HASH --audio --embedding --output-dir ./downloads
pronsim get-centroid hello --save --output-dir ./centroids
pronsim delete-exemplar HASH

Batch processing

Both upload-exemplars and score-batch support:

--batch-size N (default 10) — limits how many files are sent per API request.
--resume — when used with -o, skips input entries that already have successful results in the output file. Re-run after a partial failure to retry only failed chunks.
-o OUTPUT — output JSON is atomically updated after each batch chunk, so the file always contains valid JSON even if the process is interrupted.

If a batch chunk fails, the error is logged to stderr and an "error" key is added to that chunk's entries in the output. Processing continues with the next chunk.

Embeddings

When --save-embeddings is used, embeddings are saved as .npy files (fp16 NumPy arrays) next to the source audio files.

Naming conventions:

Scoring: {audio_stem}_{word}_embedding.npy
Exemplar upload: {audio_stem}_{word}_exemplar_embedding.npy
Exemplar retrieval: {hash}_{word}_exemplar_embedding.npy
Centroid: {word}_centroid_embedding.npy

Load a saved embedding:

import numpy as np
emb = np.load("sample1_hello_embedding.npy")  # float16 array

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
pronsim_cli		pronsim_cli
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pronsim-cli

Installation

Configuration

Global options

Commands

Health & info

Exemplar upload

Scoring

Exemplar retrieval & management

Batch processing

Embeddings

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pronsim-cli

Installation

Configuration

Global options

Commands

Health & info

Exemplar upload

Scoring

Exemplar retrieval & management

Batch processing

Embeddings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages