Formal, implementation-agnostic specifications for a voice operating system — a platform that provides a stable application binary interface for voice-interactive applications.
This repository is the source of truth for how components talk to each other and what their data shapes mean. The specs are written generically so they can be implemented by any tool, in any language, and adopted beyond their origin project.
A voice OS is not a voice assistant. A voice assistant is a product that answers questions. A voice OS is a platform: it defines the boundary between user input and computation, arbitrates which application handles each utterance, manages conversation state across interactions, and provides a stable ABI that arbitrary third-party applications run against without knowing anything about each other.
The analogy to a general-purpose OS is direct:
| OS concept | Voice OS equivalent |
|---|---|
| Process scheduler | Pipeline plugin ordering (PIPELINE-1 §5–6) |
| IPC / message passing | The bus and MSG-1 envelope |
| Shared memory | Session carrier (SESSION-1, SESSION-2) |
| Process supervision | Handler-lifecycle trio (PIPELINE-1 §8) |
| Loadable kernel modules | Pipeline plugins, transformer plugins |
| System call ABI | The match(utterances, lang, session) → Match contract |
The consequence is that OVOS is not a chatbot, not an LLM wrapper, and not a monolithic product. It is a runtime: swap the scheduler (pipeline ordering), the NLU engines (pipeline plugins), the dialogue policy (converse / context), the output layer (TTS, display), or any combination — the ABI stays stable and the rest keeps working. A skill written against the intent stack runs on any conformant orchestrator, under any pipeline configuration, in any language OVOS supports.
⚠️ Draft. Specs in this repository are at Draft status. Implementations are being brought into conformance progressively; current OVOS behaviour may not yet match these documents. Where it diverges, that is a known implementation bug — not a defect in the specification (see Authority below). The notice will be removed when a spec reaches a stable status.
The specs exist to make three things possible:
- Interoperability. Multiple implementations — engines, hosts, plugins, even non-OVOS assistants — can target the same observable contract instead of reverse-engineering each other's code.
- Stability. Implementation churn no longer drifts the contract. Each spec is a versioned document; behaviour changes go through a pull request with a version bump.
- Adoption beyond OVOS. The specs are written implementation-agnostically so other voice-assistant projects can adopt the same formats, grammar, and bus contracts without buying into OVOS as a whole.
The specs cover formats and contracts only. They do not mandate implementation choices — programming language, internal design, storage, threading, transport — those are the implementer's. What they fix is the observable contract.
These specifications are prescriptive, not descriptive. They define the intended architecture; they are not a transcript of how any current code behaves. Where an implementation — in OpenVoiceOS or any other project — diverges from a spec here, that divergence is a bug in the implementation, not in the specification.
Anyone is free to adopt these specifications and free to propose changes to them via pull request (see contributing below). Adoption is voluntary; conformance, once adopted, is not.
| ID | Document | Version | Status |
|---|---|---|---|
| OVOS-INTENT-1 | Sentence Template Grammar | 2 | Draft |
| OVOS-INTENT-2 | Locale Resource Formats | 2 | Draft · v3 in review (PR #4) |
| OVOS-INTENT-3 | Intent Definition | 1 | Draft |
| OVOS-INTENT-4 | Intent and Entity Registration | 2 | Draft — in review (PR #9) |
| ID | Document | Version | Status |
|---|---|---|---|
| OVOS-MSG-1 | Bus Message | 2 | Draft |
| OVOS-SESSION-1 | Session Carrier Wire Shape | 1 | Draft |
| OVOS-SESSION-2 | Session Lifecycle and State Ownership | 1 | Draft |
| ID | Document | Version | Status |
|---|---|---|---|
| OVOS-PIPELINE-1 | Utterance Lifecycle and Pipeline | 2 | Draft |
| OVOS-TRANSFORM-1 | Transformer Plugins | 1 | Draft — in review (PR #20) |
| OVOS-CONTEXT-1 | Intent Context | 1 | Draft — in review (PR #18) |
| OVOS-CONVERSE-1 | Active Handlers and Interactive Response | 1 | Draft — in review (PR #25) |
| OVOS-STOP-1 | Stop Pipeline Plugin | 1 | Draft — in review (PR #33) |
| OVOS-AUDIO-IN-1 | Audio Input Service | 1 | Draft |
Each spec carries its own scope statement, design rationale, and conformance section in its header. Open the document for the full picture — the tables above are an index.
Reading order by role:
- Writing a skill? INTENT-1 → INTENT-2 → INTENT-3. INTENT-4 only if you need the registration wire format.
- Building a pipeline plugin? PIPELINE-1, then SESSION-1 + SESSION-2, then the role spec (CONVERSE-1, CONTEXT-1, TRANSFORM-1, or STOP-1).
- Building an orchestrator? MSG-1 → SESSION-1 → SESSION-2 → PIPELINE-1, then INTENT-4, CONTEXT-1, CONVERSE-1, TRANSFORM-1, STOP-1.
- Surveying the architecture? appendix/overview.md §1 for the three-stack narrative.
For background — design rationale, comparisons with other systems, the catalogue of known divergences from current code, and known gaps — see APPENDIX.md (index) or browse by topic under appendix/. For term definitions, see GLOSSARY.md. For the version history of each spec, see CHANGELOG.md.
ovos-spec-tools
is a reference implementation — a dependency-light Python package
providing the sentence-template expander, the locale resource
loader, the dialog renderer, language matching, and the
ovos-spec-lint linter. Components that don't want to reimplement
the spec machinery themselves can depend on it. It is also the
intended home of the planned conformance corpus.
The bus stack (OVOS-MSG-1) does not yet have a comparable
ground-up reference implementation; ovos-bus-client is the
closest existing match but predates the spec.
Specifications are versioned documents, not living wikis. Any change to a spec — however small — MUST be submitted as a pull request, never committed directly.
Each PR that alters normative content MUST:
- bump the spec's
Versionfield in its header; - add a corresponding entry to CHANGELOG.md.
A version identifies an exact, citable state of a document, so implementations and conformance results can name the version they target.
PRs that touch only the non-normative material — APPENDIX.md and appendix/ files, GLOSSARY.md, this README, examples — do not require a version bump.
These specifications were produced as part of a documentation and interoperability effort for OpenVoiceOS, funded by NLnet's NGI0 Commons Fund under grant agreement No 101135429.
