Skip to content

OpenVoiceOS/architecture

Repository files navigation

OVOS Formal Specifications

Formal, implementation-agnostic specifications for a voice operating system — a platform that provides a stable application binary interface for voice-interactive applications.

This repository is the source of truth for how components talk to each other and what their data shapes mean. The specs are written generically so they can be implemented by any tool, in any language, and adopted beyond their origin project.

What a voice operating system is

A voice OS is not a voice assistant. A voice assistant is a product that answers questions. A voice OS is a platform: it defines the boundary between user input and computation, arbitrates which application handles each utterance, manages conversation state across interactions, and provides a stable ABI that arbitrary third-party applications run against without knowing anything about each other.

The analogy to a general-purpose OS is direct:

OS concept Voice OS equivalent
Process scheduler Pipeline plugin ordering (PIPELINE-1 §5–6)
IPC / message passing The bus and MSG-1 envelope
Shared memory Session carrier (SESSION-1, SESSION-2)
Process supervision Handler-lifecycle trio (PIPELINE-1 §8)
Loadable kernel modules Pipeline plugins, transformer plugins
System call ABI The match(utterances, lang, session) → Match contract

The consequence is that OVOS is not a chatbot, not an LLM wrapper, and not a monolithic product. It is a runtime: swap the scheduler (pipeline ordering), the NLU engines (pipeline plugins), the dialogue policy (converse / context), the output layer (TTS, display), or any combination — the ABI stays stable and the rest keeps working. A skill written against the intent stack runs on any conformant orchestrator, under any pipeline configuration, in any language OVOS supports.

⚠️ Draft. Specs in this repository are at Draft status. Implementations are being brought into conformance progressively; current OVOS behaviour may not yet match these documents. Where it diverges, that is a known implementation bug — not a defect in the specification (see Authority below). The notice will be removed when a spec reaches a stable status.


Goals

The specs exist to make three things possible:

  • Interoperability. Multiple implementations — engines, hosts, plugins, even non-OVOS assistants — can target the same observable contract instead of reverse-engineering each other's code.
  • Stability. Implementation churn no longer drifts the contract. Each spec is a versioned document; behaviour changes go through a pull request with a version bump.
  • Adoption beyond OVOS. The specs are written implementation-agnostically so other voice-assistant projects can adopt the same formats, grammar, and bus contracts without buying into OVOS as a whole.

The specs cover formats and contracts only. They do not mandate implementation choices — programming language, internal design, storage, threading, transport — those are the implementer's. What they fix is the observable contract.


Authority

These specifications are prescriptive, not descriptive. They define the intended architecture; they are not a transcript of how any current code behaves. Where an implementation — in OpenVoiceOS or any other project — diverges from a spec here, that divergence is a bug in the implementation, not in the specification.

Anyone is free to adopt these specifications and free to propose changes to them via pull request (see contributing below). Adoption is voluntary; conformance, once adopted, is not.


Specifications

Intent stack — what a skill defines

ID Document Version Status
OVOS-INTENT-1 Sentence Template Grammar 2 Draft
OVOS-INTENT-2 Locale Resource Formats 2 Draft · v3 in review (PR #4)
OVOS-INTENT-3 Intent Definition 1 Draft
OVOS-INTENT-4 Intent and Entity Registration 2 Draft — in review (PR #9)

Bus stack — how components talk

ID Document Version Status
OVOS-MSG-1 Bus Message 2 Draft
OVOS-SESSION-1 Session Carrier Wire Shape 1 Draft
OVOS-SESSION-2 Session Lifecycle and State Ownership 1 Draft

Orchestrator stack — what processes utterances

ID Document Version Status
OVOS-PIPELINE-1 Utterance Lifecycle and Pipeline 2 Draft
OVOS-TRANSFORM-1 Transformer Plugins 1 Draft — in review (PR #20)
OVOS-CONTEXT-1 Intent Context 1 Draft — in review (PR #18)
OVOS-CONVERSE-1 Active Handlers and Interactive Response 1 Draft — in review (PR #25)
OVOS-STOP-1 Stop Pipeline Plugin 1 Draft — in review (PR #33)
OVOS-AUDIO-IN-1 Audio Input Service 1 Draft

Each spec carries its own scope statement, design rationale, and conformance section in its header. Open the document for the full picture — the tables above are an index.

Reading order by role:

  • Writing a skill? INTENT-1 → INTENT-2 → INTENT-3. INTENT-4 only if you need the registration wire format.
  • Building a pipeline plugin? PIPELINE-1, then SESSION-1 + SESSION-2, then the role spec (CONVERSE-1, CONTEXT-1, TRANSFORM-1, or STOP-1).
  • Building an orchestrator? MSG-1 → SESSION-1 → SESSION-2 → PIPELINE-1, then INTENT-4, CONTEXT-1, CONVERSE-1, TRANSFORM-1, STOP-1.
  • Surveying the architecture? appendix/overview.md §1 for the three-stack narrative.

For background — design rationale, comparisons with other systems, the catalogue of known divergences from current code, and known gaps — see APPENDIX.md (index) or browse by topic under appendix/. For term definitions, see GLOSSARY.md. For the version history of each spec, see CHANGELOG.md.


Reference implementation

ovos-spec-tools is a reference implementation — a dependency-light Python package providing the sentence-template expander, the locale resource loader, the dialog renderer, language matching, and the ovos-spec-lint linter. Components that don't want to reimplement the spec machinery themselves can depend on it. It is also the intended home of the planned conformance corpus.

The bus stack (OVOS-MSG-1) does not yet have a comparable ground-up reference implementation; ovos-bus-client is the closest existing match but predates the spec.


Contributing

Specifications are versioned documents, not living wikis. Any change to a spec — however small — MUST be submitted as a pull request, never committed directly.

Each PR that alters normative content MUST:

  • bump the spec's Version field in its header;
  • add a corresponding entry to CHANGELOG.md.

A version identifies an exact, citable state of a document, so implementations and conformance results can name the version they target.

PRs that touch only the non-normative material — APPENDIX.md and appendix/ files, GLOSSARY.md, this README, examples — do not require a version bump.


Credits

These specifications were produced as part of a documentation and interoperability effort for OpenVoiceOS, funded by NLnet's NGI0 Commons Fund under grant agreement No 101135429.

NGI0 / NLnet

About

Formal architecture specifications for OpenVoiceOS — the intent template grammar, locale resource formats, and the intent definition model

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors