← APPENDIX.md · Non-normative
⚠️ AI-generated draft — not yet fully reviewed. This content was produced by a large language model (Claude Code) and has not yet been fully reviewed for accuracy, completeness, or consistency with the specifications. The normative specifications themselves are human-reviewed; this appendix is supplementary context. Readers should verify claims before relying on them.
Two patterns recur across the spec family and are worth a dedicated treatment.
Under OVOS-MSG-1's source / destination / session model,
the bus is not just an internal transport — it is the
substrate higher-level systems plug into without modifying
the assistant core. Two mechanics make that work:
single-flip routing (§3.1.1), which keeps the routing pair
correct end-to-end without per-component effort; forward vs
reply discipline (§3.1.2), which ensures non-dispatch
emissions are routed correctly through layer-2 transports; and
no central state or correlation (§3.1.3), which makes
layer-2 systems composable. HiveMind is the canonical example
of what all three together enable (§3.1.4).
The most important bus invariant in OVOS, and the one most
often reinvented incorrectly. The routing pair (source,
destination) flips exactly once per conversational turn,
performed by ovos-core, before the intent dispatch is emitted.
From that point on, every handler-side emission is already
addressed back at the user.
Three steps:
-
The user side emits. An external component — microphone service, chat UI, satellite client, test harness — emits an utterance with
sourceset to itself:context: { source: "audio", destination: null, session: {...} } -
ovos-core flips, then dispatches. When the intent service matches an intent it derives the dispatch via
Message.reply(match_type, data)(ovos-core/.../service.py:340). The.replyrule of MSG-1 §5.2 swaps the routing pair:context: { source: "ovos-core", destination: "audio", session: {...} }The dispatch goes out on the per-intent topic
<skill_id>:<intent_name>. The flip has already classified the message as going back at the user, even though a skill handler is what actually runs. -
The handler
.forwards. Every message the skill emits in response —speak, the handler lifecycle trio, GUI events — usesMessage.forward(...)(ovos-workshop/.../ovos.py:1461, 1472, …)..forwardpreservescontextunchanged, so every handler emission is already addressed back at the original user-side component.
Two consequences fall out:
- The boundary is user ↔ assistant, not core ↔ handler. Skill handlers are on OVOS's side of the boundary; from outside, OVOS is one thing. The user doesn't know or care which skill answered them.
- Handler authors never write addressing code. Because
.forwardpreserves the flipped pair, no skill anywhere needs to understandsource/destination. Get the inversion right once in ovos-core, and every downstream skill is automatically correct.
What this rules out: no per-hop addressing (handlers don't
pick their own destination); no second flip (handlers
.forward, they don't .reply to the dispatch); the dispatch
topic <skill_id>:<intent_name> selects the handler, not
destination (the destination belongs to the user).
Implementers using .reply where .forward is appropriate
produce mis-routed messages that work in local tests but
silently break layer-2 routing.
The single-flip model above applies to the dispatch path, but the same choice arises whenever any component emits a Message derived from one it received. The rule is:
- Use
forwardwhen the new Message travels in the same direction as the source — toward the same destination. A handler emittingspeak, a session sync, or a lifecycle event during a dispatch usesforward; the user-side client receives it because thedestinationwas already set by the earlierreplyflip. - Use
replywhen the new Message travels back toward the sender of the source — a responder answering a requester. A skill responding toovos.stop.ping, a handler answering a<owner_id>.converse.requestpoll, or a plugin answering an introspection request all usereply; the flip routes the answer back to whoever asked.
The practical consequence for layer-2 systems (satellites, gateways,
HiveMind nodes): routing metadata is the only mechanism a transport
has to decide where a Message goes. A component that uses reply
where forward is correct sends the Message back at itself instead
of toward the user; a component that uses forward where reply is
correct broadcasts with no destination instead of targeting the
requester. Both mistakes work on a single-node local bus (where
everything is broadcast anyway) and silently fail in layer-2
deployments.
The specs enforce this consistently:
| Emission | Derivation | Reason |
|---|---|---|
| Handler lifecycle trio (PIPELINE-1 §8) | forward |
Travels toward the client alongside the dispatch |
ovos.session.sync from a handler (SESSION-2 §2.7) |
forward |
Session update travels toward the client |
ovos.stop.pong (STOP-1 §4.2) |
reply |
Response back to the stop plugin that pinged |
<owner_id>.converse.response (CONVERSE-1 §4.2) |
reply |
Response back to the converse plugin that polled |
| Pipeline introspection response (PIPELINE-1 §10.2) | reply |
Response back to the observer that requested |
The bus is fully asynchronous. OVOS does not centrally
correlate request/response chains, and does not centrally
track per-conversation state. There is no per-message
identifier, no in-reply-to field, no host-side index mapping
a .response back to its request, no shared "current
conversation" record.
session.session_id identifies an interaction channel —
nothing more. Two messages sharing a session_id are on the
same channel, but the spec guarantees nothing about ordering,
state continuity, or pending requests.
Every component — skills, pipeline plugins, external clients,
layer-2 systems — owns whatever state it needs. An asker that
wants .response correlation keeps its own outstanding-request
table; a skill that wants conversational memory keeps its own
per-session store; a layer-2 system that wants per-peer state
keys on session_id. Whatever a later consumer needs is in
the Message (data / context / session) or out of
band — never recovered from a hidden host-side index.
This is what lets layer-2 systems plug in cleanly: if OVOS kept a central correlation index or a central conversation state, every layer-2 system would have to replicate it, hook into it, or work around it. Because OVOS keeps neither, they compose without contention.
Several real concerns are deferred by this stance and are
listed under §7 (Known gaps): multi-turn conversation, the
other session knobs current OVOS carries beyond session_id
and lang (persona_id, time_format, date_format,
system_unit, tts_preferences, …), and the eventual shape
of conversational state. The async-by-default model means
those future specs only need to define what the state is,
not how it travels.
The single-flip routing model and the no-central-state
design make layer-2 federation composable without modifying
the assistant core. A remote peer is just another user-side
emitter: it sets source to its peer ID, populates session
with its own named session, and emits a Message. The
orchestrator runs the same .reply flip; response messages
carry destination == peer ID; the bridge (watching the bus)
routes them back over the transport. The
session_id == "default" rule keeps device-local TTS on the
device's speakers because remote sessions carry their own
session_id and never "default".
Layer-2 bridges also inherit the session-field
preference/policy split without extra mechanism: client
sessions populate the preference fields
(pipeline, <type>_transformers) to request behaviour;
the bridge populates the policy fields
(blacklisted_pipelines, blacklisted_<type>_transformers)
from the peer's grant. PIPELINE-1 §5.5 and TRANSFORM-1 §5.3
compose them deterministically at the orchestrator.
The piece that sits around the intent and bus stacks — the
multi-stage orchestrator that decides which engine even gets
a turn, runs converse / fallback / common_query / ocp /
persona stages, and produces the universal
ovos.utterance.handled end-marker — is what makes OVOS
structurally distinctive (HA and Rhasspy have no equivalent
layer).
The plugin abstraction is already in current code:
OVOSPipelineFactory loads pipeline plugins by id at startup,
the orchestrator holds them in a pipeline_plugins dict
keyed on pipeline_id, and the default Session.pipeline is
an ordered list of plugin identifiers (with a migration map
translating legacy padatious_high-style names into modern
ovos-padatious-pipeline-plugin-high-style ones). The
official ovos-padatious-pipeline-plugin,
ovos-adapt-pipeline-plugin, ovos-converse-pipeline-plugin,
ovos-fallback-pipeline-plugin,
ovos-common-query-pipeline-plugin,
ovos-ocp-pipeline-plugin, and the persona plugins all
already conform to this model.
OVOS-PIPELINE-1's contribution is therefore a prescriptive refinement, not a wholesale new abstraction. It:
- formalizes the plugin contract (the
matchshape, theMatchresult, the side-effect-free discipline); - defines
<skill_id>:<intent_name>dispatch polymorphism so a plugin can bundle its own handler (a language-model persona, a chatbot) as a first-class participant alongside skill-owned handlers; - prescribes the universal
ovos.utterance.handledend-marker on every terminal path; - renames the
mycroft.skill.handler.*trio →ovos.intent.handler.*.
The current high/medium/low confidence-tier convention is
compatible with PIPELINE-1 and out of scope for the spec.
From the bus's perspective each tier is already a distinct
pipeline_id in the session's pipeline list (e.g.
padatious_high, padatious_medium, padatious_low), which
is exactly what the spec prescribes. How a Python plugin
class internally serves multiple pipeline_ids — one class
with match_high / match_medium / match_low methods,
three separate plugin instances, an orchestrator-side
suffix-decoding helper — is implementation choice the spec
does not constrain.
Three properties make the resulting model unusually expressive:
- All plugins are equivalent. No spec-level distinction
between intent engines, converse handlers, fallbacks,
language-model personas, classic chatbots, anything else.
They all expose the same
matchcontract. - Skills and plugin-bundled handlers are indistinguishable as handler owners. From outside, the assistant responded — the user does not know or care whether a skill matched against a registered intent or a language-model plugin generated the response on the fly.
- The engine-agnostic intent contract is already
realized, not hypothetical. OVOS persona plugins
(
ovos-persona,ovos-persona-server,ovos-claude-plugin,ovos-openai-plugin, etc.) plug into the pipeline as first-class language-model stages. The ordered chain (deterministic keyword engines before fuzzy template engines before language-model fallbacks last) is also how the system bounds generalization in practice.
What OVOS-PIPELINE-1 deliberately leaves out: per-plugin
behavioural contracts. A converse plugin, a fallback
plugin, a persona plugin: each defines itself. PIPELINE-1
only defines the contract every plugin conforms to and the
universal utterance lifecycle around the iteration.
The spec family does not define new transport protocols and does not aim to replace existing ones. Where an external voice-assistant protocol — Wyoming, OpenAI Chat Completions, MCP tool calls, hassil templates, MQTT-based stacks — already exists and serves a population, the spec family is designed to interoperate with it through three well-defined injection points. An adapter that plugs an external protocol into the right injection point is a third-party implementation concern; the spec family makes the integration shape predictable.
1. Pipeline plugins (OVOS-PIPELINE-1 §3) — the dispatch-layer
adapter. A pipeline plugin wraps an external matcher,
consumes the utterance, and returns a Match with the
plugin's own pipeline_id as skill_id. The external
protocol becomes a first-class participant in the dispatch
surface, indistinguishable from a skill from the bus's
perspective. This is how language-model APIs, deterministic
template matchers, and external intent classifiers attach.
2. Transformer chains (OVOS-TRANSFORM-1 §3) — the artifact-pipeline adapter. A transformer wraps an external protocol that operates on an audio, text, or rendered-output artifact but does not claim intents. Examples: a bidirectional-translation service at the utterance and dialog chains; an external STT-confidence validator at the utterance chain; a content-policy filter at the dialog or TTS chain; an acoustic-event detector at the audio chain.
3. Bus boundary (OVOS-MSG-1 §3.4) — the wire-level adapter. A bridge component subscribes to the bus, translates to and from an external transport, and either operates entirely external (Wyoming-style audio / STT / TTS services talking over TCP to a bridge that proxies the OVOS bus) or remotes the whole bus (HiveMind-style layer-2 substrates). The single-flip routing of §3.1.1, the forward/reply discipline of §3.1.2, and the no-central-state stance of §3.1.3 are what make the bus-boundary adapter feasible without modifying the assistant core.
- Wyoming (the component protocol used by Home Assistant
Voice and its ecosystem) operates at the audio-input / STT /
intent / TTS service boundary. A Wyoming bridge sits at the
bus boundary (§3.1, injection point 3 above): translate
Wyoming's
transcriptevent into anovos.utterance.handleemission and translate the assistant'sspeakMessages into Wyoming'ssynthesizeevent. Pipeline plugins are unaffected; Wyoming components plug in under the utterance lifecycle, not into it. - OpenAI Chat Completions and compatible APIs (the
de-facto LLM interface). A persona-style pipeline plugin
wraps an OpenAI-compatible client (§3 of PIPELINE-1,
injection point 1 above). The plugin emits
Matchwithskill_id = <pipeline_id>and bundles its own handler using the dispatch polymorphism of OVOS-PIPELINE-1 §7. The user sees a normal response; the LLM is a first-class intent owner. - MCP (Model Context Protocol) and similar agent-tool protocols. A pipeline plugin can expose OVOS intents to an MCP client (the OVOS-INTENT-4 §10 introspection topics enumerate available intents) or call out to MCP tools from within a plugin-bundled handler. Either direction sits at injection point 1.
- hassil templates and the Home Assistant
intentscorpus. A pipeline plugin can wrap hassil as a deterministic template matcher (injection point 1). Separately, the OVOS-INTENT-1 / hassil grammar lineage is close enough that a translation tool between OVOS-INTENT-2 locale resources and HA'sintentsYAML is mostly mechanical — both formats are template-and-vocabulary YAML at the same level of abstraction. Such a tool would let the HAintentscorpus and the OVOS locale corpus cross-pollinate without either project changing its format. This is concrete planned tooling, not just an architectural possibility (§7). - MQTT-based stacks (Rhasspy 2.x, miscellaneous IoT voice systems). Bridge at the bus boundary (injection point 3), same shape as Wyoming.
- A2A and other agent-bus protocols. Same shape as MCP; pipeline-plugin wrapper or bus-boundary bridge depending on whether the protocol participates in intent dispatch or in cross-process bus routing.
The three injection points are not exhaustive of where adapters could go — a determined integrator can hook almost anywhere — but they are the points the spec family deliberately designs to keep clean. Any new protocol that needs deeper integration than the three points permit is a signal that the protocol genuinely overlaps the assistant's own architecture rather than complementing it, at which point the integration is a co-architecture decision rather than an adapter.