← APPENDIX.md · Non-normative
⚠️ AI-generated draft — not yet fully reviewed. This content was produced by a large language model (Claude Code) and has not yet been fully reviewed for accuracy, completeness, or consistency with the specifications. The normative specifications themselves are human-reviewed; this appendix is supplementary context. Readers should verify claims before relying on them.
Short notes on why the specifications make the choices they do — the reasoning, not the requirement. Cross-reference into the normative sections.
- ASR-normalized input, no escaping (INTENT-1 §2). The grammar targets voice input. By contract, text reaching an engine is already lowercased, punctuation-stripped, single-spaced. Bracket metacharacters therefore cannot occur as literal input, so no escape mechanism is needed. A simplification bought by scoping the grammar to voice.
- Templates are training data (INTENT-1 §4). Enumerating every phrasing is futile for natural speech. A template describes the shape of the training data; the engine generalizes. This is why expansion is defined precisely but matching is not.
- An intent is not an event (INTENT-3 §1). Necessary for an open skill ecosystem — see §2.2.
- Two non-interoperable methods (INTENT-3 §2). Keyword and template intents describe a command in fundamentally different shapes. Rather than forcing one model, the spec keeps both and makes engines declare which they accept. The cost is that a developer must choose per intent and know which engines an installation runs.
- Slot typing is deferred (INTENT-1 §5.3). Interpreting a slot value as a number or date is inseparable from how ASR output is normalized — and normalization is not yet specified. Specifying typing first would be incoherent, so a value is, for now, an opaque sequence of words.
.blacklistvsexcluded(INTENT-3 §4.2, §5.4). The template grammar is purely generative — it cannot express "not this". Template intents therefore need a separate.blacklistartifact for suppression. Keyword intents express the same idea natively with theexcludedconstraint role. The asymmetry follows from the grammar, not from inconsistency.- No regular expressions (INTENT-3 §4.4). Free-form structured text is a slot — use a template intent and the slot extractor. Regexes are also notoriously hard to localize, which conflicts with the per-language model.
- Inline vocabulary references reuse
.voc(INTENT-1 §3.7). A reusable template fragment and a keyword vocabulary are the same thing — a named, slot-free phrase set — so<name>resolves to a.vocrather than introducing a new file role. The change is one grammar token plus an expander step.
- One spec, not two. Envelope + routing + derivations
are tightly coupled — every routing key lives in
context, every derivation manipulates routing, and all of them formalize existing OVOS code. Splitting them was tried; the split did not survive the derivations (which can only meaningfully be defined where the routing keys are), so they were merged into a single bus-message spec. The session carrier, by contrast, did split out cleanly into OVOS-SESSION-1. contextis extensible by design. Only the keys other systems already key behaviour off (source,destination,session) are given normative meaning. Everything else — GUI routing, tracing, security — is layered by other specs without touching the envelope.source/destinationare informational, not authorization (MSG-1 §3.3). The bus is not a security boundary. Layer-2 systems (HiveMind) build authentication and routing enforcement on top of the pair without OVOS itself learning about peers.- The boundary is user ↔ assistant, not core ↔ handler.
The
(source, destination)pair marks who is currently talking to whom across one boundary only: the external participant on one side, the assistant — core and every skill handler — on the other. The flip happens once per conversational turn (§3.1.1), not on every internal hop. - No central correlation, no central state (MSG-1 §5.4,
§3.1.2 above). The bus is fully asynchronous. Components
that need correlation or state own it themselves, keyed
on
session.session_id. Multi-turn conversation, intent context, cross-skill state, and similar concerns are deferred to other specifications. - Topic naming conventions (MSG-1 v2 §2.1.2). The
conventions other specs in the family already follow are
now codified as SHOULD-rules: dot-separated hierarchy
with
:reserved for component-pair shapes; stable ecosystem-identifying root; verb-tense pattern for the trailing segment; request/terminal pairs sharing a root verb (handle↔handled);.responsesuffix for response derivations; per-instance<root>.<domain>.<id>.<verb>form.
- Why a separate session spec.
Message.context.sessionis a load-bearing carrier claimed by multiple specs (PIPELINE-1, CONTEXT-1, TRANSFORM-1) — without a single owner, its wire contract drifts. SESSION-1 consolidates the wire shape and fixes a registry mechanism so future specs claim fields without amending SESSION-1 itself. - Prescriptive, not descriptive. Only the fields
normatively claimed by other specs are recognized.
Implementations carrying extra per-session state
(current OVOS Session has
persona_id,system_unit,time_format,date_format,location,is_speaking,is_recording, …) are non-normative under v1 — they ride through as opaque pass-through and can be claimed by future per-domain specs. - Omission means "let the orchestrator decide". Single
deferral mechanism: omitted single field, empty
session: {}, absentsession, explicitsession_id: "default"— all equivalent on the wire, all resolve at consumption to deployment defaults filled by each consumer. Nonull, no sentinels. - Language signals. Six BCP-47 fields with normative
meanings but stage-dependent consolidation:
lang(user preference, base),secondary_langs(additional understood languages, constrains lang-detect predictions and fallback selection),output_lang(renderer's preferred output language; simplifies the bidirectional-translation transformer to a fallback role),stt_lang/request_lang/detected_lang(per-utterance signals from STT, emitter, and lang-detect respectively).request_langis an emitter-reported hint (per-wakeword language assignment in multi-wakeword setups), not an override.
- Registrations are broadcast — already how OVOS works.
Skills emit registration messages on the bus; plugins
that care about a particular registration kind subscribe
to the corresponding topic. There has never been a
central routing party in OVOS; INTENT-4 just gives this
existing model normative topic names. The legacy bus
topics (
padatious:register_intent,register_vocab, etc.) are renamed into theovos.intent.*namespace — see §5.7 for the mapping. Migration is mostly a string replacement. - No "no plugin claimed" error. Following from the
broadcast model: a registration that no plugin consumes
is silently dropped. The producer gets no signal — the
introspection topics (
ovos.intent.list/ovos.intent.describe) are the supported way to verify what the orchestrator's passive index recorded. - The orchestrator passively indexes; it does not gate. The introspection topics serve from a passive registration index built by listening to broadcasts (this is new — current OVOS has no central index). The index reflects what skills declared, not what plugins actually match against — observability-only.
- Skill self-identification on every emission
(INTENT-4 §3.1). Every Message a skill emits or
modifies in place carries
Message.context["skill_id"]. Enforcement is structural on the dispatch path: the orchestrator stampscontext.skill_idfrom the<skill_id>:<intent_name>dispatch topic prefix (PIPELINE-1 §7.1), and skill emissions viaforward/replyinherit automatically.
- The plugin model is already in place; PIPELINE-1
refines it (§3.2). The current orchestrator already
loads plugins by id through
OVOSPipelineFactoryand iteratesSession.pipeline. PIPELINE-1 tightens the contract rather than introducing the abstraction. - Orchestrator and plugin contracts live in one spec, since the orchestrator's job is iterating plugins and translating their matches into bus events. Splitting them would leave neither coherent.
- Plugin contract is minimal.
match(utterances, lang, session) → Match | None. Side-effect-free duringmatch; everything else (state, registrations, language-model calls, response generation) is plugin-internal black box. The smaller the contract, the wider the set of plugins it accommodates. langparameter is propagation-only. The orchestrator passeslangthrough fromMessage.data.lang; it MUST NOT synthesize a value fromsession.langor any per-utterance signal field whendata.langis absent. Absence is a faithful "unknown" signal; consumer-side fallback policy is the consumer's.- Tier conventions are out of scope. The current
high / medium / low suffix is implementation strategy:
from the bus, each tier is already a distinct
pipeline_idinSession.pipeline. The current convention is compatible with PIPELINE-1 unchanged. - Skills and plugins are equivalent handler owners.
The dispatch topic
<skill_id>:<intent_name>is uniform: for a pure-matcher plugin theskill_idis the matched skill's id; for a plugin that bundles its own handler (e.g. a language-model persona)skill_id == pipeline_id. Both are addressed the same way. - Universal
ovos.utterance.handledend-marker on every terminal path. One reserved invariant lets observers count turns, route fallbacks, and know "the assistant is idle now" without per-stage knowledge. - Three-stage composition (PIPELINE-1 §5.5) —
preference (from
session.pipelineor default-session pipeline) → availability (drop unloaded plugins) → policy (drop denylisted). Mirrors TRANSFORM-1 §5.3 exactly. The same shape supports the client-requests/layer-2-enforces split (§3.1).
- Lifts intent context out of Adapt. The Adapt-specific
add_context/remove_contextmechanism, and the legacymycroft.skill.set_cross_context/remove_cross_contextfan-out for cross-skill use, are Adapt-only at the matcher level — Padatious and other engines ignore them. CONTEXT-1 generalizes the mechanism into a session-bound, decaying flat key/value store consumed by every intent engine uniformly viarequires_contextandexcludes_contextdeclarations. - Two explicit scopes encoded in the key shape.
private(orchestrator auto-prefixes with<skill_id>:) andshared(flat, cross-skill). The current OVOS code models the same distinction informally (MycroftSkill.set_contextauto-prefixes withalphanumeric_skill_id;set_cross_skill_contextfans out via a bus event); CONTEXT-1 names the scopes explicitly and routes both through one bus surface. - Why private is the default. A skill that calls
ovos.context.setwithout specifyingscopegets a private entry. This optimises for the safer case: a cross-skill leak from an accidentally-shared entry is harder to debug than a cross-skill miss from an accidentally-private entry. The current Adaptset_contextpattern is effectively skill-private; the default preserves migration fidelity. Cross-skill coordination is a conscious decision that deserves an explicitscope: "shared". - Prior art for the negative gate. Three in-tree
intent engines under
/plugins-pipeline/— jurebes, nebulento, and palavreado — independently implementexclude_contextas a first-class negative gate. CONTEXT-1'sexcludes_contextadopts the same primitive at the spec level, addressing patterns ("fire once", "modal suppression") that positive gating alone cannot express. - Engine-side mutation as a sanctioned non-bus
pathway. The Adapt pipeline plugin auto-injects matched
entities into context inside
match(), which conflicts with PIPELINE-1 §4.2's side-effect-freematchrule. CONTEXT-1 §5.3 carves an explicit window between match-accept and dispatch-emit for engine-side session mutation, with the orchestrator (not the bus) carrying the write. This both legitimizes the established practice and resolves the PIPELINE-1 contradiction. - Eight-level lifecycle-position owner precedence
(CONTEXT-1 §5.2). When a Message carries multiple
component-identity keys (skill_id, pipeline_id, the six
<type>_transformer_ids) from a derivation chain that crossed component boundaries, the orchestrator picks the owner by lifecycle position: the latest stage to run is the most specific.
- Spec'd as an architectural pattern, not a feature list. An orchestrator MAY implement chains at any subset of six injection points (audio, utterance, metadata, intent, dialog, TTS); a null-implementation is conformant. For each chain it does implement, the per-type contract binds. Each injection point's existence is justified by what the lifecycle holds at that exact moment — what's possible there that isn't possible elsewhere.
- Intent transformers as the system-typing home.
INTENT-1 §5.3 defers slot value typing pending a text
normalization specification. TRANSFORM-1 §3.4 is the
spec'd injection home for typing: a deployer ships
date / number / duration parsing once, and every skill
receives typed values in
Match.slotsregardless of which engine matched. The OVOS analogue of ASK'sAMAZON.DATEand Dialogflow's@sys.date-time, but as an injected enrichment rather than a built-in engine feature. - Concrete in-tree plugins as prior art. Nine plugins
live under
/plugins-transformer/today, covering five of the six injection points: utterance transformers (ovos-utterance-normalizer,ovos-utterance-corrections-plugin,ovos-transcription-validator-plugin,ovos-utterance-plugin-cancel,ovos-bidirectional-translation-plugin); dialog transformers (ovos-dialog-normalizer-plugin,ovos-bidirectional-translation-plugin,ovos-dialog-transformer-openai-plugin); audio transformers (ovos-audio-transformer-plugin-speechbrain-langdetect,ovos-audio-transformer-plugin-ggwave,ovos-audio-transformer-redis-publish); intent transformers (ovos-keyword-template-matcher,ovos-ahocorasick-ner-plugin). Thebidirectional-translationplugin exercises the cross-chain coordination viaMessage.contextthat TRANSFORM-1 §7 formalizes. - Ascending priority. TRANSFORM-1 §4 specifies
ascending priority (lower = earlier, default 50).
Current OVOS sorts transformer chains descending
(
ovos_core/transformers.py:53,117,205,reverse=True); the spec aligns with the ascending convention already used by fallback skills (fallback_service.py:49, default 101 = run last) and the natural "stages count up" reading. Bringing current plugins into conformance only requires flipping relative priorities, not rewriting. - Cancellation aligned with prior plugin convention.
Two existing utterance transformers
(
ovos-utterance-plugin-cancel,ovos-transcription-validator-plugin) already signal the lifecycle should abort by returning empty utterance lists with{canceled: true, cancel_word: <reason>}context keys. TRANSFORM-1 §8 keeps the convention, renamingcancel_wordtocancel_reason(the structured concept the field encodes) and adding orchestrator-stampedcancel_by: <transformer_id>. The spec'sovos.utterance.cancelledterminal event sits alongsideovos.intent.unmatched, keeping cancellation and failure observably distinct on the bus. langparameter is bidirectional (TRANSFORM-1 §3.0). Four of the six per-type contracts (audio, utterance, dialog, TTS) takelangas input and return it as output. A bidirectional-translation transformer that takes Spanish in and produces English out returns the destination language; the orchestrator writes the chain's finallangback intoMessage.data.langfor downstream stages. Language-detector and clearing cases fall out of the same channel.- Per-type self-identification keys, list-valued.
TRANSFORM-1 §1.3 claims six
Message.contextkeys (one per transformer type) rather than a single generic key. Role matters: a Message may have been touched by multiple types in sequence, and a multi-type plugin (e.g., both utterance and dialog) would be ambiguous in a single-key model. Keys are lists because transformers chain — the full per-type chain is preserved in order. - Per-type denylists complete the policy surface.
TRANSFORM-1 §5.2 claims six
blacklisted_<type>_transformerssession fields, paralleling the six<type>_transformerschain-ordering fields of §5.1 and thepipeline/blacklisted_pipelinespair of PIPELINE-1 §5. Three-stage composition (preference → availability → policy) in §5.3 mirrors PIPELINE-1 §5.5 exactly. - The per-type "explosion" is deliberate. Twelve flat
session fields (six chain-orderings + six denylists) plus
six
Message.contextattribution keys. A prefix-encoded single namespace would require prefix parsing at every lookup; the per-type partition matches the existing registry and chain-ordering structure. Under SESSION-1 §3.4's SHOULD-omit rule the common case carries zero of these on the wire. - Language signals live in SESSION-1. Language signals
(
stt_lang,request_lang,detected_lang, alongsidelang,secondary_langs,output_lang) are session-scoped fields with normative meanings but a non-binding consolidation order — the right priority is stage-dependent. TRANSFORM-1 §7.1 names which transformer types are natural producers of which signals; consolidation is the consumer's decision per SESSION-1 §3.2.7.
The most common reader question on first encountering STOP-1 is
why a pipeline plugin and not a skill. Stop sounds like an
ordinary intent: a user utterance ("stop", "cancel") matched and
handled. A skill that registers a stop intent and implements a
stop handler looks like the obvious shape. STOP-1 deliberately
lifts stop into the pipeline layer instead, and the reasons are
load-bearing — a skill cannot implement the cascade defined in
STOP-1 §4 even in principle.
Pre-emption requires evaluation-layer ordering control, not handler-layer dispatch. Stop's defining property is that it pre-empts every other matching stage — active converse polls, response-mode delivery, ordinary intent matching. Pipeline plugins are evaluated in declared order with first-match-wins; STOP-1 §7 positions the stop plugin first so it gets the first opportunity to claim every utterance. A skill's intent handler runs only after intent matching has already selected it, by which point converse and intent matchers have already had their say. The escape-hatch property lives at the pipeline-iteration layer, not the handler layer; a skill is at the wrong layer to own it.
The cascade target is decided before dispatch. STOP-1 §4.1
consults session.active_handlers, performs the ping-pong
filter, picks the most recently activated responder by
activated_at, and emits a Match whose skill_id
is the chosen target. The orchestrator then dispatches
<target>:stop directly using its ordinary routing rules. A
skill matching stop utterances would itself become the dispatch
target, and would then have to re-emit synthetic dispatches at
other skills — bypassing the orchestrator's routing model and
losing the standard handler-lifecycle trio for the actual stop.
Match-phase target selection is what reduces the cascade to a
single clean PIPELINE-1 dispatch instead of two-step orchestration.
Match.updated_session carries the post-stop session state.
STOP-1 §6.2 requires the stopped handler to be removed from
active_handlers via Match.updated_session so the cleared
state propagates through the rest of the utterance lifecycle.
Skills have no Match to mutate; their handlers receive the
dispatch session and may mutate it from within the handler
boundary, but cannot communicate session changes that apply
to the dispatch itself.
The reserved-name authority lives at the spec / pipeline
layer. STOP-1 §2 reserves stop across every OVOS-INTENT-4
registration in the deployment, enforced by the orchestrator's
malformed-payload treatment of competing registrations. The
authority to define what stop means globally — and to police
skill-level attempts to claim the name — cannot live inside any
single skill that itself uses the name.
Confidence-tier interleaving is a pipeline-ordering concern.
STOP-1 §7 describes stop_high / stop_medium / stop_low
interleaved with other pipeline plugins of comparable confidence.
A skill has no analogous handle on inter-stage ordering; intent
confidence is consumed by intent matchers, not by the outer
pipeline that decides which matcher runs first.
The two layers cooperate by design. A skill MAY — and per STOP-1
§9 SHOULD — provide its own stop handler: every skill that
participates in the cascade implements a stop intent handler
subscribed to <own_skill_id>:stop. The pipeline plugin matches
and selects; the skill stops. Stop is one of the few cases in
the spec set where the pipeline / skill split is not
substitutable.