Skip to content

Latest commit

 

History

History
183 lines (157 loc) · 8.52 KB

File metadata and controls

183 lines (157 loc) · 8.52 KB

← APPENDIX.md · Non-normative

⚠️ AI-generated draft — not yet fully reviewed. This content was produced by a large language model (Claude Code) and has not yet been fully reviewed for accuracy, completeness, or consistency with the specifications. The normative specifications themselves are human-reviewed; this appendix is supplementary context. Readers should verify claims before relying on them.

2. Comparison with other voice-assistant systems

The OVOS specifications occupy territory adjacent to several existing voice-assistant systems. This section locates the design choices against each comparator. The summary in §2.5 records where the voice OS leads architecturally, where it follows, and where it makes a deliberately different choice.

2.1 Home Assistant and Rhasspy — shared grammar lineage

OVOS, Home Assistant (HA), and Rhasspy share a common lineage. The bracket-expansion grammar of OVOS-INTENT-1 — (a|b) alternatives, [optional] segments, {slot} placeholders — is the same family as HA's hassil sentence templates and Rhasspy's sentences.ini. The syntax is not novel. What is distinctive about the OVOS approach is everything around the grammar.

What OVOS does differently:

  • An implementation-agnostic spec at all. HA and Rhasspy have no format-level specification independent of their implementation — the code is the contract. OVOS now has one, which is what lets multiple engines (and other assistants) implement the same contract.
  • Engine-agnostic matching. OVOS-INTENT-1 §4 treats templates as training data and leaves matching, scoring, and generalization to the engine. HA's core matching is hassil, a deterministic template matcher; Rhasspy compiles templates into a closed ASR grammar. The OVOS contract accommodates a deterministic matcher, a neural classifier, or an LLM behind one interface.
  • Templates are training data, not a closed grammar. A capable OVOS engine generalizes beyond the authored samples. Rhasspy's closed-grammar model is deterministic and offline-guaranteed but brittle — an utterance not derivable from sentences.ini cannot be recognized at all.
  • A multi-stage pipeline (§3.2). Intent engines are two stage kinds among many. Neither HA nor Rhasspy exposes an intent layer this structured.
  • An intent is bound to one handler, owned by one skill (OVOS-INTENT-3 §1). See §2.2 — this follows necessarily from the open skill ecosystem.
  • A bus substrate openable to layer-2 systems (§3.1). Neither HA nor Rhasspy exposes their bus this openly.

What HA and Rhasspy do better:

  • Reusable template fragments. hassil has expansion_rules and Rhasspy has <rule> references — named, reusable sub-templates that let authors share common fragments (politeness prefixes, articles, recurring phrasings). OVOS-INTENT-1 version 2 closes this with the <name> inline vocabulary reference, which expands a named .voc in place — reusing the existing slot-free format rather than adding a new construct.
  • i18n corpus maturity. HA's community intents repository is a large, managed, professionally-translated corpus covering many languages. OVOS has the tooling counterpart in ovos-localize (§1.4) — so the gap here is the scale and maturity of the corpus, not the absence of tooling.
  • Concrete, testable completeness. HA and Rhasspy ship systems where the hard parts — matching, number and range handling, slot typing — are solved concretely. The OVOS specs deliberately defer some of these (slot typing to a future normalization spec; matching to the engine). That deferral is intellectually consistent but means the specs' value depends on the engines and tooling that fill the gaps.

2.2 Closed domain vs open ecosystem

The sharpest difference between OVOS and HA is not technical but structural. Home Assistant is a curated, closed domain: home automation, with a vendor-managed intent vocabulary. HA can treat an intent such as HassTurnOn as a shared contract honoured uniformly across hundreds of integrations and many languages, because HA controls and curates that vocabulary.

OVOS is an open ecosystem. Skills are arbitrary third-party Python packages, installed by pip, developed independently, running as arbitrary code in process. A skill can do anything; OVOS voice-enables anything. In that setting a shared global intent vocabulary is not a missing feature — it is incoherent. When skills are unbounded, an intent must be private to the skill that defines it and bound directly to that skill's handler. OVOS-INTENT-3's "an intent is not an event" stance is therefore the correct model for an open ecosystem, just as HA's shared-vocabulary model is correct for a curated one. The two models are right for different platforms; neither is universally better.

2.3 Rasa — closest comparator for intent context

Rasa's "active forms" and slot mappings perform context-aware matching, but they are baked into the policy engine; you cannot run a Rasa NLU pipeline without Rasa policies. OVOS-CONTEXT-1 separates gating (requires_context / excludes_context, §6 / §6.1 of that spec) from match-time capture (the context-supplied capture rule, §7) from engine matching hints (engine-internal use of values, §6), so every intent engine that consumes OVOS-INTENT-3 registrations can gate uniformly without buying into a particular dialog policy.

Rasa wins on conversation-level evaluation infrastructure — story-based testing, end-to-end success metrics — for which the OVOS specs have no analogue yet (§7 catalogues this as a known gap).

Rasa's NLU pipeline is also the closest analogue to OVOS-TRANSFORM-1's utterance / metadata / intent chains, but it is a single sequence per language model and the policy/preference split (TRANSFORM-1 §5.3) does not exist. TRANSFORM-1's six-injection-point model is genuinely more expressive.

2.4 Amazon ASK / Alexa Skills Kit, Google Dialogflow

Both are closed-domain centrally-trained stacks. Their built-in entity-type systems (AMAZON.DATE, @sys.date-time) are what OVOS-TRANSFORM-1 §3.4 replicates as an injectable, deployer-replaceable, engine-agnostic contract — at the spec level OVOS is strictly more flexible, though OVOS defers the typed value formats themselves (date encoding, number representation, duration units) to a future text-normalization spec (§7), while ASK and Dialogflow ship them as built-ins.

Neither ASK nor Dialogflow has a session.pipeline-equivalent (the assistant picks one matcher per skill); neither has anything like the layer-2 substrate of OVOS-MSG-1 §3.4. ASK has built-in intents (AMAZON.HelpIntent) but they are handled inside the skill; Dialogflow has fallback intents but they do not have first-class dispatch identity. OVOS-PIPELINE-1's dispatch polymorphism (skill_id == pipeline_id for plugin-bundled handlers) lets a non-skill component advertise its own intent identity on the bus, indistinguishable from a skill — original to this architecture.

2.5 Summary — where the voice OS leads, follows, and differs

OVOS leads architecturally in three places:

  • The pipeline-plugin model with first-class dispatch polymorphism. No comparator lets a non-skill component (LLM persona, chatbot, fallback) be a first-class handler owner on the same dispatch surface.
  • The six-injection-point transformer chain with per-session preference/policy separation. Nothing in HA, Rhasspy, Rasa, ASK, or Dialogflow has a comparable lifecycle-uniform extensibility surface.
  • Negative gating (excludes_context "match if absent") in CONTEXT-1. ASK/Dialogflow contexts are purely positive; Rasa forms are not engine-agnostic; HA has no context model. The fire-once and modal-suppression patterns fall out of negative gating.

OVOS follows where ecosystem investment matters more than architecture:

  • HA's translation corpus scale (the intents repository).
  • ASK / Dialogflow's typed entity systems.
  • Rasa's conversation-level evaluation infrastructure.

OVOS makes a deliberately different choice in two places:

  • Engine-agnostic templates as training data (OVOS-INTENT-1 §4) rather than Rhasspy-style closed grammars. The trade-off: generalization beyond authored samples vs. offline-deterministic recognition guarantees.
  • Open skill ecosystem with skill-private intents (OVOS-INTENT-3 §1) rather than HA-style curated vocabulary. The trade-off: skill author freedom vs. cross-integration vocabulary sharing.