test(desktop): lock chat prompt ↔ tool-registry & schema consistency by eulicesl · Pull Request #7573 · BasedHardware/omi

eulicesl · 2026-05-31T16:28:33Z

Summary

Problem: Two prompt↔runtime drifts fail silently and read to the user as "the AI is dumb": (1) a tool declared in the chat prompt's <tools> block but not dispatched by ChatToolExecutor.execute(_:) → the model emits calls that come back "Unknown tool"; (2) a hardcoded SQL example referencing a table that no longer exists → runtime SQL errors. Neither is caught by the compiler.
What changed: Adds three Set<String> registries on ChatToolExecutor as the source of truth for the tool names it dispatches, and a PromptSchemaConsistencyTests suite that validates the prompt's formal regions against those registries and the schema annotations.
What did NOT change (scope boundary): No prompt rewrite, no runtime behavior change. Tests are scoped to the formal <tools> block and **Common SQL patterns:** section — prose narrative is intentionally out of scope (a regex over narrative would false-positive on ordinary English near SQL keywords).

Linked Issue / Bounty

N/A — independent contribution.

Root Cause (bug fixes only)

N/A — this is a regression-guard / test PR. It prevents a class of silent drift rather than fixing a current break (the tests pass against today's prompt).

How It Works

ChatToolExecutor gains piMonoChatToolNames (the 7 default chat tools), onboardingOnlyToolNames, and allRegisteredToolNames (their union), documented to stay in lockstep with the execute(_:) cases.
PromptSchemaConsistencyTests extracts the <tools> block and the **Common SQL patterns:** section from ChatPrompts.desktopChat and asserts:
- every **tool**: marker is in allRegisteredToolNames, and every piMonoChatToolNames entry has a marker (both directions);
- the prompt's "You have N tools" claim equals piMonoChatToolNames.count;
- every table referenced in the SQL-patterns section exists in ChatPrompts.tableAnnotations (or a small FTS/aux allowlist);
- no columnAnnotations set is orphaned after its table was removed.

Change Type

Chore / infra (test guardrail)

Scope

Desktop app (Swift/macOS)

Security Impact

New permissions or capabilities introduced? No
Auth or token handling changed? No
Data access scope changed (screen data, OCR, user data)? No
New or changed network calls? No
Plugin or tool execution surface changed? No — the registries are a declarative mirror of the existing execute(_:) dispatch; nothing new is executable.

Testing

Built locally — xcrun swift build -c debug --package-path Desktop → Build complete!
Manual verification: compile + test run on macOS.
Existing tests pass
New tests added — PromptSchemaConsistencyTests (5 cases), all green against the current prompt.

Human Verification

Verified scenarios: clean debug build against current main; all 5 new tests pass — confirming the shipping prompt's <tools> block, tool-count claim, and SQL examples are already consistent with the executor and schema (no drift today).
Edge cases checked: the tool-name extraction regex is [a-z0-9_\-]+ (not [a-z_]+) so a future hyphenated/numeric tool name can't slip past the check; SQL extraction covers FROM/JOIN/INSERT INTO/UPDATE/DELETE FROM.
What I did not verify (and why): no live omi.db introspection — the ChatPrompts annotation dictionaries are themselves what gets serialized into the prompt's schema, so validating against them gives the same guarantee without a DB fixture.

Evidence

Test output:

PromptSchemaConsistencyTests — Executed 5 tests, 0 failures
Build complete!

Docs

N/A — no docs impact

Performance, Privacy, and Reliability

No runtime impact (static sets + tests). Reliability: turns a class of silent prompt↔runtime drift into a CI failure at edit time.

Migration / Backward Compatibility

Backward compatible? Yes
Migration needed? No — additive registries + a new test file.

Risks and Mitigations

Risk: The registries could fall out of sync with execute(_:) if a tool is added without updating them. Mitigation: that's exactly what the tests catch — a new prompt tool with no registry entry (or vice versa) fails CI. The registries are documented as needing to track the execute(_:) cases.
Risk: The section-extraction regexes assume the prompt keeps its <tools>…</tools> and **Common SQL patterns:** structure. Mitigation: if those markers disappear, the tests XCTFail loudly rather than silently passing.

AI Disclosure

Tools used: Claude Code (Claude Opus).
AI-assisted portions: the registries, the tests, and this PR description.
How I verified correctness: ran the local debug build and the full PromptSchemaConsistencyTests suite (all green); confirmed the registries match the executor's actual execute(_:) dispatch cases.

A tool declared in the chat prompt's <tools> block but not dispatched by ChatToolExecutor.execute(_:) makes the model emit calls that fail with "Unknown tool"; a hardcoded SQL example referencing a dropped table fails at runtime. Both read to the user as "the AI is dumb", and neither is caught by the type checker. Adds three Set<String> registries on ChatToolExecutor as the source of truth for the tool names it dispatches (the 7 piMono chat tools, the onboarding-only tools, and their union), kept in lockstep with the execute(_:) cases. PromptSchemaConsistencyTests then validates the formal regions of the prompt against them: - every **tool**: marker in the <tools> block is a registered tool, and every piMono tool has a marker (both directions); - the "You have N tools" claim matches the registry size; - every table referenced in the Common SQL patterns section exists in tableAnnotations (or the FTS aux set); - no column-annotation set orphaned after its table was removed. Scoped to the formal <tools> block and SQL-patterns section so prose narrative doesn't trigger false matches. All 5 pass against the current prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8288ec7d83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

greptile-apps · 2026-05-31T16:32:10Z

Greptile Summary

This PR adds a declarative tool-name registry to ChatToolExecutor and a new PromptSchemaConsistencyTests suite that validates the chat prompt's <tools> block and SQL patterns section against those registries and ChatPrompts schema annotations — turning silent prompt↔runtime drift into a CI failure.

Adds three Set<String> statics (piMonoChatToolNames, onboardingOnlyToolNames, allRegisteredToolNames) as the authoritative list of dispatchable tools, with documentation tying them to the execute(_:) switch cases.
Adds 5 XCTest cases covering tool-name bidirectional consistency, the prompt's tool-count claim, SQL table references against tableAnnotations, and orphaned column annotations.

Confidence Score: 3/5

Safe to merge for today's prompt, but the registry is missing 7 dispatcher-known backend tools, so the guard it adds won't fire correctly if those tools are ever introduced into the formal prompt block.

The core purpose of the PR — catching prompt↔executor drift — works for the 17 tools it does enumerate. However, allRegisteredToolNames omits the 7 backend RAG tools that execute(_:) already dispatches via executeBackendTool. The documented invariant ("every tool the executor knows how to dispatch") is incorrect today; and if one of those tools is later added to desktopChat's <tools> block, the test that's supposed to accept it will reject it instead, inverting the intended behavior of the regression guard.

desktop/Desktop/Sources/Providers/ChatToolExecutor.swift — the allRegisteredToolNames union needs a backendRagToolNames set added to include the 7 omitted dispatch cases.

Important Files Changed

Filename	Overview
desktop/Desktop/Sources/Providers/ChatToolExecutor.swift	Adds three static Set registries (piMonoChatToolNames, onboardingOnlyToolNames, allRegisteredToolNames), but allRegisteredToolNames omits the 7 backend RAG tools (get_conversations, search_conversations, get_memories, search_memories, get_action_items, create_action_item, update_action_item) that execute(_:) already dispatches, making the documented invariant incorrect.
desktop/Desktop/Tests/PromptSchemaConsistencyTests.swift	New test suite with 5 well-structured tests; the extractToolNames regex could match inline param_name: markers inside tool descriptions, creating false failures if the prompt format ever gains parameter-level bold markers in the tools block.

_{Reviews (1): Last reviewed commit: "test(desktop): lock chat prompt ↔ tool-r..." | Re-trigger Greptile}

Review on BasedHardware#7573: - allRegisteredToolNames omitted the 7 backend RAG tools execute(_:) already dispatches via executeBackendTool (get_conversations, search_conversations, get_memories, search_memories, get_action_items, create_action_item, update_action_item). If a prompt ever declared one in its <tools> block, testDesktopChatToolNamesAreAllRegistered would false-fail despite working dispatch. Added a backendRagToolNames set and unioned it in, so the registered set is a true superset of dispatch. - Removed the misleading onboardingOnlyToolNames docstring line implying save_knowledge_graph is a member (it lives only in piMonoChatToolNames). - Anchored extractToolNames to a line-start marker (^[ \t]*\*\*name\*\*:) so parameter docs written with the same bold convention — inline or as `- **query**:` bullets — can't be harvested as tool names and trip the check. All 5 tests still pass against the current prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

eulicesl · 2026-05-31T19:30:53Z

Closing this one — on reflection it's a standalone test/guardrail PR and the project keeps test scaffolding out of upstream. The ChatToolExecutor registries here exist only to feed the consistency test; runtime tool dispatch uses its own switch in execute(_:) and doesn't reference them, so there's no functional code to keep. The actual chat-reliability changes are in #7552, #7555, and #7559. Thanks for the reviews!

github-actions · 2026-05-31T19:31:02Z

Hey @eulicesl 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread desktop/Desktop/Sources/Providers/ChatToolExecutor.swift Outdated

greptile-apps Bot reviewed May 31, 2026

View reviewed changes

Comment thread desktop/Desktop/Sources/Providers/ChatToolExecutor.swift Outdated

Comment thread desktop/Desktop/Sources/Providers/ChatToolExecutor.swift

Comment thread desktop/Desktop/Tests/PromptSchemaConsistencyTests.swift

eulicesl closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(desktop): lock chat prompt ↔ tool-registry & schema consistency#7573

test(desktop): lock chat prompt ↔ tool-registry & schema consistency#7573
eulicesl wants to merge 2 commits into
BasedHardware:mainfrom
eulicesl:feat/desktop-chat-prompt-schema-tests

eulicesl commented May 31, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

greptile-apps Bot commented May 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eulicesl commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eulicesl commented May 31, 2026

Summary

Linked Issue / Bounty

Root Cause (bug fixes only)

How It Works

Change Type

Scope

Security Impact

Testing

Human Verification

Evidence

Docs

Performance, Privacy, and Reliability

Migration / Backward Compatibility

Risks and Mitigations

AI Disclosure

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

greptile-apps Bot commented May 31, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eulicesl commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant