[feat] Add {{mustache}} rendering (prompt unification WP-B3)#4393
[feat] Add {{mustache}} rendering (prompt unification WP-B3)#4393junaway wants to merge 18 commits into
{{mustache}} rendering (prompt unification WP-B3)#4393Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces comprehensive RFC and implementation documentation for the prompt-runtime unification effort, including the overarching problem space (RFC), the WP-B3 mustache rendering workstream specification, research foundation, multi-phase implementation plan, QA strategy, and status tracking, plus updates to the main documentation to cross-link and clarify the rollout behavior. ChangesPrompt Runtime Unification and Mustache Rendering
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds the WP-B3 design workspace and updates the prompt-runtime-unification documentation set to describe introducing a mustache template mode (nested-only dotted lookup + brace escaping) as the default for newly created apps/prompt configs, while preserving legacy curly behavior for existing configs.
Changes:
- Added a new WP-B3 documentation workspace (RFC, research notes, plan, QA plan, status tracking).
- Added/updated the parent prompt-runtime-unification RFC to include
mustachesemantics and rollout sequencing. - Updated the prompt-runtime-unification README index to reference the new WP-B3 workspace and clarify new-app default behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.md | Entry point for the WP-B3 workspace and its scope/files. |
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md | Tracks current decisions, open questions, next steps, and validation commands. |
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md | Defines proposed mustache subset semantics, escaping, compatibility, and dependency evaluation plan. |
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md | Maps current runtime touchpoints and evaluates Mustache library options. |
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md | Phased implementation plan for adding mustache support and defaults. |
| docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/qa.md | Test strategy covering renderer behavior, compatibility, and call-site adoption. |
| docs/design/prompt-runtime-unification/rfc.md | Parent RFC describing the broader runtime/frontend unification effort and template-format semantics. |
| docs/design/prompt-runtime-unification/README.md | Index updates to reference WP-B3 and refine wording around defaults/curly visibility. |
Comments suppressed due to low confidence (6)
docs/design/prompt-runtime-unification/rfc.md:46
- The referenced implementation paths here look outdated for this repo checkout:
PromptTemplatelives insdks/python/agenta/sdk/utils/types.py, and the judge helper_format_with_templateis insdks/python/agenta/sdk/engines/running/handlers.py(there is noapi/sdk/agenta/...path). Updating these links would keep the RFC actionable for readers.
* Config lives under `parameters.prompt`: `messages`, `template_format`, `input_keys`, and `llm_config`.
* Rendering goes through `PromptTemplate.format(**inputs)` in `api/sdk/agenta/sdk/types.py`, which supports `curly`, `fstring`, and `jinja2`.
* Completion exposes top-level `inputs` keys as variables. Chat exposes the same keys except `messages`, which is appended as typed messages after rendering (not exposed as a template variable).
**LLM-as-a-judge** is close in behavior but uses a separate runtime path.
* Config is a flat evaluator shape: `prompt_template`, `model`, `response_type`, `json_schema`, `correct_answer_key`, `threshold`, `version`, optional `template_format`.
* Renders messages through `_format_with_template` in `api/sdk/agenta/sdk/workflows/handlers.py`. It supports the same three formats as `PromptTemplate.format`; the default depends on evaluator `version` — `fstring` for v2, `curly` for v3+.
docs/design/prompt-runtime-unification/rfc.md:58
- This “Current State” bullet about rendering behavior is no longer accurate in the current code: the judge path renders
json_schemaviarender_json_like(...)and_format_with_templatedoesn’t ‘return original content with a warning’ for supported formats. Please update these bullets to match the current runtime behavior so the RFC doesn’t contradict the implementation.
* **Provider/model resolution.** Chat and completion use workflow provider settings; the judge manually extracts a fixed provider-key set and therefore cannot reliably use custom or self-hosted models configured in the UI.
* **Rendering.** Each service has different rendering behavior:
* `PromptTemplate.format` raises on Jinja errors; `_format_with_template` returns the original content with a warning.
* Chat and completion recursively render `llm_config.response_format`. The judge builds `response_format` from `response_type` / `json_schema` and does not render variables inside `json_schema`.
docs/design/prompt-runtime-unification/rfc.md:93
- Markdown formatting issue: there are extra
**characters at the end of this bold sentence, which will render incorrectly. It should likely be a single bold span followed by a period.
The basic rule should be: **native JSON stays native until template rendering****.**
docs/design/prompt-runtime-unification/rfc.md:162
- Several typos/grammar issues in this section reduce clarity: “All services (chat, completion, chat)” repeats chat; “providers settings” should be “provider settings”; and “The should all support …” is missing a subject (likely “They”).
* All services (chat, completion, chat) should resolve providers settings using the same path. As such:
* The should all support custom/self-hosted models configured in the UI
docs/design/prompt-runtime-unification/rfc.md:163
- This bullet has mismatched parentheses and is missing a closing parenthesis after “explicitly set”, which makes it hard to read. Consider rephrasing/splitting the sentence to avoid nested parentheses.
* LLM-as-a-judge must not inject unsupported optional parameters such as `temperature` (the default should be None unless explicitly set (just like we currently do in chat/completion).
docs/design/prompt-runtime-unification/rfc.md:176
- Minor punctuation/spacing issues: there’s an extra space before the semicolon in “acceptable ;” and the sentence ends with “welcome..”. Cleaning this up will improve readability.
* The variables panel (right side of the playground) shows:
* variables discovered from the prompt
* variables available from the current testcase or trace context, labeled with source and type
* The prompt editor provides autocomplete for available variables. A degraded solution with only top-level autocomplete is definitely acceptable ; a full solution with full nested autocompletion is surely welcome..
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
docs/design/prompt-runtime-unification/rfc.md (2)
199-290: ⚡ Quick winClarify status of "JP's notes" sections.
The document contains four "JP's notes" sections (at lines 199-215, 225-232, 245-253, 276-290) that appear to be implementation details or developer notes. These inline notes may not belong in the formal RFC, or should be clearly marked as non-normative implementation guidance to distinguish them from requirements.
Consider either:
- Moving these notes to a separate implementation guide or the WP-B3 planning documents
- Clearly marking them as "Non-normative implementation notes" if they should remain
- Removing them if they've served their purpose during draft review
144-145: ⚡ Quick winStrengthen caveat about partial Mustache implementation.
The RFC mentions that mustache "do not implement the full mustache spec (no sections or partials)" but this critical limitation is embedded in a long sentence. Given that the WP-B3 RFC (lines 31-48) explicitly lists all unsupported features, users might be surprised if they expect standard Mustache behavior.
Consider adding a prominent callout or note immediately after introducing the
mustacheformat to highlight that this is "Mustache-compatible variable substitution" rather than full Mustache. This aligns with the WP-B3 RFC's "Agenta's Mustache-compatible variable substitution mode, not full Mustache" language.docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md (1)
57-58: 💤 Low valueClarify why chevron is "too old" for SDK dependency.
Line 57 states "Do not use
noahmorrison/chevrondirectly. It is too old for a new SDK runtime dependency." While the directive is clear, explaining why it's too old (unmaintained? incompatible? security issues?) would help future maintainers understand the decision.Consider adding a brief explanation, such as:
Do not use `noahmorrison/chevron` directly. The package is unmaintained (last release 2016) and not suitable for a new SDK runtime dependency.docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md (1)
79-114: ⚡ Quick winConsider flagging the dependency decision more prominently.
The evaluation recommends
langchain_core.utils.mustache(line 98), but the dependency note (line 114) is somewhat buried. Since addinglangchain-coreto the SDK is a significant architectural decision, consider:
- Adding a decision box or callout at the top of the "Library Evaluation" section.
- Specifying concrete acceptance criteria for the Phase 2 spike (package size threshold, import time threshold, transitive dependency count).
- Defining a clear fallback plan if langchain-core is rejected (e.g., "implement local tokenizer using the reference patterns from langchain-core source").
This ensures reviewers and implementers understand the dependency is conditional and has an escape hatch.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md (1)
16-27: ⚡ Quick winConsider whether Phase 2 timing is optimal.
Phase 2 (library spike) is scheduled after Phase 1 (resolver implementation). This ordering could lead to rework if the langchain_core evaluation reveals that its tokenizer or renderer has incompatible behavior that affects resolver design.
Two options:
- Move Phase 2 before Phase 1 - Evaluate the library first, then design resolvers around the chosen tokenizer.
- Keep current order - Design resolvers independently, then adapt the library integration to match Agenta's resolver contract (this is what the research doc recommends at line 98-106).
The current order is defensible if the resolver semantics are non-negotiable product requirements (which they appear to be). If so, consider adding a note in Phase 2 explicitly stating: "Resolver semantics from Phase 1 are fixed requirements; library integration must adapt to them, not vice versa."
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md (1)
50-57: ⚡ Quick winConsider adding tentative recommendations for open questions.
The open questions are well-identified, but some could benefit from tentative recommendations to guide Phase 2 evaluation:
Line 53 (langchain-core acceptability): Add tentative threshold, e.g., "Proceed if package adds <5MB and <10 transitive deps; otherwise implement local tokenizer."
Line 54 (unsupported constructs): Add tentative direction, e.g., "Preferred: raise explicit
UnsupportedConstructErrorwith helpful message pointing to jinja2 for advanced logic."Line 55 (
{{.}}vs{{$}}): The note already recommends{{$}}and keeping{{.}}invalid - consider promoting this to the Decisions section if it's settled.This would make Phase 2 (library spike) more deterministic without requiring another design review cycle.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e1e69858-cde0-4168-a9c4-8f86f6b690c5
📒 Files selected for processing (8)
docs/design/prompt-runtime-unification/README.mddocs/design/prompt-runtime-unification/rfc.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/qa.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.mddocs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md
Addresses three PR #4393 review findings (WPB3-018/019/020): - chatPrompts.ts: extractVariablesFromText missed mustache/curly/jinja2 tags with inner whitespace ({{ name }}). Mustache treats {{ name }} and {{name}} as equivalent, so the {{ }} patterns now allow optional spaces. - TokenPlugin.tsx: the default-branch comment overstated coverage by claiming an "fstring fallback"; the {{ }} regexes do not match fstring's {...} placeholders. Comment corrected to state reality. - types.py: PromptTemplate.template_format defaults to `curly`, but the field description called mustache the default. Reworded so the model default (curly, legacy compat) is distinct from the mustache default that app-creation flows/interfaces set explicitly. Tests: whitespace token-extraction cases added to chatPromptsMustache.test.ts (via the public extractPromptTemplateContext). entity-ui vitest 13 passed; @agenta/shared + @agenta/ui types:check clean; entity-ui lint clean; ruff format + check clean on types.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Railway Preview Environment
|
Addresses two PR #4393 review findings (WPB3-021/022): - _mustache-templates.mdx: the value-coercion table claimed dict/list render as compact JSON with "no extra whitespace" (e.g. {"x": 1}). The renderer uses json.dumps(ensure_ascii=False) with default separators, so the real output is {"x": 1, "y": 2} (spaces after : and ,). Reworded the row to match; renderer unchanged (curly-matching behavior is intended). - Parent RFC + README: the {{$...}} description still used the superseded "pre-rendered as JSONPath ... then the resulting template is rendered" framing, implying JSONPath results are fed back through the engine. WPB3-010 fixed only the wp-b3 doc set; this extends the same correction to the parent docs. Reworded every occurrence (rfc.md / README.md) to shield -> render -> substitute-last (inert data, never re-parsed); also tightened wp-b3 rfc.md. Verification: render-helper + structured-rendering suites 185 passed (covers all four modes incl. jinja2's shared-JSONPath path); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make explicit in summary.md and web-handoff.md that adding mustache touched
the other renderers via a shared {{$...}} JSONPath helper:
- curly: functionally equivalent (output unchanged; now the reference behavior).
- jinja2: refactored onto the shared helper, behavior preserved.
- fstring: untouched.
- error-contract change spans all formats but is only newly observable for
mustache/jinja2: the "Unreplaced variables in <format> template" message now
interpolates the real format instead of the hardcoded "curly". curly wording
is identical to before; fstring never raises this error so the branch is
dormant for it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // Default: {{ }} variable tokens only. Covers "curly" and "mustache" — | ||
| // mustache shares curly's {{name}} delimiters for plain variables, so it | ||
| // tokenizes through this path. (fstring also falls through to here, but its | ||
| // {...} single-brace placeholders are NOT matched by these {{ }} regexes.) | ||
| const full = /\{\{[^{}]*\}\}/ | ||
| const input = /\{\{[^{}]*$/ | ||
| const exact = /^\{\{[^{}]*\}\}$/ |
| except Exception as exc: | ||
| raise MustacheTemplateError( | ||
| f"Mustache template error in content: '{template}'. Error: {exc}" | ||
| ) from exc |
Summary
Implements WP-B3 of the prompt runtime unification RFC: adds
mustacheas the fourth prompttemplate_formatand makes it the default for newly created apps, prompt configs, and LLM-as-a-judge evaluators. It builds on the low-level renderer from WP-B1 and the structured renderer from WP-B2.mustacheis real Mustache (via themystaceengine) plus the one Agenta extension every format already carries: tags that start with{{$are resolved as JSONPath against the render context. Existingcurly,fstring, andjinja2prompts are untouched — old apps keep their declared format, and only new creation paths writemustache.This is primarily a backend/SDK package. The frontend changes are the minimal type-and-picker surface needed to load, preserve, and select
mustache; the larger playground/native-JSON work stays in the frontend follow-up packages (WP-F2/F3).What's in it
SDK rendering (
sdks/python/agenta/sdk/utils/)templating.py— new_render_mustache(...);TemplateModewidened to include"mustache". Rendering follows the same shield-and-substitute model the other formats use:{{$...}}JSONPath tags are shielded from the engine, the rest is rendered bymystace, and the resolved JSONPath values are substituted into the output last, as inert text — never re-parsed. Partials ({{>...}}), empty placeholders, JSON-Pointer tags, NUL bytes, and engine parse errors fail clearly.types.py—PromptTemplateacceptsmustacheand keeps its publicTemplateFormatErrorsurface for chat/completion callers.rendering.py— type-widening only;render_messages(...)/render_json_like(...)work unchanged once the mode is accepted.Effect on the other renderers (
curly/jinja2/fstring)Adding
mustachewas done by extracting one shared{{$...}}JSONPath helper (_render_with_jsonpath) rather than a mustache-only path, so the other formats are touched to varying degrees:curly— functionally equivalent. Its output is unchanged: it already resolved{{$...}}as inert data, andresolvers.pyhas zero diff. It is now the reference behavior the other two{{ }}formats match, rather than a special case.jinja2— refactored onto the shared helper, behavior preserved._render_jinja2no longer renders directly; it routes through_render_with_jsonpath, so{{$...}}is shielded from Jinja, the engine runs, and resolved values are substituted last as inert data ({% raw %}/{# #}spans are skipped and left to Jinja). Same rendered output, now sharing curly's JSONPath contract.fstring— untouched. Stilltemplate.format(**context); no JSONPath, no change.TemplateFormatErrormessage for unresolved variables now interpolates the actualtemplate_formatinstead of the hardcoded literal"curly"(types.py, both the chat/completion and structured paths). Forcurlythe wording is identical to before ("…in curly template…"). What changed is that, after the JSONPath unification, an unresolved{{$...}}tag can now raiseUnresolvedVariablesErrorfrom mustache and jinja2 too — so the interpolation is what keeps their error message correctly labeled (previously they would have been mislabeled "curly").fstringnever raises this error (it usesstr.format, surfacingKeyError), so for fstring the branch is dormant — the change applies to it in principle but is not currently triggerable.Engine config (
sdks/python/agenta/sdk/engines/running/)interfaces.py— the mustache default lands here for all three workflow types:llm_v0_interface: thetemplate_formatschema scalar widens its enum to["mustache", "curly", "fstring", "jinja2"]and flipsdefaultfromcurlytomustache(this is what new LLM/completion apps inherit, and the dropdown default).chat_v0_interfaceandcompletion_v0_interface: built-in default config flips"template_format"fromcurlytomustache.handlers.py—auto_ai_critique_v0learns a v5 default ofmustache(v2 →fstring, v3/v4 →curlyunchanged). An explicittemplate_formatalways wins over the version default; old judge revisions keep their original behavior.builtin.py— the built-inauto_ai_critiquetemplate bumps to version5/template_format="mustache".Backend resource (
api/oss/src/resources/evaluators/evaluators.py)5and carry an explicit hiddentemplate_format: "mustache"field, so newly created judges render with mustache.Error contract
MustacheTemplateError— unsupported partial, empty placeholder, JSON-Pointer tag, NUL byte, ormystaceparse error.UnresolvedVariablesError— an unresolvedcurlyplaceholder or a failed{{$...}}JSONPath tag, in any ofmustache/jinja2/curly(cross-format parity).TemplateFormatError— the publicPromptTemplatesurface, preserved.Frontend (type + picker surface only)
template_formatunions widened to include"mustache"across the editor token plugin, chat-message components, prompt schema control, and the shared chat-prompt extractor.mustacheshares curly's{{name}}extraction/highlighting path.templateFormatOptions.ts: the picker now offers onlymustacheandjinja2to new prompts.curly/fstringare legacy — hidden from the picker, but a prompt that already stores one keeps it visible and selectable (no silent coercion). Restores hiding that had regressed; pinned by a unit test.resolveTemplateFormat(...)is reused in the workflow molecule somustacheis preserved instead of coerced.Docs
rfc.md— dependency choice (mystacevschevron, withlangchain_coreconsidered and rejected), the three intentional Mustache deviations, the JSONPath compatibility requirement, and the security note (narrow context, never-re-parse)._mustache-templates.mdx— draft how-to (variables, sections,{{$...}}, value coercion, what's unsupported, and escaping literal{{ }}).escape-analysis.md— standalone analysis of the escape question raised in review: no backslash escape exists inmystaceorlangchain_core/chevron; the canonical literal-brace mechanism is the Mustache delimiter swap (and{% raw %}for jinja2). Decision: document now, defer a backslash escape unless real demand appears for literal{{in curly.findings.md,research.md,plan.md,qa.md,status.md,README.md— design workspace and review-findings record.Compatibility
curly/fstring/jinja2behavior is unchanged.mustache. Old judge revisions keep their per-version default.Validation
cd sdks/python && uv run ruff format+uv run ruff check— clean.cd sdks/python && uv run pytest oss/tests/pytest/unit -q— green (mustache coverage across JSONPath resolution, sections, value coercion, partial/empty/JSON-Pointer/NUL rejection, cross-format{{$...}}parity,PromptTemplate, and LLM-as-a-judge).pnpm --filter @agenta/entity-ui test— picker and mustache-extraction regression pins pass.pnpm lint-fix+types:checkon the touched web packages — clean.