docs: adopt upstream improvements.md log + voice-providers cheat-sheet

dhruva-reddy · dhruva-reddy · commit 6cd7e2e30ccf · 2026-05-01T12:59:35.000-07:00
## ELI5 **Problem.** Two of our customer-fork repos (`gitops-mudflap`, `gitops-amazon3p`) kept their own running notes about engine quirks ("man, this is annoying when X happens"). Those notes never made it back upstream, so every new customer hit the same friction. There was also no convention for *anyone* — human or AI — to leave behind a "this should be better" trail. **What this fix does.** Adopts the customer-log format upstream (severity-ranked, evidence-tagged) and seeds it with 20 entries catalogued from both customer logs. Adds a voice-provider cheat-sheet under `docs/learnings/` so the most common 400-rejection class (`voice.speed` on Cartesia) becomes a one-page lookup. Updates `.gitignore` to stop sweeping AI agent handoff scratch (`.agent/`, `.claude/handoffs/`) into commits via `git add -A`. Adds a CLAUDE.md section telling future contributors how to log new entries. **Outcome you'll notice.** Every fresh customer clone of this template inherits the running log on day one. When you hit something annoying, you append an entry in the same change instead of carrying it as folklore. As later stacks land, rows in the triage table flip from `Open` to `RESOLVED` so the file becomes a living changelog. --- Land all the zero-engine-change cleanups in one small PR so the rest of the stack starts from a clean docs surface. - improvements.md (NEW, repo root): adopt the severity-ranked, evidence- tagged catalog format from the Amazon3p customer log. Seeds 20 entries catalogued from gitops-mudflap and gitops-amazon3p. Triage table rows flip from Open → RESOLVED as later stacks land. - docs/learnings/voice-providers.md (NEW): per-provider voice block cheat-sheet (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/ Neuphonic/SmallestAI). Closes the manual-lookup half of #9. - docs/learnings/README.md: route the new entry from the index. - AGENTS.md: document multi-file push (closes #14) + voice-providers routing row. - CLAUDE.md: add Improvements log section instructing future contributors (humans + AI agents) to append entries when they hit friction. - .gitignore: cover .agent/, .agent/handoffs/, .claude/handoffs/ so git add -A doesn't sweep PII handoff scratch (closes #13). Drop the legacy "requested improvements.md" line since the local-only convention is superseded by upstream's improvements.md. Closes improvements.md #13, #14. Partial #9 (doc cheat-sheet half). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
diff --git a/.gitignore b/.gitignore
@@ -20,8 +20,12 @@ Thumbs.db
 
 tmp/
 
+# Local snapshots written by `npm run push` for `npm run rollback` recovery.
+# Operator-local; not shared.
+.vapi-state.*.snapshots/
+
 # Local agent state
 .claude/
-
-# Local-only audit notes (not part of the upstream repo)
-requested improvements.md
+.agent/
+.agent/handoffs/
+.claude/handoffs/
diff --git a/AGENTS.md b/AGENTS.md
@@ -31,6 +31,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
 | Building outbound calling agents | `docs/learnings/outbound-agents.md` |
 | Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
 | Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
+| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |
 
 ---
 
@@ -50,6 +51,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
 | Pull latest from Vapi               | `npm run pull -- <org>`, `--force`, or `--bootstrap`                              |
 | Pull one known remote resource      | `npm run pull -- <org> --type assistants --id <uuid>`                             |
 | Push only one file                  | `npm run push -- <org> resources/<org>/assistants/my-agent.md`                    |
+| Push multiple specific files        | `npm run push -- <org> <path1> <path2>` (one state-file rewrite at the end)       |
 | Test a call                         | `npm run call -- <org> -a <assistant-name>`                                       |
 
 ---
@@ -744,6 +746,7 @@ npm run pull -- <org> --type squads --id <uuid>    # Pull one known remote resou
 npm run push -- <org>                              # Push all local changes to Vapi
 npm run push -- <org> assistants                   # Push only assistants
 npm run push -- <org> resources/<org>/assistants/my-agent.md  # Push single file
+npm run push -- <org> <path1> <path2>              # Push multiple specific files (one state write)
 npm run apply -- <org>                             # Pull then push (full sync)
 
 # Testing
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -26,6 +26,26 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
    - WebSocket transport → `docs/learnings/websocket.md`
    - Call time limits / graceful ending → `docs/learnings/call-duration.md`
 
+## Improvements log
+
+This repo maintains an upstream-only running log at `improvements.md` (repo
+root). It tracks engine friction, footguns, and improvement ideas surfaced
+during real customer work — both before and after fixes land.
+
+**When you (Claude or human) hit something that makes you go "this should be
+better," append or update an entry in `improvements.md` in the same change.**
+The format is **Problem → Current behavior → Risk → Current mitigation →
+Possible fix → Status**, ordered by severity / blast radius. Cite source
+file paths with line numbers so future readers can verify your claims.
+
+When a fix lands, mark the entry `[RESOLVED YYYY-MM-DD] (#<PR-number>)` at
+the top — don't delete it. The history is the point.
+
+Customer-fork logs (`gitops-mudflap/improvements.md`,
+`gitops-amazon3p/improvements.md`) feed upstream: when an entry there is
+generic enough to apply across customers, surface it here in the same
+revision.
+
 ## Test-Call CLI Notes
 
 When debugging a customer issue with `npm run call -- <org> -s <squad>`:
diff --git a/docs/learnings/README.md b/docs/learnings/README.md
@@ -26,7 +26,7 @@ Each file targets a specific topic so you can load only the context you need.
 | Bulk-dialing from a CSV (Outbound Call Campaigns) | [outbound-campaigns.md](outbound-campaigns.md) |
 | Voicemail detection / VM vs human classification | [voicemail-detection.md](voicemail-detection.md) |
 | Enforcing call time limits / graceful call ending | [call-duration.md](call-duration.md) |
-| Authoring YAML resource files (scalar coercion, frontmatter conventions) | [yaml-conventions.md](yaml-conventions.md) |
+| Voice provider field cheat-sheet (Cartesia vs 11labs vs others) | [voice-providers.md](voice-providers.md) |
 
 ---
 
@@ -44,7 +44,7 @@ Gotchas and silent defaults for each resource type:
 | [structured-outputs.md](structured-outputs.md) | Schema type gotchas, assistant_ids, default models, target modes, KPI patterns |
 | [simulations.md](simulations.md) | Personalities, evaluation comparators, chat-mode gotcha, missing references, full `/eval/simulation/*` API reference |
 | [webhooks.md](webhooks.md) | Default server messages, timeouts, unreachable servers, credential resolution, payload shape |
-| [yaml-conventions.md](yaml-conventions.md) | YAML 1.1 boolean coercion (`off`/`yes`/`no`), whitespace-truthy gotchas, discriminated-union sentinels, deprecated-field footguns, multi-line block scalars, anchors/aliases, frontmatter fence rules |
+| [voice-providers.md](voice-providers.md) | Per-provider voice block layout (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/Neuphonic/SmallestAI) — saves 400s at push time |
 
 ### Troubleshooting Runbooks
 
diff --git a/docs/learnings/voice-providers.md b/docs/learnings/voice-providers.md
@@ -0,0 +1,91 @@
+# Voice Providers — Field Cheat-Sheet
+
+The `voice` block on an assistant or `membersOverrides.voice` on a squad is **provider-specific**. Same conceptual field (e.g. "speed") lives at different paths depending on the provider. The Vapi platform rejects misplaced fields with a generic `property X should not exist` 400 — it does not point to the correct path. This page is the lookup table.
+
+> **When a 400 says "property X should not exist":** check this page for the provider's field layout before re-pushing. The engine has no schema awareness and will accept whatever you write, then surface the error only after the push reaches the API.
+
+---
+
+## Quick lookup
+
+| Field | 11labs | Cartesia (sonic-3) | OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI |
+|-------|--------|---------------------|------------------------------------------------------------------|
+| Speech rate | `voice.speed` (0.7–1.2) | `voice.generationConfig.speed` (0.6–1.5) | `voice.speed` |
+| Stability / consistency | `voice.stability` (0.0–1.0) | — (not exposed) | — |
+| Voice similarity | `voice.similarityBoost` (0.0–1.0) | — | — |
+| SSML parsing | `voice.enableSsmlParsing: true` | (parsed natively, no flag) | varies — see provider docs |
+| Pronunciation dictionary | — | `voice.pronunciationDictId` | — |
+| Volume control | — | `voice.generationConfig.volume` (0.5–2.0) | — |
+| Emotion / accent (experimental) | — | `voice.experimentalControls.emotion`, `voice.experimentalControls.speed` (-1 to 1, older API) | — |
+
+---
+
+## 11labs
+
+```yaml
+voice:
+  provider: 11labs
+  voiceId: <uuid-or-name>
+  model: eleven_turbo_v2          # or eleven_flash_v2_5
+  speed: 1.05                      # 0.7–1.2
+  stability: 0.6                   # 0.0–1.0; higher = less expressive variation
+  similarityBoost: 0.75            # 0.0–1.0; higher = closer to source voice
+  enableSsmlParsing: true          # required for `<break>`, `<flush/>`, etc.
+```
+
+Common pitfalls:
+- `voice.generationConfig.*` — **does not exist** for 11labs. That's a Cartesia path. Push will 400.
+- Forgetting `enableSsmlParsing: true` — SSML tags will be spoken literally.
+
+---
+
+## Cartesia (sonic-3)
+
+```yaml
+voice:
+  provider: cartesia
+  model: sonic-3
+  voiceId: <uuid>
+  pronunciationDictId: pdict_<id>  # optional but sticky — see warning below
+  generationConfig:
+    speed: 1.1                     # 0.6–1.5
+    volume: 1.0                    # 0.5–2.0
+  experimentalControls:
+    speed: 0.0                     # -1 to 1 (older API path)
+    emotion: ["positivity:high"]
+```
+
+**Forbidden at top level for Cartesia (will 400):**
+- `voice.speed` — use `voice.generationConfig.speed` instead.
+- `voice.enableSsmlParsing` — Cartesia parses SSML (`<break time='0.4s'/>`, `<speed ratio='0.9'/>`) natively from the text stream; no opt-in flag exists.
+- `voice.stability`, `voice.similarityBoost` — those are 11labs fields.
+
+**Pronunciation dictionary warning:** changing the `voiceId` in the Vapi dashboard's voice picker silently drops `pronunciationDictId` from the resource. If you swap the Cartesia voice via the dashboard, re-attach the dictionary on the next pull or it will be gone. Treat `(voiceId, pronunciationDictId)` as one atomic unit during edits.
+
+---
+
+## OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI
+
+```yaml
+voice:
+  provider: openai           # or azure, rime, lmnt, minimax, neuphonic, smallestai
+  voiceId: <provider-voice-id>
+  model: <provider-model>    # e.g. tts-1-hd for openai
+  speed: 1.0                 # top-level for these providers
+```
+
+These providers expose `speed` at the top of the `voice` block. Refer to the [Vapi voice provider docs](https://docs.vapi.ai/providers/voice) for additional provider-specific fields (instructions, language hints, etc.).
+
+---
+
+## Switching providers
+
+When migrating an assistant or squad member from Cartesia to 11labs (or vice versa), the field layout flips. If you carry over `generationConfig` from a Cartesia config to an 11labs voice, the next push will 400. Always rewrite the voice block from the target provider's template; do not patch in place.
+
+If a customer changes the provider on the dashboard and your local YAML still has the old nesting, `pull` will overwrite it cleanly — but a subsequent `push` from a stale branch will 400. Pull first, then edit.
+
+---
+
+## Adding a new provider
+
+If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.
diff --git a/improvements.md b/improvements.md