Skip to content

Commit 6cd7e2e

Browse files
committed
docs: adopt upstream improvements.md log + voice-providers cheat-sheet
## ELI5 **Problem.** Two of our customer-fork repos (`gitops-mudflap`, `gitops-amazon3p`) kept their own running notes about engine quirks ("man, this is annoying when X happens"). Those notes never made it back upstream, so every new customer hit the same friction. There was also no convention for *anyone* — human or AI — to leave behind a "this should be better" trail. **What this fix does.** Adopts the customer-log format upstream (severity-ranked, evidence-tagged) and seeds it with 20 entries catalogued from both customer logs. Adds a voice-provider cheat-sheet under `docs/learnings/` so the most common 400-rejection class (`voice.speed` on Cartesia) becomes a one-page lookup. Updates `.gitignore` to stop sweeping AI agent handoff scratch (`.agent/`, `.claude/handoffs/`) into commits via `git add -A`. Adds a CLAUDE.md section telling future contributors how to log new entries. **Outcome you'll notice.** Every fresh customer clone of this template inherits the running log on day one. When you hit something annoying, you append an entry in the same change instead of carrying it as folklore. As later stacks land, rows in the triage table flip from `Open` to `RESOLVED` so the file becomes a living changelog. --- Land all the zero-engine-change cleanups in one small PR so the rest of the stack starts from a clean docs surface. - improvements.md (NEW, repo root): adopt the severity-ranked, evidence- tagged catalog format from the Amazon3p customer log. Seeds 20 entries catalogued from gitops-mudflap and gitops-amazon3p. Triage table rows flip from Open → RESOLVED as later stacks land. - docs/learnings/voice-providers.md (NEW): per-provider voice block cheat-sheet (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/ Neuphonic/SmallestAI). Closes the manual-lookup half of #9. - docs/learnings/README.md: route the new entry from the index. - AGENTS.md: document multi-file push (closes #14) + voice-providers routing row. - CLAUDE.md: add Improvements log section instructing future contributors (humans + AI agents) to append entries when they hit friction. - .gitignore: cover .agent/, .agent/handoffs/, .claude/handoffs/ so git add -A doesn't sweep PII handoff scratch (closes #13). Drop the legacy "requested improvements.md" line since the local-only convention is superseded by upstream's improvements.md. Closes improvements.md #13, #14. Partial #9 (doc cheat-sheet half). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
1 parent 94e4df4 commit 6cd7e2e

6 files changed

Lines changed: 1079 additions & 5 deletions

File tree

.gitignore

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,12 @@ Thumbs.db
2020

2121
tmp/
2222

23+
# Local snapshots written by `npm run push` for `npm run rollback` recovery.
24+
# Operator-local; not shared.
25+
.vapi-state.*.snapshots/
26+
2327
# Local agent state
2428
.claude/
25-
26-
# Local-only audit notes (not part of the upstream repo)
27-
requested improvements.md
29+
.agent/
30+
.agent/handoffs/
31+
.claude/handoffs/

AGENTS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
3131
| Building outbound calling agents | `docs/learnings/outbound-agents.md` |
3232
| Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
3333
| Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
34+
| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |
3435

3536
---
3637

@@ -50,6 +51,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
5051
| Pull latest from Vapi | `npm run pull -- <org>`, `--force`, or `--bootstrap` |
5152
| Pull one known remote resource | `npm run pull -- <org> --type assistants --id <uuid>` |
5253
| Push only one file | `npm run push -- <org> resources/<org>/assistants/my-agent.md` |
54+
| Push multiple specific files | `npm run push -- <org> <path1> <path2>` (one state-file rewrite at the end) |
5355
| Test a call | `npm run call -- <org> -a <assistant-name>` |
5456

5557
---
@@ -744,6 +746,7 @@ npm run pull -- <org> --type squads --id <uuid> # Pull one known remote resou
744746
npm run push -- <org> # Push all local changes to Vapi
745747
npm run push -- <org> assistants # Push only assistants
746748
npm run push -- <org> resources/<org>/assistants/my-agent.md # Push single file
749+
npm run push -- <org> <path1> <path2> # Push multiple specific files (one state write)
747750
npm run apply -- <org> # Pull then push (full sync)
748751
749752
# Testing

CLAUDE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,26 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
2626
- WebSocket transport → `docs/learnings/websocket.md`
2727
- Call time limits / graceful ending → `docs/learnings/call-duration.md`
2828

29+
## Improvements log
30+
31+
This repo maintains an upstream-only running log at `improvements.md` (repo
32+
root). It tracks engine friction, footguns, and improvement ideas surfaced
33+
during real customer work — both before and after fixes land.
34+
35+
**When you (Claude or human) hit something that makes you go "this should be
36+
better," append or update an entry in `improvements.md` in the same change.**
37+
The format is **Problem → Current behavior → Risk → Current mitigation →
38+
Possible fix → Status**, ordered by severity / blast radius. Cite source
39+
file paths with line numbers so future readers can verify your claims.
40+
41+
When a fix lands, mark the entry `[RESOLVED YYYY-MM-DD] (#<PR-number>)` at
42+
the top — don't delete it. The history is the point.
43+
44+
Customer-fork logs (`gitops-mudflap/improvements.md`,
45+
`gitops-amazon3p/improvements.md`) feed upstream: when an entry there is
46+
generic enough to apply across customers, surface it here in the same
47+
revision.
48+
2949
## Test-Call CLI Notes
3050

3151
When debugging a customer issue with `npm run call -- <org> -s <squad>`:

docs/learnings/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Each file targets a specific topic so you can load only the context you need.
2626
| Bulk-dialing from a CSV (Outbound Call Campaigns) | [outbound-campaigns.md](outbound-campaigns.md) |
2727
| Voicemail detection / VM vs human classification | [voicemail-detection.md](voicemail-detection.md) |
2828
| Enforcing call time limits / graceful call ending | [call-duration.md](call-duration.md) |
29-
| Authoring YAML resource files (scalar coercion, frontmatter conventions) | [yaml-conventions.md](yaml-conventions.md) |
29+
| Voice provider field cheat-sheet (Cartesia vs 11labs vs others) | [voice-providers.md](voice-providers.md) |
3030

3131
---
3232

@@ -44,7 +44,7 @@ Gotchas and silent defaults for each resource type:
4444
| [structured-outputs.md](structured-outputs.md) | Schema type gotchas, assistant_ids, default models, target modes, KPI patterns |
4545
| [simulations.md](simulations.md) | Personalities, evaluation comparators, chat-mode gotcha, missing references, full `/eval/simulation/*` API reference |
4646
| [webhooks.md](webhooks.md) | Default server messages, timeouts, unreachable servers, credential resolution, payload shape |
47-
| [yaml-conventions.md](yaml-conventions.md) | YAML 1.1 boolean coercion (`off`/`yes`/`no`), whitespace-truthy gotchas, discriminated-union sentinels, deprecated-field footguns, multi-line block scalars, anchors/aliases, frontmatter fence rules |
47+
| [voice-providers.md](voice-providers.md) | Per-provider voice block layout (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/Neuphonic/SmallestAI) — saves 400s at push time |
4848

4949
### Troubleshooting Runbooks
5050

docs/learnings/voice-providers.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Voice Providers — Field Cheat-Sheet
2+
3+
The `voice` block on an assistant or `membersOverrides.voice` on a squad is **provider-specific**. Same conceptual field (e.g. "speed") lives at different paths depending on the provider. The Vapi platform rejects misplaced fields with a generic `property X should not exist` 400 — it does not point to the correct path. This page is the lookup table.
4+
5+
> **When a 400 says "property X should not exist":** check this page for the provider's field layout before re-pushing. The engine has no schema awareness and will accept whatever you write, then surface the error only after the push reaches the API.
6+
7+
---
8+
9+
## Quick lookup
10+
11+
| Field | 11labs | Cartesia (sonic-3) | OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI |
12+
|-------|--------|---------------------|------------------------------------------------------------------|
13+
| Speech rate | `voice.speed` (0.7–1.2) | `voice.generationConfig.speed` (0.6–1.5) | `voice.speed` |
14+
| Stability / consistency | `voice.stability` (0.0–1.0) | — (not exposed) ||
15+
| Voice similarity | `voice.similarityBoost` (0.0–1.0) |||
16+
| SSML parsing | `voice.enableSsmlParsing: true` | (parsed natively, no flag) | varies — see provider docs |
17+
| Pronunciation dictionary || `voice.pronunciationDictId` ||
18+
| Volume control || `voice.generationConfig.volume` (0.5–2.0) ||
19+
| Emotion / accent (experimental) || `voice.experimentalControls.emotion`, `voice.experimentalControls.speed` (-1 to 1, older API) ||
20+
21+
---
22+
23+
## 11labs
24+
25+
```yaml
26+
voice:
27+
provider: 11labs
28+
voiceId: <uuid-or-name>
29+
model: eleven_turbo_v2 # or eleven_flash_v2_5
30+
speed: 1.05 # 0.7–1.2
31+
stability: 0.6 # 0.0–1.0; higher = less expressive variation
32+
similarityBoost: 0.75 # 0.0–1.0; higher = closer to source voice
33+
enableSsmlParsing: true # required for `<break>`, `<flush/>`, etc.
34+
```
35+
36+
Common pitfalls:
37+
- `voice.generationConfig.*` — **does not exist** for 11labs. That's a Cartesia path. Push will 400.
38+
- Forgetting `enableSsmlParsing: true` — SSML tags will be spoken literally.
39+
40+
---
41+
42+
## Cartesia (sonic-3)
43+
44+
```yaml
45+
voice:
46+
provider: cartesia
47+
model: sonic-3
48+
voiceId: <uuid>
49+
pronunciationDictId: pdict_<id> # optional but sticky — see warning below
50+
generationConfig:
51+
speed: 1.1 # 0.6–1.5
52+
volume: 1.0 # 0.5–2.0
53+
experimentalControls:
54+
speed: 0.0 # -1 to 1 (older API path)
55+
emotion: ["positivity:high"]
56+
```
57+
58+
**Forbidden at top level for Cartesia (will 400):**
59+
- `voice.speed` — use `voice.generationConfig.speed` instead.
60+
- `voice.enableSsmlParsing` — Cartesia parses SSML (`<break time='0.4s'/>`, `<speed ratio='0.9'/>`) natively from the text stream; no opt-in flag exists.
61+
- `voice.stability`, `voice.similarityBoost` — those are 11labs fields.
62+
63+
**Pronunciation dictionary warning:** changing the `voiceId` in the Vapi dashboard's voice picker silently drops `pronunciationDictId` from the resource. If you swap the Cartesia voice via the dashboard, re-attach the dictionary on the next pull or it will be gone. Treat `(voiceId, pronunciationDictId)` as one atomic unit during edits.
64+
65+
---
66+
67+
## OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI
68+
69+
```yaml
70+
voice:
71+
provider: openai # or azure, rime, lmnt, minimax, neuphonic, smallestai
72+
voiceId: <provider-voice-id>
73+
model: <provider-model> # e.g. tts-1-hd for openai
74+
speed: 1.0 # top-level for these providers
75+
```
76+
77+
These providers expose `speed` at the top of the `voice` block. Refer to the [Vapi voice provider docs](https://docs.vapi.ai/providers/voice) for additional provider-specific fields (instructions, language hints, etc.).
78+
79+
---
80+
81+
## Switching providers
82+
83+
When migrating an assistant or squad member from Cartesia to 11labs (or vice versa), the field layout flips. If you carry over `generationConfig` from a Cartesia config to an 11labs voice, the next push will 400. Always rewrite the voice block from the target provider's template; do not patch in place.
84+
85+
If a customer changes the provider on the dashboard and your local YAML still has the old nesting, `pull` will overwrite it cleanly — but a subsequent `push` from a stale branch will 400. Pull first, then edit.
86+
87+
---
88+
89+
## Adding a new provider
90+
91+
If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.

0 commit comments

Comments
 (0)