fishaudio · Kilerd · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/.mintlify/skills/fish-audio-api/SKILL.md b/.mintlify/skills/fish-audio-api/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: fish-audio-api
-description: Write direct HTTP / WebSocket calls to the Fish Audio platform (TTS, ASR, voice models, wallet, real-time TTS streaming) without depending on the Python or JavaScript SDK. Use when the user asks to call Fish Audio from curl, a language without an official SDK, an edge/runtime environment that cannot install the SDK, or when they explicitly want raw REST / WebSocket code. Covers authentication, endpoint URLs, required headers, request / response schemas, MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue, and the WebSocket streaming protocol.
+description: Write direct HTTP / WebSocket calls to the Fish Audio platform (TTS, ASR, voice design, voice models, wallet, real-time TTS streaming) without depending on the Python or JavaScript SDK. Use when the user asks to call Fish Audio from curl, a language without an official SDK, an edge/runtime environment that cannot install the SDK, or when they explicitly want raw REST / WebSocket code. Covers authentication, endpoint URLs, required headers, request / response schemas, MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue, voice-design candidate generation, and the WebSocket streaming protocol.
 ---
 
 # Fish Audio Raw API Skill
@@ -27,6 +27,7 @@ This file condenses those into rules an agent can apply directly.
 | --- | --- | --- |
 | POST | `/v1/tts` | Text-to-Speech (streams audio bytes) |
 | POST | `/v1/asr` | Speech-to-Text (returns JSON transcript) |
+| POST | `/v1/voice-design` | Voice Design (returns generated voice candidates) |
 | GET | `/model` | List voice models |
 | POST | `/model` | Create voice model (voice cloning) |
 | GET | `/model/{id}` | Get voice model metadata |
@@ -239,6 +240,77 @@ r.raise_for_status()
 print(r.json()["text"])
 ```
 
+## Voice Design — `POST /v1/voice-design`
+
+Required headers:
+
+- `Authorization: Bearer <FISH_API_KEY>`
+- `Content-Type: application/json`
+- `model: voice-design-1` (required; currently the only public Voice Design model)
+
+Response: JSON `{ candidates: VoiceDesignCandidate[] }`. Each candidate includes `audio_base64`; decode it to write the generated audio bytes to a file. The current candidate audio payload is WAV bytes encoded as base64.
+
+### Request body fields (VoiceDesignRequest)
+
+| Field                     | Type           | Default      | Notes                                                                       |
+| ------------------------- | -------------- | ------------ | --------------------------------------------------------------------------- |
+| `instruction`             | string         | — (required) | Voice design prompt. 1 to 2000 characters.                                  |
+| `reference_text`          | string \| null | null         | Optional preview text to read in the generated voice. Up to 300 characters. |
+| `language`                | string \| null | null         | Optional language hint such as `en`, `zh`, or `ja`.                         |
+| `n`                       | int            | 2            | Number of candidates. Range: 1 to 4.                                        |
+| `speed`                   | number         | 1.0          | Speaking speed multiplier. Must be greater than 0 and at most 3.             |
+| `num_step`                | int            | 32           | Diffusion steps. Range: 1 to 128.                                           |
+| `guidance_scale`          | number         | 2.0          | Prompt guidance. Must be at least 0.                                        |
+| `instruct_guidance_scale` | number         | 0.0          | Instruction guidance. Must be at least 0.                                   |
+| `seed`                    | int \| null    | null         | Optional deterministic seed for candidate generation.                       |
+
+Do **not** send MessagePack, multipart form data, inline reference audio, or service-internal fields such as `features`, `features_json_file`, or `include_audio_base64`.
+
+### curl
+
+```bash
+curl --request POST https://api.fish.audio/v1/voice-design \
+  --header "Authorization: Bearer $FISH_API_KEY" \
+  --header "Content-Type: application/json" \
+  --header "model: voice-design-1" \
+  --data '{
+    "instruction": "Warm, confident studio narrator with a natural tone",
+    "reference_text": "Welcome to Fish Audio.",
+    "language": "en",
+    "n": 2
+  }' | jq -r '.candidates[0].audio_base64' | base64 --decode > voice.wav
+```
+
+### Python
+
+```python
+import base64
+import os
+import httpx
+
+r = httpx.post(
+    "https://api.fish.audio/v1/voice-design",
+    headers={
+        "Authorization": f"Bearer {os.environ['FISH_API_KEY']}",
+        "Content-Type": "application/json",
+        "model": "voice-design-1",
+    },
+    json={
+        "instruction": "Warm, confident studio narrator with a natural tone",
+        "reference_text": "Welcome to Fish Audio.",
+        "language": "en",
+        "n": 2,
+    },
+    timeout=120,
+)
+r.raise_for_status()
+candidate = r.json()["candidates"][0]
+with open("voice.wav", "wb") as f:
+    f.write(base64.b64decode(candidate["audio_base64"]))
+```
+
+Billing: one successful generation request is charged once, even when it returns multiple candidates. Authentication, validation, balance, concurrency, and service errors are not billed.
+
 ## Voice models — `/model`
 
 ### List: `GET /model`

diff --git a/api-reference/endpoint/openapi-v1/voice-design.mdx b/api-reference/endpoint/openapi-v1/voice-design.mdx
@@ -0,0 +1,49 @@
+---
+openapi: post /v1/voice-design
+title: "Voice Design"
+description: "Generate candidate voices from a prompt"
+icon: "wand-magic-sparkles"
+iconType: "solid"
+---
+
+<Warning>
+This endpoint only accepts `application/json`.
+
+You must include the `model: voice-design-1` header. Extra request fields are rejected.
+
+</Warning>
+
+<Note>
+  A successful request returns generated voice candidates with `audio_base64`
+  audio payloads. Decode the base64 value to write the candidate audio to a
+  file.
+</Note>
+
+## Example
+
+```bash
+curl --request POST https://api.fish.audio/v1/voice-design \
+  --header "Authorization: Bearer $FISH_API_KEY" \
+  --header "Content-Type: application/json" \
+  --header "model: voice-design-1" \
+  --data '{
+    "instruction": "Warm, confident studio narrator with a natural tone",
+    "reference_text": "Welcome to Fish Audio.",
+    "language": "en",
+    "n": 2,
+    "speed": 1,
+    "num_step": 32,
+    "guidance_scale": 2,
+    "instruct_guidance_scale": 0,
+    "seed": 42
+  }'
+```
+
+## Usage notes
+
+- `instruction` is required and must be 1 to 2000 characters.
+- `reference_text` is optional preview text and can be up to 300 characters.
+- `n` controls how many candidates are returned. The supported range is 1 to 4.
+- `seed` is optional and can help reproduce candidate generation.
+- The endpoint is stateless: it does not create batches, samples, voice models, or presigned URLs.
+- Billing happens once per successful generation request, not once per candidate.
diff --git a/api-reference/introduction.mdx b/api-reference/introduction.mdx
@@ -30,6 +30,10 @@ Use our [/model endpoint](/api-reference/endpoint/model/create-model) to create
 
 Use our [/v1/tts endpoint](/api-reference/endpoint/openapi-v1/text-to-speech) to generate speech.
 
+## Design a Voice
+
+Use our [/v1/voice-design endpoint](/api-reference/endpoint/openapi-v1/voice-design) to generate candidate voices from a prompt.
+
 ## Real-time Streaming
 
 Use our [Python SDK](/features/realtime-streaming) or [JavaScript SDK](/features/realtime-streaming) for real-time audio streaming with WebSocket.