From bb0d4d8580012bf22d9a876fbf7a8fcf27d8aaa7 Mon Sep 17 00:00:00 2001 From: Kilerd Chan Date: Wed, 17 Jun 2026 11:45:52 +0800 Subject: [PATCH 1/2] docs: document traceparent propagation --- .mintlify/skills/fish-audio-api/SKILL.md | 8 + api-reference/asyncapi.yml | 4 + .../endpoint/openapi-v1/speech-to-text.mdx | 7 + .../text-to-speech-stream-with-timestamps.mdx | 6 + .../endpoint/openapi-v1/text-to-speech.mdx | 6 + api-reference/endpoint/websocket/tts-live.mdx | 8 + api-reference/introduction.mdx | 4 + api-reference/observability.mdx | 189 ++++++++++++++++++ docs.json | 1 + 9 files changed, 233 insertions(+) create mode 100644 api-reference/observability.mdx diff --git a/.mintlify/skills/fish-audio-api/SKILL.md b/.mintlify/skills/fish-audio-api/SKILL.md index d62c05f..a0d6af9 100644 --- a/.mintlify/skills/fish-audio-api/SKILL.md +++ b/.mintlify/skills/fish-audio-api/SKILL.md @@ -17,6 +17,7 @@ This file condenses those into rules an agent can apply directly. - Base URL: `https://api.fish.audio` - WebSocket base: `wss://api.fish.audio` - Auth (all endpoints): `Authorization: Bearer ` +- Optional distributed tracing for inference APIs: send W3C `traceparent` on `/v1/tts`, `/v1/tts/stream/with-timestamp`, `/v1/asr`, and `/v1/tts/live` to bind the caller's business trace to Fish Audio inference spans. - Get API keys: `https://fish.audio/app/api-keys` - Never hardcode keys — read from an env var like `FISH_API_KEY`. - Errors are JSON `{status, message}` for 401 / 402 / 404, and an array of `{loc, type, msg, ctx, in}` for 422 (validation). @@ -45,6 +46,10 @@ Required headers: - `Content-Type: application/json` **or** `application/msgpack` - `model: s2-pro` (required). Values: `s1`, `s2-pro`. Default to `s2-pro` unless the user explicitly asks otherwise. +Optional tracing header: + +- `traceparent: 00---` for W3C distributed tracing. + Response: streaming audio bytes (`Transfer-Encoding: chunked`) in the format set by `format`. Write to a file or pipe to a player. There is **no JSON wrapper** on success. ### Request body fields (TTSRequest) @@ -197,6 +202,8 @@ await pipeline(Readable.fromWeb(res.body), createWriteStream("out.mp3")); Required headers: `Authorization`. Content type: `multipart/form-data` or `application/msgpack`. +Optional tracing header: `traceparent` in W3C Trace Context format. + Form fields: - `audio` (binary, required) @@ -370,6 +377,7 @@ For low-latency / streaming TTS (e.g. LLM token stream → speech). All frames a - `Authorization: Bearer ` - `model: s2-pro` (or `s1`) — **required** +- `traceparent` — optional W3C Trace Context header for distributed tracing. Browser WebSocket clients cannot set custom headers directly; use a server-side proxy when trace propagation is required. ### Event sequence diff --git a/api-reference/asyncapi.yml b/api-reference/asyncapi.yml index d8c40f3..ad173aa 100644 --- a/api-reference/asyncapi.yml +++ b/api-reference/asyncapi.yml @@ -65,12 +65,16 @@ channels: - s1 - s2-pro description: TTS model to use for this session. Use `s2-pro` for multi-speaker dialogue synthesis. + traceparent: + type: string + description: Optional W3C Trace Context header used to correlate your business trace with Fish Audio inference spans. description: | Real-time TTS streaming channel. Clients send text chunks and receive audio chunks concurrently. ## Connection Headers - `Authorization: Bearer ` - Required for authentication (see security section) - `model: ` - Required to specify which TTS model to use (see bindings) + - `traceparent: ` - Optional W3C Trace Context header for distributed tracing operations: receiveText: diff --git a/api-reference/endpoint/openapi-v1/speech-to-text.mdx b/api-reference/endpoint/openapi-v1/speech-to-text.mdx index 660a31a..1a1a3e9 100644 --- a/api-reference/endpoint/openapi-v1/speech-to-text.mdx +++ b/api-reference/endpoint/openapi-v1/speech-to-text.mdx @@ -9,3 +9,10 @@ iconType: "solid" This BETA endpoint only accepts `application/form-data` and `application/msgpack`. + + + This endpoint accepts the optional W3C `traceparent` header for distributed + tracing across your business service, Fish Audio edge, upstream ASR, and + alignment spans. See [Tracing & Performance + Analysis](/api-reference/observability). + diff --git a/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx b/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx index 96b5b96..c287130 100644 --- a/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx +++ b/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx @@ -11,6 +11,12 @@ iconType: "solid" one JSON payload with a base64-encoded audio chunk. + + This endpoint accepts the optional W3C `traceparent` header for distributed + tracing across your business service and Fish Audio inference. See [Tracing & + Performance Analysis](/api-reference/observability). + + Use this endpoint when you need both progressive audio delivery and text-to-audio alignment data, such as karaoke-style highlighting, word or diff --git a/api-reference/endpoint/openapi-v1/text-to-speech.mdx b/api-reference/endpoint/openapi-v1/text-to-speech.mdx index f581b93..59dc7ce 100644 --- a/api-reference/endpoint/openapi-v1/text-to-speech.mdx +++ b/api-reference/endpoint/openapi-v1/text-to-speech.mdx @@ -14,6 +14,12 @@ For best results, upload reference audio using the [create model](/api-reference To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the [instructions](/features/text-to-speech#direct-api-messagepack). + + This endpoint accepts the optional W3C `traceparent` header for distributed + tracing across your business service and Fish Audio inference. See [Tracing & + Performance Analysis](/api-reference/observability). + + Audio formats supported: - WAV / PCM diff --git a/api-reference/endpoint/websocket/tts-live.mdx b/api-reference/endpoint/websocket/tts-live.mdx index b227946..b762486 100644 --- a/api-reference/endpoint/websocket/tts-live.mdx +++ b/api-reference/endpoint/websocket/tts-live.mdx @@ -10,6 +10,14 @@ iconType: "solid" The WebSocket TTS endpoint enables bidirectional streaming for low-latency text-to-speech generation with MessagePack serialization. + + The WebSocket upgrade request accepts the optional W3C `traceparent` header + for distributed tracing across your business service and Fish Audio inference. + Browser WebSocket clients cannot set custom headers directly; use a trusted + server-side proxy when you need trace propagation from the browser. See + [Tracing & Performance Analysis](/api-reference/observability). + + The `request` payload inside `StartEvent` uses the same parameters as the HTTP [Text to Speech API](/api-reference/endpoint/openapi-v1/text-to-speech). For more detailed field guidance, model-specific behavior, and examples, see that page. In WebSocket mode, `request.text` is typically empty in `StartEvent`, and the text content is sent through subsequent `TextEvent` messages. diff --git a/api-reference/introduction.mdx b/api-reference/introduction.mdx index fa4e842..d2d41cb 100644 --- a/api-reference/introduction.mdx +++ b/api-reference/introduction.mdx @@ -22,6 +22,10 @@ Every error returns a JSON body with a `message` and a `status`. See [Errors](/a Fish Audio publishes a canonical OpenAPI schema at [https://api.fish.audio/openapi.json](https://api.fish.audio/openapi.json). When working with AI coding agents or IDE assistants, mention this schema URL as part of your prompt or project context so the agent can understand Fish Audio's endpoints, request and response models, authentication requirements, and supported parameters directly from the machine-readable API contract. +## Distributed Tracing + +Fish Audio inference APIs accept the W3C `traceparent` header so your business-side trace and Fish Audio's inference-side trace can share the same trace ID. See [Tracing & Performance Analysis](/api-reference/observability) for supported endpoints, examples, and enterprise performance analysis details. + ## Create a Voice Clone Use our [/model endpoint](/api-reference/endpoint/model/create-model) to create a voice clone model. diff --git a/api-reference/observability.mdx b/api-reference/observability.mdx new file mode 100644 index 0000000..a820542 --- /dev/null +++ b/api-reference/observability.mdx @@ -0,0 +1,189 @@ +--- +title: "Tracing & Performance Analysis" +description: "Correlate Fish Audio inference spans with your own distributed traces" +icon: "chart-line" +iconType: "solid" +--- + +Fish Audio inference APIs accept the standard W3C Trace Context `traceparent` +header. Send this header when your application already has an active trace and +you want Fish Audio edge, inference, alignment, and upstream ASR spans to appear +under the same trace ID. + +Supported inference surfaces: + +- `POST /v1/tts` +- `POST /v1/tts/stream/with-timestamp` +- `POST /v1/asr` +- `wss://api.fish.audio/v1/tts/live` + +If `traceparent` is omitted or invalid, Fish Audio starts a new trace for the +request. Tracing does not change authentication, rate limits, billing, request +priority, or generated output. + +## Header Format + +Use the W3C `traceparent` format: + +```text +traceparent: 00--- +``` + +Example: + +```text +traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 +``` + +In this example, `4bf92f3577b34da6a3ce929d0e0e4736` is the trace ID. Fish Audio +continues that trace ID when it calls the inference backend and related services. + + + Prefer letting your tracing SDK inject `traceparent` instead of hand-building + it. For each Fish Audio request, create a child span in your business service + and inject that span context into the outgoing HTTP or WebSocket headers. + + +## Bind Business and Inference Traces + +To bind your business-side trace to Fish Audio's inference-side trace: + +1. Start or continue a trace in your application for the user workflow. +2. Create a child span around the Fish Audio request. +3. Inject that span context into the request headers. +4. Send the request with `traceparent`. +5. Use the same trace ID in your observability tool to inspect both your + business spans and Fish Audio inference spans. + +For multiple Fish Audio calls in one workflow, keep the same trace ID by using +the same parent trace, but let your tracing SDK create a fresh parent/span ID for +each outgoing request. + +## REST Example + +```bash +TRACEPARENT="00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" + +curl --request POST https://api.fish.audio/v1/tts \ + --header "Authorization: Bearer $FISH_API_KEY" \ + --header "Content-Type: application/json" \ + --header "model: s2-pro" \ + --header "traceparent: $TRACEPARENT" \ + --data '{ + "text": "Hello from a traced Fish Audio request.", + "reference_id": "model-id", + "format": "mp3" + }' \ + --output out.mp3 +``` + +## OpenTelemetry Examples + + + + + ```python + import os + import requests + from opentelemetry import trace + from opentelemetry.propagate import inject + + tracer = trace.get_tracer("my-service") + + with tracer.start_as_current_span("fish_audio.tts"): + headers = { + "Authorization": f"Bearer {os.environ['FISH_API_KEY']}", + "Content-Type": "application/json", + "model": "s2-pro", + } + inject(headers) + + response = requests.post( + "https://api.fish.audio/v1/tts", + headers=headers, + json={ + "text": "Hello from a traced request.", + "reference_id": "model-id", + "format": "mp3", + }, + timeout=60, + ) + response.raise_for_status() + ``` + + + + + ```javascript + import { context, propagation, trace } from "@opentelemetry/api"; + + const tracer = trace.getTracer("my-service"); + const span = tracer.startSpan("fish_audio.tts"); + const ctx = trace.setSpan(context.active(), span); + + try { + await context.with(ctx, async () => { + const headers = { + Authorization: `Bearer ${process.env.FISH_API_KEY}`, + "Content-Type": "application/json", + model: "s2-pro", + }; + propagation.inject(ctx, headers); + + const response = await fetch("https://api.fish.audio/v1/tts", { + method: "POST", + headers, + body: JSON.stringify({ + text: "Hello from a traced request.", + reference_id: "model-id", + format: "mp3", + }), + }); + + if (!response.ok) { + throw new Error(`${response.status} ${await response.text()}`); + } + }); + } finally { + span.end(); + } + ``` + + + + +## WebSocket Tracing + +Pass `traceparent` on the WebSocket upgrade request: + +```javascript +import WebSocket from "ws"; + +const ws = new WebSocket("wss://api.fish.audio/v1/tts/live", { + headers: { + Authorization: `Bearer ${process.env.FISH_API_KEY}`, + model: "s2-pro", + traceparent: "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01", + }, +}); +``` + + + Browser WebSocket APIs do not allow custom headers. For browser clients, + inject `traceparent` from a trusted server-side proxy or use REST endpoints + where your frontend can send the header through `fetch`. + + +## Enterprise Performance Analysis + + + Fish Audio supports trace-based performance analysis for enterprise customers + with a signed enterprise agreement. When this support is enabled, share the + W3C trace ID with Fish Audio support so we can correlate your business span + with Fish Audio edge routing, backend inference, reference-audio encoding, + TTFT, response-time, alignment, and upstream ASR spans. + + +Only share the trace ID or `traceparent` value needed for investigation. Do not +place API keys, user identifiers, transcripts, audio URLs, or other sensitive +data in trace IDs or span names. diff --git a/docs.json b/docs.json index d9b086d..dac004f 100644 --- a/docs.json +++ b/docs.json @@ -162,6 +162,7 @@ "group": "API Reference", "pages": [ "api-reference/introduction", + "api-reference/observability", "api-reference/errors" ] }, From 9d1b1fb3221b99f276e58f08374762c7cd9e4359 Mon Sep 17 00:00:00 2001 From: Kilerd Chan Date: Wed, 17 Jun 2026 13:06:46 +0800 Subject: [PATCH 2/2] docs: remove endpoint tracing callouts --- .mintlify/skills/fish-audio-api/SKILL.md | 9 +-------- api-reference/asyncapi.yml | 4 ---- api-reference/endpoint/openapi-v1/speech-to-text.mdx | 7 ------- .../openapi-v1/text-to-speech-stream-with-timestamps.mdx | 6 ------ api-reference/endpoint/openapi-v1/text-to-speech.mdx | 6 ------ api-reference/endpoint/websocket/tts-live.mdx | 8 -------- 6 files changed, 1 insertion(+), 39 deletions(-) diff --git a/.mintlify/skills/fish-audio-api/SKILL.md b/.mintlify/skills/fish-audio-api/SKILL.md index a0d6af9..dd05985 100644 --- a/.mintlify/skills/fish-audio-api/SKILL.md +++ b/.mintlify/skills/fish-audio-api/SKILL.md @@ -17,7 +17,7 @@ This file condenses those into rules an agent can apply directly. - Base URL: `https://api.fish.audio` - WebSocket base: `wss://api.fish.audio` - Auth (all endpoints): `Authorization: Bearer ` -- Optional distributed tracing for inference APIs: send W3C `traceparent` on `/v1/tts`, `/v1/tts/stream/with-timestamp`, `/v1/asr`, and `/v1/tts/live` to bind the caller's business trace to Fish Audio inference spans. +- Optional distributed tracing for inference APIs: see `https://docs.fish.audio/api-reference/observability`. - Get API keys: `https://fish.audio/app/api-keys` - Never hardcode keys — read from an env var like `FISH_API_KEY`. - Errors are JSON `{status, message}` for 401 / 402 / 404, and an array of `{loc, type, msg, ctx, in}` for 422 (validation). @@ -46,10 +46,6 @@ Required headers: - `Content-Type: application/json` **or** `application/msgpack` - `model: s2-pro` (required). Values: `s1`, `s2-pro`. Default to `s2-pro` unless the user explicitly asks otherwise. -Optional tracing header: - -- `traceparent: 00---` for W3C distributed tracing. - Response: streaming audio bytes (`Transfer-Encoding: chunked`) in the format set by `format`. Write to a file or pipe to a player. There is **no JSON wrapper** on success. ### Request body fields (TTSRequest) @@ -202,8 +198,6 @@ await pipeline(Readable.fromWeb(res.body), createWriteStream("out.mp3")); Required headers: `Authorization`. Content type: `multipart/form-data` or `application/msgpack`. -Optional tracing header: `traceparent` in W3C Trace Context format. - Form fields: - `audio` (binary, required) @@ -377,7 +371,6 @@ For low-latency / streaming TTS (e.g. LLM token stream → speech). All frames a - `Authorization: Bearer ` - `model: s2-pro` (or `s1`) — **required** -- `traceparent` — optional W3C Trace Context header for distributed tracing. Browser WebSocket clients cannot set custom headers directly; use a server-side proxy when trace propagation is required. ### Event sequence diff --git a/api-reference/asyncapi.yml b/api-reference/asyncapi.yml index ad173aa..d8c40f3 100644 --- a/api-reference/asyncapi.yml +++ b/api-reference/asyncapi.yml @@ -65,16 +65,12 @@ channels: - s1 - s2-pro description: TTS model to use for this session. Use `s2-pro` for multi-speaker dialogue synthesis. - traceparent: - type: string - description: Optional W3C Trace Context header used to correlate your business trace with Fish Audio inference spans. description: | Real-time TTS streaming channel. Clients send text chunks and receive audio chunks concurrently. ## Connection Headers - `Authorization: Bearer ` - Required for authentication (see security section) - `model: ` - Required to specify which TTS model to use (see bindings) - - `traceparent: ` - Optional W3C Trace Context header for distributed tracing operations: receiveText: diff --git a/api-reference/endpoint/openapi-v1/speech-to-text.mdx b/api-reference/endpoint/openapi-v1/speech-to-text.mdx index 1a1a3e9..660a31a 100644 --- a/api-reference/endpoint/openapi-v1/speech-to-text.mdx +++ b/api-reference/endpoint/openapi-v1/speech-to-text.mdx @@ -9,10 +9,3 @@ iconType: "solid" This BETA endpoint only accepts `application/form-data` and `application/msgpack`. - - - This endpoint accepts the optional W3C `traceparent` header for distributed - tracing across your business service, Fish Audio edge, upstream ASR, and - alignment spans. See [Tracing & Performance - Analysis](/api-reference/observability). - diff --git a/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx b/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx index c287130..96b5b96 100644 --- a/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx +++ b/api-reference/endpoint/openapi-v1/text-to-speech-stream-with-timestamps.mdx @@ -11,12 +11,6 @@ iconType: "solid" one JSON payload with a base64-encoded audio chunk. - - This endpoint accepts the optional W3C `traceparent` header for distributed - tracing across your business service and Fish Audio inference. See [Tracing & - Performance Analysis](/api-reference/observability). - - Use this endpoint when you need both progressive audio delivery and text-to-audio alignment data, such as karaoke-style highlighting, word or diff --git a/api-reference/endpoint/openapi-v1/text-to-speech.mdx b/api-reference/endpoint/openapi-v1/text-to-speech.mdx index 59dc7ce..f581b93 100644 --- a/api-reference/endpoint/openapi-v1/text-to-speech.mdx +++ b/api-reference/endpoint/openapi-v1/text-to-speech.mdx @@ -14,12 +14,6 @@ For best results, upload reference audio using the [create model](/api-reference To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the [instructions](/features/text-to-speech#direct-api-messagepack). - - This endpoint accepts the optional W3C `traceparent` header for distributed - tracing across your business service and Fish Audio inference. See [Tracing & - Performance Analysis](/api-reference/observability). - - Audio formats supported: - WAV / PCM diff --git a/api-reference/endpoint/websocket/tts-live.mdx b/api-reference/endpoint/websocket/tts-live.mdx index b762486..b227946 100644 --- a/api-reference/endpoint/websocket/tts-live.mdx +++ b/api-reference/endpoint/websocket/tts-live.mdx @@ -10,14 +10,6 @@ iconType: "solid" The WebSocket TTS endpoint enables bidirectional streaming for low-latency text-to-speech generation with MessagePack serialization. - - The WebSocket upgrade request accepts the optional W3C `traceparent` header - for distributed tracing across your business service and Fish Audio inference. - Browser WebSocket clients cannot set custom headers directly; use a trusted - server-side proxy when you need trace propagation from the browser. See - [Tracing & Performance Analysis](/api-reference/observability). - - The `request` payload inside `StartEvent` uses the same parameters as the HTTP [Text to Speech API](/api-reference/endpoint/openapi-v1/text-to-speech). For more detailed field guidance, model-specific behavior, and examples, see that page. In WebSocket mode, `request.text` is typically empty in `StartEvent`, and the text content is sent through subsequent `TextEvent` messages.