Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 14 additions & 35 deletions developer-guide/core-features/fine-grained-control.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,14 @@ import { AudioTranscript } from "/snippets/audio-transcript.jsx";
<AudioTranscript page="core-features-fine-grained-control" />
</Visibility>

<Card title="Try it live in the API playground" icon="flask" href="/api-reference/endpoint/openapi-v1/text-to-speech" horizontal>
Put your phoneme or paralanguage tags into the `text` field and send a real request to hear the result.
<Card
title="Try it live in the API playground"
icon="flask"
href="/api-reference/endpoint/openapi-v1/text-to-speech"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use relative internal links instead of absolute internal paths.

Line 20 and Line 99 use absolute internal URLs. Switch these to relative paths to match repo documentation rules.

Suggested update
-  href="/api-reference/endpoint/openapi-v1/text-to-speech"
+  href="../../api-reference/endpoint/openapi-v1/text-to-speech"
-For pauses, laughter, breathing, emotion, tone, and other paralinguistic cues, use the [Emotion Control](/developer-guide/core-features/emotions) guide.
+For pauses, laughter, breathing, emotion, tone, and other paralinguistic cues, use the [Emotion Control](../emotions) guide.

As per coding guidelines, "Use relative paths for internal links" and "Do not use absolute URLs for internal links."

Also applies to: 99-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@developer-guide/core-features/fine-grained-control.mdx` at line 20, The href
attributes on line 20 and line 99 use absolute internal paths starting with a
forward slash instead of relative paths. Convert both
href="/api-reference/endpoint/openapi-v1/text-to-speech" occurrences to use
relative path syntax that properly references the target document based on the
current file's location in the directory structure, following the repository's
documentation guideline to use relative paths for all internal links.

Source: Coding guidelines

horizontal
>
Put your phoneme or paralanguage tags into the `text` field and send a real
request to hear the result.
</Card>

## Getting Started
Expand All @@ -24,7 +30,7 @@ To use fine-grained control, you can use either our SDK, API, or Playground.

SDK/API: Phoneme tags are preserved by text normalization, so you can keep the default normalization behavior for pronunciation control. Set `"normalize": false` only when you want to prevent normalization from rewriting the surrounding text, such as numbers, dates, or URLs.

Playground: You can use V1.6 Control Model, without setting any other options.
Playground: Use `s2-pro` or `s1`. Both models support phoneme control with `<|phoneme_start|>` and `<|phoneme_end|>` tags.

<Note>
Disabling normalization may reduce the stability of reading numbers, dates,
Expand Down Expand Up @@ -88,41 +94,14 @@ Japanese:
<|phoneme_start|>ha0shi1ga0<|phoneme_end|>見えます。
```

## Paralanguage

Paralanguage controls allow you to add natural speech elements and pauses to make the generated speech sound more human-like. There are two main types of controls:

### Pause Words

You can use common pause words like "um", "uh", "嗯", "啊" to control the rhythm of the speech.

### Special Effects
## Emotion and Paralanguage Control

The following special effects can be added using parentheses:
For pauses, laughter, breathing, emotion, tone, and other paralinguistic cues, use the [Emotion Control](/developer-guide/core-features/emotions) guide.

| Effect | Description | First Available | Stage |
| ---------------- | ------------------ | --------------- | ------------ |
| `(break)` | Short pause | V1.6 | Experimental |
| `(long-break)` | Extended pause | V1.6 | Experimental |
| `(breath)` | Breathing sound | V1.6 | Experimental |
| `(laugh)` | Laughter sound | V1.6 | Experimental |
| `(cough)` | Coughing sound | V1.6 | Experimental |
| `(lip-smacking)` | Lip smacking sound | V1.6 | Experimental |
| `(sigh)` | Sighing sound | V1.6 | Experimental |

<Warning>
The effects `(laugh)`, `(cough)`, `(lip-smacking)`, and `(sigh)` are
developing. You may need to repeat them multiple times for better results.
</Warning>

Example:

```text
I am, um, an (break) engineer.
```
`s2-pro` uses `[bracket]` cues such as `[laughing]` and `[break]`. `s1` uses `(parentheses)` syntax. See the Emotion Control guide for the current supported marker lists and examples.

You can combine paralanguage and phoneme control in the same text:
You can combine emotion or paralanguage cues and phoneme control in the same text:

```text
I am, um, an (break) <|phoneme_start|>EH1 N JH AH0 N IH1 R<|phoneme_end|>.
[calmly] I am an <|phoneme_start|>EH1 N JH AH0 N IH1 R<|phoneme_end|>.
```