Skip to content

Allow configuring keepalive_ping_timeout_seconds for tts.stream_websocket (AsyncFishAudio) #47

@stackpiles-naka

Description

@stackpiles-naka

I'm using AsyncFishAudio with tts.stream_websocket to stream TTS audio over WebSocket in a conversational application.

When generating relatively long responses, I frequently hit a WebSocketNetworkError coming from httpx_ws during ws.receive_bytes(). After some investigation, I found that increasing keepalive_ping_timeout_seconds on the underlying aconnect_ws call from the default 20 seconds to 60 seconds completely eliminates these errors for my use case.

Right now, the only way I can change this timeout is by modifying the library source directly, which is not maintainable.

Environment

  • Library: fishaudio (AsyncFishAudio)
  • Feature: AsyncTTSClient.stream_websocket / client.tts.stream_websocket(...)
  • Transport: WebSocket via httpx_ws.aconnect_ws
  • Use case: Long-form, streamed conversational TTS (responses can be quite long)

Current behavior

  • AsyncTTSClient.stream_websocket internally calls:
 async with aconnect_ws(
     "/v1/tts/live",
     client=self._client.client,
     headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"}
 ) as ws:
     ...
  • For long TTS generations, there can be periods where no audio chunks or other frames are received for more than 20 seconds.
  • In those cases, httpx_ws raises a WebSocketNetworkError, which bubbles up to my application and breaks the TTS stream.

Workaround

If I patch the library locally and change the aconnect_ws call to:

async with aconnect_ws(
    "/v1/tts/live",
    client=self._client.client,
    headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"},
    keepalive_ping_timeout_seconds=60,
) as ws:

...then the WebSocketNetworkError no longer occurs, and long TTS responses stream successfully. However, this requires modifying the installed package, which is fragile and hard to maintain across upgrades.

Requested / expected behavior

It would be great if the keepalive timeout were configurable from the public API, for example by:

  • Adding an optional parameter to:
 async def stream_websocket(
     self,
     text_stream: AsyncIterable[Union[str, TextEvent, FlushEvent]],
     *,
     reference_id: Optional[str] = None,
     references: Optional[List[ReferenceAudio]] = None,
     format: Optional[AudioFormat] = None,
     latency: Optional[LatencyMode] = None,
     speed: Optional[float] = None,
     config: TTSConfig = TTSConfig(),
     model: Model = "s1",
     keepalive_ping_timeout_seconds: int | None = None,  # for example
 ):

and passing it through to aconnect_ws, with a sensible default (e.g. current behavior).

  • Or alternatively, exposing this via some configuration or RequestOptions-like object.

Questions

  1. Is keepalive_ping_timeout_seconds intended to be user-configurable for long-running TTS streams?
  2. Would you be open to a PR that adds an optional parameter (or configuration mechanism) to control this timeout without patching the library source?
  3. Is there a recommended pattern in this library for configuring WebSocket-level timeouts for TTS streaming?

Having an official way to configure this timeout would make it much easier to support long-form conversational TTS without resorting to local patches. Thanks!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions