feat: add xai asr and tts extensions by BenWeekes · Pull Request #2146 · TEN-framework/ten-framework

BenWeekes · 2026-04-24T08:51:35Z

Summary

add xai_asr_python and xai_tts_python TEN extensions
add xAI graph variants to the voice-assistant example
add xAI env/config wiring and usage notes for local and demo testing
add standalone test coverage for both new extensions

What Changed

added ai_agents/agents/ten_packages/extension/xai_asr_python
- WebSocket STT client with explicit audio.done finalize flow
- bounded reconnect handling with fatal escalation for non-recoverable auth/config failures
- result mapping for interim, locked, and final transcript states
- dump support, metrics, and vendor error classification
added ai_agents/agents/ten_packages/extension/xai_tts_python
- WebSocket TTS client using PCM output
- bounded reconnect / backoff with auth failures treated as fatal
- cancellation and request-finalization behavior aligned with AsyncTTS2BaseExtension
- config redaction, lifecycle logging, and invalid punctuation-only request handling
updated ai_agents/agents/examples/voice-assistant/tenapp/manifest.json
- add xAI extension dependencies
updated ai_agents/agents/examples/voice-assistant/tenapp/property.json
- add voice_assistant_xai_asr
- add voice_assistant_xai_tts
- add voice_assistant_xai_full
updated ai_agents/agents/examples/voice-assistant/README.md
- document graph selection and xAI usage
updated ai_agents/.env.example
- add XAI_API_KEY

Testing

standalone TTS tests: 23 passed
standalone ASR tests: 17 passed
validated the demo graph wiring on voice_assistant_xai_full

Notes

companion repo docs updates were pushed to docs/progressive-disclosure in TEN Framework
companion portal guide updates were pushed to improve/tts-extension-guide in TEN Portal

github-actions · 2026-04-24T09:01:58Z

test

github-actions · 2026-04-24T09:03:21Z

placeholder being replaced

github-actions · 2026-04-24T09:04:44Z

edit test

github-actions · 2026-04-24T09:05:50Z

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

github-actions · 2026-04-24T09:07:03Z

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

Bugs / Correctness

1. XAIASRConfig.to_json() returns Python repr, not JSON

In xai_asr_python/config.py, to_json() ends with return str(config_dict). str() on a dict produces Python __repr__ output (single-quoted keys, True/False instead of true/false) which is not valid JSON. This should be return json.dumps(config_dict). If this string is ever parsed downstream it will fail.

2. _message_handler re-fires on_open() on duplicate transcript.created

In recognition.py, when a transcript.created event arrives inside the running _message_handler loop, callback.on_open() is called again. That re-triggers the buffer-flush and timeline reset in the extension. If the server re-sends this event mid-session it will corrupt in-flight audio state. Guard this branch so on_open() is only forwarded once per connection.

3. Redundant copy.deepcopy after json.loads in _flush_buffered_audio_frames

json.loads already returns a fresh object tree, so wrapping it in copy.deepcopy is unnecessary overhead on every buffered frame.

4. _ensure_connection does not verify the socket is still open (TTS)

self._ws could reference a closed connection if stop() or cancel() raised during the finally block. Consider also checking self._ws.state == State.OPEN, mirroring how XAIASRRecognition.is_connected() works.

API Key Validation Leaks Test Logic into Production

5. xai_tts_python/config.py validate() accepts "test" prefix

The validation accepts api_key values starting with "test" to bypass format checking in tests, but the error message says the key must start with "xai-". Remove the "test" branch and update tests to use a properly formatted dummy key like "xai-test-key" (which already passes the xai- check).

Missing Manifest Properties

6. xai_asr_python/manifest.json omits dump, dump_path, and finalize_timeout_ms

These three properties are present in property.json and documented in the README but absent from api.property.properties in the manifest. Other extensions in this repo declare all settable properties in the manifest so TMAN can discover and validate them. Add: dump (bool), dump_path (string), finalize_timeout_ms (int32).

Fatal-Error String Matching is Fragile

7. ASR _is_fatal_connection_error and on_error

Both use substring scans on free-form error strings, including the literal string "api key" which could match unrelated messages. The TTS extension handles this correctly by catching InvalidStatus and threading a numeric status_code through XAITTSConnectionException. The ASR path should do the same - surface the status code through the recognition callback rather than scanning the message text.

Minor / Style

8. Missing license header in xai_asr_python/__init__.py - the TTS counterpart has the Apache-2.0 header; the ASR one does not.

9. Missing newlines at end of files: xai_asr_python/requirements.txt and xai_asr_python/tests/bin/start both show no trailing newline (git diff reports no newline at end of file) in the diff.

10. validate() shadows Pydantic's built-in in both configs. Both XAIASRConfig and XAITTSConfig define a validate() method that shadows the Pydantic class-level validator name. Consider renaming to validate_config() or check_required_fields().

11. Duplicated conftest.py - both test directories contain byte-for-byte identical conftest.py. Consider extracting to a shared fixture module.

12. Stale test log string in xai_tts_python/tests/test_basic.py: text="Hello, this is a test of the Deepgram TTS extension." should reference xAI.

Summary

Severity	Count
Bug / Correctness	4
Validation / Security	2
Missing config surface	1
Minor / Style	6

The top priorities before merge are items 1 (invalid JSON from to_json()), 5 (test bypass in production validate()), and 6 (missing manifest properties). The rest are good-to-fix but not blockers.

github-actions · 2026-04-24T10:01:32Z

PR Review: xAI ASR and TTS Extensions - test post

github-actions · 2026-04-24T10:02:19Z

PR Review: xAI ASR and TTS Extensions. Issues to fix: (1) on_open called twice in recognition.py - both start() and _message_handler call the callback on transcript.created, causing duplicate metrics and buffer flushes if server re-sends; (2) buf potentially unbound in xai_asr_python send_audio - if lock_buf raises, buf is undefined in the except handler, initialize buf=None before the try block; (3) property_en_hotwords.json is identical to property_en.json with no hotword impl - remove or implement. Issues to address: (4) XAIASRConfig.validate() shadows Pydantic class method, rename to validate_config(); (5) TTS api key startswith('test') check is too broad; (6) turn_status values 1 and 2 are undocumented magic numbers; (7) isinstance str check for text_normalization is dead code since the field is typed bool; (8) missing trailing newlines in requirements.txt and tests/bin/start; (9) test_basic.py copy-paste error says Deepgram not xAI. Positives: great reconnect logic with backoff and fatal escalation, API key redaction, _contains_spoken_content guard, comprehensive test coverage (timeout, flush, punctuation, reconnect-after-error, auth failure), clean agent.py update for the new event types. Generated by Claude Code

github-actions · 2026-04-24T14:55:54Z

Code Review: xAI ASR + TTS Extensions

Great overall structure. The two extensions follow existing repo conventions closely and come with thorough test coverage (17 ASR + 23 TTS tests). The reconnection logic, error classification, and sensitive-data redaction are all solid. A few issues worth addressing before merge.

Bugs

1. Potential UnboundLocalError in xai_asr_python/extension.py send_audio

If frame.lock_buf() itself raises (e.g. the frame is already invalid), buf is never bound, and the except block's frame.unlock_buf(buf) will throw a second exception masking the original. Prefer the standard try/finally pattern:

buf = frame.lock_buf()
try:
    audio_data = bytes(buf)
    ...
    return True
except Exception as e:
    self.ten_env.log_error(f"Error sending audio: {e}")
    return False
finally:
    frame.unlock_buf(buf)

Code Quality

2. _connection_successful flag in ReconnectManager is dead code

self._connection_successful is assigned in __init__ and handle_reconnect but never read. Remove it.

3. try/except TypeError for dynamic dispatch is fragile (reconnect_manager.py)

handle_reconnect catches TypeError to detect whether the error handler accepts 1 or 2 arguments. This silently swallows unrelated TypeErrors raised inside the handler. Since the only call-site is XAIASRExtension._handle_reconnect, use a consistent 2-arg signature throughout.

4. Empty on_deinit override in TTS extension (extension.py)

The override only calls super() with no additional logic. Delete it.

5. BUILD.gn in xai_tts_python/manifest.json package includes

BUILD.gn is auto-generated and in .gitignore. AGENTS.md says not to include auto-generated files. xai_asr_python correctly omits it. Remove from xai_tts_python's manifest.

Completeness / Consistency

6. dump, dump_path, and finalize_timeout_ms missing from ASR manifest property schema

xai_asr_python/manifest.json only declares params under api.property.properties. The fields dump, dump_path, and finalize_timeout_ms are used by XAIASRConfig but are absent from the schema, so they are invisible to TMAN Designer and TEN's property system. Compare with xai_tts_python/manifest.json which correctly declares dump and dump_path.

7. property_en_hotwords.json is byte-for-byte identical to property_en.json

Both files in xai_asr_python/tests/configs/ have the same content. The name implies hotwords support which xAI STT does not expose here. Either add a meaningful difference or remove the file.

Minor Nits

8. Misleading validation error message for TTS API key (config.py)

validate() allows keys starting with xai- OR test, but the error message only mentions xai-. Update the message to reflect both accepted prefixes.

9. Copy-paste error in TTS test (test_basic.py)

The basic test sends "Hello, this is a test of the Deepgram TTS extension." — should reference xAI, not Deepgram.

Summary

Severity	Item
Bug	`lock_buf` exception path can produce `UnboundLocalError` in `send_audio`
Code quality	Dead `_connection_successful` flag in `ReconnectManager`
Code quality	`try/except TypeError` for dispatch in `handle_reconnect`
Code quality	Pointless `on_deinit` override in TTS extension
Code quality	`BUILD.gn` in TTS manifest package includes
Completeness	`dump`/`dump_path`/`finalize_timeout_ms` missing from ASR manifest schema
Completeness	`property_en_hotwords.json` identical to `property_en.json`
Nit	Misleading API key error message
Nit	"Deepgram" copy-paste in xAI TTS test

wangyimin-agora · 2026-04-30T08:16:11Z

+    def _calculate_request_event_interval_ms(self) -> int:
+        if self.sent_ts is None:
+            return 0
+        return int((datetime.now() - self.sent_ts).total_seconds() * 1000)


The interval time is formulate is "current time - first audio receive time", not first request send time

wangyimin-agora · 2026-04-30T08:22:11Z

+                code=ModuleErrorCode.NON_FATAL_ERROR,
+                vendor_info=ModuleErrorVendorInfo(vendor=self.vendor()),
+            )
+            await self._finalize_request(TTSAudioEndReason.ERROR, error=error)


_finalize_request only for handling last request of the turn.
if t.text_input_end:

wangyimin-agora · 2026-04-30T08:34:27Z

+        if start_ms <= 0:
+            start_ms = int(datetime.now().timestamp() * 1000)
+
+        transcript_result = TTSTextResult(


TTSTextResult is not necessary. Actually, we only use it when we can get a word-level timestamp. if this tts vendor cannot provide, we can remove it

wangyimin-agora · 2026-04-30T08:35:43Z

+                message=e.body,
+            ),
+        )
+        await self._finalize_request(TTSAudioEndReason.ERROR, error=error)


_finalize_request only for handling last request of the turn.
if t.text_input_end:

github-actions · 2026-04-30T10:09:28Z

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

This PR adds two well-structured TEN extensions — xai_asr_python (WebSocket STT) and xai_tts_python (WebSocket TTS) — along with three new voice-assistant graph variants and comprehensive test coverage. Overall this is high-quality work following the project's existing patterns. A few issues worth addressing before merge.

Issues

Bug: `TypeError` swallowed in `ReconnectManager.handle_reconnect`

reconnect_manager.py:

try:
    await error_handler(error, vendor_info)
except TypeError:
    await error_handler(error)

Using exception handling to detect function signatures is fragile — if error_handler raises a genuine TypeError internally, the manager silently retries without vendor_info. The callers (XAIASRExtension.send_asr_error) have a stable signature, so this safety valve isn't needed. Call await error_handler(error, vendor_info) directly (or await error_handler(error) when vendor_name is None).

Bug: `TIMEOUT_CODE = 10105` in `const.py` is defined but never used

xai_asr_python/const.py defines TIMEOUT_CODE = 10105 but nothing in the ASR extension uses it. Should be wired up or removed.

Design smell: `_ensure_client` accesses private internals

xai_tts_python/extension.py:

ws = getattr(self.client, "_ws", None)
if ws is not None and getattr(ws, "state", None) == State.OPEN:
    return

This reaches into XAITTSClient's private state. XAITTSClient should expose an is_connected() -> bool method (as XAIASRRecognition already does) and _ensure_client should call that instead.

Fragile API key prefix validation in TTS config

xai_tts_python/config.py:

if not self.api_key.startswith("xai-"):
    raise ValueError("API key must start with 'xai-'")

This will break if xAI changes their key format, and there is no equivalent check in the ASR extension. Either apply it consistently or drop it — the server will reject invalid keys with a 401 anyway.

Test config: `property_en_hotwords.json` is a copy of `property_en.json`

xai_asr_python/tests/configs/property_en_hotwords.json and property_en.json are byte-for-byte identical. If hotwords aren't implemented yet, remove the duplicate rather than leaving a misleading placeholder.

Minor Issues

Docstring/code mismatch: ReconnectManager docstring says max_attempts=4, but extension.py instantiates it with max_attempts=10. Update the docstring.
Unnecessary override: xai_tts_python/extension.py's on_deinit does nothing except call super() — remove it.
Reconnect cost on cancel: XAITTSClient.cancel() closes the WebSocket and sets _needs_reconnect = True, causing a full TLS + WS handshake on the next utterance after every barge-in. Consider draining the socket instead of closing it, or pre-warming the connection after cancel. Not a blocker, but a known latency trade-off worth documenting.
State reset scattered across request_tts: The ~6 fields reset in multiple code paths (_request_text, _request_metadata, total_audio_bytes, etc.) would benefit from a private _reset_request_state() helper to avoid future divergence.
Duplicate api_key check: start_connection in xai_asr_python/extension.py re-checks for an empty API key even though config.validate() already raised on it during on_init. The guard is harmless but adds noise.

What's Good

Reconnection logic is well-designed: bounded retries with exponential backoff, auth failures correctly escalated as fatal, successful connections reset the counter.
Both extensions handle the audio.done/finalize flow correctly, including timeout handling and the _close_expected guard against spurious reconnects.
Config redaction with encrypt() before logging is consistently applied in both extensions.
Excellent test coverage — 40 tests covering: 401 errors, timeout, reconnect-after-error, sequential requests, flush, dump verification, TTFB metrics, config redaction, and language normalization.
The agent.py fix silently dropping tts_audio_start/tts_audio_end is a clean way to handle the new data events without polluting the log with warnings.
Property graphs are correct and follow the established pattern.

Summary

The implementation is solid and well-tested. The TypeError-swallowing pattern in ReconnectManager and the private-field access in _ensure_client are the most important items to address. The rest are cleanup/polish.

…sing (#2146)

# Conflicts: # ai_agents/.env.example

github-actions · 2026-05-14T12:09:44Z

Code Review — `feat: add xai asr and tts extensions`

Thanks for the thorough work — the extensions are well-structured, follow the existing *_asr_python / *_tts_python conventions, and ship with substantial test coverage (17 ASR + 23 TTS). Most of the heavy lifting (config validation, redaction, dump support, finalize flow, bounded reconnect with auth-failure escalation) is in good shape. A few concrete issues below, ordered roughly by severity.

Bugs

1. XAIASRExtension._handle_reconnect() passes kwargs that ReconnectManager.handle_reconnect() does not accept — this will raise TypeError the first time a reconnect happens.

xai_asr_python/extension.py calls:

success = await self.reconnect_manager.handle_reconnect(
    connection_func=self._connect_recognition,
    error_handler=self.send_asr_error,
    vendor_name=self.vendor(),       # not a parameter
    vendor_code=\"connect_failed\",    # not a parameter
)

But ReconnectManager.handle_reconnect is defined as:

async def handle_reconnect(self, connection_func, error_handler=None) -> bool:

The current test_reconnect.py invokes the manager directly with positional args, so the unit tests don't catch this. Either drop the extra kwargs from the call site or extend the manager's signature (and forward vendor_name/vendor_code into a ModuleErrorVendorInfo so vendor info actually surfaces through the error path — currently error_handler is invoked with only a ModuleError, which means send_asr_error would receive no vendor info on a reconnect-failure escalation).

2. XAIASRExtension.send_audio — possible UnboundLocalError — if frame.lock_buf() itself raises, buf is undefined when the except branch tries frame.unlock_buf(buf). Initialize buf = None (and skip the unlock if still None), or move lock_buf() out of the try so the unlock-on-error path is reachable only when buf was successfully obtained.

3. XAITTSClient._connect() has no open_timeout — an unreachable host can hang the websocket connect indefinitely. Compare with xai_asr_python/recognition.py which does pass open_timeout=timeout. Recommend adding open_timeout=10 (or surfacing as config).

Code quality / minor

4. XAITTSConfig.validate() requires api_key.startswith(\"xai-\") — this couples the extension to the current xAI key prefix. If xAI ever issues a key with a different prefix (or you ever want to test against a proxy/gateway), this breaks. Consider relaxing to a non-empty check, or at least making it a warn-then-proceed.

5. XAIASRConfig.validate() shadows pydantic's BaseModel.validate — works under pydantic v2 (where validate is removed), but the name collision is confusing. Consider renaming to validate_params() for clarity (the TTS config has the same naming).

6. XAITTSConfig.update_params + XAITTSClient._build_ws_url double-handles text_normalization — update_params already coerces to bool, so the isinstance(self.config.text_normalization, str) branch in _build_ws_url is dead. Drop one of the two coercions.

7. XAIASRRecognition._message_handler — when start() consumes the initial transcript.created event, then a second transcript.created from the server during the handler loop will fire on_open() again (re-emitting connect-delay metrics, re-setting audio_timeline). Likely benign in practice, but an early-return after the first on_open would make it deterministic.

8. ASR fatal-error classification is limited to 401/403 — _is_fatal_connection_error doesn't consider non-recoverable cases like 400/404 (bad URL/params), and conversely 429 (rate-limited) is recoverable but currently treated like any other error. Worth widening the fatal set and adding 429 → non-fatal-with-backoff.

9. tests/configs/property_en_hotwords.json is byte-identical to property_en.json — looks like a placeholder. Either populate it with hotword-specific config or remove it.

10. Voice-assistant property.json graph duplication — three new graphs (*_xai_asr, *_xai_tts, *_xai_full) duplicate ~600 lines of node + connection wiring. Not a blocker (matches existing convention), but worth flagging for a future refactor of the example.

11. Inconsistent env-var defaulting in graphs — extension property.json uses ${env:XAI_API_KEY|} (empty default), example graphs use ${env:XAI_API_KEY} (no default). The latter will surface a clearer error if unset; pick one and apply consistently.

12. manifest.json version 0.4.0 — unusual starting version for a brand-new extension; assume this is a copy-paste from an existing template — fine to leave but worth noting.

Security

API key handled via ${env:...} and redacted via utils.encrypt in to_str/to_json ✅
recognition.py and xai_tts.py log full WS URL including query params (Connecting to xAI STT: {url}). Query params don't carry the API key (which is in the Authorization header), so no leak — but consider redacting URL logs anyway in case future params include sensitive data.
.env.example correctly leaves XAI_API_KEY= empty ✅.

Performance

TTS audio chunk timestamps are computed from accumulated bytes — straightforward and correct.
Both clients use a single websocket per request lifetime with bounded backoff (4 attempts ASR / 5 attempts TTS). Reasonable.
ASR buffered_frames is drained on on_open — good. No flow-control on the audio send path; if upstream produces faster than xAI accepts, await self._ws.send(...) will provide natural backpressure ✅.

Test coverage

Good unit-level coverage for both extensions:

ASR: config validation, redaction, language normalization, finalize-with-timeout, dump path, reconnect escalation, metric emission, vendor error, ASR result shape.
TTS: dump, basic audio, flush, error paths (empty params, invalid key), TTFB metric, params, robustness (empty/whitespace text), state machine.

Gaps to consider:

No test exercises XAIASRExtension._handle_reconnect end-to-end, which is why bug Update README.md #1 went undetected. A test that drives on_close (without _stop_requested / _close_expected) and asserts the manager call signature would catch it.
No test for send_audio failure path raising before lock_buf returns (bug modify playground default settings，fix voice type error #2).
No test for XAITTSClient._connect timeout behavior (bug Fix speech sdk #3).

Once the three bugs above are addressed, this is in good shape to merge. Nice work overall.

wangyimin-agora · 2026-05-18T03:19:04Z

+        return int(duration_sec * 1000)
+
+    def _calculate_request_event_interval_ms(self) -> int:
+        return self._request_event_interval_ms


Interval time is the last audio receive time - first audio receive time. not ttfb.
It represents the audio length of the current request id

Fixed in 2820e94. _calculate_request_event_interval_ms now returns _last_audio_chunk_ts - _first_audio_chunk_ts. The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.

request_event_interval_ms now measures last audio chunk arrival minus first audio chunk arrival, representing the wall-clock audio receive window for the request. Previously the field stored TTFB which was the wrong semantic per maintainer feedback. The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.

github-actions · 2026-05-18T04:10:09Z

Review of xAI ASR/TTS extensions

Thanks for the thorough work — the test coverage is impressive (40 tests) and the bounded-reconnect / fatal escalation pattern is consistent. Below are issues grouped by severity.

Bugs

UnboundLocalError in xai_asr_python/extension.py:send_audio — if frame.lock_buf() raises, buf is referenced in the except block before assignment. Initialize buf = None before the try, or move the lock outside.
Wrong audio duration for non-PCM encodings in recognition.py:send_audio_frame:
```
duration_ms = int(len(audio_data) / (sample_rate / 1000 * 2))
```
Hard-codes 2 bytes/sample, but encoding=mulaw|alaw (advertised in config) is 1 byte/sample. Audio timeline math drifts for those encodings.
text_normalization boolean coercion in xai_tts_python/config.py:update_params:
```
self.text_normalization = bool(params.pop(\"text_normalization\", self.text_normalization))
```
bool(\"false\") → True. The xai_tts.py _build_ws_url then attempts a string-vs-bool branch, but at this point the string is already gone. Parse explicitly.
stop_connection sets _stop_requested = False at the end, so a recognition.close() callback firing on_close() immediately after stop will trigger an unintended reconnect. The reset belongs in start_connection, not stop_connection.
Reconnect-manager success contract is ambiguous: handle_reconnect returns True when the function executed, not when the connection succeeded — but the docstring promises mark_connection_successful is what determines real success. The caller in extension._handle_reconnect exits the loop on True, which means a transient failure where _connect_recognition doesn't raise (e.g., the websocket opens and immediately closes) will silently stop retrying without ever marking success.
reconnect_manager.py hard-codes MODULE_NAME_ASR even though it's structured as a generic helper. If reused (the docstring implies generality), TTS errors will be reported under module asr.
error_handler duck-typing in ReconnectManager.handle_reconnect (try ... except TypeError) — pick one signature and stick with it. Catching TypeError to detect arity differences is fragile and will mask real TypeErrors thrown inside the handler.
Race in XAIASRRecognition.send_audio_frame — checks is_connected() then await self.ready_event.wait(). If the socket dies between the two, the await may hang until the message handler finally fires on_close. Consider awaiting with a small timeout or removing the second wait once is_connected returns true.
request_tts over-length path (xai_tts_python/extension.py): _request_text_length is incremented before the 15000-char check, so the failed request leaves stale state until the generic exception handler resets it via _finalize_request. The check should run on the prospective sum before mutating state.
XAITTSExtension._ensure_client reaches into client._ws to inspect socket state. Add an is_connected() method on the client; the current coupling will silently break if the attribute is renamed.

Code quality / consistency

Config shape diverges between the two extensions: XAIASRConfig keeps everything under params; XAITTSConfig pops most fields out into typed attributes. Pick one approach so future maintainers don't have to re-learn the layout per extension.
XAIASRConfig.validate() and XAITTSConfig.validate() shadow Pydantic's reserved-ish names. Rename to validate_config() or use @model_validator to avoid v2 footguns and confusion.
api_key.startswith(\"xai-\") check in TTS config is brittle — vendor key prefixes change. Either drop or limit to a soft warning.
xai_asr_python/manifest.json includes \"docs/**\", xai_tts_python/manifest.json includes \"BUILD.gn\" and \"**.tent\" — these are copy-paste artifacts, neither directory/file exists in the package.
tests/mock.py in xai_asr_python is empty (license header only) — remove.
tests/configs/property_en_hotwords.json is byte-identical to property_en.json — either parameterize or remove.
finalize swallows asyncio.CancelledError instead of re-raising — this typically breaks structured cancellation upstream; log + re-raise is the safer pattern.
Magic constants: byte_limit=1024 * 1024 * 10, MAX_CONNECT_ATTEMPTS = 5, WS_RECV_TIMEOUT = 8.0 — 8s between audio chunks during normal speech is fine, but document that this is a hard ceiling; long sentences with high-latency networks could trip it.
xai_tts.py:_build_ws_url duplicates a list of "known" keys that already exist as fields in XAITTSConfig. Drive it from the model's field set to avoid drift.
tts_audio_start/tts_audio_end swallowed in agent.py — a silent return for these names is fine, but a one-line comment explaining why (these are vendor-emitted lifecycle data the main loop doesn't model) would help future readers.
Test files mix asyncio.run(_run()) with pytest-asyncio-style elsewhere in the repo. Not blocking, but inconsistent.

Security / robustness

Redaction is api_key-only. If users put auth tokens or signed URL components in params.*, they'll appear in logs via to_json/to_str. Consider a deny-list of additional sensitive substrings (token, secret, signature).
InvalidStatus import path (websockets.exceptions) — websockets>=15 reorganized exceptions; pin or guard against import errors. Same for websockets.protocol.State and websockets.asyncio.client.ClientConnection — these are 13+ APIs.
test_invalid_api_key_error mocks websockets.connect with Exception(\"401 Unauthorized - Invalid API key\") and relies on substring matching in _connect. Real 401 handling depends on InvalidStatus.response.status_code; this test passes despite that path not being exercised. Consider mocking with an actual InvalidStatus-shaped object.

Performance

Per-frame int(...) and datetime.now() calls in request_tts / _process_tts_text are fine, but _get_next_audio_chunk_timestamp_ms recomputes _calculate_audio_duration_ms on every chunk (O(n) over chunk count for log/dispatch). Cache the running duration.
audio_dumper.push_bytes is awaited inside send_audio on the hot path. If the dump filesystem is slow, ASR ingest backpressures. Fire-and-forget into a queue if dump perf matters.

Test coverage notes

Strong on lifecycle, finalize, reconnect ceiling, redaction, and error paths.
Missing:
- Mid-stream WebSocket disconnect during active synthesis (only error events tested, not transport drops).
- mulaw/alaw encodings (only pcm is tested) — relevant given the duration-calculation bug above.
- cancel_tts mid-stream (tests cover tts_flush but not the cancellation path that calls client.cancel()).
- Multiple concurrent request_tts interleavings.

Smaller nits

xai_asr_python/extension.py:204: self.config = XAIASRConfig.model_validate_json(\"{}\") after a failed validate creates an invalid object; downstream code should not assume self.config.params[\"api_key\"] exists.
READMEs reference cd xai_asr_python / cd xai_tts_python but the actual paths are ai_agents/agents/ten_packages/extension/....
extension.py (TTS) line if not self.api_key.startswith(\"xai-\"): — the test config files use \"xai-test-key\" (good), but property_invalid.json uses \"invalid\" and property_miss_required.json uses \"\" — both fail the empty check first, so the prefix check is never exercised in tests.

Overall: the implementation is solid and the test count is reassuring; the most important fixes before merge are the non-PCM duration math (#2), the reconnect success contract (#5/#6), and the buf UnboundLocalError (#1). The rest can land as follow-ups.

- asr send_audio: initialize buf before try to avoid UnboundLocalError if frame.lock_buf() raises - asr send_audio_frame: derive bytes_per_sample from encoding so mulaw/alaw duration math no longer hardcodes 2 bytes/sample - tts config: parse text_normalization string explicitly so "false" coerces to False rather than True via bool() - asr stop_connection: drop stop/close-expected resets at function exit; they belong in start_connection and were racing with the on_close callback - tts client: add is_connected() so the extension does not poke at private _ws state to decide if reconnect is needed - tts request_tts: check prospective length before incrementing the request text length counter so over-length requests do not leave stale state

- drop empty test mock module and duplicate hotwords test config - drop package manifest entries for paths that do not exist (docs/, BUILD.gn, **.tent) in xai_asr_python and xai_tts_python - extend sensitive-handling redaction beyond api_key to cover any params field whose name contains key/token/secret/signature/password

BenWeekes · 2026-05-18T04:28:14Z

Latest commits address all open threads:

2820e94b7 — request_event_interval_ms now last_audio_chunk_ts − first_audio_chunk_ts per the extension.py:160 comment
5ccb387e2 — six bugs flagged by claude-review:
- xai_asr_python.send_audio UnboundLocalError on buf
- recognition.send_audio_frame duration math hardcoded 2 bytes/sample (mulaw/alaw broken)
- xai_tts_python.config.update_params bool("false") coerced to True
- xai_asr_python.stop_connection reset _stop_requested at exit, racing with on_close
- _ensure_client reached into client._ws — moved to XAITTSClient.is_connected()
- request_tts over-length check ran after mutating _request_text_length
1523e7268 — manifest cleanups (dropped non-existent docs/**, BUILD.gn, **.tent includes), removed empty test mock and duplicate hotwords config, extended sensitive-key redaction beyond api_key (token / secret / signature / password)

Latest main merged in (849590024). Standalone counts still green (ASR 18, TTS 23; guarder 9 + 1 deselected / 16). Ready for another look.

github-actions · 2026-05-18T04:30:25Z

Code Review: feat: add xai asr and tts extensions

Overall, this is a well-structured, well-tested addition that follows the established patterns of other ASR/TTS extensions in this repo. Configs are validated, secrets are redacted before logging, reconnect/backoff is bounded, and there is solid mock-based test coverage on both sides. Below are the issues I found, grouped roughly by severity.

Bugs and correctness

ASR audio duration ignores channels -- in recognition.py send_audio_frame, bytes_per_sample is 1 or 2 and duration_ms is computed without dividing by channel count. The config exposes multichannel and channels but the math does not factor channels in. For stereo input, every frame reported duration will be 2x actual, drifting audio_timeline and downstream alignment. Either factor channels into the divisor or document/enforce mono only.
XAITTSConfig.text_normalization is typed bool but build_ws_url checks for str -- pydantic coerces the field to bool, so the isinstance str branch is dead code. Just assign the bool directly and let encode_query_value do the conversion.
validate() shadows pydantic API -- both XAIASRConfig.validate and XAITTSConfig.validate define an instance method named validate. BaseModel.validate is a classmethod (deprecated in v1, gone in v2). Renaming to check or validate_config or wiring into a model_validator after-mode avoids surprises.
reconnect_manager.handle_reconnect swallows real TypeErrors -- the try/except TypeError pattern detects callable arity by catching TypeError, but if error_handler is a 2-arg callable that raises TypeError for unrelated reasons (string formatting, dict access), it will silently retry with one argument and either succeed (hiding the bug) or raise a confusing second error. Detect arity explicitly with inspect.signature once, or commit to the 2-arg shape since you control all callers.
TTS first-byte timeout may be too short -- WS_RECV_TIMEOUT = 8.0 in xai_tts.py is applied to each recv, including the first chunk after text.done. Cold-path TTFB on a fresh connection can exceed 8s under load; you will surface a spurious Timeout error and force a reconnect. Consider a separate longer first-chunk timeout or making this configurable via params.
TTS 15000-char limit is hard-coded and metrics may double-count -- request_tts caps text at 15000 chars then increments both metrics_add_input_characters and metrics_add_output_characters with the same len(prepared_text). Output-character metrics typically track the synthesized text actually returned, not the input again. Worth confirming you are not double-counting. The cap should also be sourced from xAI documented limit and ideally a config knob.
finalize_request does not reset total_audio_bytes or audio_start_sent -- they get reset on the next request-id transition in request_tts, but if the same request_id is somehow re-driven (guarded by current_request_finished only after a normal finalize, not after cancel_tts), state can leak across logical requests. Reset all of audio_start_sent, total_audio_bytes, request_text_length, current_request_finished in one place after finalize for symmetry with the request-start reset.
TTS on_init error path emits with empty request_id and an empty vendor_info -- ModuleErrorVendorInfo with no code or message produces a vendor block where vendor=xai and the rest is blank; downstream parsing that expects both fields may misclassify. Either drop the vendor_info here or fill it in.
ASR start_connection returns early on missing key but does not reset stop flags -- stop_requested = False and close_expected = False are set after the early-return guard, so a subsequent retry path that calls stop_connection followed by start_connection enters with stale state. Move the resets above the guard.
emit_asr_result start-time origin is implicit -- start_ms is seconds-from-stream-start coerced to ms, then passed into audio_timeline.get_audio_duration_before_time. The translation only makes sense if the timeline shares the xAI start origin. Worth a comment or a fixture/test that uses a non-trivial FakeTimeline to lock the contract.

Code quality and best practices

Mismatched validate shape between ASR and TTS -- ASR XAIASRConfig.validate requires params api_key; TTS XAITTSConfig.validate additionally enforces api_key.startswith xai-. The ASR side accepts arbitrary key prefixes (its tests/configs/property_invalid.json literally uses the string invalid), so the test passes but real key-format mistakes will not be caught the same way. Pick one policy and apply both.
build_ws_url filtering is brittle -- the drop-these-keys, pass-the-rest-through set is duplicated between update_params (which already pops them) and build_ws_url. After update_params, self.config.params should already be free of those keys. Keeping both lists in sync is a future-bug magnet.
Inconsistent addon imports -- addon.py (TTS) does a deferred import to avoid a circular import; addon.py (ASR) imports eagerly. Be consistent.
recognition.py message_handler silently drops unknown event types -- adding a log_debug for the else branch will help when xAI ships new event types you have not mapped.
xai_tts.py connect detects 401 via substring match -- InvalidStatus should already cover the structured case; the substring fallback can misfire on unrelated errors that happen to include 401 or Unauthorized in their body. Consider dropping it and relying on InvalidStatus.
xai_tts.py get always sends text.done after a single text.delta -- that matches the test mocks but means there is no streaming-input mode for partial deltas; the docstring/README should mention this if it is the intended contract.
Punctuation-only rejection wording -- TTS request_tts only rejects punctuation-only input when both request_text_length is 0 and text_input_end is true. Punctuation in the middle of a multi-chunk request is allowed (good), but the error message could mention this caveat to make debugging easier.
tests/test_subtitle_alignment.py skip change is unrelated to xAI -- converting an error to pytest.skip is a behavior change to a shared integration test. Worth calling out separately or splitting into its own commit so it is reviewable on its own merit.

Performance

Reconnect ceiling of 10 in XAIASRExtension.on_init (vs default 4 in the manager) means up to about 40s of backoff before fatal escalation. Reasonable for transient network blips, but the manager docstring still advertises default 4. Update the docstring or the default.
XAITTSClient.get decodes base64 and yields chunks one-at-a-time. For large responses this is fine; just be aware that backpressure is implicit (the consumer must keep up with the WS). The 8s recv timeout catches a hung consumer indirectly, but that is not the same as proper backpressure.

Security

API key redaction via utils.encrypt in both XAIASRConfig.to_json and XAITTSConfig.to_str -- good.
API keys are sent via Authorization Bearer header rather than query string -- good.
The ASR side logs the connection URL at INFO. No key in it, but it includes language, endpointing, diarize, etc. -- fine, but worth confirming nothing sensitive ever sneaks into params passthrough to the URL.
SENSITIVE_SUBSTRINGS is duplicated in both xai_asr_python/config.py and xai_tts_python/config.py. Other extensions in this repo seem to repeat this pattern too, but consider promoting it into ten_ai_base.utils to avoid drift between vendors.

Test coverage

Solid unit-level mock coverage for both extensions; good split between client-mock tests (state machine) and config-only tests.
Missing: a test that exercises cancel_tts for the TTS extension and asserts the INTERRUPTED reason flows through finalize_request.
Missing: an ASR test with a non-trivial audio_timeline that proves the start_ms translation in emit_asr_result is doing what the code claims.
Missing: a test for the multichannel / channels config path (or a test that asserts non-mono input is rejected, depending on which way you decide to fix issue 1 above).
The ASR/TTS standalone tests pass claim in the PR description matches the unit tests; it would be helpful to also note whether the integration tts_guarder suite was run against the new TTS extension.

Nits

addon.py (ASR) has a stray blank line at the top of on_create_instance. Cosmetic.
README in xai_asr_python lists multichannel / diarize as supported but the code does not expose any way to consume diarization output -- clarify whether it is pass-through to the vendor only.

Summary

The code is in good shape and matches the patterns already established in sibling extensions. The most important things to address before merging are (1) the channels-aware duration calculation in ASR send_audio_frame, (2) the dead text_normalization branch in build_ws_url, (3) the TypeError-as-arity-detection in ReconnectManager, and (4) tightening the validate method naming. The rest are quality-of-life fixes that can land in follow-ups.

…onse (#2146) _first_audio_chunk_ts was being set inside the EVENT_TTS_TTFB_METRIC branch. That event is yielded by the client just before the first EVENT_TTS_RESPONSE, so the recorded "first chunk arrival" was a few microseconds earlier than the actual first chunk receive in the extension. Move the assignment into the first EVENT_TTS_RESPONSE so request_event_interval_ms is literally last_audio_chunk_ts - first_audio_chunk_ts as documented.

BenWeekes · 2026-05-18T04:36:15Z

Follow-up on 2820e94b7: _first_audio_chunk_ts was being set in the EVENT_TTS_TTFB_METRIC branch, but the client yields TTFB just before the first EVENT_TTS_RESPONSE from the same audio.delta message — so the recorded "first chunk arrival" was a few microseconds earlier than the actual first chunk processed by the extension.

Fixed in 31259ac6f — _first_audio_chunk_ts now assigned in the first EVENT_TTS_RESPONSE (None-guarded). request_event_interval_ms is now literally last_audio_chunk_ts − first_audio_chunk_ts, matching the doc note in #2165.

github-actions · 2026-05-18T04:39:24Z

PR Review: xAI ASR + TTS extensions

Nice scope: the new extensions are well-tested (40+ unit tests), the graph wiring covers all four ASR/TTS combinations, and the redact/reconnect/dump scaffolding follows the patterns from neighboring vendor extensions. Below are concrete points worth addressing before merge.

Bugs / correctness

`xai_asr_python/extension.py:send_audio` holds the audio frame locked across `await`s.
```python
buf = frame.lock_buf()
audio_data = bytes(buf) # already a copy
if self.audio_dumper:
await self.audio_dumper.push_bytes(audio_data)
await self.recognition.send_audio_frame(audio_data)
frame.unlock_buf(buf)
```
`bytes(buf)` already produces a detached copy, so the lock can (and should) be released immediately after. Holding it across two awaits adds latency and increases the chance the buffer is starved. Other vendor extensions in this repo unlock before awaiting.
`xai_asr_python/recognition.py:start` leaks the websocket on a bad startup event. If the first frame isn't `transcript.created`, the function raises but never closes `self.websocket`. Same path: if `wait_for(recv)` times out, the websocket is left open. Wrap the post-connect handshake in `try/except` and `await self.websocket.close()` on failure.
`xai_tts_python/xai_tts.py:get` sends `text.done` after every delta.
```python
await self._ws.send(json.dumps({"type": "text.delta", "delta": text}))
await self._ws.send(json.dumps({"type": "text.done"}))
```
For each call to `request_tts`, the input stream is closed regardless of whether `text_input_end` is true. Multi-chunk LLM streaming through this client effectively becomes N independent synthesis requests, which defeats partial-text streaming and may cause stuttering or extra TTFB cost between chunks. Either send `text.done` only on the final chunk, or document that this is the intended one-shot model.
`xai_tts_python/xai_tts.py:get` is not safe under concurrent `cancel`. The websocket is shared mutable state; `cancel()` calls `await self._ws.close()` while `get()` may be mid-`await self._ws.send(...)` or `recv()`. This can surface as confusing exceptions rather than clean cancellation. An `asyncio.Lock` around the active streaming session would make the intent explicit.
`xai_tts_python/xai_tts.py:_build_ws_url` has dead branching for `text_normalization`. `update_params` already coerces it to `bool` via `_coerce_bool`, so the `if isinstance(..., str)` branch never runs. Drop the conditional and just use `self.config.text_normalization`.
Stale doc/comment on `reconnect_manager.py`. The class docstring says "default: 4" for `max_attempts`, but `extension.py` constructs it with `max_attempts=10`. Update the comment or revisit the ceiling — 10 retries with the 4s cap means ~30s of reconnect storm before fatal escalation.
`xai_asr_python/extension.py:on_init` falls back to an empty config on validation failure (`self.config = XAIASRConfig.model_validate_json("{}")`). Subsequent lifecycle calls then operate on a half-initialized extension; this is brittle. Either set `self.config = None` and gate every method on it (already partially done), or refuse to enter `running` state entirely.

API / brittleness

`xai_tts_python/config.py:validate` enforces `api_key.startswith("xai-")`. Hard-coded prefix checks tend to age badly when vendors change key formats; one of the test fixtures (`property_invalid.json` uses `"invalid"`) only fails today because of this prefix, not because of a real auth check. Consider dropping the prefix check and letting the 401 handling fail loud at connect time instead — it's already wired through `_handle_connection_error`.
`XAIASRConfig.validate` shadows a Pydantic concept — easy to confuse with `model_validate`. `validate_config()` (or a `model_validator(mode="after")`) would be clearer and let you drop the explicit `apply_defaults()` + `validate()` two-step.
README lists five voices but nothing validates `voice_id`. Minor, but if xAI rejects an unknown voice the user sees a generic 4xx; a clearer client-side error would help.

Test coverage

The unit-test coverage for happy paths and finalize/error contracts is solid (`test_finalize.py`, `test_reconnect.py`, `test_robustness.py`, `test_error_msg.py`).
Gaps worth closing:
- No test for the websocket leak in (modify playground default settings，fix voice type error #2) — a bad startup event should not leave a connected socket.
- No test for the multi-chunk send pattern (Fix speech sdk #3) — assert behavior when two non-final `request_tts` calls arrive on the same `request_id`.
- `test_subtitle_alignment.py` was changed from `FileNotFoundError` to `pytest.skip`. That's the right call for providers without word-level timing, but it now also masks a missing config for providers that should support it. Consider keying the skip on a per-provider list rather than purely on file presence.

Nits

`agent.py` swallows `tts_audio_start`/`tts_audio_end` data names with an unconditional `return`. This is fine but worth a one-line comment clarifying why main_python intentionally ignores them (so a future reader doesn't assume a bug).
`xai_tts_python/property.json` and the in-tree graph configs default `XAI_API_KEY` with both `${env:XAI_API_KEY|}` (with empty fallback) and `${env:XAI_API_KEY}` (no fallback). Pick one for consistency — the empty-fallback form pairs better with the explicit "API key required" runtime error message.
`xai_tts_python/xai_tts.py` constants `MAX_CONNECT_ATTEMPTS=5` and the cap of `4.0s` are duplicated in `ReconnectManager`. Centralizing or at least documenting why the TTS path uses 5/4s while the ASR path uses 10/4s would help future ops triage.

Overall: solid first cut. Items 1–4 above are the ones I'd block on; the rest are polish.

…gnition (#2146) finalize() guarded self.recognition at entry but then re-accessed it after awaiting send_audio_done and wait_for_done. The on_close callback nulls self.recognition, so a close firing mid-finalize made the later self.recognition.done_event access raise AttributeError. Capture a local reference at the top of finalize and use it throughout so the call stays consistent even if on_close clears the attribute mid-flight.

github-actions · 2026-05-18T05:11:43Z

Review: xAI ASR & TTS extensions (#2146)

Thanks for the substantial contribution — both extensions follow the established _python extension layout, ship with extensive unit tests, and the graph wiring in voice-assistant looks consistent with the existing Deepgram/ElevenLabs variants. Below are the issues I'd want addressed before merge, grouped by severity.

Bugs / correctness

Encoding allow-list inconsistency (xai_asr_python).
config.py validates encoding against {"pcm", "mulaw", "alaw"}, but recognition.py:send_audio_frame computes bytes_per_sample for {"mulaw", "ulaw", "alaw"}. ulaw will never reach that branch (config rejects it first), and any ulaw user will get a confusing "Unsupported encoding" error even though the framing code was clearly written to handle it. Pick one canonical set — I'd suggest {"pcm", "mulaw", "alaw"} everywhere, since xAI's docs use mulaw.
Multi-channel duration math is wrong (xai_asr_python/recognition.py, send_audio_frame).
The duration calc divides only by sample_rate * bytes_per_sample and ignores channels. The config schema accepts channels and multichannel, so a user setting channels=2 will get a duration that is 2x the real wallclock — and audio_timeline then drives every downstream start_ms. Either multiply by channels here or refuse channels != 1 in validate().
text_normalization query encoding is dead/buggy logic (xai_tts.py, _build_ws_url).
XAITTSConfig.update_params already coerces text_normalization to a bool via _coerce_bool, so isinstance(self.config.text_normalization, str) is always False — meaning the conditional always falls through to the bool path. The code reads as if a string fallback was intended, but is unreachable. Either remove the dead branch, or do the coercion in one place (keep it in the config and let _encode_query_value handle the bool→"true"/"false" conversion that already exists below).
_ensure_connection doesn't notice a closed socket (xai_tts.py).
It only checks if not self._ws. If the websocket object exists but state != OPEN (e.g., a server-initiated close between requests), the next await self._ws.send(...) will raise ConnectionClosed. The recovery path does catch this (via the outer except Exception in request_tts → _reconnect_client), but it is a wasted error round-trip. Use the existing is_connected() helper here.
InvalidStatus.response.status_code may not exist on older websockets releases.
xai_tts.py references e.response.status_code. With websockets>=15.0.1 this is fine, but the requirement is unpinned upper-bound — and on older 12.x releases the attribute is just status_code. Worth either pinning more strictly or guarding with getattr.
403 should be fatal in TTS too (xai_tts_python/extension.py, _handle_connection_error).
The ASR path treats {401, 403} as fatal, but TTS only treats 401. xAI returns 403 for disabled/insufficient-quota keys; reconnecting forever in that case is the wrong behavior. Align with ASR.
Open websocket leaked when startup handshake fails (xai_asr_python/recognition.py, start).
If wait_for(self.websocket.recv(), timeout) times out, or first_event["type"] != "transcript.created", the function raises but never closes self.websocket. The caller (start_connection) catches the exception and may go into reconnect, leaving the original socket open. Wrap startup in a try/except that closes self.websocket on any failure.
recorder_map grows unbounded (xai_tts_python/extension.py).
Each new request_id adds a PCMWriter to self.recorder_map, but nothing ever removes it. _finalize_request calls flush() but doesn't del the entry. With dump=True in long-running sessions this will leak file handles and memory. Pop and close/flush after finalize.
.env.example trailing-newline change is gratuitous.
The diff converts the previous MURF_TTS_KEY= (no trailing newline) to one with a newline. Not a real bug, but it makes the diff noisier than needed for an unrelated file.

Design / API

XAIASRConfig.validate() shadows BaseModel.validate.
Pydantic v2 removed the legacy classmethod, but reusing the name is still confusing — readers will assume it is pydantic's hook and try to call it as a classmethod. Rename to validate_config() (this is the convention used in some other extensions).
API-key prefix check is over-eager (xai_tts_python/config.py, validate).
if not self.api_key.startswith("xai-") will break any user with a future-issued key that uses a different prefix, and the only thing it catches is a typo'd key — which the server will reject anyway. Drop it; let the websocket handshake produce the real auth error.
reconnect_manager.py's TypeError-fallback for the error-handler signature is fragile.
The try: error_handler(error, vendor_info) / except TypeError: error_handler(error) block swallows real TypeErrors raised inside send_asr_error (e.g., type bugs in caller code) and silently retries with a different signature, hiding the real exception. Since the only caller passes self.send_asr_error which accepts both args, just call it with two args directly — drop the fallback.
bit_rate / optimize_streaming_latency aren't validated.
Anything int passes XAITTSConfig.validate(). Negative or zero values would silently land in the URL and produce vendor errors. Either constrain ranges or document acceptable values.
Two near-identical _SENSITIVE_SUBSTRINGS blocks.
Both xai_asr_python/config.py and xai_tts_python/config.py redefine the same constant + helper. If ten_ai_base.utils.encrypt is the canonical home, consider also putting the substring list there (or at minimum factor a single helper inside the xai package to avoid drift).

Style / nits

The default XAI_DEFAULT_PARAMS in ASR config silently overwrites unrelated keys via setdefault — fine, but adding a brief comment about the intent (these are the minimum params xAI expects on the URL) would help maintainers.
XAITTSConfig.update_params returns None and mutates in place; pairs awkwardly with pydantic's normal validator hooks. A model_post_init or model_validator(mode="after") would be more idiomatic and run automatically.
TIMEOUT_CODE = 10105 in const.py is defined but never used — drop or wire it up.
Consider documenting the voice_assistant_xai_full test commands in the README's "Testing" section so reviewers can repro the demo path.

Test coverage

Test counts are healthy (40 across both extensions) and cover finalize, reconnect, error mapping, dumps, and metrics — well above the bar for an extension PR. Suggested gaps:

ASR: a test asserting bytes_per_sample math for mulaw and (if you keep multi-channel support) channels=2.
TTS: a test asserting that 403 responses are surfaced as FATAL_ERROR (today only 401 is asserted).
TTS: a test asserting recorder_map is cleaned up after _finalize_request.
Reconnect: a test that validates the state when mark_connection_successful is called from on_open after _handle_reconnect has already returned success=True — currently the success path is implicit.

Security

Sensitive-value redaction on log output looks correct (encrypt is applied to anything matching key|token|secret|signature|password). Worth double-checking that to_str(sensitive_handling=False) is never called from production logging — a quick grep shows it is only used in tests today, but a default of True everywhere would prevent foot-guns.
${env:XAI_API_KEY|} defaulting to empty in property.json is consistent with the other extensions; the config layer surfaces a proper FATAL_ERROR when missing. Good.

Housekeeping

PR title feat: add xai asr and tts extensions matches the project convention.
manifest.json versions: xai_tts_python is 0.1.1 while xai_asr_python is 0.1.0 — make sure the skew is intentional.

Overall direction is good — the fixes above are mostly localized and shouldn't require restructuring. Happy to re-review once these are addressed.

xai_asr_python: - multi-channel duration math: bytes_per_ms now multiplies by channels so channels=2 reports correct wall-clock instead of double - encoding allow-list consistency: framing now uses {pcm, mulaw, alaw} to match config.validate - startup websocket leak: close the underlying socket if the first-message handshake raises before the message handler task is started - drop the TypeError fallback in reconnect_manager.handle_reconnect; the single caller passes the two-arg form, so the fallback only masked real type bugs in error_handler - delete unused TIMEOUT_CODE constant xai_tts_python: - 403 is now classified as FATAL_ERROR alongside 401 (xAI returns 403 for disabled/insufficient-quota keys) - recorder_map entries are popped after _finalize_request to avoid leaking PCMWriter file handles in long-running sessions - drop dead isinstance(text_normalization, str) branch in _build_ws_url now that update_params coerces to bool via _coerce_bool - _ensure_connection uses is_connected() so a non-OPEN socket triggers a reconnect instead of being reused - _connect uses getattr for InvalidStatus.response.status_code so older websockets releases that expose status_code directly are handled - drop the xai- prefix check on api_key; real auth failures surface from the server handshake Tests added: - xai_asr_python.tests.test_send_audio_duration covers pcm vs mulaw and channels=2 duration math - xai_tts_python.tests.test_review_feedback covers 403 -> FATAL_ERROR classification and recorder_map cleanup after finalize

github-actions · 2026-05-18T05:37:09Z

Review of PR #2146 — `feat: add xai asr and tts extensions`

Overall this is a well-structured contribution: bounded reconnect, fatal/non-fatal classification on auth, dump support, redaction of secrets, and solid unit-test coverage on both extensions. A few correctness and consistency notes below.

Potential bugs

ASR may stop reconnecting after a finalize-induced close — extension.py:1411-1419
In on_close, when _close_expected is True (set during finalize), the handler resets the flag and returns without reconnecting. If the xAI WebSocket closes after transcript.done (which is what _close_expected exists to handle), the extension will sit disconnected; the next send_audio will simply return False, and there is no path back to start_connection. If xAI keeps the socket open after audio.done this is fine, but worth confirming against the vendor and/or adding a re-arm path. A targeted test (close after finalize, then send a new audio frame) would lock this down.
Potential double _finalize_request on TTS cancel race — extension.py (xai_tts) cancel_tts + _process_tts_text
cancel_tts calls self.client.cancel() (which closes the WS) and then _finalize_request(INTERRUPTED). If request_tts is concurrently inside _process_tts_text, the closed socket raises in client.get(), yields EVENT_TTS_ERROR, and that branch also calls _finalize_request(ERROR, …). Result: two tts_audio_end for the same request. Consider gating finalize on the current request_id (e.g. if self.current_request_id != request_id: return) or a _finalized flag.
Auth detection via substring match — xai_tts.py:5658-5662
```
if \"401\" in error_message or \"Unauthorized\" in error_message:
    raise XAITTSConnectionException(status_code=401, body=error_message)
```
A vendor message that incidentally contains 401 (e.g. \"request id 401abc...\") would be classified as fatal auth. The InvalidStatus branch above already handles real 401/403 status codes — consider dropping the string fallback, or restricting it to a strict prefix.
apply_defaults is the wrong gate for required fields — config.py (asr)
apply_defaults populates a default language, sample_rate, etc., but validate() only checks api_key/sample_rate/encoding. If a caller passes params: {}, defaults silently fill everything in. That's probably intentional, but the language default of \"en\" ends up appended to the WebSocket URL even when the caller may have wanted vendor-side detection. A short note in the README about default behavior would help.
Word text field naming — extension.py:1351
ASRWord(word=word.get(\"text\", \"\"), …). If xAI ever switches its word object to word/token instead of text, this silently emits empty strings. Worth either logging a warning when text is missing or asserting it during partial parsing.

Consistency / code quality

Two different config patterns in one PR. ASR (XAIASRConfig) keeps everything inside params: dict[str, Any] and reads via params.get(...). TTS (XAITTSConfig) hoists known keys to typed model fields and pops them out of params. Both are reasonable, but living side-by-side in the same PR makes the codebase harder to navigate. Picking one shape (ideally the typed-field one — it's stricter and gives you to_str for free) and converting the other would pay off.
ReconnectManager docstring drifts from real call site. The docstring claims max_attempts: 4 and lists delays 0.5s, 1s, 2s, 4s, but the extension instantiates it with max_attempts=10 (extension.py:1088-1090). With the same backoff curve and the 4s cap, that's ~31s of retries before fatal. Either update the docstring, or move the 10 into a named constant with a comment about why the integration guarder needs that ceiling.
Magic numbers. if self._request_text_length + len(prepared_text) > 15000: (xai_tts extension.py) and the sample-rate allowlist in both configs would read better as module-level constants (e.g. XAI_TTS_MAX_TEXT_LENGTH = 15000, SUPPORTED_SAMPLE_RATES = {...}). Same for the {401, 403} set duplicated across _is_fatal_connection_error and _handle_connection_error.
Dead code in test. XAIAsrExtensionTester.stop_test_if_checking_failed (tests/test_error_check.py) is defined but never called.
config.py:apply_defaults dead else branch. params = self.params if isinstance(self.params, dict) else {} — pydantic guarantees params is a dict via Field(default_factory=dict).
xai_tts_python requirements. websockets>=15.0.1 is fine, but pinning pydantic without any version is unusual for this repo — most other extensions either omit it (provided by base) or pin a minimum. Worth checking whether pydantic belongs in requirements.txt at all here, since ten_ai_base already depends on it.

Performance / observability

TTFB metric only fires on the first audio chunk of the first request after reset_ttfb. That's correct, but the chunk timestamp computed in _get_next_audio_chunk_timestamp_ms uses int(datetime.now().timestamp() * 1000) for the start anchor instead of _first_audio_chunk_ts. The two are taken at almost the same moment so it shouldn't matter, but using _first_audio_chunk_ts directly would make the relationship explicit.
finalize_timeout_ms = 2000 default may be aggressive on slow networks for long utterances. Worth surfacing this in the README so integrators know to bump it.

Security

API keys are redacted in to_json/to_str via utils.encrypt. ✅
WS URL is logged at info level (recognition.py:1637) — query string contains the same scalar params as the connection, but no secret. The bearer token is sent in Authorization header, not the URL. ✅
One small note: if a user puts a sensitive value in an unrecognized params key (e.g. a custom auth_token), the TTS _build_ws_url will pass it through to the query string unredacted, and that URL gets logged. Consider redacting sensitive-looking unknown keys before logging the URL, or never logging the full URL.

Test coverage

40 standalone tests is a strong baseline. A few gaps worth filling:

ASR: a test for the post-finalize close → next-audio path described in (1).
TTS: a test for the cancel/finalize race in (2), even if it's a deterministic asyncio interleaving.
TTS: a test confirming tts_audio_start is emitted exactly once per request when audio is produced.

Nothing here is a blocker for the feature itself — most are robustness or polish. Nice work overall.

BenWeekes · 2026-05-18T05:43:23Z

Second auto-review pass addressed in 78dac2ade.

ASR

recognition.send_audio_frame duration math now multiplies by channels so channels=2 reports correct wall-clock instead of double.
Encoding allow-list aligned to {pcm, mulaw, alaw} across config and framing (dropped unreachable ulaw).
recognition.start now closes the websocket if the first-message handshake raises before the message task is started.
reconnect_manager.handle_reconnect drops the TypeError fallback for error_handler arity; the single caller passes two args, so the fallback only masked real type bugs.
Removed unused TIMEOUT_CODE constant.

TTS

403 now classified as FATAL_ERROR alongside 401 (xAI returns 403 for disabled / insufficient-quota keys).
recorder_map entries popped after _finalize_request to avoid leaking PCMWriter file handles in long-running sessions.
Dead isinstance(text_normalization, str) branch removed; update_params already coerces to bool.
_ensure_connection uses is_connected() so a non-OPEN socket triggers reconnect.
_connect uses getattr for InvalidStatus.response.status_code to handle older websockets releases.
Dropped xai- prefix check on api_key; real auth failures surface from the handshake.

Tests added

xai_asr_python/tests/test_send_audio_duration.py — PCM, mulaw, and stereo PCM duration math.
xai_tts_python/tests/test_review_feedback.py — 403 → FATAL classification, 500 → NON_FATAL, recorder_map cleanup after finalize (incl. flush-failure path).

Results on 78dac2ade

TTS standalone: 27 passed
ASR standalone: 21 passed
TTS guarder: 15 passed, 1 skipped (test_subtitle_alignment opt-in)
ASR guarder: 9 passed, 1 deselected (test_long_duration_stream, repo-default skip)

) task check flagged 9 xAI files for black reformatting. Apply task format output so the formatter check passes on dev/xai.

github-actions · 2026-05-18T06:01:55Z

PR #2146 Review — xAI ASR/TTS Extensions

Thanks for the comprehensive submission — overall structure follows the existing extension conventions well, and test coverage is impressive (40+ tests across the two extensions). Below are findings, ordered by severity.

Bugs / Correctness

1. TTS sample width is hard-coded to 2 bytes, breaking mu-law/A-law duration math (xai_tts_python/extension.py:101-102)

def synthesize_audio_sample_width(self) -> int:
    return 2

XAITTSConfig.validate() accepts codecs mulaw, ulaw, alaw (1 byte/sample) and mp3 (variable bitrate), but _calculate_audio_duration_ms() uses this constant 2-byte width unconditionally. With mu-law/A-law output the reported request_total_audio_duration_ms will be 2× too long, and the TTS audio chunk timestamps in _get_next_audio_chunk_timestamp_ms() will drift. Suggest deriving width from self.config.codec (and skipping duration math entirely for mp3).

2. Codec allow-list likely overstates vendor support (xai_tts_python/config.py:51)

if self.codec not in {"pcm", "mp3", "wav", "mulaw", "ulaw", "alaw"}:

The README and graph properties only document pcm. If xAI's WebSocket API only supports pcm/mp3 today, narrow this set so users get a clear validation error rather than a cryptic vendor failure. (wav over a streaming WebSocket is also unusual — that's typically a container, not a stream codec.)

3. ASR _handle_reconnect floods the bus with error events on each retry (xai_asr_python/extension.py:897-918 + reconnect_manager.py:96-122)

Every failed retry calls error_handler and emits a non-fatal send_asr_error (with vendor info). With max_attempts=10, a transient outage produces 10 separate error events before terminal escalation. Consider only emitting a single "reconnecting" notice and the terminal fatal error, and keeping per-attempt failures at log level. The same pattern in start_connection means the first failure produces an error from start_connection AND another N from _handle_reconnect.

4. Reconnect manager type signature is inconsistent with how it's called (xai_asr_python/reconnect_manager.py:49-58)

error_handler: Optional[Callable[[ModuleError], Awaitable[None]]] = None,

But the call site invokes error_handler(error, vendor_info). Either the type hint should be Callable[..., Awaitable[None]] / a Protocol with both signatures, or always pass vendor_info (possibly None). send_asr_error already accepts both, so the conditional branch isn't necessary.

5. ASR start() rejects any non-transcript.created first event, including legitimate error frames (xai_asr_python/recognition.py:128-141)

if first_event.get("type") != "transcript.created":
    raise RuntimeError(f"Unexpected xAI STT startup event: {first_event.get('type')}")

If xAI returns an error event first (e.g. quota exceeded after auth succeeds), it surfaces as RuntimeError("Unexpected xAI STT startup event: error") rather than the actual error message, and the fatal/non-fatal classification in _extract_connection_status_code won't extract a code from it. Special-case error here and propagate the real reason.

6. TTS get() may break protocol on websocket reuse

The client sends {"type":"text.done"} after each text.delta, then reuses the same WebSocket for the next request_tts. If the xAI server treats text.done as session-ending (closes after the final audio.done), the second request will hang in _ws.recv() until WS_RECV_TIMEOUT (8s). Worth confirming with the vendor whether the socket is multi-request or single-shot, and either reconnecting after every audio.done or documenting/testing that reuse works.

7. Hard-coded MODULE_NAME_ASR in ReconnectManager (xai_asr_python/reconnect_manager.py)

Currently fine since only ASR uses it, but the class is otherwise generic — pass module_name in the constructor so it's reusable / not surprising to a future caller.

Code quality / consistency

8. XAIASRConfig.validate() / XAITTSConfig.validate() shadow Pydantic v2 conventions

Pydantic v2 has model_validate as the conventional class-level entry point; defining an instance validate() works but is easy to misread. Consider renaming to validate_config() or using a model_validator(mode="after") so the check runs automatically on construction.

9. ASR manifest version 0.4.0 for a brand-new extension (xai_asr_python/manifest.json:4)

The TTS manifest in this same PR starts at 0.1.1. Suggest aligning to 0.1.0 unless there's an external reason for the higher version.

10. Duplicated _is_sensitive_key and _SENSITIVE_SUBSTRINGS across ASR and TTS configs

Identical helpers in xai_asr_python/config.py and xai_tts_python/config.py. Worth promoting to a shared helper in ten_ai_base.utils alongside encrypt.

11. test_subtitle_alignment.py weakens existing coverage

-    raise FileNotFoundError(f"Config file not found: {config_file_path}")
+    pytest.skip(f"Config file not found: {config_file_path}. ...")

Turning a missing config into a skip means a typo in a future provider's config filename is silently green. If the goal is "xAI doesn't have word-level timing," a cleaner alternative is to gate via SUPPORTED_TTS_EXTENSIONS rather than file existence.

12. ASR validate() doesn't validate endpointing, channels, or language

Only sample_rate and encoding are checked. Negative/zero endpointing, channel counts above 2, or unknown language codes will silently flow into the URL.

13. TTS bit_rate is sent only for codec == mp3, but it's never validated

Default is 128000. If xAI rejects values outside a supported set, surfacing that at config time is friendlier than at connect time.

14. Hardcoded URL wss://api.x.ai/v1/stt appears in three places (config.py, recognition.py, property.json). The recognition._build_url() fallback ignores XAI_DEFAULT_PARAMS. Centralize the default in one place.

15. XAITTSExtension.cancel_tts() calls _finalize_request(INTERRUPTED) while request_tts() may still be iterating

Worth tracing whether the in-flight _process_tts_text loop will also hit EVENT_TTS_END/EVENT_TTS_ERROR after cancel() closes the socket and emit a second finalize. The _audio_start_sent flag guards tts_audio_start but there's no guard on send_tts_audio_end.

Security

16. _is_sensitive_key matches any field containing "key"

Safe-failing, just noting that future fields like voice_key would be over-redacted.

17. URL is logged at info level (recognition.py:1113)

The current URL is safe (Bearer token in header), but if xAI ever moves the key into a query param like some other vendors do, logging the URL would leak it. Consider scrubbing query strings before logging.

Performance

18. recognition._message_handler() json.loads every frame on the hot path

Not a real concern at typical ASR rates, just calling it out: if xAI sends partials at high frequency, a select-by-prefix could trim parsing.

19. ASR start_connection opens then immediately closes if already connected (xai_asr_python/extension.py:642-643)

if self.is_connected():
    await self.stop_connection()

Forces a close + reconnect on every start_connection call. Probably intentional, but worth a comment explaining why.

Testing

Coverage is solid. Three suggestions:

20. No test asserts duration for mu-law/A-law TTS — would have caught issue #1.

21. No test exercises the reuse-across-requests TTS path — would catch issue #6. The mocked tests bypass the real _ws_url send/recv cycle entirely.

22. test_error_check.py (ASR) connects to the real wss://api.x.ai/v1/stt with api_key=invalid_key. That test will fail in air-gapped CI. Consider mocking websockets.connect like the TTS tests do.

Minor

xai_asr_python/addon.py has an unconventional blank line after the method signature.
_extract_connection_status_code and _is_fatal_connection_error are duplicated between ASR extension.py and TTS xai_tts.py. Move to a shared util.
requirements.txt pins websockets>=15.0.1 but doesn't pin pydantic. If the rest of the repo standardizes pins, match the convention.
The TTS README lists 5 voices but the config validator doesn't constrain voice_id. Either constrain or document why it's free-form.

Summary

The architecture and lifecycle handling (start / connect / finalize / reconnect / stop) are clean and follow patterns from neighboring extensions. The most important things to address before merge are #1 (sample width), #3 (error event flooding), #5 (startup error event handling), and #6 (websocket reuse semantics) — those are correctness/UX bugs visible in production. The rest are quality / consistency polish.

Nice work on the test coverage and the explicit audio.done finalize flow.

…ension (#2146) pylint W0611 flagged LOG_CATEGORY_VENDOR as unused in xai_tts_python/extension.py after recent edits removed the only usage. The ci lint job treats any pylint warning as fatal so the import has to go.

github-actions · 2026-05-18T06:18:18Z

PR Review: feat: add xai asr and tts extensions

Thanks for the substantial contribution — the extensions are well-structured, follow existing patterns from sibling extensions, and the test coverage is generous (40 test functions across both extensions). Below is a focused review with the most useful findings up top.

🔴 Bugs / correctness

1. XAITTSClient._connect_exp_cnt is never reset between reconnect cycles — xai_tts_python/xai_tts.py, _connect_with_backoff
After a fully-failed _connect_with_backoff cycle, _connect_exp_cnt ends at MAX_CONNECT_ATTEMPTS (5) and is only reset to 0 in _connect() on success. The next call to _connect_with_backoff (from _reconnect() or _ensure_connection()) will see while self._connect_exp_cnt < MAX_CONNECT_ATTEMPTS evaluate False, skip the loop entirely, and immediately raise XAITTSConnectionException(status_code=503, body="xAI TTS connection failed") (with last_error=None). The TTS client effectively becomes permanently un-reconnectable after one bad burst. Recommend resetting _connect_exp_cnt = 0 at the entry of _connect_with_backoff (or using a local counter for the per-cycle bound).

2. _connect_with_backoff ignores cancellation — xai_tts_python/xai_tts.py
stop() sets _is_cancelled = True but the backoff loop doesn't check it. A stop() during a 2–4s asyncio.sleep will still keep retrying. Consider checking self._is_cancelled in the loop and bailing early.

3. send_audio holds the audio buffer locked across awaits — xai_asr_python/extension.py, send_audio
The pattern is:
```python
buf = frame.lock_buf()
audio_data = bytes(buf)
if self.audio_dumper:
await self.audio_dumper.push_bytes(audio_data) # await with buffer locked
await self.recognition.send_audio_frame(audio_data) # await with buffer locked
frame.unlock_buf(buf)
```
`audio_data = bytes(buf)` already makes a copy, so the buffer can be unlocked immediately after that line. Holding the lock across `await`s can stall upstream pipeline frames unnecessarily. Most sibling ASR extensions copy-then-unlock first.

4. Double-unlock risk on success path — same function
`buf` is not set to `None` after `frame.unlock_buf(buf)` in the success branch. If anything between the unlock and `return True` raised, the `except` branch would unlock again. Set `buf = None` immediately after `unlock_buf` to make the cleanup branch safe under any future edits.

5. XAITTSConfig.update_params() is destructive and not idempotent — xai_tts_python/config.py
It uses `params.pop(...)` to extract typed fields out of the `params` dict. Calling `update_params()` a second time would no longer find `api_key`, `base_url`, etc. and would silently fall back to the (already-set) instance fields. That's fine today (called once in `on_init`) but it's a footgun. Prefer `params.get(...)` and then explicitly drop the extracted keys, so the call is idempotent and intent is clear.

6. XAIASRConfig.validate() shadows a Pydantic name historically used internally
Pydantic v2 removed the legacy `BaseModel.validate` classmethod, so this technically doesn't clash today, but `model_validate` is the canonical entry point and a new `validate()` instance method on a Pydantic model can confuse readers (and tooling). Consider renaming to `validate_config()` or `check()`. Same applies to `XAITTSConfig.validate()`.

🟡 Code quality / style

7. Magic numbers scattered

`15000` character limit in `xai_tts_python/extension.py` — surface as a module-level constant or config field.
`1024 * 1024 * 10` in `buffer_strategy()` — sibling extensions usually name this.
Reconnect ceilings (`max_attempts=10` in extension, `max_attempts=4` in manager docstring) — the docstring still says "default: 4" while the extension overrides to 10. Worth a sentence noting why ASR uses a higher ceiling than the manager default.

8. XAIASRRecognitionCallback uses @abstractmethod without ABC/ABCMeta
The `@abstractmethod` decorations are decorative only here — `XAIASRRecognitionCallback` is a plain class, so abstractness isn't enforced. Either inherit from `abc.ABC`, switch to `typing.Protocol`, or drop the decorators.

9. Reconnect-manager docstring drift
The class docstring claims "Exponential backoff strategy with maximum delay cap: 0.5s, 1s, 2s, 4s (capped)" but the extension passes `max_attempts=10`, which yields `0.5, 1, 2, 4, 4, 4, 4, 4, 4, 4`. Update the docstring or parameterize the example.

10. on_init error path leaves a partially-initialized ASR config
On exception during config parse/validate, the ASR extension does `self.config = XAIASRConfig.model_validate_json("{}")` and sends a FATAL error, but `start_connection()` will then re-detect the missing api_key and emit a second FATAL error. Not strictly broken (the upstream control plane likely shuts the extension down on the first FATAL), but the duplicate error path is noisy. Consider an explicit `self._init_failed = True` short-circuit.

11. _emit_asr_result + final=True with empty text
`finalize()` calls `_emit_asr_result(payload, final=True, locked=False)` only `if payload and payload.get("text")`, so an empty-text final from the vendor never reaches `_emit_asr_result`. But the guard `if not text and not final: return` inside the function still allows empty-text finals to propagate downstream. If that's intentional (so consumers see a "final empty" signal), worth a comment; if not, the guard should be `if not text: return`.

🟢 Security

API keys redacted via `utils.encrypt(...)` before logging — good. `_is_sensitive_key` substring matching is a reasonable belt-and-suspenders for `params` blobs.
Bearer token sent in `Authorization` header — appropriate (not in URL query params, where it would land in proxy logs).
However, `XAITTSClient._build_ws_url` iterates `self.config.params.items()` (after `update_params()` has popped typed keys) and forwards anything left in the dict into the URL. If a user accidentally puts a sensitive value (e.g., a non-standard `auth_token` key) into `params`, it will land in the URL as a query string. Consider re-applying the `_is_sensitive_key` filter on the leftover params here as well.
No path-traversal validation on `dump_path`. Today this is operator-supplied via property, so it's lower priority, but `os.makedirs(dump_path, ...)` will create arbitrary directories. If extensions are ever exposed to untrusted config sources, this needs hardening.

🟡 Performance

TTS reconnect backoff caps at 4s after 4 attempts (0.5, 1, 2, 4, 4); ASR caps at the same. Once the `_connect_exp_cnt` bug (Update README.md #1) is fixed, consider whether you want jitter to avoid thundering-herd reconnects on multi-replica deployments.
`request_tts` re-opens a fresh websocket per `cancel()`. For high-cancel-rate workloads (frequent barge-ins) this may add latency; an option to keep the connection warm with a `text.cancel` sentinel (if the vendor supports one) would amortize handshake cost. Not blocking — just worth noting in a follow-up.

🟢 Test coverage

Solid breadth: connection failure, reconnect ceiling, vendor errors, dump, finalize, params, metrics, robustness, state machine, review feedback.
`test_reconnect.py` validates the manager-level escalation, but I didn't see a regression test for the TTS `_connect_exp_cnt` exhaustion case (item Update README.md #1). A unit test that runs `_connect_with_backoff` to exhaustion, then calls it again with a now-working underlying `_connect`, would have caught the bug above.
`test_subtitle_alignment.py` change (raise → `pytest.skip`) is a nice quality-of-life improvement, but the skip message could mention which providers the test still applies to (the file already has `SUPPORTED_TTS_EXTENSIONS = {"cartesia_tts"}`).

🟢 Misc / polish

`agent.py` change to silently drop `tts_audio_start`/`tts_audio_end` events (rather than logging "Unhandled data") is reasonable, but worth a one-line comment explaining that those events are consumed elsewhere in the graph and the main agent intentionally ignores them.
`recognition.py` raises `RuntimeError` for non-`transcript.created` first event but does not include the full event in the message — including the raw payload would help debugging.
`xai_asr_python/init.py` and `xai_tts_python/init.py` only export `addon`, matching siblings — good.
Conventional-commits + branch-naming look right per `AGENTS.md`.

Summary

Overall this is a strong, well-tested extension pair. The most important item to address before merge is #1 (TTS reconnect counter exhaustion) — it's a latent bug that will surface in production after the first sustained xAI outage. #2, #3, #5 are also worth fixing before merge. The rest are polish.

Nice work — happy to re-review once the TTS reconnect fix lands.

@AbstractMethod

Critical fix: - xai_tts_python.xai_tts.XAITTSClient._connect_with_backoff used the persistent self._connect_exp_cnt as the loop predicate. After a fully failed cycle the counter sat at MAX_CONNECT_ATTEMPTS and the next call skipped the loop, raising 503 with no real attempt. Switch to a local attempt counter scoped to each cycle so the client recovers after an outage. Add regression test test_connect_backoff_recovers_after_exhausted_cycle. Cancellation: - _connect_with_backoff now checks self._is_cancelled and bails with 499 instead of continuing to retry during shutdown. New test test_connect_backoff_respects_cancellation covers this. Bug fixes: - xai_asr_python.send_audio: copy then unlock immediately instead of holding the audio buffer locked across awaits. set buf=None after unlock so the except branch cannot double-unlock under future edits. - xai_tts_python.config.update_params: switch from params.pop() to params.get() + explicit cleanup so a second call is idempotent. - Rename XAIASRConfig.validate / XAITTSConfig.validate to validate_config to avoid shadowing the pydantic BaseModel name. Update callers and tests. - xai_asr_python.on_init now sets _init_failed when the config fails to parse so start_connection short-circuits instead of emitting a second FATAL error for the same root cause. - _emit_asr_result drops the empty-text-with-final carve-out; finalize already filters on payload.get("text"). Quality: - Surface magic constants: MAX_REQUEST_TEXT_CHARS in xai_tts_python, AUDIO_BUFFER_BYTE_LIMIT and RECONNECT_MAX_ATTEMPTS in xai_asr_python. - XAIASRRecognitionCallback now inherits from abc.ABC so the @AbstractMethod decorators are enforced. - Reconnect manager docstring updated to reflect that callers can override max_attempts (ASR uses 10). - Recognition.start RuntimeError on unexpected first event now includes the raw payload for easier debugging. - TTS backoff adds 80-120% jitter so multi-replica deployments do not reconnect at the same millisecond after a shared outage. Security: - XAITTSClient._build_ws_url now drops leftover params whose name looks sensitive (key/token/secret/signature/password) instead of forwarding them into the URL where they could land in proxy logs.

…equest (#2146) XAITTSClient.cancel() sets both _is_cancelled and _needs_reconnect. The next get() call previously reached _reconnect()/_ensure_connection() with _is_cancelled still True; after the cancellation guard was added to _connect_with_backoff() this caused the backoff loop to short-circuit with a 499 and the request following a barge-in to fail immediately. Move the _is_cancelled = False reset to the top of get() (after the empty-text early return, before _reconnect) so a fresh request always clears the per-request cancel flag before any connection work. Add test_get_after_cancel_reconnects_instead_of_failing to lock in the recovery behavior.

feat(ai-agents): add xai speech extensions

920218b

BenWeekes requested review from halajohn and plutoless as code owners April 24, 2026 08:51

fix(ai-agents): route tts lifecycle events in xai graphs

871f777

fix(ai-agents): stabilize xai asr guarder flows (#2146)

88d8ca4

wangyimin-agora reviewed Apr 30, 2026

View reviewed changes

fix(ai-agents): address xai speech review feedback (#2146)

50f116c

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

merge: bring xai review fixes into dev/ben-graphs (#2146)

e42e01c

fix(ai-agents): remove xai tts text result emission (#2146)

6f6e9f5

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

test(ai-agents): skip optional tts subtitle alignment when config mis…

df3f0b9

…sing (#2146)

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

merge: bring xai tts text-result removal into dev/ben-graphs (#2146)

39a1109

Merge remote-tracking branch 'origin/main' into dev/xai

8495900

# Conflicts: # ai_agents/.env.example

wangyimin-agora reviewed May 18, 2026

View reviewed changes

Ubuntu added 2 commits May 18, 2026 04:19

BenWeekes mentioned this pull request May 18, 2026

docs: progressive disclosure updates from xai extension work #2165

Open

2 tasks

style(ai-agents): apply black formatting to xai speech extensions (#2146

0ff99b3

) task check flagged 9 xAI files for black reformatting. Apply task format output so the formatter check passes on dev/xai.

Ubuntu added 2 commits May 18, 2026 06:38

wangyoucao577 approved these changes May 18, 2026

View reviewed changes

wangyoucao577 merged commit 4e645cd into main May 18, 2026
33 of 34 checks passed

wangyoucao577 deleted the dev/xai branch May 18, 2026 08:29

Conversation

BenWeekes commented Apr 24, 2026

Summary

What Changed

Testing

Notes

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: xAI ASR and TTS Extensions

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: xAI ASR and TTS Extensions

Bugs / Correctness

API Key Validation Leaks Test Logic into Production

Missing Manifest Properties

Fatal-Error String Matching is Fragile

Minor / Style

Summary

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR + TTS Extensions

Bugs

Code Quality

Completeness / Consistency

Minor Nits

Summary

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

Issues

Bug: TypeError swallowed in ReconnectManager.handle_reconnect

Bug: TIMEOUT_CODE = 10105 in const.py is defined but never used

Design smell: _ensure_client accesses private internals

Fragile API key prefix validation in TTS config

Test config: property_en_hotwords.json is a copy of property_en.json

Minor Issues

What's Good

Summary

Uh oh!

github-actions Bot commented May 14, 2026

Code Review — feat: add xai asr and tts extensions

Bugs

Code quality / minor

Security

Performance

Test coverage

Uh oh!

wangyimin-agora May 18, 2026

Choose a reason for hiding this comment

Uh oh!

BenWeekes May 18, 2026

Choose a reason for hiding this comment

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

wangyimin-agora Apr 30, 2026 •

edited

Loading

Bug: `TypeError` swallowed in `ReconnectManager.handle_reconnect`

Bug: `TIMEOUT_CODE = 10105` in `const.py` is defined but never used

Design smell: `_ensure_client` accesses private internals

Test config: `property_en_hotwords.json` is a copy of `property_en.json`

Code Review — `feat: add xai asr and tts extensions`

Review of PR #2146 — `feat: add xai asr and tts extensions`