Skip to content

feat: add xai asr and tts extensions#2146

Merged
wangyoucao577 merged 16 commits into
mainfrom
dev/xai
May 18, 2026
Merged

feat: add xai asr and tts extensions#2146
wangyoucao577 merged 16 commits into
mainfrom
dev/xai

Conversation

@BenWeekes
Copy link
Copy Markdown
Contributor

Summary

  • add xai_asr_python and xai_tts_python TEN extensions
  • add xAI graph variants to the voice-assistant example
  • add xAI env/config wiring and usage notes for local and demo testing
  • add standalone test coverage for both new extensions

What Changed

  • added ai_agents/agents/ten_packages/extension/xai_asr_python
    • WebSocket STT client with explicit audio.done finalize flow
    • bounded reconnect handling with fatal escalation for non-recoverable auth/config failures
    • result mapping for interim, locked, and final transcript states
    • dump support, metrics, and vendor error classification
  • added ai_agents/agents/ten_packages/extension/xai_tts_python
    • WebSocket TTS client using PCM output
    • bounded reconnect / backoff with auth failures treated as fatal
    • cancellation and request-finalization behavior aligned with AsyncTTS2BaseExtension
    • config redaction, lifecycle logging, and invalid punctuation-only request handling
  • updated ai_agents/agents/examples/voice-assistant/tenapp/manifest.json
    • add xAI extension dependencies
  • updated ai_agents/agents/examples/voice-assistant/tenapp/property.json
    • add voice_assistant_xai_asr
    • add voice_assistant_xai_tts
    • add voice_assistant_xai_full
  • updated ai_agents/agents/examples/voice-assistant/README.md
    • document graph selection and xAI usage
  • updated ai_agents/.env.example
    • add XAI_API_KEY

Testing

  • standalone TTS tests: 23 passed
  • standalone ASR tests: 17 passed
  • validated the demo graph wiring on voice_assistant_xai_full

Notes

  • companion repo docs updates were pushed to docs/progressive-disclosure in TEN Framework
  • companion portal guide updates were pushed to improve/tts-extension-guide in TEN Portal

@github-actions
Copy link
Copy Markdown

test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

placeholder being replaced

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

edit test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.


Bugs / Correctness

1. XAIASRConfig.to_json() returns Python repr, not JSON

In xai_asr_python/config.py, to_json() ends with return str(config_dict). str() on a dict produces Python __repr__ output (single-quoted keys, True/False instead of true/false) which is not valid JSON. This should be return json.dumps(config_dict). If this string is ever parsed downstream it will fail.

2. _message_handler re-fires on_open() on duplicate transcript.created

In recognition.py, when a transcript.created event arrives inside the running _message_handler loop, callback.on_open() is called again. That re-triggers the buffer-flush and timeline reset in the extension. If the server re-sends this event mid-session it will corrupt in-flight audio state. Guard this branch so on_open() is only forwarded once per connection.

3. Redundant copy.deepcopy after json.loads in _flush_buffered_audio_frames

json.loads already returns a fresh object tree, so wrapping it in copy.deepcopy is unnecessary overhead on every buffered frame.

4. _ensure_connection does not verify the socket is still open (TTS)

self._ws could reference a closed connection if stop() or cancel() raised during the finally block. Consider also checking self._ws.state == State.OPEN, mirroring how XAIASRRecognition.is_connected() works.


API Key Validation Leaks Test Logic into Production

5. xai_tts_python/config.py validate() accepts "test" prefix

The validation accepts api_key values starting with "test" to bypass format checking in tests, but the error message says the key must start with "xai-". Remove the "test" branch and update tests to use a properly formatted dummy key like "xai-test-key" (which already passes the xai- check).


Missing Manifest Properties

6. xai_asr_python/manifest.json omits dump, dump_path, and finalize_timeout_ms

These three properties are present in property.json and documented in the README but absent from api.property.properties in the manifest. Other extensions in this repo declare all settable properties in the manifest so TMAN can discover and validate them. Add: dump (bool), dump_path (string), finalize_timeout_ms (int32).


Fatal-Error String Matching is Fragile

7. ASR _is_fatal_connection_error and on_error

Both use substring scans on free-form error strings, including the literal string "api key" which could match unrelated messages. The TTS extension handles this correctly by catching InvalidStatus and threading a numeric status_code through XAITTSConnectionException. The ASR path should do the same - surface the status code through the recognition callback rather than scanning the message text.


Minor / Style

8. Missing license header in xai_asr_python/__init__.py - the TTS counterpart has the Apache-2.0 header; the ASR one does not.

9. Missing newlines at end of files: xai_asr_python/requirements.txt and xai_asr_python/tests/bin/start both show no trailing newline (git diff reports no newline at end of file) in the diff.

10. validate() shadows Pydantic's built-in in both configs. Both XAIASRConfig and XAITTSConfig define a validate() method that shadows the Pydantic class-level validator name. Consider renaming to validate_config() or check_required_fields().

11. Duplicated conftest.py - both test directories contain byte-for-byte identical conftest.py. Consider extracting to a shared fixture module.

12. Stale test log string in xai_tts_python/tests/test_basic.py: text="Hello, this is a test of the Deepgram TTS extension." should reference xAI.


Summary

Severity Count
Bug / Correctness 4
Validation / Security 2
Missing config surface 1
Minor / Style 6

The top priorities before merge are items 1 (invalid JSON from to_json()), 5 (test bypass in production validate()), and 6 (missing manifest properties). The rest are good-to-fix but not blockers.

@github-actions
Copy link
Copy Markdown

PR Review: xAI ASR and TTS Extensions - test post

@github-actions
Copy link
Copy Markdown

PR Review: xAI ASR and TTS Extensions. Issues to fix: (1) on_open called twice in recognition.py - both start() and _message_handler call the callback on transcript.created, causing duplicate metrics and buffer flushes if server re-sends; (2) buf potentially unbound in xai_asr_python send_audio - if lock_buf raises, buf is undefined in the except handler, initialize buf=None before the try block; (3) property_en_hotwords.json is identical to property_en.json with no hotword impl - remove or implement. Issues to address: (4) XAIASRConfig.validate() shadows Pydantic class method, rename to validate_config(); (5) TTS api key startswith('test') check is too broad; (6) turn_status values 1 and 2 are undocumented magic numbers; (7) isinstance str check for text_normalization is dead code since the field is typed bool; (8) missing trailing newlines in requirements.txt and tests/bin/start; (9) test_basic.py copy-paste error says Deepgram not xAI. Positives: great reconnect logic with backoff and fatal escalation, API key redaction, _contains_spoken_content guard, comprehensive test coverage (timeout, flush, punctuation, reconnect-after-error, auth failure), clean agent.py update for the new event types. Generated by Claude Code

@github-actions
Copy link
Copy Markdown

Code Review: xAI ASR + TTS Extensions

Great overall structure. The two extensions follow existing repo conventions closely and come with thorough test coverage (17 ASR + 23 TTS tests). The reconnection logic, error classification, and sensitive-data redaction are all solid. A few issues worth addressing before merge.

Bugs

1. Potential UnboundLocalError in xai_asr_python/extension.py send_audio

If frame.lock_buf() itself raises (e.g. the frame is already invalid), buf is never bound, and the except block's frame.unlock_buf(buf) will throw a second exception masking the original. Prefer the standard try/finally pattern:

buf = frame.lock_buf()
try:
    audio_data = bytes(buf)
    ...
    return True
except Exception as e:
    self.ten_env.log_error(f"Error sending audio: {e}")
    return False
finally:
    frame.unlock_buf(buf)

Code Quality

2. _connection_successful flag in ReconnectManager is dead code

self._connection_successful is assigned in __init__ and handle_reconnect but never read. Remove it.

3. try/except TypeError for dynamic dispatch is fragile (reconnect_manager.py)

handle_reconnect catches TypeError to detect whether the error handler accepts 1 or 2 arguments. This silently swallows unrelated TypeErrors raised inside the handler. Since the only call-site is XAIASRExtension._handle_reconnect, use a consistent 2-arg signature throughout.

4. Empty on_deinit override in TTS extension (extension.py)

The override only calls super() with no additional logic. Delete it.

5. BUILD.gn in xai_tts_python/manifest.json package includes

BUILD.gn is auto-generated and in .gitignore. AGENTS.md says not to include auto-generated files. xai_asr_python correctly omits it. Remove from xai_tts_python's manifest.

Completeness / Consistency

6. dump, dump_path, and finalize_timeout_ms missing from ASR manifest property schema

xai_asr_python/manifest.json only declares params under api.property.properties. The fields dump, dump_path, and finalize_timeout_ms are used by XAIASRConfig but are absent from the schema, so they are invisible to TMAN Designer and TEN's property system. Compare with xai_tts_python/manifest.json which correctly declares dump and dump_path.

7. property_en_hotwords.json is byte-for-byte identical to property_en.json

Both files in xai_asr_python/tests/configs/ have the same content. The name implies hotwords support which xAI STT does not expose here. Either add a meaningful difference or remove the file.

Minor Nits

8. Misleading validation error message for TTS API key (config.py)

validate() allows keys starting with xai- OR test, but the error message only mentions xai-. Update the message to reflect both accepted prefixes.

9. Copy-paste error in TTS test (test_basic.py)

The basic test sends "Hello, this is a test of the Deepgram TTS extension." — should reference xAI, not Deepgram.

Summary

Severity Item
Bug lock_buf exception path can produce UnboundLocalError in send_audio
Code quality Dead _connection_successful flag in ReconnectManager
Code quality try/except TypeError for dispatch in handle_reconnect
Code quality Pointless on_deinit override in TTS extension
Code quality BUILD.gn in TTS manifest package includes
Completeness dump/dump_path/finalize_timeout_ms missing from ASR manifest schema
Completeness property_en_hotwords.json identical to property_en.json
Nit Misleading API key error message
Nit "Deepgram" copy-paste in xAI TTS test

def _calculate_request_event_interval_ms(self) -> int:
if self.sent_ts is None:
return 0
return int((datetime.now() - self.sent_ts).total_seconds() * 1000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interval time is formulate is "current time - first audio receive time", not first request send time

code=ModuleErrorCode.NON_FATAL_ERROR,
vendor_info=ModuleErrorVendorInfo(vendor=self.vendor()),
)
await self._finalize_request(TTSAudioEndReason.ERROR, error=error)
Copy link
Copy Markdown

@wangyimin-agora wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_finalize_request only for handling last request of the turn.
if t.text_input_end:

if start_ms <= 0:
start_ms = int(datetime.now().timestamp() * 1000)

transcript_result = TTSTextResult(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTSTextResult is not necessary. Actually, we only use it when we can get a word-level timestamp. if this tts vendor cannot provide, we can remove it

message=e.body,
),
)
await self._finalize_request(TTSAudioEndReason.ERROR, error=error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_finalize_request only for handling last request of the turn.
if t.text_input_end:

@github-actions
Copy link
Copy Markdown

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

This PR adds two well-structured TEN extensions — xai_asr_python (WebSocket STT) and xai_tts_python (WebSocket TTS) — along with three new voice-assistant graph variants and comprehensive test coverage. Overall this is high-quality work following the project's existing patterns. A few issues worth addressing before merge.


Issues

Bug: TypeError swallowed in ReconnectManager.handle_reconnect

reconnect_manager.py:

try:
    await error_handler(error, vendor_info)
except TypeError:
    await error_handler(error)

Using exception handling to detect function signatures is fragile — if error_handler raises a genuine TypeError internally, the manager silently retries without vendor_info. The callers (XAIASRExtension.send_asr_error) have a stable signature, so this safety valve isn't needed. Call await error_handler(error, vendor_info) directly (or await error_handler(error) when vendor_name is None).

Bug: TIMEOUT_CODE = 10105 in const.py is defined but never used

xai_asr_python/const.py defines TIMEOUT_CODE = 10105 but nothing in the ASR extension uses it. Should be wired up or removed.

Design smell: _ensure_client accesses private internals

xai_tts_python/extension.py:

ws = getattr(self.client, "_ws", None)
if ws is not None and getattr(ws, "state", None) == State.OPEN:
    return

This reaches into XAITTSClient's private state. XAITTSClient should expose an is_connected() -> bool method (as XAIASRRecognition already does) and _ensure_client should call that instead.

Fragile API key prefix validation in TTS config

xai_tts_python/config.py:

if not self.api_key.startswith("xai-"):
    raise ValueError("API key must start with 'xai-'")

This will break if xAI changes their key format, and there is no equivalent check in the ASR extension. Either apply it consistently or drop it — the server will reject invalid keys with a 401 anyway.

Test config: property_en_hotwords.json is a copy of property_en.json

xai_asr_python/tests/configs/property_en_hotwords.json and property_en.json are byte-for-byte identical. If hotwords aren't implemented yet, remove the duplicate rather than leaving a misleading placeholder.


Minor Issues

  • Docstring/code mismatch: ReconnectManager docstring says max_attempts=4, but extension.py instantiates it with max_attempts=10. Update the docstring.
  • Unnecessary override: xai_tts_python/extension.py's on_deinit does nothing except call super() — remove it.
  • Reconnect cost on cancel: XAITTSClient.cancel() closes the WebSocket and sets _needs_reconnect = True, causing a full TLS + WS handshake on the next utterance after every barge-in. Consider draining the socket instead of closing it, or pre-warming the connection after cancel. Not a blocker, but a known latency trade-off worth documenting.
  • State reset scattered across request_tts: The ~6 fields reset in multiple code paths (_request_text, _request_metadata, total_audio_bytes, etc.) would benefit from a private _reset_request_state() helper to avoid future divergence.
  • Duplicate api_key check: start_connection in xai_asr_python/extension.py re-checks for an empty API key even though config.validate() already raised on it during on_init. The guard is harmless but adds noise.

What's Good

  • Reconnection logic is well-designed: bounded retries with exponential backoff, auth failures correctly escalated as fatal, successful connections reset the counter.
  • Both extensions handle the audio.done/finalize flow correctly, including timeout handling and the _close_expected guard against spurious reconnects.
  • Config redaction with encrypt() before logging is consistently applied in both extensions.
  • Excellent test coverage — 40 tests covering: 401 errors, timeout, reconnect-after-error, sequential requests, flush, dump verification, TTFB metrics, config redaction, and language normalization.
  • The agent.py fix silently dropping tts_audio_start/tts_audio_end is a clean way to handle the new data events without polluting the log with warnings.
  • Property graphs are correct and follow the established pattern.

Summary

The implementation is solid and well-tested. The TypeError-swallowing pattern in ReconnectManager and the private-field access in _ensure_client are the most important items to address. The rest are cleanup/polish.

# Conflicts:
#	ai_agents/.env.example
@github-actions
Copy link
Copy Markdown

Code Review — feat: add xai asr and tts extensions

Thanks for the thorough work — the extensions are well-structured, follow the existing *_asr_python / *_tts_python conventions, and ship with substantial test coverage (17 ASR + 23 TTS). Most of the heavy lifting (config validation, redaction, dump support, finalize flow, bounded reconnect with auth-failure escalation) is in good shape. A few concrete issues below, ordered roughly by severity.

Bugs

1. XAIASRExtension._handle_reconnect() passes kwargs that ReconnectManager.handle_reconnect() does not accept — this will raise TypeError the first time a reconnect happens.

xai_asr_python/extension.py calls:

success = await self.reconnect_manager.handle_reconnect(
    connection_func=self._connect_recognition,
    error_handler=self.send_asr_error,
    vendor_name=self.vendor(),       # not a parameter
    vendor_code=\"connect_failed\",    # not a parameter
)

But ReconnectManager.handle_reconnect is defined as:

async def handle_reconnect(self, connection_func, error_handler=None) -> bool:

The current test_reconnect.py invokes the manager directly with positional args, so the unit tests don't catch this. Either drop the extra kwargs from the call site or extend the manager's signature (and forward vendor_name/vendor_code into a ModuleErrorVendorInfo so vendor info actually surfaces through the error path — currently error_handler is invoked with only a ModuleError, which means send_asr_error would receive no vendor info on a reconnect-failure escalation).

2. XAIASRExtension.send_audio — possible UnboundLocalError — if frame.lock_buf() itself raises, buf is undefined when the except branch tries frame.unlock_buf(buf). Initialize buf = None (and skip the unlock if still None), or move lock_buf() out of the try so the unlock-on-error path is reachable only when buf was successfully obtained.

3. XAITTSClient._connect() has no open_timeout — an unreachable host can hang the websocket connect indefinitely. Compare with xai_asr_python/recognition.py which does pass open_timeout=timeout. Recommend adding open_timeout=10 (or surfacing as config).

Code quality / minor

4. XAITTSConfig.validate() requires api_key.startswith(\"xai-\") — this couples the extension to the current xAI key prefix. If xAI ever issues a key with a different prefix (or you ever want to test against a proxy/gateway), this breaks. Consider relaxing to a non-empty check, or at least making it a warn-then-proceed.

5. XAIASRConfig.validate() shadows pydantic's BaseModel.validate — works under pydantic v2 (where validate is removed), but the name collision is confusing. Consider renaming to validate_params() for clarity (the TTS config has the same naming).

6. XAITTSConfig.update_params + XAITTSClient._build_ws_url double-handles text_normalizationupdate_params already coerces to bool, so the isinstance(self.config.text_normalization, str) branch in _build_ws_url is dead. Drop one of the two coercions.

7. XAIASRRecognition._message_handler — when start() consumes the initial transcript.created event, then a second transcript.created from the server during the handler loop will fire on_open() again (re-emitting connect-delay metrics, re-setting audio_timeline). Likely benign in practice, but an early-return after the first on_open would make it deterministic.

8. ASR fatal-error classification is limited to 401/403_is_fatal_connection_error doesn't consider non-recoverable cases like 400/404 (bad URL/params), and conversely 429 (rate-limited) is recoverable but currently treated like any other error. Worth widening the fatal set and adding 429 → non-fatal-with-backoff.

9. tests/configs/property_en_hotwords.json is byte-identical to property_en.json — looks like a placeholder. Either populate it with hotword-specific config or remove it.

10. Voice-assistant property.json graph duplication — three new graphs (*_xai_asr, *_xai_tts, *_xai_full) duplicate ~600 lines of node + connection wiring. Not a blocker (matches existing convention), but worth flagging for a future refactor of the example.

11. Inconsistent env-var defaulting in graphs — extension property.json uses ${env:XAI_API_KEY|} (empty default), example graphs use ${env:XAI_API_KEY} (no default). The latter will surface a clearer error if unset; pick one and apply consistently.

12. manifest.json version 0.4.0 — unusual starting version for a brand-new extension; assume this is a copy-paste from an existing template — fine to leave but worth noting.

Security

  • API key handled via ${env:...} and redacted via utils.encrypt in to_str/to_json
  • recognition.py and xai_tts.py log full WS URL including query params (Connecting to xAI STT: {url}). Query params don't carry the API key (which is in the Authorization header), so no leak — but consider redacting URL logs anyway in case future params include sensitive data.
  • .env.example correctly leaves XAI_API_KEY= empty ✅.

Performance

  • TTS audio chunk timestamps are computed from accumulated bytes — straightforward and correct.
  • Both clients use a single websocket per request lifetime with bounded backoff (4 attempts ASR / 5 attempts TTS). Reasonable.
  • ASR buffered_frames is drained on on_open — good. No flow-control on the audio send path; if upstream produces faster than xAI accepts, await self._ws.send(...) will provide natural backpressure ✅.

Test coverage

Good unit-level coverage for both extensions:

  • ASR: config validation, redaction, language normalization, finalize-with-timeout, dump path, reconnect escalation, metric emission, vendor error, ASR result shape.
  • TTS: dump, basic audio, flush, error paths (empty params, invalid key), TTFB metric, params, robustness (empty/whitespace text), state machine.

Gaps to consider:

Once the three bugs above are addressed, this is in good shape to merge. Nice work overall.

return int(duration_sec * 1000)

def _calculate_request_event_interval_ms(self) -> int:
return self._request_event_interval_ms
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interval time is the last audio receive time - first audio receive time. not ttfb.
It represents the audio length of the current request id

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2820e94. _calculate_request_event_interval_ms now returns _last_audio_chunk_ts - _first_audio_chunk_ts. The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.

request_event_interval_ms now measures last audio chunk arrival minus
first audio chunk arrival, representing the wall-clock audio receive
window for the request. Previously the field stored TTFB which was the
wrong semantic per maintainer feedback.

The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.
@github-actions
Copy link
Copy Markdown

Review of xAI ASR/TTS extensions

Thanks for the thorough work — the test coverage is impressive (40 tests) and the bounded-reconnect / fatal escalation pattern is consistent. Below are issues grouped by severity.

Bugs

  1. UnboundLocalError in xai_asr_python/extension.py:send_audio — if frame.lock_buf() raises, buf is referenced in the except block before assignment. Initialize buf = None before the try, or move the lock outside.

  2. Wrong audio duration for non-PCM encodings in recognition.py:send_audio_frame:

    duration_ms = int(len(audio_data) / (sample_rate / 1000 * 2))

    Hard-codes 2 bytes/sample, but encoding=mulaw|alaw (advertised in config) is 1 byte/sample. Audio timeline math drifts for those encodings.

  3. text_normalization boolean coercion in xai_tts_python/config.py:update_params:

    self.text_normalization = bool(params.pop(\"text_normalization\", self.text_normalization))

    bool(\"false\")True. The xai_tts.py _build_ws_url then attempts a string-vs-bool branch, but at this point the string is already gone. Parse explicitly.

  4. stop_connection sets _stop_requested = False at the end, so a recognition.close() callback firing on_close() immediately after stop will trigger an unintended reconnect. The reset belongs in start_connection, not stop_connection.

  5. Reconnect-manager success contract is ambiguous: handle_reconnect returns True when the function executed, not when the connection succeeded — but the docstring promises mark_connection_successful is what determines real success. The caller in extension._handle_reconnect exits the loop on True, which means a transient failure where _connect_recognition doesn't raise (e.g., the websocket opens and immediately closes) will silently stop retrying without ever marking success.

  6. reconnect_manager.py hard-codes MODULE_NAME_ASR even though it's structured as a generic helper. If reused (the docstring implies generality), TTS errors will be reported under module asr.

  7. error_handler duck-typing in ReconnectManager.handle_reconnect (try ... except TypeError) — pick one signature and stick with it. Catching TypeError to detect arity differences is fragile and will mask real TypeErrors thrown inside the handler.

  8. Race in XAIASRRecognition.send_audio_frame — checks is_connected() then await self.ready_event.wait(). If the socket dies between the two, the await may hang until the message handler finally fires on_close. Consider awaiting with a small timeout or removing the second wait once is_connected returns true.

  9. request_tts over-length path (xai_tts_python/extension.py): _request_text_length is incremented before the 15000-char check, so the failed request leaves stale state until the generic exception handler resets it via _finalize_request. The check should run on the prospective sum before mutating state.

  10. XAITTSExtension._ensure_client reaches into client._ws to inspect socket state. Add an is_connected() method on the client; the current coupling will silently break if the attribute is renamed.

Code quality / consistency

  • Config shape diverges between the two extensions: XAIASRConfig keeps everything under params; XAITTSConfig pops most fields out into typed attributes. Pick one approach so future maintainers don't have to re-learn the layout per extension.
  • XAIASRConfig.validate() and XAITTSConfig.validate() shadow Pydantic's reserved-ish names. Rename to validate_config() or use @model_validator to avoid v2 footguns and confusion.
  • api_key.startswith(\"xai-\") check in TTS config is brittle — vendor key prefixes change. Either drop or limit to a soft warning.
  • xai_asr_python/manifest.json includes \"docs/**\", xai_tts_python/manifest.json includes \"BUILD.gn\" and \"**.tent\" — these are copy-paste artifacts, neither directory/file exists in the package.
  • tests/mock.py in xai_asr_python is empty (license header only) — remove.
  • tests/configs/property_en_hotwords.json is byte-identical to property_en.json — either parameterize or remove.
  • finalize swallows asyncio.CancelledError instead of re-raising — this typically breaks structured cancellation upstream; log + re-raise is the safer pattern.
  • Magic constants: byte_limit=1024 * 1024 * 10, MAX_CONNECT_ATTEMPTS = 5, WS_RECV_TIMEOUT = 8.08s between audio chunks during normal speech is fine, but document that this is a hard ceiling; long sentences with high-latency networks could trip it.
  • xai_tts.py:_build_ws_url duplicates a list of "known" keys that already exist as fields in XAITTSConfig. Drive it from the model's field set to avoid drift.
  • tts_audio_start/tts_audio_end swallowed in agent.py — a silent return for these names is fine, but a one-line comment explaining why (these are vendor-emitted lifecycle data the main loop doesn't model) would help future readers.
  • Test files mix asyncio.run(_run()) with pytest-asyncio-style elsewhere in the repo. Not blocking, but inconsistent.

Security / robustness

  • Redaction is api_key-only. If users put auth tokens or signed URL components in params.*, they'll appear in logs via to_json/to_str. Consider a deny-list of additional sensitive substrings (token, secret, signature).
  • InvalidStatus import path (websockets.exceptions) — websockets>=15 reorganized exceptions; pin or guard against import errors. Same for websockets.protocol.State and websockets.asyncio.client.ClientConnection — these are 13+ APIs.
  • test_invalid_api_key_error mocks websockets.connect with Exception(\"401 Unauthorized - Invalid API key\") and relies on substring matching in _connect. Real 401 handling depends on InvalidStatus.response.status_code; this test passes despite that path not being exercised. Consider mocking with an actual InvalidStatus-shaped object.

Performance

  • Per-frame int(...) and datetime.now() calls in request_tts / _process_tts_text are fine, but _get_next_audio_chunk_timestamp_ms recomputes _calculate_audio_duration_ms on every chunk (O(n) over chunk count for log/dispatch). Cache the running duration.
  • audio_dumper.push_bytes is awaited inside send_audio on the hot path. If the dump filesystem is slow, ASR ingest backpressures. Fire-and-forget into a queue if dump perf matters.

Test coverage notes

  • Strong on lifecycle, finalize, reconnect ceiling, redaction, and error paths.
  • Missing:
    • Mid-stream WebSocket disconnect during active synthesis (only error events tested, not transport drops).
    • mulaw/alaw encodings (only pcm is tested) — relevant given the duration-calculation bug above.
    • cancel_tts mid-stream (tests cover tts_flush but not the cancellation path that calls client.cancel()).
    • Multiple concurrent request_tts interleavings.

Smaller nits

  • xai_asr_python/extension.py:204: self.config = XAIASRConfig.model_validate_json(\"{}\") after a failed validate creates an invalid object; downstream code should not assume self.config.params[\"api_key\"] exists.
  • READMEs reference cd xai_asr_python / cd xai_tts_python but the actual paths are ai_agents/agents/ten_packages/extension/....
  • extension.py (TTS) line if not self.api_key.startswith(\"xai-\"): — the test config files use \"xai-test-key\" (good), but property_invalid.json uses \"invalid\" and property_miss_required.json uses \"\" — both fail the empty check first, so the prefix check is never exercised in tests.

Overall: the implementation is solid and the test count is reassuring; the most important fixes before merge are the non-PCM duration math (#2), the reconnect success contract (#5/#6), and the buf UnboundLocalError (#1). The rest can land as follow-ups.

Ubuntu added 2 commits May 18, 2026 04:19
- asr send_audio: initialize buf before try to avoid UnboundLocalError if
  frame.lock_buf() raises
- asr send_audio_frame: derive bytes_per_sample from encoding so mulaw/alaw
  duration math no longer hardcodes 2 bytes/sample
- tts config: parse text_normalization string explicitly so "false" coerces
  to False rather than True via bool()
- asr stop_connection: drop stop/close-expected resets at function exit; they
  belong in start_connection and were racing with the on_close callback
- tts client: add is_connected() so the extension does not poke at private
  _ws state to decide if reconnect is needed
- tts request_tts: check prospective length before incrementing the request
  text length counter so over-length requests do not leave stale state
- drop empty test mock module and duplicate hotwords test config
- drop package manifest entries for paths that do not exist (docs/, BUILD.gn,
  **.tent) in xai_asr_python and xai_tts_python
- extend sensitive-handling redaction beyond api_key to cover any params
  field whose name contains key/token/secret/signature/password
@BenWeekes
Copy link
Copy Markdown
Contributor Author

Latest commits address all open threads:

  • 2820e94b7request_event_interval_ms now last_audio_chunk_ts − first_audio_chunk_ts per the extension.py:160 comment
  • 5ccb387e2 — six bugs flagged by claude-review:
    • xai_asr_python.send_audio UnboundLocalError on buf
    • recognition.send_audio_frame duration math hardcoded 2 bytes/sample (mulaw/alaw broken)
    • xai_tts_python.config.update_params bool("false") coerced to True
    • xai_asr_python.stop_connection reset _stop_requested at exit, racing with on_close
    • _ensure_client reached into client._ws — moved to XAITTSClient.is_connected()
    • request_tts over-length check ran after mutating _request_text_length
  • 1523e7268 — manifest cleanups (dropped non-existent docs/**, BUILD.gn, **.tent includes), removed empty test mock and duplicate hotwords config, extended sensitive-key redaction beyond api_key (token / secret / signature / password)

Latest main merged in (849590024). Standalone counts still green (ASR 18, TTS 23; guarder 9 + 1 deselected / 16). Ready for another look.

@github-actions
Copy link
Copy Markdown

Code Review: feat: add xai asr and tts extensions

Overall, this is a well-structured, well-tested addition that follows the established patterns of other ASR/TTS extensions in this repo. Configs are validated, secrets are redacted before logging, reconnect/backoff is bounded, and there is solid mock-based test coverage on both sides. Below are the issues I found, grouped roughly by severity.

Bugs and correctness

  1. ASR audio duration ignores channels -- in recognition.py send_audio_frame, bytes_per_sample is 1 or 2 and duration_ms is computed without dividing by channel count. The config exposes multichannel and channels but the math does not factor channels in. For stereo input, every frame reported duration will be 2x actual, drifting audio_timeline and downstream alignment. Either factor channels into the divisor or document/enforce mono only.

  2. XAITTSConfig.text_normalization is typed bool but build_ws_url checks for str -- pydantic coerces the field to bool, so the isinstance str branch is dead code. Just assign the bool directly and let encode_query_value do the conversion.

  3. validate() shadows pydantic API -- both XAIASRConfig.validate and XAITTSConfig.validate define an instance method named validate. BaseModel.validate is a classmethod (deprecated in v1, gone in v2). Renaming to check or validate_config or wiring into a model_validator after-mode avoids surprises.

  4. reconnect_manager.handle_reconnect swallows real TypeErrors -- the try/except TypeError pattern detects callable arity by catching TypeError, but if error_handler is a 2-arg callable that raises TypeError for unrelated reasons (string formatting, dict access), it will silently retry with one argument and either succeed (hiding the bug) or raise a confusing second error. Detect arity explicitly with inspect.signature once, or commit to the 2-arg shape since you control all callers.

  5. TTS first-byte timeout may be too short -- WS_RECV_TIMEOUT = 8.0 in xai_tts.py is applied to each recv, including the first chunk after text.done. Cold-path TTFB on a fresh connection can exceed 8s under load; you will surface a spurious Timeout error and force a reconnect. Consider a separate longer first-chunk timeout or making this configurable via params.

  6. TTS 15000-char limit is hard-coded and metrics may double-count -- request_tts caps text at 15000 chars then increments both metrics_add_input_characters and metrics_add_output_characters with the same len(prepared_text). Output-character metrics typically track the synthesized text actually returned, not the input again. Worth confirming you are not double-counting. The cap should also be sourced from xAI documented limit and ideally a config knob.

  7. finalize_request does not reset total_audio_bytes or audio_start_sent -- they get reset on the next request-id transition in request_tts, but if the same request_id is somehow re-driven (guarded by current_request_finished only after a normal finalize, not after cancel_tts), state can leak across logical requests. Reset all of audio_start_sent, total_audio_bytes, request_text_length, current_request_finished in one place after finalize for symmetry with the request-start reset.

  8. TTS on_init error path emits with empty request_id and an empty vendor_info -- ModuleErrorVendorInfo with no code or message produces a vendor block where vendor=xai and the rest is blank; downstream parsing that expects both fields may misclassify. Either drop the vendor_info here or fill it in.

  9. ASR start_connection returns early on missing key but does not reset stop flags -- stop_requested = False and close_expected = False are set after the early-return guard, so a subsequent retry path that calls stop_connection followed by start_connection enters with stale state. Move the resets above the guard.

  10. emit_asr_result start-time origin is implicit -- start_ms is seconds-from-stream-start coerced to ms, then passed into audio_timeline.get_audio_duration_before_time. The translation only makes sense if the timeline shares the xAI start origin. Worth a comment or a fixture/test that uses a non-trivial FakeTimeline to lock the contract.

Code quality and best practices

  • Mismatched validate shape between ASR and TTS -- ASR XAIASRConfig.validate requires params api_key; TTS XAITTSConfig.validate additionally enforces api_key.startswith xai-. The ASR side accepts arbitrary key prefixes (its tests/configs/property_invalid.json literally uses the string invalid), so the test passes but real key-format mistakes will not be caught the same way. Pick one policy and apply both.
  • build_ws_url filtering is brittle -- the drop-these-keys, pass-the-rest-through set is duplicated between update_params (which already pops them) and build_ws_url. After update_params, self.config.params should already be free of those keys. Keeping both lists in sync is a future-bug magnet.
  • Inconsistent addon imports -- addon.py (TTS) does a deferred import to avoid a circular import; addon.py (ASR) imports eagerly. Be consistent.
  • recognition.py message_handler silently drops unknown event types -- adding a log_debug for the else branch will help when xAI ships new event types you have not mapped.
  • xai_tts.py connect detects 401 via substring match -- InvalidStatus should already cover the structured case; the substring fallback can misfire on unrelated errors that happen to include 401 or Unauthorized in their body. Consider dropping it and relying on InvalidStatus.
  • xai_tts.py get always sends text.done after a single text.delta -- that matches the test mocks but means there is no streaming-input mode for partial deltas; the docstring/README should mention this if it is the intended contract.
  • Punctuation-only rejection wording -- TTS request_tts only rejects punctuation-only input when both request_text_length is 0 and text_input_end is true. Punctuation in the middle of a multi-chunk request is allowed (good), but the error message could mention this caveat to make debugging easier.
  • tests/test_subtitle_alignment.py skip change is unrelated to xAI -- converting an error to pytest.skip is a behavior change to a shared integration test. Worth calling out separately or splitting into its own commit so it is reviewable on its own merit.

Performance

  • Reconnect ceiling of 10 in XAIASRExtension.on_init (vs default 4 in the manager) means up to about 40s of backoff before fatal escalation. Reasonable for transient network blips, but the manager docstring still advertises default 4. Update the docstring or the default.
  • XAITTSClient.get decodes base64 and yields chunks one-at-a-time. For large responses this is fine; just be aware that backpressure is implicit (the consumer must keep up with the WS). The 8s recv timeout catches a hung consumer indirectly, but that is not the same as proper backpressure.

Security

  • API key redaction via utils.encrypt in both XAIASRConfig.to_json and XAITTSConfig.to_str -- good.
  • API keys are sent via Authorization Bearer header rather than query string -- good.
  • The ASR side logs the connection URL at INFO. No key in it, but it includes language, endpointing, diarize, etc. -- fine, but worth confirming nothing sensitive ever sneaks into params passthrough to the URL.
  • SENSITIVE_SUBSTRINGS is duplicated in both xai_asr_python/config.py and xai_tts_python/config.py. Other extensions in this repo seem to repeat this pattern too, but consider promoting it into ten_ai_base.utils to avoid drift between vendors.

Test coverage

  • Solid unit-level mock coverage for both extensions; good split between client-mock tests (state machine) and config-only tests.
  • Missing: a test that exercises cancel_tts for the TTS extension and asserts the INTERRUPTED reason flows through finalize_request.
  • Missing: an ASR test with a non-trivial audio_timeline that proves the start_ms translation in emit_asr_result is doing what the code claims.
  • Missing: a test for the multichannel / channels config path (or a test that asserts non-mono input is rejected, depending on which way you decide to fix issue 1 above).
  • The ASR/TTS standalone tests pass claim in the PR description matches the unit tests; it would be helpful to also note whether the integration tts_guarder suite was run against the new TTS extension.

Nits

  • addon.py (ASR) has a stray blank line at the top of on_create_instance. Cosmetic.
  • README in xai_asr_python lists multichannel / diarize as supported but the code does not expose any way to consume diarization output -- clarify whether it is pass-through to the vendor only.

Summary

The code is in good shape and matches the patterns already established in sibling extensions. The most important things to address before merging are (1) the channels-aware duration calculation in ASR send_audio_frame, (2) the dead text_normalization branch in build_ws_url, (3) the TypeError-as-arity-detection in ReconnectManager, and (4) tightening the validate method naming. The rest are quality-of-life fixes that can land in follow-ups.

…onse (#2146)

_first_audio_chunk_ts was being set inside the EVENT_TTS_TTFB_METRIC branch.
That event is yielded by the client just before the first EVENT_TTS_RESPONSE,
so the recorded "first chunk arrival" was a few microseconds earlier than the
actual first chunk receive in the extension. Move the assignment into the
first EVENT_TTS_RESPONSE so request_event_interval_ms is literally
last_audio_chunk_ts - first_audio_chunk_ts as documented.
@BenWeekes
Copy link
Copy Markdown
Contributor Author

Follow-up on 2820e94b7: _first_audio_chunk_ts was being set in the EVENT_TTS_TTFB_METRIC branch, but the client yields TTFB just before the first EVENT_TTS_RESPONSE from the same audio.delta message — so the recorded "first chunk arrival" was a few microseconds earlier than the actual first chunk processed by the extension.

Fixed in 31259ac6f_first_audio_chunk_ts now assigned in the first EVENT_TTS_RESPONSE (None-guarded). request_event_interval_ms is now literally last_audio_chunk_ts − first_audio_chunk_ts, matching the doc note in #2165.

@github-actions
Copy link
Copy Markdown

PR Review: xAI ASR + TTS extensions

Nice scope: the new extensions are well-tested (40+ unit tests), the graph wiring covers all four ASR/TTS combinations, and the redact/reconnect/dump scaffolding follows the patterns from neighboring vendor extensions. Below are concrete points worth addressing before merge.

Bugs / correctness

  1. `xai_asr_python/extension.py:send_audio` holds the audio frame locked across `await`s.
    ```python
    buf = frame.lock_buf()
    audio_data = bytes(buf) # already a copy
    if self.audio_dumper:
    await self.audio_dumper.push_bytes(audio_data)
    await self.recognition.send_audio_frame(audio_data)
    frame.unlock_buf(buf)
    ```
    `bytes(buf)` already produces a detached copy, so the lock can (and should) be released immediately after. Holding it across two awaits adds latency and increases the chance the buffer is starved. Other vendor extensions in this repo unlock before awaiting.

  2. `xai_asr_python/recognition.py:start` leaks the websocket on a bad startup event. If the first frame isn't `transcript.created`, the function raises but never closes `self.websocket`. Same path: if `wait_for(recv)` times out, the websocket is left open. Wrap the post-connect handshake in `try/except` and `await self.websocket.close()` on failure.

  3. `xai_tts_python/xai_tts.py:get` sends `text.done` after every delta.
    ```python
    await self._ws.send(json.dumps({"type": "text.delta", "delta": text}))
    await self._ws.send(json.dumps({"type": "text.done"}))
    ```
    For each call to `request_tts`, the input stream is closed regardless of whether `text_input_end` is true. Multi-chunk LLM streaming through this client effectively becomes N independent synthesis requests, which defeats partial-text streaming and may cause stuttering or extra TTFB cost between chunks. Either send `text.done` only on the final chunk, or document that this is the intended one-shot model.

  4. `xai_tts_python/xai_tts.py:get` is not safe under concurrent `cancel`. The websocket is shared mutable state; `cancel()` calls `await self._ws.close()` while `get()` may be mid-`await self._ws.send(...)` or `recv()`. This can surface as confusing exceptions rather than clean cancellation. An `asyncio.Lock` around the active streaming session would make the intent explicit.

  5. `xai_tts_python/xai_tts.py:_build_ws_url` has dead branching for `text_normalization`. `update_params` already coerces it to `bool` via `_coerce_bool`, so the `if isinstance(..., str)` branch never runs. Drop the conditional and just use `self.config.text_normalization`.

  6. Stale doc/comment on `reconnect_manager.py`. The class docstring says "default: 4" for `max_attempts`, but `extension.py` constructs it with `max_attempts=10`. Update the comment or revisit the ceiling — 10 retries with the 4s cap means ~30s of reconnect storm before fatal escalation.

  7. `xai_asr_python/extension.py:on_init` falls back to an empty config on validation failure (`self.config = XAIASRConfig.model_validate_json("{}")`). Subsequent lifecycle calls then operate on a half-initialized extension; this is brittle. Either set `self.config = None` and gate every method on it (already partially done), or refuse to enter `running` state entirely.

API / brittleness

  1. `xai_tts_python/config.py:validate` enforces `api_key.startswith("xai-")`. Hard-coded prefix checks tend to age badly when vendors change key formats; one of the test fixtures (`property_invalid.json` uses `"invalid"`) only fails today because of this prefix, not because of a real auth check. Consider dropping the prefix check and letting the 401 handling fail loud at connect time instead — it's already wired through `_handle_connection_error`.

  2. `XAIASRConfig.validate` shadows a Pydantic concept — easy to confuse with `model_validate`. `validate_config()` (or a `model_validator(mode="after")`) would be clearer and let you drop the explicit `apply_defaults()` + `validate()` two-step.

  3. README lists five voices but nothing validates `voice_id`. Minor, but if xAI rejects an unknown voice the user sees a generic 4xx; a clearer client-side error would help.

Test coverage

  • The unit-test coverage for happy paths and finalize/error contracts is solid (`test_finalize.py`, `test_reconnect.py`, `test_robustness.py`, `test_error_msg.py`).
  • Gaps worth closing:
    • No test for the websocket leak in (modify playground default settings,fix voice type error #2) — a bad startup event should not leave a connected socket.
    • No test for the multi-chunk send pattern (Fix speech sdk #3) — assert behavior when two non-final `request_tts` calls arrive on the same `request_id`.
    • `test_subtitle_alignment.py` was changed from `FileNotFoundError` to `pytest.skip`. That's the right call for providers without word-level timing, but it now also masks a missing config for providers that should support it. Consider keying the skip on a per-provider list rather than purely on file presence.

Nits

  • `agent.py` swallows `tts_audio_start`/`tts_audio_end` data names with an unconditional `return`. This is fine but worth a one-line comment clarifying why main_python intentionally ignores them (so a future reader doesn't assume a bug).
  • `xai_tts_python/property.json` and the in-tree graph configs default `XAI_API_KEY` with both `${env:XAI_API_KEY|}` (with empty fallback) and `${env:XAI_API_KEY}` (no fallback). Pick one for consistency — the empty-fallback form pairs better with the explicit "API key required" runtime error message.
  • `xai_tts_python/xai_tts.py` constants `MAX_CONNECT_ATTEMPTS=5` and the cap of `4.0s` are duplicated in `ReconnectManager`. Centralizing or at least documenting why the TTS path uses 5/4s while the ASR path uses 10/4s would help future ops triage.

Overall: solid first cut. Items 1–4 above are the ones I'd block on; the rest are polish.

…gnition (#2146)

finalize() guarded self.recognition at entry but then re-accessed it after
awaiting send_audio_done and wait_for_done. The on_close callback nulls
self.recognition, so a close firing mid-finalize made the later
self.recognition.done_event access raise AttributeError. Capture a local
reference at the top of finalize and use it throughout so the call stays
consistent even if on_close clears the attribute mid-flight.
@github-actions
Copy link
Copy Markdown

Review: xAI ASR & TTS extensions (#2146)

Thanks for the substantial contribution — both extensions follow the established _python extension layout, ship with extensive unit tests, and the graph wiring in voice-assistant looks consistent with the existing Deepgram/ElevenLabs variants. Below are the issues I'd want addressed before merge, grouped by severity.

Bugs / correctness

  1. Encoding allow-list inconsistency (xai_asr_python).
    config.py validates encoding against {"pcm", "mulaw", "alaw"}, but recognition.py:send_audio_frame computes bytes_per_sample for {"mulaw", "ulaw", "alaw"}. ulaw will never reach that branch (config rejects it first), and any ulaw user will get a confusing "Unsupported encoding" error even though the framing code was clearly written to handle it. Pick one canonical set — I'd suggest {"pcm", "mulaw", "alaw"} everywhere, since xAI's docs use mulaw.

  2. Multi-channel duration math is wrong (xai_asr_python/recognition.py, send_audio_frame).
    The duration calc divides only by sample_rate * bytes_per_sample and ignores channels. The config schema accepts channels and multichannel, so a user setting channels=2 will get a duration that is 2x the real wallclock — and audio_timeline then drives every downstream start_ms. Either multiply by channels here or refuse channels != 1 in validate().

  3. text_normalization query encoding is dead/buggy logic (xai_tts.py, _build_ws_url).
    XAITTSConfig.update_params already coerces text_normalization to a bool via _coerce_bool, so isinstance(self.config.text_normalization, str) is always False — meaning the conditional always falls through to the bool path. The code reads as if a string fallback was intended, but is unreachable. Either remove the dead branch, or do the coercion in one place (keep it in the config and let _encode_query_value handle the bool→"true"/"false" conversion that already exists below).

  4. _ensure_connection doesn't notice a closed socket (xai_tts.py).
    It only checks if not self._ws. If the websocket object exists but state != OPEN (e.g., a server-initiated close between requests), the next await self._ws.send(...) will raise ConnectionClosed. The recovery path does catch this (via the outer except Exception in request_tts_reconnect_client), but it is a wasted error round-trip. Use the existing is_connected() helper here.

  5. InvalidStatus.response.status_code may not exist on older websockets releases.
    xai_tts.py references e.response.status_code. With websockets>=15.0.1 this is fine, but the requirement is unpinned upper-bound — and on older 12.x releases the attribute is just status_code. Worth either pinning more strictly or guarding with getattr.

  6. 403 should be fatal in TTS too (xai_tts_python/extension.py, _handle_connection_error).
    The ASR path treats {401, 403} as fatal, but TTS only treats 401. xAI returns 403 for disabled/insufficient-quota keys; reconnecting forever in that case is the wrong behavior. Align with ASR.

  7. Open websocket leaked when startup handshake fails (xai_asr_python/recognition.py, start).
    If wait_for(self.websocket.recv(), timeout) times out, or first_event["type"] != "transcript.created", the function raises but never closes self.websocket. The caller (start_connection) catches the exception and may go into reconnect, leaving the original socket open. Wrap startup in a try/except that closes self.websocket on any failure.

  8. recorder_map grows unbounded (xai_tts_python/extension.py).
    Each new request_id adds a PCMWriter to self.recorder_map, but nothing ever removes it. _finalize_request calls flush() but doesn't del the entry. With dump=True in long-running sessions this will leak file handles and memory. Pop and close/flush after finalize.

  9. .env.example trailing-newline change is gratuitous.
    The diff converts the previous MURF_TTS_KEY= (no trailing newline) to one with a newline. Not a real bug, but it makes the diff noisier than needed for an unrelated file.

Design / API

  1. XAIASRConfig.validate() shadows BaseModel.validate.
    Pydantic v2 removed the legacy classmethod, but reusing the name is still confusing — readers will assume it is pydantic's hook and try to call it as a classmethod. Rename to validate_config() (this is the convention used in some other extensions).

  2. API-key prefix check is over-eager (xai_tts_python/config.py, validate).
    if not self.api_key.startswith("xai-") will break any user with a future-issued key that uses a different prefix, and the only thing it catches is a typo'd key — which the server will reject anyway. Drop it; let the websocket handshake produce the real auth error.

  3. reconnect_manager.py's TypeError-fallback for the error-handler signature is fragile.
    The try: error_handler(error, vendor_info) / except TypeError: error_handler(error) block swallows real TypeErrors raised inside send_asr_error (e.g., type bugs in caller code) and silently retries with a different signature, hiding the real exception. Since the only caller passes self.send_asr_error which accepts both args, just call it with two args directly — drop the fallback.

  4. bit_rate / optimize_streaming_latency aren't validated.
    Anything int passes XAITTSConfig.validate(). Negative or zero values would silently land in the URL and produce vendor errors. Either constrain ranges or document acceptable values.

  5. Two near-identical _SENSITIVE_SUBSTRINGS blocks.
    Both xai_asr_python/config.py and xai_tts_python/config.py redefine the same constant + helper. If ten_ai_base.utils.encrypt is the canonical home, consider also putting the substring list there (or at minimum factor a single helper inside the xai package to avoid drift).

Style / nits

  • The default XAI_DEFAULT_PARAMS in ASR config silently overwrites unrelated keys via setdefault — fine, but adding a brief comment about the intent (these are the minimum params xAI expects on the URL) would help maintainers.
  • XAITTSConfig.update_params returns None and mutates in place; pairs awkwardly with pydantic's normal validator hooks. A model_post_init or model_validator(mode="after") would be more idiomatic and run automatically.
  • TIMEOUT_CODE = 10105 in const.py is defined but never used — drop or wire it up.
  • Consider documenting the voice_assistant_xai_full test commands in the README's "Testing" section so reviewers can repro the demo path.

Test coverage

Test counts are healthy (40 across both extensions) and cover finalize, reconnect, error mapping, dumps, and metrics — well above the bar for an extension PR. Suggested gaps:

  • ASR: a test asserting bytes_per_sample math for mulaw and (if you keep multi-channel support) channels=2.
  • TTS: a test asserting that 403 responses are surfaced as FATAL_ERROR (today only 401 is asserted).
  • TTS: a test asserting recorder_map is cleaned up after _finalize_request.
  • Reconnect: a test that validates the state when mark_connection_successful is called from on_open after _handle_reconnect has already returned success=True — currently the success path is implicit.

Security

  • Sensitive-value redaction on log output looks correct (encrypt is applied to anything matching key|token|secret|signature|password). Worth double-checking that to_str(sensitive_handling=False) is never called from production logging — a quick grep shows it is only used in tests today, but a default of True everywhere would prevent foot-guns.
  • ${env:XAI_API_KEY|} defaulting to empty in property.json is consistent with the other extensions; the config layer surfaces a proper FATAL_ERROR when missing. Good.

Housekeeping

  • PR title feat: add xai asr and tts extensions matches the project convention.
  • manifest.json versions: xai_tts_python is 0.1.1 while xai_asr_python is 0.1.0 — make sure the skew is intentional.

Overall direction is good — the fixes above are mostly localized and shouldn't require restructuring. Happy to re-review once these are addressed.

xai_asr_python:
- multi-channel duration math: bytes_per_ms now multiplies by channels so
  channels=2 reports correct wall-clock instead of double
- encoding allow-list consistency: framing now uses {pcm, mulaw, alaw} to
  match config.validate
- startup websocket leak: close the underlying socket if the first-message
  handshake raises before the message handler task is started
- drop the TypeError fallback in reconnect_manager.handle_reconnect; the
  single caller passes the two-arg form, so the fallback only masked real
  type bugs in error_handler
- delete unused TIMEOUT_CODE constant

xai_tts_python:
- 403 is now classified as FATAL_ERROR alongside 401 (xAI returns 403 for
  disabled/insufficient-quota keys)
- recorder_map entries are popped after _finalize_request to avoid leaking
  PCMWriter file handles in long-running sessions
- drop dead isinstance(text_normalization, str) branch in _build_ws_url
  now that update_params coerces to bool via _coerce_bool
- _ensure_connection uses is_connected() so a non-OPEN socket triggers a
  reconnect instead of being reused
- _connect uses getattr for InvalidStatus.response.status_code so older
  websockets releases that expose status_code directly are handled
- drop the xai- prefix check on api_key; real auth failures surface from
  the server handshake

Tests added:
- xai_asr_python.tests.test_send_audio_duration covers pcm vs mulaw and
  channels=2 duration math
- xai_tts_python.tests.test_review_feedback covers 403 -> FATAL_ERROR
  classification and recorder_map cleanup after finalize
@github-actions
Copy link
Copy Markdown

Review of PR #2146feat: add xai asr and tts extensions

Overall this is a well-structured contribution: bounded reconnect, fatal/non-fatal classification on auth, dump support, redaction of secrets, and solid unit-test coverage on both extensions. A few correctness and consistency notes below.

Potential bugs

  1. ASR may stop reconnecting after a finalize-induced closeextension.py:1411-1419
    In on_close, when _close_expected is True (set during finalize), the handler resets the flag and returns without reconnecting. If the xAI WebSocket closes after transcript.done (which is what _close_expected exists to handle), the extension will sit disconnected; the next send_audio will simply return False, and there is no path back to start_connection. If xAI keeps the socket open after audio.done this is fine, but worth confirming against the vendor and/or adding a re-arm path. A targeted test (close after finalize, then send a new audio frame) would lock this down.

  2. Potential double _finalize_request on TTS cancel raceextension.py (xai_tts) cancel_tts + _process_tts_text
    cancel_tts calls self.client.cancel() (which closes the WS) and then _finalize_request(INTERRUPTED). If request_tts is concurrently inside _process_tts_text, the closed socket raises in client.get(), yields EVENT_TTS_ERROR, and that branch also calls _finalize_request(ERROR, …). Result: two tts_audio_end for the same request. Consider gating finalize on the current request_id (e.g. if self.current_request_id != request_id: return) or a _finalized flag.

  3. Auth detection via substring matchxai_tts.py:5658-5662

    if \"401\" in error_message or \"Unauthorized\" in error_message:
        raise XAITTSConnectionException(status_code=401, body=error_message)

    A vendor message that incidentally contains 401 (e.g. \"request id 401abc...\") would be classified as fatal auth. The InvalidStatus branch above already handles real 401/403 status codes — consider dropping the string fallback, or restricting it to a strict prefix.

  4. apply_defaults is the wrong gate for required fieldsconfig.py (asr)
    apply_defaults populates a default language, sample_rate, etc., but validate() only checks api_key/sample_rate/encoding. If a caller passes params: {}, defaults silently fill everything in. That's probably intentional, but the language default of \"en\" ends up appended to the WebSocket URL even when the caller may have wanted vendor-side detection. A short note in the README about default behavior would help.

  5. Word text field namingextension.py:1351
    ASRWord(word=word.get(\"text\", \"\"), …). If xAI ever switches its word object to word/token instead of text, this silently emits empty strings. Worth either logging a warning when text is missing or asserting it during partial parsing.

Consistency / code quality

  1. Two different config patterns in one PR. ASR (XAIASRConfig) keeps everything inside params: dict[str, Any] and reads via params.get(...). TTS (XAITTSConfig) hoists known keys to typed model fields and pops them out of params. Both are reasonable, but living side-by-side in the same PR makes the codebase harder to navigate. Picking one shape (ideally the typed-field one — it's stricter and gives you to_str for free) and converting the other would pay off.

  2. ReconnectManager docstring drifts from real call site. The docstring claims max_attempts: 4 and lists delays 0.5s, 1s, 2s, 4s, but the extension instantiates it with max_attempts=10 (extension.py:1088-1090). With the same backoff curve and the 4s cap, that's ~31s of retries before fatal. Either update the docstring, or move the 10 into a named constant with a comment about why the integration guarder needs that ceiling.

  3. Magic numbers. if self._request_text_length + len(prepared_text) > 15000: (xai_tts extension.py) and the sample-rate allowlist in both configs would read better as module-level constants (e.g. XAI_TTS_MAX_TEXT_LENGTH = 15000, SUPPORTED_SAMPLE_RATES = {...}). Same for the {401, 403} set duplicated across _is_fatal_connection_error and _handle_connection_error.

  4. Dead code in test. XAIAsrExtensionTester.stop_test_if_checking_failed (tests/test_error_check.py) is defined but never called.

  5. config.py:apply_defaults dead else branch. params = self.params if isinstance(self.params, dict) else {} — pydantic guarantees params is a dict via Field(default_factory=dict).

  6. xai_tts_python requirements. websockets>=15.0.1 is fine, but pinning pydantic without any version is unusual for this repo — most other extensions either omit it (provided by base) or pin a minimum. Worth checking whether pydantic belongs in requirements.txt at all here, since ten_ai_base already depends on it.

Performance / observability

  1. TTFB metric only fires on the first audio chunk of the first request after reset_ttfb. That's correct, but the chunk timestamp computed in _get_next_audio_chunk_timestamp_ms uses int(datetime.now().timestamp() * 1000) for the start anchor instead of _first_audio_chunk_ts. The two are taken at almost the same moment so it shouldn't matter, but using _first_audio_chunk_ts directly would make the relationship explicit.

  2. finalize_timeout_ms = 2000 default may be aggressive on slow networks for long utterances. Worth surfacing this in the README so integrators know to bump it.

Security

  • API keys are redacted in to_json/to_str via utils.encrypt. ✅
  • WS URL is logged at info level (recognition.py:1637) — query string contains the same scalar params as the connection, but no secret. The bearer token is sent in Authorization header, not the URL. ✅
  • One small note: if a user puts a sensitive value in an unrecognized params key (e.g. a custom auth_token), the TTS _build_ws_url will pass it through to the query string unredacted, and that URL gets logged. Consider redacting sensitive-looking unknown keys before logging the URL, or never logging the full URL.

Test coverage

40 standalone tests is a strong baseline. A few gaps worth filling:

  • ASR: a test for the post-finalize close → next-audio path described in (1).
  • TTS: a test for the cancel/finalize race in (2), even if it's a deterministic asyncio interleaving.
  • TTS: a test confirming tts_audio_start is emitted exactly once per request when audio is produced.

Nothing here is a blocker for the feature itself — most are robustness or polish. Nice work overall.

@BenWeekes
Copy link
Copy Markdown
Contributor Author

Second auto-review pass addressed in 78dac2ade.

ASR

  • recognition.send_audio_frame duration math now multiplies by channels so channels=2 reports correct wall-clock instead of double.
  • Encoding allow-list aligned to {pcm, mulaw, alaw} across config and framing (dropped unreachable ulaw).
  • recognition.start now closes the websocket if the first-message handshake raises before the message task is started.
  • reconnect_manager.handle_reconnect drops the TypeError fallback for error_handler arity; the single caller passes two args, so the fallback only masked real type bugs.
  • Removed unused TIMEOUT_CODE constant.

TTS

  • 403 now classified as FATAL_ERROR alongside 401 (xAI returns 403 for disabled / insufficient-quota keys).
  • recorder_map entries popped after _finalize_request to avoid leaking PCMWriter file handles in long-running sessions.
  • Dead isinstance(text_normalization, str) branch removed; update_params already coerces to bool.
  • _ensure_connection uses is_connected() so a non-OPEN socket triggers reconnect.
  • _connect uses getattr for InvalidStatus.response.status_code to handle older websockets releases.
  • Dropped xai- prefix check on api_key; real auth failures surface from the handshake.

Tests added

  • xai_asr_python/tests/test_send_audio_duration.py — PCM, mulaw, and stereo PCM duration math.
  • xai_tts_python/tests/test_review_feedback.py — 403 → FATAL classification, 500 → NON_FATAL, recorder_map cleanup after finalize (incl. flush-failure path).

Results on 78dac2ade

  • TTS standalone: 27 passed
  • ASR standalone: 21 passed
  • TTS guarder: 15 passed, 1 skipped (test_subtitle_alignment opt-in)
  • ASR guarder: 9 passed, 1 deselected (test_long_duration_stream, repo-default skip)

)

task check flagged 9 xAI files for black reformatting. Apply task format
output so the formatter check passes on dev/xai.
@github-actions
Copy link
Copy Markdown

PR #2146 Review — xAI ASR/TTS Extensions

Thanks for the comprehensive submission — overall structure follows the existing extension conventions well, and test coverage is impressive (40+ tests across the two extensions). Below are findings, ordered by severity.

Bugs / Correctness

1. TTS sample width is hard-coded to 2 bytes, breaking mu-law/A-law duration math (xai_tts_python/extension.py:101-102)

def synthesize_audio_sample_width(self) -> int:
    return 2

XAITTSConfig.validate() accepts codecs mulaw, ulaw, alaw (1 byte/sample) and mp3 (variable bitrate), but _calculate_audio_duration_ms() uses this constant 2-byte width unconditionally. With mu-law/A-law output the reported request_total_audio_duration_ms will be 2× too long, and the TTS audio chunk timestamps in _get_next_audio_chunk_timestamp_ms() will drift. Suggest deriving width from self.config.codec (and skipping duration math entirely for mp3).

2. Codec allow-list likely overstates vendor support (xai_tts_python/config.py:51)

if self.codec not in {"pcm", "mp3", "wav", "mulaw", "ulaw", "alaw"}:

The README and graph properties only document pcm. If xAI's WebSocket API only supports pcm/mp3 today, narrow this set so users get a clear validation error rather than a cryptic vendor failure. (wav over a streaming WebSocket is also unusual — that's typically a container, not a stream codec.)

3. ASR _handle_reconnect floods the bus with error events on each retry (xai_asr_python/extension.py:897-918 + reconnect_manager.py:96-122)

Every failed retry calls error_handler and emits a non-fatal send_asr_error (with vendor info). With max_attempts=10, a transient outage produces 10 separate error events before terminal escalation. Consider only emitting a single "reconnecting" notice and the terminal fatal error, and keeping per-attempt failures at log level. The same pattern in start_connection means the first failure produces an error from start_connection AND another N from _handle_reconnect.

4. Reconnect manager type signature is inconsistent with how it's called (xai_asr_python/reconnect_manager.py:49-58)

error_handler: Optional[Callable[[ModuleError], Awaitable[None]]] = None,

But the call site invokes error_handler(error, vendor_info). Either the type hint should be Callable[..., Awaitable[None]] / a Protocol with both signatures, or always pass vendor_info (possibly None). send_asr_error already accepts both, so the conditional branch isn't necessary.

5. ASR start() rejects any non-transcript.created first event, including legitimate error frames (xai_asr_python/recognition.py:128-141)

if first_event.get("type") != "transcript.created":
    raise RuntimeError(f"Unexpected xAI STT startup event: {first_event.get('type')}")

If xAI returns an error event first (e.g. quota exceeded after auth succeeds), it surfaces as RuntimeError("Unexpected xAI STT startup event: error") rather than the actual error message, and the fatal/non-fatal classification in _extract_connection_status_code won't extract a code from it. Special-case error here and propagate the real reason.

6. TTS get() may break protocol on websocket reuse

The client sends {"type":"text.done"} after each text.delta, then reuses the same WebSocket for the next request_tts. If the xAI server treats text.done as session-ending (closes after the final audio.done), the second request will hang in _ws.recv() until WS_RECV_TIMEOUT (8s). Worth confirming with the vendor whether the socket is multi-request or single-shot, and either reconnecting after every audio.done or documenting/testing that reuse works.

7. Hard-coded MODULE_NAME_ASR in ReconnectManager (xai_asr_python/reconnect_manager.py)

Currently fine since only ASR uses it, but the class is otherwise generic — pass module_name in the constructor so it's reusable / not surprising to a future caller.

Code quality / consistency

8. XAIASRConfig.validate() / XAITTSConfig.validate() shadow Pydantic v2 conventions

Pydantic v2 has model_validate as the conventional class-level entry point; defining an instance validate() works but is easy to misread. Consider renaming to validate_config() or using a model_validator(mode="after") so the check runs automatically on construction.

9. ASR manifest version 0.4.0 for a brand-new extension (xai_asr_python/manifest.json:4)

The TTS manifest in this same PR starts at 0.1.1. Suggest aligning to 0.1.0 unless there's an external reason for the higher version.

10. Duplicated _is_sensitive_key and _SENSITIVE_SUBSTRINGS across ASR and TTS configs

Identical helpers in xai_asr_python/config.py and xai_tts_python/config.py. Worth promoting to a shared helper in ten_ai_base.utils alongside encrypt.

11. test_subtitle_alignment.py weakens existing coverage

-    raise FileNotFoundError(f"Config file not found: {config_file_path}")
+    pytest.skip(f"Config file not found: {config_file_path}. ...")

Turning a missing config into a skip means a typo in a future provider's config filename is silently green. If the goal is "xAI doesn't have word-level timing," a cleaner alternative is to gate via SUPPORTED_TTS_EXTENSIONS rather than file existence.

12. ASR validate() doesn't validate endpointing, channels, or language

Only sample_rate and encoding are checked. Negative/zero endpointing, channel counts above 2, or unknown language codes will silently flow into the URL.

13. TTS bit_rate is sent only for codec == mp3, but it's never validated

Default is 128000. If xAI rejects values outside a supported set, surfacing that at config time is friendlier than at connect time.

14. Hardcoded URL wss://api.x.ai/v1/stt appears in three places (config.py, recognition.py, property.json). The recognition._build_url() fallback ignores XAI_DEFAULT_PARAMS. Centralize the default in one place.

15. XAITTSExtension.cancel_tts() calls _finalize_request(INTERRUPTED) while request_tts() may still be iterating

Worth tracing whether the in-flight _process_tts_text loop will also hit EVENT_TTS_END/EVENT_TTS_ERROR after cancel() closes the socket and emit a second finalize. The _audio_start_sent flag guards tts_audio_start but there's no guard on send_tts_audio_end.

Security

16. _is_sensitive_key matches any field containing "key"

Safe-failing, just noting that future fields like voice_key would be over-redacted.

17. URL is logged at info level (recognition.py:1113)

The current URL is safe (Bearer token in header), but if xAI ever moves the key into a query param like some other vendors do, logging the URL would leak it. Consider scrubbing query strings before logging.

Performance

18. recognition._message_handler() json.loads every frame on the hot path

Not a real concern at typical ASR rates, just calling it out: if xAI sends partials at high frequency, a select-by-prefix could trim parsing.

19. ASR start_connection opens then immediately closes if already connected (xai_asr_python/extension.py:642-643)

if self.is_connected():
    await self.stop_connection()

Forces a close + reconnect on every start_connection call. Probably intentional, but worth a comment explaining why.

Testing

Coverage is solid. Three suggestions:

20. No test asserts duration for mu-law/A-law TTS — would have caught issue #1.

21. No test exercises the reuse-across-requests TTS path — would catch issue #6. The mocked tests bypass the real _ws_url send/recv cycle entirely.

22. test_error_check.py (ASR) connects to the real wss://api.x.ai/v1/stt with api_key=invalid_key. That test will fail in air-gapped CI. Consider mocking websockets.connect like the TTS tests do.

Minor

  • xai_asr_python/addon.py has an unconventional blank line after the method signature.
  • _extract_connection_status_code and _is_fatal_connection_error are duplicated between ASR extension.py and TTS xai_tts.py. Move to a shared util.
  • requirements.txt pins websockets>=15.0.1 but doesn't pin pydantic. If the rest of the repo standardizes pins, match the convention.
  • The TTS README lists 5 voices but the config validator doesn't constrain voice_id. Either constrain or document why it's free-form.

Summary

The architecture and lifecycle handling (start / connect / finalize / reconnect / stop) are clean and follow patterns from neighboring extensions. The most important things to address before merge are #1 (sample width), #3 (error event flooding), #5 (startup error event handling), and #6 (websocket reuse semantics) — those are correctness/UX bugs visible in production. The rest are quality / consistency polish.

Nice work on the test coverage and the explicit audio.done finalize flow.

…ension (#2146)

pylint W0611 flagged LOG_CATEGORY_VENDOR as unused in xai_tts_python/extension.py
after recent edits removed the only usage. The ci lint job treats any pylint
warning as fatal so the import has to go.
@github-actions
Copy link
Copy Markdown

PR Review: feat: add xai asr and tts extensions

Thanks for the substantial contribution — the extensions are well-structured, follow existing patterns from sibling extensions, and the test coverage is generous (40 test functions across both extensions). Below is a focused review with the most useful findings up top.

🔴 Bugs / correctness

1. XAITTSClient._connect_exp_cnt is never reset between reconnect cyclesxai_tts_python/xai_tts.py, _connect_with_backoff
After a fully-failed _connect_with_backoff cycle, _connect_exp_cnt ends at MAX_CONNECT_ATTEMPTS (5) and is only reset to 0 in _connect() on success. The next call to _connect_with_backoff (from _reconnect() or _ensure_connection()) will see while self._connect_exp_cnt < MAX_CONNECT_ATTEMPTS evaluate False, skip the loop entirely, and immediately raise XAITTSConnectionException(status_code=503, body="xAI TTS connection failed") (with last_error=None). The TTS client effectively becomes permanently un-reconnectable after one bad burst. Recommend resetting _connect_exp_cnt = 0 at the entry of _connect_with_backoff (or using a local counter for the per-cycle bound).

2. _connect_with_backoff ignores cancellationxai_tts_python/xai_tts.py
stop() sets _is_cancelled = True but the backoff loop doesn't check it. A stop() during a 2–4s asyncio.sleep will still keep retrying. Consider checking self._is_cancelled in the loop and bailing early.

3. send_audio holds the audio buffer locked across awaitsxai_asr_python/extension.py, send_audio
The pattern is:
```python
buf = frame.lock_buf()
audio_data = bytes(buf)
if self.audio_dumper:
await self.audio_dumper.push_bytes(audio_data) # await with buffer locked
await self.recognition.send_audio_frame(audio_data) # await with buffer locked
frame.unlock_buf(buf)
```
`audio_data = bytes(buf)` already makes a copy, so the buffer can be unlocked immediately after that line. Holding the lock across `await`s can stall upstream pipeline frames unnecessarily. Most sibling ASR extensions copy-then-unlock first.

4. Double-unlock risk on success path — same function
`buf` is not set to `None` after `frame.unlock_buf(buf)` in the success branch. If anything between the unlock and `return True` raised, the `except` branch would unlock again. Set `buf = None` immediately after `unlock_buf` to make the cleanup branch safe under any future edits.

5. XAITTSConfig.update_params() is destructive and not idempotentxai_tts_python/config.py
It uses `params.pop(...)` to extract typed fields out of the `params` dict. Calling `update_params()` a second time would no longer find `api_key`, `base_url`, etc. and would silently fall back to the (already-set) instance fields. That's fine today (called once in `on_init`) but it's a footgun. Prefer `params.get(...)` and then explicitly drop the extracted keys, so the call is idempotent and intent is clear.

6. XAIASRConfig.validate() shadows a Pydantic name historically used internally
Pydantic v2 removed the legacy `BaseModel.validate` classmethod, so this technically doesn't clash today, but `model_validate` is the canonical entry point and a new `validate()` instance method on a Pydantic model can confuse readers (and tooling). Consider renaming to `validate_config()` or `check()`. Same applies to `XAITTSConfig.validate()`.

🟡 Code quality / style

7. Magic numbers scattered

  • `15000` character limit in `xai_tts_python/extension.py` — surface as a module-level constant or config field.
  • `1024 * 1024 * 10` in `buffer_strategy()` — sibling extensions usually name this.
  • Reconnect ceilings (`max_attempts=10` in extension, `max_attempts=4` in manager docstring) — the docstring still says "default: 4" while the extension overrides to 10. Worth a sentence noting why ASR uses a higher ceiling than the manager default.

8. XAIASRRecognitionCallback uses @abstractmethod without ABC/ABCMeta
The `@abstractmethod` decorations are decorative only here — `XAIASRRecognitionCallback` is a plain class, so abstractness isn't enforced. Either inherit from `abc.ABC`, switch to `typing.Protocol`, or drop the decorators.

9. Reconnect-manager docstring drift
The class docstring claims "Exponential backoff strategy with maximum delay cap: 0.5s, 1s, 2s, 4s (capped)" but the extension passes `max_attempts=10`, which yields `0.5, 1, 2, 4, 4, 4, 4, 4, 4, 4`. Update the docstring or parameterize the example.

10. on_init error path leaves a partially-initialized ASR config
On exception during config parse/validate, the ASR extension does `self.config = XAIASRConfig.model_validate_json("{}")` and sends a FATAL error, but `start_connection()` will then re-detect the missing api_key and emit a second FATAL error. Not strictly broken (the upstream control plane likely shuts the extension down on the first FATAL), but the duplicate error path is noisy. Consider an explicit `self._init_failed = True` short-circuit.

11. _emit_asr_result + final=True with empty text
`finalize()` calls `_emit_asr_result(payload, final=True, locked=False)` only `if payload and payload.get("text")`, so an empty-text final from the vendor never reaches `_emit_asr_result`. But the guard `if not text and not final: return` inside the function still allows empty-text finals to propagate downstream. If that's intentional (so consumers see a "final empty" signal), worth a comment; if not, the guard should be `if not text: return`.

🟢 Security

  • API keys redacted via `utils.encrypt(...)` before logging — good. `_is_sensitive_key` substring matching is a reasonable belt-and-suspenders for `params` blobs.
  • Bearer token sent in `Authorization` header — appropriate (not in URL query params, where it would land in proxy logs).
  • However, `XAITTSClient._build_ws_url` iterates `self.config.params.items()` (after `update_params()` has popped typed keys) and forwards anything left in the dict into the URL. If a user accidentally puts a sensitive value (e.g., a non-standard `auth_token` key) into `params`, it will land in the URL as a query string. Consider re-applying the `_is_sensitive_key` filter on the leftover params here as well.
  • No path-traversal validation on `dump_path`. Today this is operator-supplied via property, so it's lower priority, but `os.makedirs(dump_path, ...)` will create arbitrary directories. If extensions are ever exposed to untrusted config sources, this needs hardening.

🟡 Performance

  • TTS reconnect backoff caps at 4s after 4 attempts (0.5, 1, 2, 4, 4); ASR caps at the same. Once the `_connect_exp_cnt` bug (Update README.md #1) is fixed, consider whether you want jitter to avoid thundering-herd reconnects on multi-replica deployments.
  • `request_tts` re-opens a fresh websocket per `cancel()`. For high-cancel-rate workloads (frequent barge-ins) this may add latency; an option to keep the connection warm with a `text.cancel` sentinel (if the vendor supports one) would amortize handshake cost. Not blocking — just worth noting in a follow-up.

🟢 Test coverage

  • Solid breadth: connection failure, reconnect ceiling, vendor errors, dump, finalize, params, metrics, robustness, state machine, review feedback.
  • `test_reconnect.py` validates the manager-level escalation, but I didn't see a regression test for the TTS `_connect_exp_cnt` exhaustion case (item Update README.md #1). A unit test that runs `_connect_with_backoff` to exhaustion, then calls it again with a now-working underlying `_connect`, would have caught the bug above.
  • `test_subtitle_alignment.py` change (raise → `pytest.skip`) is a nice quality-of-life improvement, but the skip message could mention which providers the test still applies to (the file already has `SUPPORTED_TTS_EXTENSIONS = {"cartesia_tts"}`).

🟢 Misc / polish

  • `agent.py` change to silently drop `tts_audio_start`/`tts_audio_end` events (rather than logging "Unhandled data") is reasonable, but worth a one-line comment explaining that those events are consumed elsewhere in the graph and the main agent intentionally ignores them.
  • `recognition.py` raises `RuntimeError` for non-`transcript.created` first event but does not include the full event in the message — including the raw payload would help debugging.
  • `xai_asr_python/init.py` and `xai_tts_python/init.py` only export `addon`, matching siblings — good.
  • Conventional-commits + branch-naming look right per `AGENTS.md`.

Summary

Overall this is a strong, well-tested extension pair. The most important item to address before merge is #1 (TTS reconnect counter exhaustion) — it's a latent bug that will surface in production after the first sustained xAI outage. #2, #3, #5 are also worth fixing before merge. The rest are polish.

Nice work — happy to re-review once the TTS reconnect fix lands.

Ubuntu added 2 commits May 18, 2026 06:38
Critical fix:
- xai_tts_python.xai_tts.XAITTSClient._connect_with_backoff used the
  persistent self._connect_exp_cnt as the loop predicate. After a fully
  failed cycle the counter sat at MAX_CONNECT_ATTEMPTS and the next call
  skipped the loop, raising 503 with no real attempt. Switch to a local
  attempt counter scoped to each cycle so the client recovers after an
  outage. Add regression test
  test_connect_backoff_recovers_after_exhausted_cycle.

Cancellation:
- _connect_with_backoff now checks self._is_cancelled and bails with 499
  instead of continuing to retry during shutdown. New test
  test_connect_backoff_respects_cancellation covers this.

Bug fixes:
- xai_asr_python.send_audio: copy then unlock immediately instead of
  holding the audio buffer locked across awaits. set buf=None after
  unlock so the except branch cannot double-unlock under future edits.
- xai_tts_python.config.update_params: switch from params.pop() to
  params.get() + explicit cleanup so a second call is idempotent.
- Rename XAIASRConfig.validate / XAITTSConfig.validate to
  validate_config to avoid shadowing the pydantic BaseModel name. Update
  callers and tests.
- xai_asr_python.on_init now sets _init_failed when the config fails
  to parse so start_connection short-circuits instead of emitting a
  second FATAL error for the same root cause.
- _emit_asr_result drops the empty-text-with-final carve-out; finalize
  already filters on payload.get("text").

Quality:
- Surface magic constants: MAX_REQUEST_TEXT_CHARS in xai_tts_python,
  AUDIO_BUFFER_BYTE_LIMIT and RECONNECT_MAX_ATTEMPTS in xai_asr_python.
- XAIASRRecognitionCallback now inherits from abc.ABC so the
  @AbstractMethod decorators are enforced.
- Reconnect manager docstring updated to reflect that callers can
  override max_attempts (ASR uses 10).
- Recognition.start RuntimeError on unexpected first event now includes
  the raw payload for easier debugging.
- TTS backoff adds 80-120% jitter so multi-replica deployments do not
  reconnect at the same millisecond after a shared outage.

Security:
- XAITTSClient._build_ws_url now drops leftover params whose name looks
  sensitive (key/token/secret/signature/password) instead of forwarding
  them into the URL where they could land in proxy logs.
…equest (#2146)

XAITTSClient.cancel() sets both _is_cancelled and _needs_reconnect. The
next get() call previously reached _reconnect()/_ensure_connection() with
_is_cancelled still True; after the cancellation guard was added to
_connect_with_backoff() this caused the backoff loop to short-circuit
with a 499 and the request following a barge-in to fail immediately.

Move the _is_cancelled = False reset to the top of get() (after the
empty-text early return, before _reconnect) so a fresh request always
clears the per-request cancel flag before any connection work.

Add test_get_after_cancel_reconnects_instead_of_failing to lock in the
recovery behavior.
@wangyoucao577 wangyoucao577 merged commit 4e645cd into main May 18, 2026
33 of 34 checks passed
@wangyoucao577 wangyoucao577 deleted the dev/xai branch May 18, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants