feat: add xai asr and tts extensions#2146
Conversation
|
test |
|
placeholder being replaced |
|
edit test |
Code Review: xAI ASR and TTS ExtensionsGood overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean. Below are the issues worth addressing before merge, ordered by severity. |
Code Review: xAI ASR and TTS ExtensionsGood overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean. Below are the issues worth addressing before merge, ordered by severity. Bugs / Correctness1. XAIASRConfig.to_json() returns Python repr, not JSON In 2. _message_handler re-fires on_open() on duplicate transcript.created In 3. Redundant copy.deepcopy after json.loads in _flush_buffered_audio_frames
4. _ensure_connection does not verify the socket is still open (TTS)
API Key Validation Leaks Test Logic into Production5. xai_tts_python/config.py validate() accepts "test" prefix The validation accepts Missing Manifest Properties6. xai_asr_python/manifest.json omits dump, dump_path, and finalize_timeout_ms These three properties are present in Fatal-Error String Matching is Fragile7. ASR _is_fatal_connection_error and on_error Both use substring scans on free-form error strings, including the literal string Minor / Style8. Missing license header in 9. Missing newlines at end of files: 10. 11. Duplicated 12. Stale test log string in Summary
The top priorities before merge are items 1 (invalid JSON from |
|
PR Review: xAI ASR and TTS Extensions - test post |
|
PR Review: xAI ASR and TTS Extensions. Issues to fix: (1) on_open called twice in recognition.py - both start() and _message_handler call the callback on transcript.created, causing duplicate metrics and buffer flushes if server re-sends; (2) buf potentially unbound in xai_asr_python send_audio - if lock_buf raises, buf is undefined in the except handler, initialize buf=None before the try block; (3) property_en_hotwords.json is identical to property_en.json with no hotword impl - remove or implement. Issues to address: (4) XAIASRConfig.validate() shadows Pydantic class method, rename to validate_config(); (5) TTS api key startswith('test') check is too broad; (6) turn_status values 1 and 2 are undocumented magic numbers; (7) isinstance str check for text_normalization is dead code since the field is typed bool; (8) missing trailing newlines in requirements.txt and tests/bin/start; (9) test_basic.py copy-paste error says Deepgram not xAI. Positives: great reconnect logic with backoff and fatal escalation, API key redaction, _contains_spoken_content guard, comprehensive test coverage (timeout, flush, punctuation, reconnect-after-error, auth failure), clean agent.py update for the new event types. Generated by Claude Code |
Code Review: xAI ASR + TTS ExtensionsGreat overall structure. The two extensions follow existing repo conventions closely and come with thorough test coverage (17 ASR + 23 TTS tests). The reconnection logic, error classification, and sensitive-data redaction are all solid. A few issues worth addressing before merge. Bugs1. Potential If buf = frame.lock_buf()
try:
audio_data = bytes(buf)
...
return True
except Exception as e:
self.ten_env.log_error(f"Error sending audio: {e}")
return False
finally:
frame.unlock_buf(buf)Code Quality2.
3.
4. Empty The override only calls 5.
Completeness / Consistency6.
7. Both files in Minor Nits8. Misleading validation error message for TTS API key (
9. Copy-paste error in TTS test ( The basic test sends Summary
|
| def _calculate_request_event_interval_ms(self) -> int: | ||
| if self.sent_ts is None: | ||
| return 0 | ||
| return int((datetime.now() - self.sent_ts).total_seconds() * 1000) |
There was a problem hiding this comment.
The interval time is formulate is "current time - first audio receive time", not first request send time
| code=ModuleErrorCode.NON_FATAL_ERROR, | ||
| vendor_info=ModuleErrorVendorInfo(vendor=self.vendor()), | ||
| ) | ||
| await self._finalize_request(TTSAudioEndReason.ERROR, error=error) |
There was a problem hiding this comment.
_finalize_request only for handling last request of the turn.
if t.text_input_end:
| if start_ms <= 0: | ||
| start_ms = int(datetime.now().timestamp() * 1000) | ||
|
|
||
| transcript_result = TTSTextResult( |
There was a problem hiding this comment.
TTSTextResult is not necessary. Actually, we only use it when we can get a word-level timestamp. if this tts vendor cannot provide, we can remove it
| message=e.body, | ||
| ), | ||
| ) | ||
| await self._finalize_request(TTSAudioEndReason.ERROR, error=error) |
There was a problem hiding this comment.
_finalize_request only for handling last request of the turn.
if t.text_input_end:
Code Review: PR #2146 — Add xAI ASR and TTS ExtensionsOverviewThis PR adds two well-structured TEN extensions — IssuesBug:
|
# Conflicts: # ai_agents/.env.example
Code Review —
|
| return int(duration_sec * 1000) | ||
|
|
||
| def _calculate_request_event_interval_ms(self) -> int: | ||
| return self._request_event_interval_ms |
There was a problem hiding this comment.
Interval time is the last audio receive time - first audio receive time. not ttfb.
It represents the audio length of the current request id
There was a problem hiding this comment.
Fixed in 2820e94. _calculate_request_event_interval_ms now returns _last_audio_chunk_ts - _first_audio_chunk_ts. The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.
request_event_interval_ms now measures last audio chunk arrival minus first audio chunk arrival, representing the wall-clock audio receive window for the request. Previously the field stored TTFB which was the wrong semantic per maintainer feedback. The TTFB metric emitted via send_tts_ttfb_metrics is unchanged.
Review of xAI ASR/TTS extensionsThanks for the thorough work — the test coverage is impressive (40 tests) and the bounded-reconnect / fatal escalation pattern is consistent. Below are issues grouped by severity. Bugs
Code quality / consistency
Security / robustness
Performance
Test coverage notes
Smaller nits
Overall: the implementation is solid and the test count is reassuring; the most important fixes before merge are the non-PCM duration math (#2), the reconnect success contract (#5/#6), and the |
- asr send_audio: initialize buf before try to avoid UnboundLocalError if frame.lock_buf() raises - asr send_audio_frame: derive bytes_per_sample from encoding so mulaw/alaw duration math no longer hardcodes 2 bytes/sample - tts config: parse text_normalization string explicitly so "false" coerces to False rather than True via bool() - asr stop_connection: drop stop/close-expected resets at function exit; they belong in start_connection and were racing with the on_close callback - tts client: add is_connected() so the extension does not poke at private _ws state to decide if reconnect is needed - tts request_tts: check prospective length before incrementing the request text length counter so over-length requests do not leave stale state
- drop empty test mock module and duplicate hotwords test config - drop package manifest entries for paths that do not exist (docs/, BUILD.gn, **.tent) in xai_asr_python and xai_tts_python - extend sensitive-handling redaction beyond api_key to cover any params field whose name contains key/token/secret/signature/password
|
Latest commits address all open threads:
Latest |
Code Review: feat: add xai asr and tts extensionsOverall, this is a well-structured, well-tested addition that follows the established patterns of other ASR/TTS extensions in this repo. Configs are validated, secrets are redacted before logging, reconnect/backoff is bounded, and there is solid mock-based test coverage on both sides. Below are the issues I found, grouped roughly by severity. Bugs and correctness
Code quality and best practices
Performance
Security
Test coverage
Nits
SummaryThe code is in good shape and matches the patterns already established in sibling extensions. The most important things to address before merging are (1) the channels-aware duration calculation in ASR send_audio_frame, (2) the dead text_normalization branch in build_ws_url, (3) the TypeError-as-arity-detection in ReconnectManager, and (4) tightening the validate method naming. The rest are quality-of-life fixes that can land in follow-ups. |
…onse (#2146) _first_audio_chunk_ts was being set inside the EVENT_TTS_TTFB_METRIC branch. That event is yielded by the client just before the first EVENT_TTS_RESPONSE, so the recorded "first chunk arrival" was a few microseconds earlier than the actual first chunk receive in the extension. Move the assignment into the first EVENT_TTS_RESPONSE so request_event_interval_ms is literally last_audio_chunk_ts - first_audio_chunk_ts as documented.
|
Follow-up on Fixed in |
PR Review: xAI ASR + TTS extensionsNice scope: the new extensions are well-tested (40+ unit tests), the graph wiring covers all four ASR/TTS combinations, and the redact/reconnect/dump scaffolding follows the patterns from neighboring vendor extensions. Below are concrete points worth addressing before merge. Bugs / correctness
API / brittleness
Test coverage
Nits
Overall: solid first cut. Items 1–4 above are the ones I'd block on; the rest are polish. |
…gnition (#2146) finalize() guarded self.recognition at entry but then re-accessed it after awaiting send_audio_done and wait_for_done. The on_close callback nulls self.recognition, so a close firing mid-finalize made the later self.recognition.done_event access raise AttributeError. Capture a local reference at the top of finalize and use it throughout so the call stays consistent even if on_close clears the attribute mid-flight.
Review: xAI ASR & TTS extensions (#2146)Thanks for the substantial contribution — both extensions follow the established Bugs / correctness
Design / API
Style / nits
Test coverageTest counts are healthy (40 across both extensions) and cover finalize, reconnect, error mapping, dumps, and metrics — well above the bar for an extension PR. Suggested gaps:
Security
Housekeeping
Overall direction is good — the fixes above are mostly localized and shouldn't require restructuring. Happy to re-review once these are addressed. |
xai_asr_python:
- multi-channel duration math: bytes_per_ms now multiplies by channels so
channels=2 reports correct wall-clock instead of double
- encoding allow-list consistency: framing now uses {pcm, mulaw, alaw} to
match config.validate
- startup websocket leak: close the underlying socket if the first-message
handshake raises before the message handler task is started
- drop the TypeError fallback in reconnect_manager.handle_reconnect; the
single caller passes the two-arg form, so the fallback only masked real
type bugs in error_handler
- delete unused TIMEOUT_CODE constant
xai_tts_python:
- 403 is now classified as FATAL_ERROR alongside 401 (xAI returns 403 for
disabled/insufficient-quota keys)
- recorder_map entries are popped after _finalize_request to avoid leaking
PCMWriter file handles in long-running sessions
- drop dead isinstance(text_normalization, str) branch in _build_ws_url
now that update_params coerces to bool via _coerce_bool
- _ensure_connection uses is_connected() so a non-OPEN socket triggers a
reconnect instead of being reused
- _connect uses getattr for InvalidStatus.response.status_code so older
websockets releases that expose status_code directly are handled
- drop the xai- prefix check on api_key; real auth failures surface from
the server handshake
Tests added:
- xai_asr_python.tests.test_send_audio_duration covers pcm vs mulaw and
channels=2 duration math
- xai_tts_python.tests.test_review_feedback covers 403 -> FATAL_ERROR
classification and recorder_map cleanup after finalize
Review of PR #2146 —
|
|
Second auto-review pass addressed in ASR
TTS
Tests added
Results on
|
PR #2146 Review — xAI ASR/TTS ExtensionsThanks for the comprehensive submission — overall structure follows the existing extension conventions well, and test coverage is impressive (40+ tests across the two extensions). Below are findings, ordered by severity. Bugs / Correctness1. TTS sample width is hard-coded to 2 bytes, breaking mu-law/A-law duration math ( def synthesize_audio_sample_width(self) -> int:
return 2
2. Codec allow-list likely overstates vendor support ( if self.codec not in {"pcm", "mp3", "wav", "mulaw", "ulaw", "alaw"}:The README and graph properties only document 3. ASR Every failed retry calls 4. Reconnect manager type signature is inconsistent with how it's called ( error_handler: Optional[Callable[[ModuleError], Awaitable[None]]] = None,But the call site invokes 5. ASR if first_event.get("type") != "transcript.created":
raise RuntimeError(f"Unexpected xAI STT startup event: {first_event.get('type')}")If xAI returns an 6. TTS The client sends 7. Hard-coded Currently fine since only ASR uses it, but the class is otherwise generic — pass Code quality / consistency8. Pydantic v2 has 9. ASR manifest version The TTS manifest in this same PR starts at 10. Duplicated Identical helpers in 11. - raise FileNotFoundError(f"Config file not found: {config_file_path}")
+ pytest.skip(f"Config file not found: {config_file_path}. ...")Turning a missing config into a skip means a typo in a future provider's config filename is silently green. If the goal is "xAI doesn't have word-level timing," a cleaner alternative is to gate via 12. ASR Only 13. TTS Default is 14. Hardcoded URL 15. Worth tracing whether the in-flight Security16. Safe-failing, just noting that future fields like 17. URL is logged at info level ( The current URL is safe (Bearer token in header), but if xAI ever moves the key into a query param like some other vendors do, logging the URL would leak it. Consider scrubbing query strings before logging. Performance18. Not a real concern at typical ASR rates, just calling it out: if xAI sends partials at high frequency, a select-by-prefix could trim parsing. 19. ASR if self.is_connected():
await self.stop_connection()Forces a close + reconnect on every TestingCoverage is solid. Three suggestions: 20. No test asserts duration for mu-law/A-law TTS — would have caught issue #1. 21. No test exercises the reuse-across-requests TTS path — would catch issue #6. The mocked tests bypass the real 22. Minor
SummaryThe architecture and lifecycle handling (start / connect / finalize / reconnect / stop) are clean and follow patterns from neighboring extensions. The most important things to address before merge are #1 (sample width), #3 (error event flooding), #5 (startup error event handling), and #6 (websocket reuse semantics) — those are correctness/UX bugs visible in production. The rest are quality / consistency polish. Nice work on the test coverage and the explicit |
…ension (#2146) pylint W0611 flagged LOG_CATEGORY_VENDOR as unused in xai_tts_python/extension.py after recent edits removed the only usage. The ci lint job treats any pylint warning as fatal so the import has to go.
PR Review: feat: add xai asr and tts extensionsThanks for the substantial contribution — the extensions are well-structured, follow existing patterns from sibling extensions, and the test coverage is generous (40 test functions across both extensions). Below is a focused review with the most useful findings up top. 🔴 Bugs / correctness1. 2. 3. 4. Double-unlock risk on success path — same function 5. 6. 🟡 Code quality / style7. Magic numbers scattered
8. 9. Reconnect-manager docstring drift 10. 11. 🟢 Security
🟡 Performance
🟢 Test coverage
🟢 Misc / polish
SummaryOverall this is a strong, well-tested extension pair. The most important item to address before merge is #1 (TTS reconnect counter exhaustion) — it's a latent bug that will surface in production after the first sustained xAI outage. #2, #3, #5 are also worth fixing before merge. The rest are polish. Nice work — happy to re-review once the TTS reconnect fix lands. |
Critical fix:
- xai_tts_python.xai_tts.XAITTSClient._connect_with_backoff used the
persistent self._connect_exp_cnt as the loop predicate. After a fully
failed cycle the counter sat at MAX_CONNECT_ATTEMPTS and the next call
skipped the loop, raising 503 with no real attempt. Switch to a local
attempt counter scoped to each cycle so the client recovers after an
outage. Add regression test
test_connect_backoff_recovers_after_exhausted_cycle.
Cancellation:
- _connect_with_backoff now checks self._is_cancelled and bails with 499
instead of continuing to retry during shutdown. New test
test_connect_backoff_respects_cancellation covers this.
Bug fixes:
- xai_asr_python.send_audio: copy then unlock immediately instead of
holding the audio buffer locked across awaits. set buf=None after
unlock so the except branch cannot double-unlock under future edits.
- xai_tts_python.config.update_params: switch from params.pop() to
params.get() + explicit cleanup so a second call is idempotent.
- Rename XAIASRConfig.validate / XAITTSConfig.validate to
validate_config to avoid shadowing the pydantic BaseModel name. Update
callers and tests.
- xai_asr_python.on_init now sets _init_failed when the config fails
to parse so start_connection short-circuits instead of emitting a
second FATAL error for the same root cause.
- _emit_asr_result drops the empty-text-with-final carve-out; finalize
already filters on payload.get("text").
Quality:
- Surface magic constants: MAX_REQUEST_TEXT_CHARS in xai_tts_python,
AUDIO_BUFFER_BYTE_LIMIT and RECONNECT_MAX_ATTEMPTS in xai_asr_python.
- XAIASRRecognitionCallback now inherits from abc.ABC so the
@AbstractMethod decorators are enforced.
- Reconnect manager docstring updated to reflect that callers can
override max_attempts (ASR uses 10).
- Recognition.start RuntimeError on unexpected first event now includes
the raw payload for easier debugging.
- TTS backoff adds 80-120% jitter so multi-replica deployments do not
reconnect at the same millisecond after a shared outage.
Security:
- XAITTSClient._build_ws_url now drops leftover params whose name looks
sensitive (key/token/secret/signature/password) instead of forwarding
them into the URL where they could land in proxy logs.
…equest (#2146) XAITTSClient.cancel() sets both _is_cancelled and _needs_reconnect. The next get() call previously reached _reconnect()/_ensure_connection() with _is_cancelled still True; after the cancellation guard was added to _connect_with_backoff() this caused the backoff loop to short-circuit with a 499 and the request following a barge-in to fail immediately. Move the _is_cancelled = False reset to the top of get() (after the empty-text early return, before _reconnect) so a fresh request always clears the per-request cancel flag before any connection work. Add test_get_after_cancel_reconnects_instead_of_failing to lock in the recovery behavior.
Summary
xai_asr_pythonandxai_tts_pythonTEN extensionsvoice-assistantexampleWhat Changed
ai_agents/agents/ten_packages/extension/xai_asr_pythonaudio.donefinalize flowai_agents/agents/ten_packages/extension/xai_tts_pythonAsyncTTS2BaseExtensionai_agents/agents/examples/voice-assistant/tenapp/manifest.jsonai_agents/agents/examples/voice-assistant/tenapp/property.jsonvoice_assistant_xai_asrvoice_assistant_xai_ttsvoice_assistant_xai_fullai_agents/agents/examples/voice-assistant/README.mdai_agents/.env.exampleXAI_API_KEYTesting
23 passed17 passedvoice_assistant_xai_fullNotes
docs/progressive-disclosurein TEN Frameworkimprove/tts-extension-guidein TEN Portal