Skip to content

Match native audio source format to device, recreate on sample-rate mismatch#303

Open
MaxHeimbrock wants to merge 3 commits into
mainfrom
max/mic-samplerate-recreate
Open

Match native audio source format to device, recreate on sample-rate mismatch#303
MaxHeimbrock wants to merge 3 commits into
mainfrom
max/mic-samplerate-recreate

Conversation

@MaxHeimbrock

Copy link
Copy Markdown
Contributor

Problem

When using a Bluetooth headset (or any time Unity's audio output configuration differs from the hardcoded defaults), microphone capture fails with sample_rate and num_channels don't match and the RtcAudioSource metadata-mismatch warning.

Root cause: RtcAudioSource created the native (Rust) audio source with a fixed sample_rate (48000) and num_channels (2). But captured frames flow through Unity's audio graph (AudioProbe.OnAudioFilterRead) at the DSP output configuration, which a Bluetooth headset can change at runtime. The Rust native source does not resample — NativeAudioSource::capture_frame rejects any frame whose rate/channels differ from how the source was configured.

Changes

  • RtcAudioSource: initializes the native source from Unity's actual output configuration (AudioSettings.GetConfiguration → sample rate + channels from speaker mode), falling back to the platform defaults when Unity can't report one. Adds a runtime safety net: a captured frame whose format doesn't match the live source is dropped, and the native source is recreated to match. Recreation is coalesced and marshaled from the audio thread to the main thread via the captured Unity SynchronizationContext. Handle is now mutable and a NativeSourceChanged event fires on recreation.
  • LocalAudioTrack / Track: a published track is bound to a specific source handle at creation and can't follow a new one in place, so LocalAudioTrack listens for NativeSourceChanged and transparently rebuilds — unpublish → re-create the FFI track onto the new handle (SwapHandle) → republish — keeping the same instance so callers' references stay valid.
  • Participant.PublishTrack: remembers (participant, options) on the audio track so it can republish itself after a recreation. Initial publish path is unchanged.
  • MeetManager (Meet sample): removed the redundant Microphone.Start(null, …, 44100).
  • SineWaveAudioSource (tests): passes its explicit sample rate to the base constructor so test behavior stays deterministic (no rebuild churn).

The hybrid approach preserves the existing publish-before-start contract: the native source still exists at construction, so existing callers and E2E tests that publish without starting the source continue to work.

Verification

  • Compiled the Runtime, PlayModeTests, and Meet Assembly-CSharp assemblies cleanly against the generated rsp files.
  • Not yet run here: the live E2E PlayMode tests (require livekit-server --dev) and an on-device Bluetooth connect/disconnect repro.

🤖 Generated with Claude Code

MaxHeimbrock and others added 3 commits June 11, 2026 15:03
The native (Rust) audio source was created with a hardcoded sample rate
(48000) and channel count (2). Microphone frames, however, arrive at
Unity's actual DSP output configuration, which can differ — most notably
when a Bluetooth headset connects and switches the output rate. The Rust
native source does not resample; it rejects frames whose rate/channels
don't match its configuration with "sample_rate and num_channels don't
match", producing the metadata-mismatch warning and capture failures.

RtcAudioSource now initializes the native source from Unity's real output
configuration (AudioSettings.GetConfiguration) instead of hardcoded
defaults, and adds a runtime safety net: when a captured frame's format
does not match the live source, the frame is dropped and the native
source is recreated to match (coalesced and marshaled to the main
thread). Because a track is bound to a specific source handle at creation
and cannot follow a new one in place, LocalAudioTrack listens for the
recreation and transparently rebuilds + republishes itself onto the new
handle, keeping the same instance so callers' references stay valid.

Sources that know their exact format (SineWaveAudioSource) pass it
explicitly to keep behavior deterministic. Also removes the redundant
Microphone.Start in the Meet sample.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the base RtcAudioSource constructor's `int channels = 2` hint with
two explicit constructors: a device-capture one that takes only the source
type and reads both sample rate and channel count from Unity's audio
configuration (falling back to the platform defaults), and an explicit one
for sources that generate a known fixed format. Either way the format is
corrected from the first captured frame, so the initial values are just a
starting point.

MicrophoneSource and BasicAudioSource now use device mode (no channel
hint); BasicAudioSource drops its unused `channels` parameter.
SineWaveAudioSource declares its exact (sampleRate, channels).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant