Match native audio source format to device, recreate on sample-rate mismatch#303
Open
MaxHeimbrock wants to merge 3 commits into
Open
Match native audio source format to device, recreate on sample-rate mismatch#303MaxHeimbrock wants to merge 3 commits into
MaxHeimbrock wants to merge 3 commits into
Conversation
The native (Rust) audio source was created with a hardcoded sample rate (48000) and channel count (2). Microphone frames, however, arrive at Unity's actual DSP output configuration, which can differ — most notably when a Bluetooth headset connects and switches the output rate. The Rust native source does not resample; it rejects frames whose rate/channels don't match its configuration with "sample_rate and num_channels don't match", producing the metadata-mismatch warning and capture failures. RtcAudioSource now initializes the native source from Unity's real output configuration (AudioSettings.GetConfiguration) instead of hardcoded defaults, and adds a runtime safety net: when a captured frame's format does not match the live source, the frame is dropped and the native source is recreated to match (coalesced and marshaled to the main thread). Because a track is bound to a specific source handle at creation and cannot follow a new one in place, LocalAudioTrack listens for the recreation and transparently rebuilds + republishes itself onto the new handle, keeping the same instance so callers' references stay valid. Sources that know their exact format (SineWaveAudioSource) pass it explicitly to keep behavior deterministic. Also removes the redundant Microphone.Start in the Meet sample. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the base RtcAudioSource constructor's `int channels = 2` hint with two explicit constructors: a device-capture one that takes only the source type and reads both sample rate and channel count from Unity's audio configuration (falling back to the platform defaults), and an explicit one for sources that generate a known fixed format. Either way the format is corrected from the first captured frame, so the initial values are just a starting point. MicrophoneSource and BasicAudioSource now use device mode (no channel hint); BasicAudioSource drops its unused `channels` parameter. SineWaveAudioSource declares its exact (sampleRate, channels). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When using a Bluetooth headset (or any time Unity's audio output configuration differs from the hardcoded defaults), microphone capture fails with
sample_rate and num_channels don't matchand theRtcAudioSourcemetadata-mismatch warning.Root cause:
RtcAudioSourcecreated the native (Rust) audio source with a fixedsample_rate(48000) andnum_channels(2). But captured frames flow through Unity's audio graph (AudioProbe.OnAudioFilterRead) at the DSP output configuration, which a Bluetooth headset can change at runtime. The Rust native source does not resample —NativeAudioSource::capture_framerejects any frame whose rate/channels differ from how the source was configured.Changes
RtcAudioSource: initializes the native source from Unity's actual output configuration (AudioSettings.GetConfiguration→ sample rate + channels from speaker mode), falling back to the platform defaults when Unity can't report one. Adds a runtime safety net: a captured frame whose format doesn't match the live source is dropped, and the native source is recreated to match. Recreation is coalesced and marshaled from the audio thread to the main thread via the captured UnitySynchronizationContext.Handleis now mutable and aNativeSourceChangedevent fires on recreation.LocalAudioTrack/Track: a published track is bound to a specific source handle at creation and can't follow a new one in place, soLocalAudioTracklistens forNativeSourceChangedand transparently rebuilds — unpublish → re-create the FFI track onto the new handle (SwapHandle) → republish — keeping the same instance so callers' references stay valid.Participant.PublishTrack: remembers(participant, options)on the audio track so it can republish itself after a recreation. Initial publish path is unchanged.MeetManager(Meet sample): removed the redundantMicrophone.Start(null, …, 44100).SineWaveAudioSource(tests): passes its explicit sample rate to the base constructor so test behavior stays deterministic (no rebuild churn).The hybrid approach preserves the existing publish-before-start contract: the native source still exists at construction, so existing callers and E2E tests that publish without starting the source continue to work.
Verification
livekit-server --dev) and an on-device Bluetooth connect/disconnect repro.🤖 Generated with Claude Code