feat: voice-activity streaming mode & inner-vad for speech-to-text module by IgorSwat · Pull Request #1160 · software-mansion/react-native-executorch

IgorSwat · 2026-05-20T13:09:00Z

Description

This PR introduces changes focused on voice-activity-detection module and it's utilization within the library:

Native side VAD streaming - introduces a continuous voice-activity-detection mechanism with user-friendly callback system. Example usage from demo app:

  await model.stream({
    onSpeechBegin: () => {...},
    onSpeechEnd: () => {...},
    options: {...},
  });

VAD x STT integration - adds an option to utilize voice-activity-detection within the speech-to-text module, significantly improving the effective performance of the STT.
Demo apps: introduces new screen in the speech demo app: VoiceActivityDetectionScreen and changes the behavior of SpeechToTextScreen, adding a toggle to switch the VAD submodule for STT on/off.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

To test the VAD streaming: run the VoiceActivityDetectionScreen within the Speech demo app.
To test the VAD & STT integration: run the SpeechToTextScreen within the Speech demo app, with VAD toggle on.

Screenshots

Related issues

#1118

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak · 2026-05-21T14:01:02Z

Please also fix these warnings:

msluszniak · 2026-05-21T15:39:26Z

+      }
+    })();
+
+    while (this.isStreaming && !finished) {


stream() resolves as soon as this.isStreaming flips, but the native loop only re-checks the flag at the top of the next iteration — so for up to timeout + one inference after await streamStop() returns, the native streamer is still alive, can still queue callInvoker_->invokeAsync callbacks, and still touches audioBuffer_. If the caller then runs unload() (or the host object is destroyed) we're in UAF / use-after-unload territory.

Two options: (a) actually join — stream() doesn't resolve until the native stream() call returns, and streamStop() awaits that; or (b) document explicitly that unload() is not safe immediately after streamStop() and that callbacks may fire after the promise resolves. (a) is the safer contract.

msluszniak · 2026-05-21T15:39:28Z

+    runForward((inst) => inst.stream(input));
+
+  const streamInsert = (waveform: Float32Array) =>
+    runForward((inst) => {


Both streamInsert and streamStop go through runForward, which gates on isGenerating. While stream() is running, isGenerating is true, so every streamInsert(buffer) call rejects with "model is currently generating" — which is what Jakub hit on the rapid-tap repro. That thread is marked resolved, but I don't see the fix in either useVAD.ts or VADModule.ts. At minimum streamInsert (a buffer push) must bypass runForward for the streaming API to function; arguably streamStop should bypass too — you may want to stop precisely because inference is stuck.

If it's true, then that's a Module Factory issue - and I don't think this PR is a good place to fix it.

chmjkb

besides my previous comment, I think it looks good, great work!

IgorSwat requested review from chmjkb and msluszniak May 20, 2026 13:09

IgorSwat force-pushed the @is/vad-streaming branch from 694fe4f to 1c2411e Compare May 20, 2026 13:15

IgorSwat changed the base branch from main to @is/speech-to-text-ultimate May 20, 2026 13:26

chmjkb requested changes May 20, 2026

View reviewed changes

msluszniak reviewed May 20, 2026

View reviewed changes

IgorSwat force-pushed the @is/speech-to-text-ultimate branch from 02113ff to 6bba141 Compare May 20, 2026 15:46

msluszniak reviewed May 20, 2026

View reviewed changes

Comment thread ...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp Outdated

chmjkb requested changes May 21, 2026

View reviewed changes

Comment thread apps/speech/screens/SpeechToTextScreen.tsx

Comment thread apps/speech/screens/VoiceActivityDetectionScreen.tsx

Base automatically changed from @is/speech-to-text-ultimate to main May 21, 2026 08:20

IgorSwat added 8 commits May 21, 2026 10:34

Add CoreML whisper models

c1a1e97

Update urls & audio-api

8780a78

Add CoreML whisper models

3a97a75

Implement VAD streaming

b94d5f8

Integrate VAD with STT

0d2dbf1

Fix wrong include issue

b8cf8fa

Rebase with other PR changes

4782eda

Bump audio-api version

0ea858d

IgorSwat force-pushed the @is/vad-streaming branch from 1c2411e to 0ea858d Compare May 21, 2026 08:55

IgorSwat added 2 commits May 21, 2026 11:38

Apply review suggestions

790fb9c

Fix demo app keyboard behavior

dc5113d

msluszniak assigned IgorSwat May 21, 2026

msluszniak added the feature PRs that implement a new feature label May 21, 2026

Update demos & change default STT model for iOS simulator

177ce98

IgorSwat requested a review from benITo47 May 21, 2026 12:49

chmjkb reviewed May 21, 2026

View reviewed changes

Comment thread docs/docs/03-hooks/01-natural-language-processing/useSpeechToText.md Outdated

chmjkb reviewed May 21, 2026

View reviewed changes

Comment thread docs/docs/04-typescript-api/01-natural-language-processing/VADModule.md Outdated

benITo47 reviewed May 21, 2026

View reviewed changes

Comment thread ...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h Outdated

benITo47 requested changes May 21, 2026

View reviewed changes

msluszniak reviewed May 21, 2026

View reviewed changes

Comment thread ...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp Outdated

msluszniak reviewed May 21, 2026

View reviewed changes

Comment thread ...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp Outdated

msluszniak reviewed May 21, 2026

View reviewed changes

Comment thread ...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h Outdated

msluszniak reviewed May 21, 2026

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp Outdated

msluszniak linked an issue May 21, 2026 that may be closed by this pull request

Implement continuous voice activity detection #1118

Open

chmjkb reviewed May 22, 2026

View reviewed changes

Apply review suggestions

6efe72e

This was referenced May 22, 2026

VAD: duration constants use hop units while neighbours use ms #1172

Open

VAD streaming: streamInsert/streamStop blocked by isGenerating gate during stream() #1173

Open

Conversation

IgorSwat commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak May 21, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IgorSwat commented May 20, 2026 •

edited

Loading

msluszniak May 21, 2026 •

edited

Loading

IgorSwat May 22, 2026 •

edited

Loading