feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160
feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160IgorSwat wants to merge 12 commits into
Conversation
694fe4f to
1c2411e
Compare
02113ff to
6bba141
Compare
1c2411e to
0ea858d
Compare
| } | ||
| })(); | ||
|
|
||
| while (this.isStreaming && !finished) { |
There was a problem hiding this comment.
stream() resolves as soon as this.isStreaming flips, but the native loop only re-checks the flag at the top of the next iteration — so for up to timeout + one inference after await streamStop() returns, the native streamer is still alive, can still queue callInvoker_->invokeAsync callbacks, and still touches audioBuffer_. If the caller then runs unload() (or the host object is destroyed) we're in UAF / use-after-unload territory.
Two options: (a) actually join — stream() doesn't resolve until the native stream() call returns, and streamStop() awaits that; or (b) document explicitly that unload() is not safe immediately after streamStop() and that callbacks may fire after the promise resolves. (a) is the safer contract.
| runForward((inst) => inst.stream(input)); | ||
|
|
||
| const streamInsert = (waveform: Float32Array) => | ||
| runForward((inst) => { |
There was a problem hiding this comment.
Both streamInsert and streamStop go through runForward, which gates on isGenerating. While stream() is running, isGenerating is true, so every streamInsert(buffer) call rejects with "model is currently generating" — which is what Jakub hit on the rapid-tap repro. That thread is marked resolved, but I don't see the fix in either useVAD.ts or VADModule.ts. At minimum streamInsert (a buffer push) must bypass runForward for the streaming API to function; arguably streamStop should bypass too — you may want to stop precisely because inference is stuck.
There was a problem hiding this comment.
If it's true, then that's a Module Factory issue - and I don't think this PR is a good place to fix it.
chmjkb
left a comment
There was a problem hiding this comment.
besides my previous comment, I think it looks good, great work!

Description
This PR introduces changes focused on voice-activity-detection module and it's utilization within the library:
VoiceActivityDetectionScreenand changes the behavior ofSpeechToTextScreen, adding a toggle to switch the VAD submodule for STT on/off.Introduces a breaking change?
Type of change
Tested on
Testing instructions
VoiceActivityDetectionScreenwithin theSpeechdemo app.SpeechToTextScreenwithin theSpeechdemo app, with VAD toggle on.Screenshots
Related issues
#1118
Checklist
Additional notes