feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3)#175
Merged
Conversation
Surface the existing core whisper.cpp engine via a `/voice` slash command and add the settings schema for it. No mic capture yet — this is the safe, self-contained foundation per docs/VOICE_INPUT.md. Core: - Add VoiceConfig (provider | binPath | modelPath) to settings types, re-exported from @deepcode/core (the JSON schema already had the block). - New detectVoice() (voice/detect.ts): resolves the whisper binary (settings.binPath, else whisper-cli/whisper on PATH) and the model (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin), never throws — missing pieces become `problems`. Injectable probes for deterministic tests. - validateSettingsShallow now flags an unknown voice.provider. CLI: - /voice reports readiness or prints actionable setup steps (+ per-issue detail); `/voice setup` always shows install instructions. - SessionContext gains an optional `home` (honors --home) for the default model-path probe; wired in the REPL. Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases. Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Type /voice in the REPL to dictate: record from the mic, transcribe locally with whisper.cpp, and pre-fill the input line with the transcript to edit before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md. Core: - voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH; buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS (avfoundation on macOS, alsa on Linux; rec/sox use the default device); recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV trailer flushes — a non-zero exit after abort is expected, a non-zero exit without one rejects, e.g. no mic). Injectable which/spawn for tests. - VoiceConfig gains optional inputDevice (ffmpeg override); schema updated. CLI: - voice-capture.ts: orchestrates detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + status lines. Handles not-ready / no-recorder / no-speech / failures gracefully. - /voice now triggers capture when the REPL wires ctx.voiceCapture; falls back to readiness/setup output otherwise. `/voice setup` still forces the install steps. Setup lines extracted to pure, reused helpers. - REPL wires voiceCapture and pre-fills the next prompt via rl.write() once the transcript is ready (ctx.prefillInput). Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture. Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint + format all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds local voice dictation to the Mac desktop client, mirroring the CLI: click 🎙 to record, click again to stop → transcribe with whisper.cpp → splice the text into the composer. Spec: docs/VOICE_INPUT.md. Rust (src-tauri): - voice.rs: voice_status (detect whisper bin + model + ffmpeg), voice_start / voice_stop / voice_cancel. The in-flight recording Child lives in tauri-managed VoiceState between start/stop. Desktop uses ffmpeg and stops it gracefully by writing `q` to stdin (flushes a valid WAV), then runs whisper and deletes the temp clip. parse_whisper_output ported from core. - Registered in lib.rs (+ .manage(VoiceState)). - Entitlements: com.apple.security.device.audio-input; Info.plist: NSMicrophoneUsageDescription (merged into the bundle by Tauri). Renderer: - lib/voice.ts: typed voice_* wrappers + pure insertTranscript() helper. - lib/use-voice.ts: idle→recording→transcribing state machine; probes voice_status on mount to disable the button (with a tooltip) when unset. - Repl.tsx: 🎙 button in the composer toolbar; transcript splices at the caret. index.css: button styles incl. a recording pulse. - preview-app.tsx: mock voice_* so the dev harness can exercise the button. Docs: VOICE_INPUT.md desktop usage; BEHAVIOR_PARITY /voice row now covers CLI + desktop. Testing: cargo test (4 voice cases: parse/expand-home/ffmpeg-args/status) + cargo build clean (no warnings). Renderer: 6 voice.ts cases (IPC names + insertTranscript); desktop suite 60 pass; typecheck + vite build + lint + format all clean. Verified the full idle→record→stop→insert flow in the mock-Tauri preview harness (screenshots). The real-microphone round-trip (avfoundation capture + TCC permission) needs manual on-device verification — CI compiles neither Rust nor the mic path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c2a7cff to
cabd3a6
Compare
Owner
Author
|
♻️ Rebased onto current Note: base is now
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slice 3 of
/voice: voice dictation in the Mac desktop client, mirroring the CLI. Click the 🎙 button in the composer to record, click again to stop → it transcribes locally with whisper.cpp and splices the text into the input. Fully on-device.Changes
Rust (
src-tauri)voice.rs:voice_status— detect whisper binary + model (settings or defaults) + ffmpeg; returns readiness + problems.voice_start/voice_stop/voice_cancel— the in-flight recordingChildlives in tauri-managedVoiceStatebetween start and stop. Desktop uses ffmpeg and stops it gracefully by writingqto stdin (flushes a valid WAV trailer), then runs whisper, parses the transcript, and deletes the temp clip.parse_whisper_outputported from core (timestamps + log lines stripped).lib.rs(+.manage(VoiceState)).Entitlements.plist:com.apple.security.device.audio-input.Info.plist:NSMicrophoneUsageDescription(Tauri merges it into the bundle) → the OS mic-permission prompt on first use.Renderer
lib/voice.ts— typedvoice_*wrappers + a pureinsertTranscript()helper (splices at the caret with smart spacing).lib/use-voice.ts—idle → recording → transcribingstate machine; probesvoice_statuson mount to disable the button (with a tooltip) when voice isn't set up.Repl.tsx— 🎙 button in the composer toolbar; the transcript is spliced at the cursor.index.css— button styling incl. a recording pulse.preview-app.tsx— mockvoice_*so the dev harness can exercise the button.Docs:
VOICE_INPUT.mddesktop usage;BEHAVIOR_PARITY.md/voicerow now covers CLI + desktop.Testing
cargo test— 4 voice cases (parse / expand-home / ffmpeg-args / status-with-missing-paths);cargo buildclean (no warnings).voice.tscases (IPC command names +insertTranscript); full desktop suite 60 pass;pnpm typecheck,vite build,pnpm lint(0 errors),pnpm format:checkall clean..appwith a mic + a whisper model. The control flow, command wiring, arg-building, and transcript parsing are unit-tested; the actual record→audio→whisper round-trip needs a manual on-device pass.🤖 Generated with Claude Code