feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3) by oratis · Pull Request #175 · oratis/deepcode

oratis · 2026-06-08T05:57:17Z

Summary

Slice 3 of /voice: voice dictation in the Mac desktop client, mirroring the CLI. Click the 🎙 button in the composer to record, click again to stop → it transcribes locally with whisper.cpp and splices the text into the input. Fully on-device.

Stacked on #174 (slice 2), which is stacked on #173 (slice 1). Base is feat/voice-cli-capture. Merge the chain bottom-up.

Changes

Rust (src-tauri)

voice.rs:
- voice_status — detect whisper binary + model (settings or defaults) + ffmpeg; returns readiness + problems.
- voice_start / voice_stop / voice_cancel — the in-flight recording Child lives in tauri-managed VoiceState between start and stop. Desktop uses ffmpeg and stops it gracefully by writing q to stdin (flushes a valid WAV trailer), then runs whisper, parses the transcript, and deletes the temp clip.
- parse_whisper_output ported from core (timestamps + log lines stripped).
Registered in lib.rs (+ .manage(VoiceState)).
Entitlements.plist: com.apple.security.device.audio-input. Info.plist: NSMicrophoneUsageDescription (Tauri merges it into the bundle) → the OS mic-permission prompt on first use.

Renderer

lib/voice.ts — typed voice_* wrappers + a pure insertTranscript() helper (splices at the caret with smart spacing).
lib/use-voice.ts — idle → recording → transcribing state machine; probes voice_status on mount to disable the button (with a tooltip) when voice isn't set up.
Repl.tsx — 🎙 button in the composer toolbar; the transcript is spliced at the cursor. index.css — button styling incl. a recording pulse.
preview-app.tsx — mock voice_* so the dev harness can exercise the button.

Docs: VOICE_INPUT.md desktop usage; BEHAVIOR_PARITY.md /voice row now covers CLI + desktop.

Testing

Rust: cargo test — 4 voice cases (parse / expand-home / ffmpeg-args / status-with-missing-paths); cargo build clean (no warnings).
Renderer: 6 voice.ts cases (IPC command names + insertTranscript); full desktop suite 60 pass; pnpm typecheck, vite build, pnpm lint (0 errors), pnpm format:check all clean.
Preview harness: verified the full idle → record (red ⏹) → stop → transcript spliced into the composer flow with the mock-Tauri preview (screenshots below).

⚠️ The real-microphone round-trip is not verified — there's no audio hardware in CI, CI doesn't compile Rust, and the avfoundation capture + macOS TCC permission flow only exercise on a packaged .app with a mic + a whisper model. The control flow, command wiring, arg-building, and transcript parsing are unit-tested; the actual record→audio→whisper round-trip needs a manual on-device pass.

🤖 Generated with Claude Code

Surface the existing core whisper.cpp engine via a `/voice` slash command and add the settings schema for it. No mic capture yet — this is the safe, self-contained foundation per docs/VOICE_INPUT.md. Core: - Add VoiceConfig (provider | binPath | modelPath) to settings types, re-exported from @deepcode/core (the JSON schema already had the block). - New detectVoice() (voice/detect.ts): resolves the whisper binary (settings.binPath, else whisper-cli/whisper on PATH) and the model (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin), never throws — missing pieces become `problems`. Injectable probes for deterministic tests. - validateSettingsShallow now flags an unknown voice.provider. CLI: - /voice reports readiness or prints actionable setup steps (+ per-issue detail); `/voice setup` always shows install instructions. - SessionContext gains an optional `home` (honors --home) for the default model-path probe; wired in the REPL. Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases. Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Type /voice in the REPL to dictate: record from the mic, transcribe locally with whisper.cpp, and pre-fill the input line with the transcript to edit before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md. Core: - voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH; buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS (avfoundation on macOS, alsa on Linux; rec/sox use the default device); recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV trailer flushes — a non-zero exit after abort is expected, a non-zero exit without one rejects, e.g. no mic). Injectable which/spawn for tests. - VoiceConfig gains optional inputDevice (ffmpeg override); schema updated. CLI: - voice-capture.ts: orchestrates detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + status lines. Handles not-ready / no-recorder / no-speech / failures gracefully. - /voice now triggers capture when the REPL wires ctx.voiceCapture; falls back to readiness/setup output otherwise. `/voice setup` still forces the install steps. Setup lines extracted to pure, reused helpers. - REPL wires voiceCapture and pre-fills the next prompt via rl.write() once the transcript is ready (ctx.prefillInput). Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture. Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint + format all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds local voice dictation to the Mac desktop client, mirroring the CLI: click 🎙 to record, click again to stop → transcribe with whisper.cpp → splice the text into the composer. Spec: docs/VOICE_INPUT.md. Rust (src-tauri): - voice.rs: voice_status (detect whisper bin + model + ffmpeg), voice_start / voice_stop / voice_cancel. The in-flight recording Child lives in tauri-managed VoiceState between start/stop. Desktop uses ffmpeg and stops it gracefully by writing `q` to stdin (flushes a valid WAV), then runs whisper and deletes the temp clip. parse_whisper_output ported from core. - Registered in lib.rs (+ .manage(VoiceState)). - Entitlements: com.apple.security.device.audio-input; Info.plist: NSMicrophoneUsageDescription (merged into the bundle by Tauri). Renderer: - lib/voice.ts: typed voice_* wrappers + pure insertTranscript() helper. - lib/use-voice.ts: idle→recording→transcribing state machine; probes voice_status on mount to disable the button (with a tooltip) when unset. - Repl.tsx: 🎙 button in the composer toolbar; transcript splices at the caret. index.css: button styles incl. a recording pulse. - preview-app.tsx: mock voice_* so the dev harness can exercise the button. Docs: VOICE_INPUT.md desktop usage; BEHAVIOR_PARITY /voice row now covers CLI + desktop. Testing: cargo test (4 voice cases: parse/expand-home/ffmpeg-args/status) + cargo build clean (no warnings). Renderer: 6 voice.ts cases (IPC names + insertTranscript); desktop suite 60 pass; typecheck + vite build + lint + format all clean. Verified the full idle→record→stop→insert flow in the mock-Tauri preview harness (screenshots). The real-microphone round-trip (avfoundation capture + TCC permission) needs manual on-device verification — CI compiles neither Rust nor the mic path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

oratis · 2026-06-08T06:32:20Z

♻️ Rebased onto current main (resolves conflicts after #172 /tasks+/background landed) and retargeted base main so CI runs here. ✅ CI green.

Note: base is now main (not the stacked feature branch), so this PR's diff is cumulative — it includes the lower slice(s) too. Clean merge options:

Simplest: squash-merge feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3) #175 alone (it contains all three slices), then close the others; or
Per-slice: merge bottom-up, re-stacking each remaining branch with git rebase --onto main <old-base> after each merge (drops the already-merged slice).

oratis changed the base branch from feat/voice-cli-capture to main June 8, 2026 06:13

oratis closed this Jun 8, 2026

oratis reopened this Jun 8, 2026

t and others added 5 commits June 8, 2026 14:25

ci: trigger checks (PR retargeted to main)

22d709f

ci: trigger checks (PR retargeted to main)

cabd3a6

oratis force-pushed the feat/voice-desktop branch from c2a7cff to cabd3a6 Compare June 8, 2026 06:29

oratis mentioned this pull request Jun 8, 2026

feat(voice): CLI mic capture + transcribe via /voice (slice 2) #174

Closed

oratis merged commit 38fcc3a into main Jun 8, 2026
3 checks passed

oratis mentioned this pull request Jun 8, 2026

feat(voice): /voice setup check + whisper.cpp detection (slice 1) #173

Closed

oratis deleted the feat/voice-desktop branch June 8, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3)#175

feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3)#175
oratis merged 5 commits into
mainfrom
feat/voice-desktop

oratis commented Jun 8, 2026

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oratis commented Jun 8, 2026

Summary

Changes

Testing

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant