Skip to content

feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3)#175

Merged
oratis merged 5 commits into
mainfrom
feat/voice-desktop
Jun 8, 2026
Merged

feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3)#175
oratis merged 5 commits into
mainfrom
feat/voice-desktop

Conversation

@oratis

@oratis oratis commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Slice 3 of /voice: voice dictation in the Mac desktop client, mirroring the CLI. Click the 🎙 button in the composer to record, click again to stop → it transcribes locally with whisper.cpp and splices the text into the input. Fully on-device.

Stacked on #174 (slice 2), which is stacked on #173 (slice 1). Base is feat/voice-cli-capture. Merge the chain bottom-up.

Changes

Rust (src-tauri)

  • voice.rs:
    • voice_status — detect whisper binary + model (settings or defaults) + ffmpeg; returns readiness + problems.
    • voice_start / voice_stop / voice_cancel — the in-flight recording Child lives in tauri-managed VoiceState between start and stop. Desktop uses ffmpeg and stops it gracefully by writing q to stdin (flushes a valid WAV trailer), then runs whisper, parses the transcript, and deletes the temp clip.
    • parse_whisper_output ported from core (timestamps + log lines stripped).
  • Registered in lib.rs (+ .manage(VoiceState)).
  • Entitlements.plist: com.apple.security.device.audio-input. Info.plist: NSMicrophoneUsageDescription (Tauri merges it into the bundle) → the OS mic-permission prompt on first use.

Renderer

  • lib/voice.ts — typed voice_* wrappers + a pure insertTranscript() helper (splices at the caret with smart spacing).
  • lib/use-voice.tsidle → recording → transcribing state machine; probes voice_status on mount to disable the button (with a tooltip) when voice isn't set up.
  • Repl.tsx — 🎙 button in the composer toolbar; the transcript is spliced at the cursor. index.css — button styling incl. a recording pulse.
  • preview-app.tsx — mock voice_* so the dev harness can exercise the button.

Docs: VOICE_INPUT.md desktop usage; BEHAVIOR_PARITY.md /voice row now covers CLI + desktop.

Testing

  • Rust: cargo test — 4 voice cases (parse / expand-home / ffmpeg-args / status-with-missing-paths); cargo build clean (no warnings).
  • Renderer: 6 voice.ts cases (IPC command names + insertTranscript); full desktop suite 60 pass; pnpm typecheck, vite build, pnpm lint (0 errors), pnpm format:check all clean.
  • Preview harness: verified the full idle → record (red ⏹) → stop → transcript spliced into the composer flow with the mock-Tauri preview (screenshots below).

⚠️ The real-microphone round-trip is not verified — there's no audio hardware in CI, CI doesn't compile Rust, and the avfoundation capture + macOS TCC permission flow only exercise on a packaged .app with a mic + a whisper model. The control flow, command wiring, arg-building, and transcript parsing are unit-tested; the actual record→audio→whisper round-trip needs a manual on-device pass.

🤖 Generated with Claude Code

@oratis oratis changed the base branch from feat/voice-cli-capture to main June 8, 2026 06:13
@oratis oratis closed this Jun 8, 2026
@oratis oratis reopened this Jun 8, 2026
t and others added 5 commits June 8, 2026 14:25
Surface the existing core whisper.cpp engine via a `/voice` slash command
and add the settings schema for it. No mic capture yet — this is the safe,
self-contained foundation per docs/VOICE_INPUT.md.

Core:
- Add VoiceConfig (provider | binPath | modelPath) to settings types,
  re-exported from @deepcode/core (the JSON schema already had the block).
- New detectVoice() (voice/detect.ts): resolves the whisper binary
  (settings.binPath, else whisper-cli/whisper on PATH) and the model
  (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin),
  never throws — missing pieces become `problems`. Injectable probes for
  deterministic tests.
- validateSettingsShallow now flags an unknown voice.provider.

CLI:
- /voice reports readiness or prints actionable setup steps (+ per-issue
  detail); `/voice setup` always shows install instructions.
- SessionContext gains an optional `home` (honors --home) for the default
  model-path probe; wired in the REPL.

Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases.
Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Type /voice in the REPL to dictate: record from the mic, transcribe locally
with whisper.cpp, and pre-fill the input line with the transcript to edit
before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md.

Core:
- voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH;
  buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS
  (avfoundation on macOS, alsa on Linux; rec/sox use the default device);
  recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV
  trailer flushes — a non-zero exit after abort is expected, a non-zero
  exit without one rejects, e.g. no mic). Injectable which/spawn for tests.
- VoiceConfig gains optional inputDevice (ffmpeg override); schema updated.

CLI:
- voice-capture.ts: orchestrates detect → record (Enter to stop) →
  WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) →
  return transcript + status lines. Handles not-ready / no-recorder /
  no-speech / failures gracefully.
- /voice now triggers capture when the REPL wires ctx.voiceCapture; falls
  back to readiness/setup output otherwise. `/voice setup` still forces the
  install steps. Setup lines extracted to pure, reused helpers.
- REPL wires voiceCapture and pre-fills the next prompt via rl.write() once
  the transcript is ready (ctx.prefillInput).

Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a
recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture.

Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new
CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end
is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint +
format all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds local voice dictation to the Mac desktop client, mirroring the CLI:
click 🎙 to record, click again to stop → transcribe with whisper.cpp →
splice the text into the composer. Spec: docs/VOICE_INPUT.md.

Rust (src-tauri):
- voice.rs: voice_status (detect whisper bin + model + ffmpeg), voice_start /
  voice_stop / voice_cancel. The in-flight recording Child lives in
  tauri-managed VoiceState between start/stop. Desktop uses ffmpeg and stops
  it gracefully by writing `q` to stdin (flushes a valid WAV), then runs
  whisper and deletes the temp clip. parse_whisper_output ported from core.
- Registered in lib.rs (+ .manage(VoiceState)).
- Entitlements: com.apple.security.device.audio-input; Info.plist:
  NSMicrophoneUsageDescription (merged into the bundle by Tauri).

Renderer:
- lib/voice.ts: typed voice_* wrappers + pure insertTranscript() helper.
- lib/use-voice.ts: idle→recording→transcribing state machine; probes
  voice_status on mount to disable the button (with a tooltip) when unset.
- Repl.tsx: 🎙 button in the composer toolbar; transcript splices at the
  caret. index.css: button styles incl. a recording pulse.
- preview-app.tsx: mock voice_* so the dev harness can exercise the button.

Docs: VOICE_INPUT.md desktop usage; BEHAVIOR_PARITY /voice row now covers
CLI + desktop.

Testing: cargo test (4 voice cases: parse/expand-home/ffmpeg-args/status) +
cargo build clean (no warnings). Renderer: 6 voice.ts cases (IPC names +
insertTranscript); desktop suite 60 pass; typecheck + vite build + lint +
format all clean. Verified the full idle→record→stop→insert flow in the
mock-Tauri preview harness (screenshots). The real-microphone round-trip
(avfoundation capture + TCC permission) needs manual on-device verification —
CI compiles neither Rust nor the mic path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oratis

oratis commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

♻️ Rebased onto current main (resolves conflicts after #172 /tasks+/background landed) and retargeted base main so CI runs here. ✅ CI green.

Note: base is now main (not the stacked feature branch), so this PR's diff is cumulative — it includes the lower slice(s) too. Clean merge options:

@oratis oratis merged commit 38fcc3a into main Jun 8, 2026
3 checks passed
@oratis oratis deleted the feat/voice-desktop branch June 8, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant