feat(voice): CLI mic capture + transcribe via /voice (slice 2) by oratis · Pull Request #174 · oratis/deepcode

oratis · 2026-06-08T05:42:28Z

Summary

Slice 2 of /voice: interactive CLI dictation. Type /voice in the REPL → it records from the mic, transcribes locally with whisper.cpp, and pre-fills the input line with the transcript to edit before sending. Fully local — no audio leaves the machine.

Stacked on #173 (slice 1). Base is feat/voice-setup-detect; review/merge that first (or merge this into it). The slice-1 diff is not part of this PR's net change.

Per the agreed decisions: /voice command trigger (not a Ctrl+V keybinding), and auto-detect ffmpeg or sox (ffmpeg preferred, then rec/sox).

Changes

Core (@deepcode/core)

voice/record.ts:
- detectRecorder() — finds ffmpeg / rec / sox on PATH (preference order). Injectable which for tests.
- buildRecordArgs() — pure, per-tool/per-OS argv for a 16 kHz mono WAV: ffmpeg uses avfoundation (macOS) / alsa (Linux); rec/sox capture the default device. Throws on unsupported ffmpeg platforms without a device.
- recordToWav() — spawns the recorder, stops on an AbortSignal via SIGINT (so ffmpeg/sox flush a valid WAV trailer). A non-zero exit after abort resolves; without an abort it rejects (e.g. no mic). Injectable spawn.
VoiceConfig gains optional inputDevice (ffmpeg override); JSON schema updated.

CLI

voice-capture.ts — orchestration: detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + display lines. Handles not-ready, no-recorder, no-speech, and recorder/transcription failures gracefully.
/voice triggers capture when the REPL wires ctx.voiceCapture; otherwise falls back to the slice-1 readiness/setup output. /voice setup still forces install steps. Setup/ready lines extracted to pure, reusable helpers.
REPL wires voiceCapture and pre-fills the next prompt with the transcript via rl.write() (ctx.prefillInput), so the user edits before submitting.

Docs

VOICE_INPUT.md: Usage now documents the /voice flow (was Ctrl+V) + a "Install a mic recorder" section + inputDevice.
BEHAVIOR_PARITY.md: /voice row updated for CLI capture (still 🟡 — desktop pending).

Testing

pnpm typecheck — clean; pnpm lint — 0 errors; pnpm format:check — clean
core: 661 passed / 16 skipped (9 new recorder cases: detect / buildArgs per-OS / record orchestration incl. abort + no-mic)
cli: 151 passed (3 new: capture callback pre-fills, cancel/empty no-fill, /voice setup bypasses capture)
⚠️ Real-microphone end-to-end is manual — there's no audio hardware in CI, so the spawn boundary (recorder + whisper) is covered with injected fakes; the actual record→transcribe round-trip needs a local mic + model to verify.

Follow-up

Slice 3: desktop 🎙 record button + mic permission via a Tauri command, same whisper backend.

🤖 Generated with Claude Code

Surface the existing core whisper.cpp engine via a `/voice` slash command and add the settings schema for it. No mic capture yet — this is the safe, self-contained foundation per docs/VOICE_INPUT.md. Core: - Add VoiceConfig (provider | binPath | modelPath) to settings types, re-exported from @deepcode/core (the JSON schema already had the block). - New detectVoice() (voice/detect.ts): resolves the whisper binary (settings.binPath, else whisper-cli/whisper on PATH) and the model (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin), never throws — missing pieces become `problems`. Injectable probes for deterministic tests. - validateSettingsShallow now flags an unknown voice.provider. CLI: - /voice reports readiness or prints actionable setup steps (+ per-issue detail); `/voice setup` always shows install instructions. - SessionContext gains an optional `home` (honors --home) for the default model-path probe; wired in the REPL. Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases. Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Type /voice in the REPL to dictate: record from the mic, transcribe locally with whisper.cpp, and pre-fill the input line with the transcript to edit before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md. Core: - voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH; buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS (avfoundation on macOS, alsa on Linux; rec/sox use the default device); recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV trailer flushes — a non-zero exit after abort is expected, a non-zero exit without one rejects, e.g. no mic). Injectable which/spawn for tests. - VoiceConfig gains optional inputDevice (ffmpeg override); schema updated. CLI: - voice-capture.ts: orchestrates detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + status lines. Handles not-ready / no-recorder / no-speech / failures gracefully. - /voice now triggers capture when the REPL wires ctx.voiceCapture; falls back to readiness/setup output otherwise. `/voice setup` still forces the install steps. Setup lines extracted to pure, reused helpers. - REPL wires voiceCapture and pre-fills the next prompt via rl.write() once the transcript is ready (ctx.prefillInput). Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture. Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint + format all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

oratis · 2026-06-08T06:32:17Z

♻️ Rebased onto current main (resolves conflicts after #172 /tasks+/background landed) and retargeted base main so CI runs here. ✅ CI green.

Note: base is now main (not the stacked feature branch), so this PR's diff is cumulative — it includes the lower slice(s) too. Clean merge options:

Simplest: squash-merge feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3) #175 alone (it contains all three slices), then close the others; or
Per-slice: merge bottom-up, re-stacking each remaining branch with git rebase --onto main <old-base> after each merge (drops the already-merged slice).

…175) Local, on-device speech-to-text via whisper.cpp — no audio leaves the machine. - Core: VoiceConfig (binPath/modelPath/provider/inputDevice) + detectVoice(); existing WhisperCppProvider surfaced. detectRecorder/recordToWav (ffmpeg/sox). - CLI: /voice records → transcribes → pre-fills the input line to edit; /voice setup prints install steps. - Desktop: 🎙 composer button + Tauri voice_status/start/stop/cancel (ffmpeg, stdin-q graceful stop); mic entitlement + NSMicrophoneUsageDescription. - Docs: docs/VOICE_INPUT.md + BEHAVIOR_PARITY /voice row. Squashes the three review slices (PRs #173, #174, #175). Real-microphone round-trip needs manual on-device verification (no audio hardware / Rust in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

oratis · 2026-06-08T06:43:34Z

✅ Rolled into main via #175 (squash-merged — contains all three voice slices: setup/detection + CLI capture + desktop 🎙). Closing as merged.

oratis mentioned this pull request Jun 8, 2026

feat(voice): desktop 🎙 composer button + Tauri voice commands (slice 3) #175

Merged

oratis changed the base branch from feat/voice-setup-detect to main June 8, 2026 06:13

oratis closed this Jun 8, 2026

oratis reopened this Jun 8, 2026

t and others added 3 commits June 8, 2026 14:25

ci: trigger checks (PR retargeted to main)

22d709f

oratis force-pushed the feat/voice-cli-capture branch from de934e5 to 22d709f Compare June 8, 2026 06:29

oratis closed this Jun 8, 2026

oratis deleted the feat/voice-cli-capture branch June 8, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174

feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174
oratis wants to merge 3 commits into
mainfrom
feat/voice-cli-capture

oratis commented Jun 8, 2026

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oratis commented Jun 8, 2026

Summary

Changes

Testing

Follow-up

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

oratis commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant