feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174
Closed
oratis wants to merge 3 commits into
Closed
feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174oratis wants to merge 3 commits into
oratis wants to merge 3 commits into
Conversation
Surface the existing core whisper.cpp engine via a `/voice` slash command and add the settings schema for it. No mic capture yet — this is the safe, self-contained foundation per docs/VOICE_INPUT.md. Core: - Add VoiceConfig (provider | binPath | modelPath) to settings types, re-exported from @deepcode/core (the JSON schema already had the block). - New detectVoice() (voice/detect.ts): resolves the whisper binary (settings.binPath, else whisper-cli/whisper on PATH) and the model (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin), never throws — missing pieces become `problems`. Injectable probes for deterministic tests. - validateSettingsShallow now flags an unknown voice.provider. CLI: - /voice reports readiness or prints actionable setup steps (+ per-issue detail); `/voice setup` always shows install instructions. - SessionContext gains an optional `home` (honors --home) for the default model-path probe; wired in the REPL. Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases. Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Type /voice in the REPL to dictate: record from the mic, transcribe locally with whisper.cpp, and pre-fill the input line with the transcript to edit before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md. Core: - voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH; buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS (avfoundation on macOS, alsa on Linux; rec/sox use the default device); recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV trailer flushes — a non-zero exit after abort is expected, a non-zero exit without one rejects, e.g. no mic). Injectable which/spawn for tests. - VoiceConfig gains optional inputDevice (ffmpeg override); schema updated. CLI: - voice-capture.ts: orchestrates detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + status lines. Handles not-ready / no-recorder / no-speech / failures gracefully. - /voice now triggers capture when the REPL wires ctx.voiceCapture; falls back to readiness/setup output otherwise. `/voice setup` still forces the install steps. Setup lines extracted to pure, reused helpers. - REPL wires voiceCapture and pre-fills the next prompt via rl.write() once the transcript is ready (ctx.prefillInput). Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture. Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint + format all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
de934e5 to
22d709f
Compare
Owner
Author
|
♻️ Rebased onto current Note: base is now
|
oratis
added a commit
that referenced
this pull request
Jun 8, 2026
…175) Local, on-device speech-to-text via whisper.cpp — no audio leaves the machine. - Core: VoiceConfig (binPath/modelPath/provider/inputDevice) + detectVoice(); existing WhisperCppProvider surfaced. detectRecorder/recordToWav (ffmpeg/sox). - CLI: /voice records → transcribes → pre-fills the input line to edit; /voice setup prints install steps. - Desktop: 🎙 composer button + Tauri voice_status/start/stop/cancel (ffmpeg, stdin-q graceful stop); mic entitlement + NSMicrophoneUsageDescription. - Docs: docs/VOICE_INPUT.md + BEHAVIOR_PARITY /voice row. Squashes the three review slices (PRs #173, #174, #175). Real-microphone round-trip needs manual on-device verification (no audio hardware / Rust in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
|
✅ Rolled into |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slice 2 of
/voice: interactive CLI dictation. Type/voicein the REPL → it records from the mic, transcribes locally with whisper.cpp, and pre-fills the input line with the transcript to edit before sending. Fully local — no audio leaves the machine.Per the agreed decisions:
/voicecommand trigger (not a Ctrl+V keybinding), and auto-detect ffmpeg or sox (ffmpeg preferred, thenrec/sox).Changes
Core (
@deepcode/core)voice/record.ts:detectRecorder()— findsffmpeg/rec/soxon PATH (preference order). Injectablewhichfor tests.buildRecordArgs()— pure, per-tool/per-OS argv for a 16 kHz mono WAV: ffmpeg usesavfoundation(macOS) /alsa(Linux);rec/soxcapture the default device. Throws on unsupported ffmpeg platforms without a device.recordToWav()— spawns the recorder, stops on anAbortSignalvia SIGINT (so ffmpeg/sox flush a valid WAV trailer). A non-zero exit after abort resolves; without an abort it rejects (e.g. no mic). Injectablespawn.VoiceConfiggains optionalinputDevice(ffmpeg override); JSON schema updated.CLI
voice-capture.ts— orchestration: detect → record (Enter to stop) →WhisperCppProvider.transcribe→ delete the temp WAV (+.txtside-file) → return transcript + display lines. Handles not-ready, no-recorder, no-speech, and recorder/transcription failures gracefully./voicetriggers capture when the REPL wiresctx.voiceCapture; otherwise falls back to the slice-1 readiness/setup output./voice setupstill forces install steps. Setup/ready lines extracted to pure, reusable helpers.voiceCaptureand pre-fills the next prompt with the transcript viarl.write()(ctx.prefillInput), so the user edits before submitting.Docs
VOICE_INPUT.md: Usage now documents the/voiceflow (was Ctrl+V) + a "Install a mic recorder" section +inputDevice.BEHAVIOR_PARITY.md:/voicerow updated for CLI capture (still 🟡 — desktop pending).Testing
pnpm typecheck— clean;pnpm lint— 0 errors;pnpm format:check— clean/voice setupbypasses capture)Follow-up
🤖 Generated with Claude Code