Skip to content

feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174

Closed
oratis wants to merge 3 commits into
mainfrom
feat/voice-cli-capture
Closed

feat(voice): CLI mic capture + transcribe via /voice (slice 2)#174
oratis wants to merge 3 commits into
mainfrom
feat/voice-cli-capture

Conversation

@oratis

@oratis oratis commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Slice 2 of /voice: interactive CLI dictation. Type /voice in the REPL → it records from the mic, transcribes locally with whisper.cpp, and pre-fills the input line with the transcript to edit before sending. Fully local — no audio leaves the machine.

Stacked on #173 (slice 1). Base is feat/voice-setup-detect; review/merge that first (or merge this into it). The slice-1 diff is not part of this PR's net change.

Per the agreed decisions: /voice command trigger (not a Ctrl+V keybinding), and auto-detect ffmpeg or sox (ffmpeg preferred, then rec/sox).

Changes

Core (@deepcode/core)

  • voice/record.ts:
    • detectRecorder() — finds ffmpeg / rec / sox on PATH (preference order). Injectable which for tests.
    • buildRecordArgs() — pure, per-tool/per-OS argv for a 16 kHz mono WAV: ffmpeg uses avfoundation (macOS) / alsa (Linux); rec/sox capture the default device. Throws on unsupported ffmpeg platforms without a device.
    • recordToWav() — spawns the recorder, stops on an AbortSignal via SIGINT (so ffmpeg/sox flush a valid WAV trailer). A non-zero exit after abort resolves; without an abort it rejects (e.g. no mic). Injectable spawn.
  • VoiceConfig gains optional inputDevice (ffmpeg override); JSON schema updated.

CLI

  • voice-capture.ts — orchestration: detect → record (Enter to stop) → WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) → return transcript + display lines. Handles not-ready, no-recorder, no-speech, and recorder/transcription failures gracefully.
  • /voice triggers capture when the REPL wires ctx.voiceCapture; otherwise falls back to the slice-1 readiness/setup output. /voice setup still forces install steps. Setup/ready lines extracted to pure, reusable helpers.
  • REPL wires voiceCapture and pre-fills the next prompt with the transcript via rl.write() (ctx.prefillInput), so the user edits before submitting.

Docs

  • VOICE_INPUT.md: Usage now documents the /voice flow (was Ctrl+V) + a "Install a mic recorder" section + inputDevice.
  • BEHAVIOR_PARITY.md: /voice row updated for CLI capture (still 🟡 — desktop pending).

Testing

  • pnpm typecheck — clean; pnpm lint — 0 errors; pnpm format:check — clean
  • core: 661 passed / 16 skipped (9 new recorder cases: detect / buildArgs per-OS / record orchestration incl. abort + no-mic)
  • cli: 151 passed (3 new: capture callback pre-fills, cancel/empty no-fill, /voice setup bypasses capture)
  • ⚠️ Real-microphone end-to-end is manual — there's no audio hardware in CI, so the spawn boundary (recorder + whisper) is covered with injected fakes; the actual record→transcribe round-trip needs a local mic + model to verify.

Follow-up

  • Slice 3: desktop 🎙 record button + mic permission via a Tauri command, same whisper backend.

🤖 Generated with Claude Code

@oratis oratis changed the base branch from feat/voice-setup-detect to main June 8, 2026 06:13
@oratis oratis closed this Jun 8, 2026
@oratis oratis reopened this Jun 8, 2026
t and others added 3 commits June 8, 2026 14:25
Surface the existing core whisper.cpp engine via a `/voice` slash command
and add the settings schema for it. No mic capture yet — this is the safe,
self-contained foundation per docs/VOICE_INPUT.md.

Core:
- Add VoiceConfig (provider | binPath | modelPath) to settings types,
  re-exported from @deepcode/core (the JSON schema already had the block).
- New detectVoice() (voice/detect.ts): resolves the whisper binary
  (settings.binPath, else whisper-cli/whisper on PATH) and the model
  (settings.modelPath, else ~/.deepcode/models/whisper-base.en.bin),
  never throws — missing pieces become `problems`. Injectable probes for
  deterministic tests.
- validateSettingsShallow now flags an unknown voice.provider.

CLI:
- /voice reports readiness or prints actionable setup steps (+ per-issue
  detail); `/voice setup` always shows install instructions.
- SessionContext gains an optional `home` (honors --home) for the default
  model-path probe; wired in the REPL.

Tests: 9 core detection cases, 1 schema case, 3 CLI messaging cases.
Updates the /voice BEHAVIOR_PARITY row (✗ → ✓, 🔄 → 🟡).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Type /voice in the REPL to dictate: record from the mic, transcribe locally
with whisper.cpp, and pre-fill the input line with the transcript to edit
before sending. Builds on slice 1's detection. Spec: docs/VOICE_INPUT.md.

Core:
- voice/record.ts: detectRecorder() finds ffmpeg / rec / sox on PATH;
  buildRecordArgs() builds the 16 kHz mono WAV command per tool + OS
  (avfoundation on macOS, alsa on Linux; rec/sox use the default device);
  recordToWav() spawns it and stops on an AbortSignal (SIGINT so the WAV
  trailer flushes — a non-zero exit after abort is expected, a non-zero
  exit without one rejects, e.g. no mic). Injectable which/spawn for tests.
- VoiceConfig gains optional inputDevice (ffmpeg override); schema updated.

CLI:
- voice-capture.ts: orchestrates detect → record (Enter to stop) →
  WhisperCppProvider.transcribe → delete the temp WAV (+ .txt side-file) →
  return transcript + status lines. Handles not-ready / no-recorder /
  no-speech / failures gracefully.
- /voice now triggers capture when the REPL wires ctx.voiceCapture; falls
  back to readiness/setup output otherwise. `/voice setup` still forces the
  install steps. Setup lines extracted to pure, reused helpers.
- REPL wires voiceCapture and pre-fills the next prompt via rl.write() once
  the transcript is ready (ctx.prefillInput).

Docs: VOICE_INPUT.md usage now describes the /voice flow (was Ctrl+V) + a
recorder-install section; BEHAVIOR_PARITY /voice row updated for CLI capture.

Tests: 9 core recorder cases (detect/buildArgs/record orchestration) + 3 new
CLI cases (capture callback, cancel/empty, setup bypass). Real-mic end-to-end
is manual (no audio hardware in CI). core 661 / cli 151, typecheck + lint +
format all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oratis oratis force-pushed the feat/voice-cli-capture branch from de934e5 to 22d709f Compare June 8, 2026 06:29
@oratis

oratis commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

♻️ Rebased onto current main (resolves conflicts after #172 /tasks+/background landed) and retargeted base main so CI runs here. ✅ CI green.

Note: base is now main (not the stacked feature branch), so this PR's diff is cumulative — it includes the lower slice(s) too. Clean merge options:

oratis added a commit that referenced this pull request Jun 8, 2026
…175)

Local, on-device speech-to-text via whisper.cpp — no audio leaves the machine.

- Core: VoiceConfig (binPath/modelPath/provider/inputDevice) + detectVoice();
  existing WhisperCppProvider surfaced. detectRecorder/recordToWav (ffmpeg/sox).
- CLI: /voice records → transcribes → pre-fills the input line to edit;
  /voice setup prints install steps.
- Desktop: 🎙 composer button + Tauri voice_status/start/stop/cancel (ffmpeg,
  stdin-q graceful stop); mic entitlement + NSMicrophoneUsageDescription.
- Docs: docs/VOICE_INPUT.md + BEHAVIOR_PARITY /voice row.

Squashes the three review slices (PRs #173, #174, #175). Real-microphone
round-trip needs manual on-device verification (no audio hardware / Rust in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oratis

oratis commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

✅ Rolled into main via #175 (squash-merged — contains all three voice slices: setup/detection + CLI capture + desktop 🎙). Closing as merged.

@oratis oratis closed this Jun 8, 2026
@oratis oratis deleted the feat/voice-cli-capture branch June 8, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant