Skip to content
This repository was archived by the owner on Apr 28, 2026. It is now read-only.
This repository was archived by the owner on Apr 28, 2026. It is now read-only.

Improve speaker identification using autoresearch/evo approach #1

@ComputelessComputer

Description

@ComputelessComputer

Problem

Current speaker identification relies on:

  1. Cosine similarity between speaker embeddings and stored profiles (scoreSpeakerProfile in store.ts)
  2. Hard-coded thresholds — 0.7 (≥3 samples) or 0.74 (<3 samples) minimum score, plus a 0.04 margin gate when score < 0.82
  3. Simple centroid + best-sample scoringMath.max(bestSampleScore, centroidScore)

This works but the thresholds and scoring strategy were hand-tuned. There is no benchmark to measure how well speaker ID actually performs, and no systematic way to improve it.

Approach: Autoresearch loop (Evo-style)

Use evo or its approach — an autoresearch loop that discovers a benchmark, runs baseline, then spawns parallel agents to beat it:

  • Tree search over greedy hill-climb — multiple forks from any committed improvement
  • N parallel agents in git worktrees — each tries a different hypothesis
  • Shared failure traces — agents don't repeat each other's mistakes
  • Regression gates — changes that break existing correct matches get discarded

What to measure

Build a benchmark dataset of meetings with ground-truth speaker labels. Metrics:

  • Accuracy — % of speakers correctly identified against known profiles
  • Precision/Recall — false matches vs missed matches
  • Confidence calibration — does a 0.85 confidence actually mean 85% correct?
  • Threshold sensitivity — how much do results change with threshold tweaks?

What to explore

The autoresearch loop should explore improvements across the full stack:

Scoring strategy (store.ts)

  • Weighted combination of centroid + sample scores instead of simple Math.max
  • Top-K sample averaging instead of single best sample
  • Score normalization across profiles (relative ranking vs absolute threshold)
  • Adaptive thresholds based on profile quality (sample count, embedding variance)

Embedding quality (Swift layer)

  • Segment selection strategy — which diarization segments to embed (currently all)
  • Minimum segment duration filtering
  • Embedding aggregation — mean vs weighted mean vs attention-pooled centroids
  • Per-sample quality scoring (reject noisy/short segments)

Profile management

  • Automatic outlier detection in stored samples
  • Profile convergence metrics — when does a profile have "enough" samples?
  • Cross-meeting consistency checks

Matching logic

  • Two-stage matching: fast centroid screen → detailed sample comparison
  • Speaker verification (1:1) vs identification (1:N) distinction
  • Temporal priors — if speaker A was in the last 3 meetings, they're likely in this one

Current architecture reference

Diarization → Segments with speaker IDs (Swift/CoreML sortformer)
     ↓
Embedding extraction → Per-speaker embedding vectors (Swift speech_bridge)
     ↓
Profile matching → cosine similarity against stored profiles (TS store.ts)
     ↓
Suggestion → recommendSpeakerProfile() returns best match above threshold

Key files:

  • src/store.tscosineSimilarity(), scoreSpeakerProfile(), recommendSpeakerProfile(), normalizedEmbeddingCentroid()
  • src-tauri/swift-permissions/src/speech_bridge.swift — embedding extraction, diarization segment processing, centroid computation
  • src-tauri/src/lib.rsStoredSpeakerProfile, analyze_speaker_embeddings command
  • src-tauri/src/asr.rsSpeakerEmbeddingPayload, FileSpeakerEmbeddingPayload

Implementation plan

  1. Build benchmark harness — collect ground-truth labeled meetings, define metrics, run baseline
  2. Set up evo — point it at the speaker ID codebase, configure benchmark as the optimization target
  3. Run optimization loop — let parallel agents explore scoring, thresholds, embedding strategies
  4. Gate on regression — any change must not regress existing correct matches
  5. Ship the winner — commit the best-performing configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions