The initial concept mixed local development with remote deployment workflows. In practice, that created friction. The real breakthrough was architectural, not algorithmic: make the Pi itself the primary development machine and treat everything else as secondary.
That one shift changed almost every decision:
- Faster loop times (no constant deploy/sync wait states)
- Fewer "works here but not there" environment mismatches
- Easier direct testing of audio devices, GPIO behavior, and LCD drawing
- Better confidence in changes because hardware is in the loop every day
This project became local-first and script-first:
- bootstrap scripts for environment consistency
- verification script for runtime dependencies
- direct execution paths for detector, TreblingTuner, and diagnostics
Early tuning attempts exposed a common pitfall: visual output looked wrong mostly because detection was unstable, not because rendering was broken.
The stabilization sequence evolved like this:
- Establish robust gates for RMS, confidence, and Hz range.
- Add smoothing and hysteresis so vibrato and glissando do not thrash card selection.
- Add note-off behavior so stale notes disappear after a short silence.
- Add startup noise calibration so thresholds adapt to room conditions.
- Add timestamped logs to inspect event timing and identify over-sensitivity.
A key lesson was source quality:
- Synthetic or heavily processed instrument apps introduced modulation artifacts.
- Cleaner references (sine source, direct speaker tests, acoustic recorder) produced more meaningful tuning data.
This led to a practical rule: tune the detector against controlled sources first, then validate against real instruments.
Two separate calibration concerns emerged and were split intentionally:
- Background/noise calibration
- Staff anchoring calibration
Noise calibration:
- A short countdown at startup
- User instructed not to play
- Gate derived from percentile + margin, not fixed constants only
Staff anchoring calibration:
- In relative mode, collect stable natural notes before locking display mapping
- Avoid instant anchoring from accidental/noisy detections
- Support button-triggered recalibration on hardware
That split made behavior much easier to reason about. Before this, one calibration concept overloaded too many responsibilities.
A central product question: should display always reflect absolute octave labels, or should it optimize readability for single-note learners?
The implemented approach supports both:
- Absolute mode: strict scientific mapping
- Relative mode (default): keep note movement readable in a fixed staff window
In relative mode, octave intent is represented by arrows when out-of-range, while placement continuity favors nearby movement on staff.
This mirrors how musicians actually use a practice aid in this context:
- They read external sheet music
- They play one note at a time
- They glance at the device to confirm pitch class and staff-relative location
The system therefore prioritizes stable letter correctness and readable staff continuity.
Seton2 started as adjacent generation logic and was integrated into the project as a first-class asset pipeline.
Reasons:
- Deterministic card generation from config
- No manual image editing loop
- Easy reproducibility across clean installs
- Ability to regenerate at different resolutions or style settings
The generation stack settled into:
- LilyPond for staff/note engraving
- Pillow for card composition (border, label, arrows)
- JSON manifest as runtime contract
This made TreblingTuner simpler: load manifest + card images and focus on mapping logic.
Ghost notes were added to answer a real learner question: "Where else is this same letter on the staff?"
The first implementation worked visually in many cases but exposed subtle rendering bugs:
- artifacts from copied ledger fragments
- occasional misalignment on extreme cards
- arrow/overlay collisions
The eventual fix was structural:
- build per-label ghost templates from in-range source cards
- map ghost placement using stable staff geometry by draw_at reference
- avoid one-template-fits-all assumptions
Result: cleaner overlays with predictable positioning, especially for high-arrow cards.
Initial card generation used a broad musical range (A2..G9) for flexibility. In practice, this created clutter and confusion for the target instruments.
The project now supports explicit range bounds (min_note / max_note) and defaults to a practical span (C4..D7).
Why this mattered:
- fewer assets to carry and inspect
- less ambiguity in out-of-range behavior
- tighter alignment with recorder/ocarina training use-cases
A related cleanup step pruned stale generated files not in manifest to avoid accidental drift between disk contents and runtime behavior.
Testing moved through four maturity levels:
Level 1: Manual ad hoc playback
- quick checks, low confidence, hard to compare runs
Level 2: Logged interactive sessions
- timestamps + emitted states
- better for diagnosing jitter and thresholds
Level 3: Controlled audio diagnostics
- loopback-style scripts (
looptest.py,zelda_tuning.py) - repeatable pitch validation against known sequences
Level 4: Mapping simulation harness
- synthetic melody/range sequences to validate staff mapping decisions without audio hardware noise
The key insight: hardware testing and logic simulation complement each other. One catches electrical/acoustic realities; the other catches algorithmic edge cases quickly.
The cleanup pass focused on three principles:
- remove obvious trash and stale artifacts
- keep diagnostics that materially help hardware debugging
- tighten docs so the next person can run and reason without archaeology
Actions included:
- removing throwaway logs and smoke-test leftovers
- removing an unused import and stale cache artifacts
- tightening README and config docs
- preserving practical scripts for audio and runtime verification
This codebase is optimized for:
- local Pi development
- repeatable setup scripts
- single-note instrument practice
- readable staff mapping with optional strict absolute mode
- clear regeneration path for visual assets
It is not optimized for:
- polyphonic detection
- generalized DAW-grade pitch tracking
- multi-instrument notation systems with key-signature-aware enharmonic policy
Those can be layered later without rewriting the current core if the current boundaries stay explicit.
Likely next phase (already discussed in direction):
- game-oriented layer on top of stable note/card pipeline
- timing scoring and intonation scoring
- optional instrument presets for Seton2 generation ranges
Because detection, mapping, and rendering are now separated, these can be added with minimal churn.
The biggest success was reducing ambiguity: ambiguous environments, ambiguous calibration behavior, ambiguous mapping rules, and ambiguous assets all create instability.
Once each of those became explicit (scripts, countdowns, manifests, range bounds, dedicated diagnostics), quality stopped depending on luck and started depending on reproducible process.
That is the real milestone of this build.