Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
b8a30f6
feat: add bat environment
Kinvert Jun 9, 2026
50ab5c2
feat: visualize bat chirps and reflections
Kinvert Jun 9, 2026
b30a23b
feat: improve bat randomization training
Kinvert Jun 9, 2026
4bc4aa1
feat: switch bat sonar to per-ear frequency bins
Kinvert Jun 9, 2026
83af6ca
perf: bucket bat echoes by arrival tick
Kinvert Jun 9, 2026
fb794b3
fix: constrain bat to forward dynamics
Kinvert Jun 9, 2026
89613c9
tune bat defaults from bat1 sweep
Kinvert Jun 9, 2026
2b64c83
add harder bat curriculum start level
Kinvert Jun 9, 2026
4dead34
test bat bug wall bounce
Kinvert Jun 9, 2026
5f7ac48
add bat chirp budget metrics
Kinvert Jun 9, 2026
40b8804
retune bat chirp budget pressure
Kinvert Jun 9, 2026
6a30d14
penalize overlapping bat chirps
Kinvert Jun 9, 2026
1c1f629
stabilize bat sonar curriculum metrics
Kinvert Jun 9, 2026
e7454e1
simplify bat chirp budget curriculum
Kinvert Jun 9, 2026
7d99f84
add bat raylib chirp audio
Kinvert Jun 9, 2026
fb1f591
penalize early bug echo chirps
Kinvert Jun 9, 2026
f923149
prepare bat timing sweep
Kinvert Jun 9, 2026
2fe1b24
Add bat corner reflectors and sweep defaults
Kinvert Jun 10, 2026
732f87b
Add Bat MP4 recording export
Kinvert Jun 10, 2026
b3d65ba
Tune Bat chirp source and distance curriculum
Kinvert Jun 10, 2026
ba6734a
Add Bat inbound maneuver curriculum
Kinvert Jun 10, 2026
a22f01f
Prepare Bat reflector strength sweep
Kinvert Jun 10, 2026
18358a5
Set Bat sweep budget to sixteen runs
Kinvert Jun 10, 2026
653dc44
Keep Bat sweep run count CLI controlled
Kinvert Jun 10, 2026
a055440
Tighten Bat observations and sweep setup
Kinvert Jun 10, 2026
8d9428a
Remove Bat config fallback guards
Kinvert Jun 11, 2026
ac8165a
Remove more Bat fallback guards
Kinvert Jun 11, 2026
227dec5
Remove remaining Bat spawn bloat
Kinvert Jun 11, 2026
9917db4
Simplify Bat helper scaffolding
Kinvert Jun 11, 2026
513387a
Slim Bat logging and recording code
Kinvert Jun 11, 2026
cb92935
Move Bat audio helpers
Kinvert Jun 11, 2026
41f7651
Document Bat sensing research
Kinvert Jun 11, 2026
3544257
Add sweepable Bat ear directivity
Kinvert Jun 11, 2026
ac61d3b
Add Bat bug wing sidebands
Kinvert Jun 11, 2026
ff094bd
Set Bat defaults to ewgh6l5l
Kinvert Jun 11, 2026
3a438a0
Hardcode stable bat constants
Kinvert Jun 11, 2026
5dd5b19
Hardcode bat episode constants
Kinvert Jun 11, 2026
21eb101
Clean up bat symbol names
Kinvert Jun 11, 2026
b59371c
Clean up stale bat metrics
Kinvert Jun 12, 2026
8b4cdfd
Clean up bat step logic
Kinvert Jun 12, 2026
11a092e
Clean up bat action and reflector code
Kinvert Jun 12, 2026
57ec5f1
Simplify bat curriculum and echo helpers
Kinvert Jun 12, 2026
ffea318
Remove stale bat chirp budget field
Kinvert Jun 12, 2026
7967d44
Trim bat curriculum component logs
Kinvert Jun 12, 2026
fa59159
Simplify fixed chirp perf reference
Kinvert Jun 12, 2026
eb24a21
Remove impossible bat max speed guard
Kinvert Jun 12, 2026
3fd1f0f
Remove dead bat echo energy accumulator
Kinvert Jun 12, 2026
426be02
Clean up bat environment internals
Kinvert Jun 12, 2026
c9da7aa
Render bat echo frequency history
Kinvert Jun 12, 2026
aa3b383
Add bat observation render gauges
Kinvert Jun 13, 2026
822681a
Clean up bat env internals
Kinvert Jun 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
450 changes: 450 additions & 0 deletions BAT9_SWEEP_ANALYSIS.md

Large diffs are not rendered by default.

512 changes: 512 additions & 0 deletions BAT_CURRICULUM.md

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions BAT_PRIORITIES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Bat Priorities

Current near-term priorities for the Bat PufferLib environment.

## 0. Video capture with audio

- RayLib can render and play audio, but it does not natively encode MP4.
- Preferred path: keep RayLib as the renderer/audio source, capture frames/audio
during eval, and use `ffmpeg` to mux an MP4.
- A future helper should make this feel like one command, but avoid embedding an
MP4 encoder in the env.
- Existing GIF capture remains useful for quick silent demos.
- Later render polish: play audible reflection blips in addition to emitted
chirps. Keep this eval-only. Bug reflections and static wall/obstacle
reflections should likely use distinguishable volume, timbre, panning, or
marker sounds so the debug audio stays interpretable.

## 1. Add episode timer observation

- Add a normalized episode timer observation so the policy knows urgency.
- For the current `max_steps = 512` Bat episode budget, expose a float in
`[0, 1]` representing elapsed time from `0` ticks to timeout. If the budget is
later changed to exactly `500`, scale the same way from `0..500`.
- The Bat8 visual evals show a likely failure mode where policies chirp too
little, settle into circling, and time out. Without a timer observation, the
policy has no direct signal that it is running out of episode time.

## 2. Bug-reflection chirp timing penalty

- Replace broad "chirp before all echoes clear" pressure with bug-specific
timing pressure.
- Penalize a valid chirp if it is emitted before the previous chirp's expected
bug reflection has returned.
- Scale the penalty by remaining wait fraction, so chirping immediately after a
prior chirp is worse than chirping shortly before the bug echo arrives.
- Keep the coefficient sweepable through `chirp_overlap_penalty`.
- Do not penalize based on all static wall/obstacle reflections; clutter may
legitimately require reacquisition chirps.

## 3. Resume performance work

- Use level 7 and level 10 evals as visual sanity checks.
- Focus on harder-level failures where the bat spends chirps before acquiring
the bug.
- Keep reward shaping minimal and prefer terminal/curriculum/perf pressure where
possible.

## 4. Prepare the next sweep

- Make sure the next sweep includes any new timing penalty coefficient ranges.
- Sweep `chirp_cooldown_ticks` in a bounded range. Current range is `6..18`.
- Keep `max_chirps_per_episode` fixed at `15` for this sweep so budget does
not confound timing penalty and cooldown effects.
- Cap policy sweep size at `hidden_size = 64..256` and `num_layers = 2..4` so
overnight sweeps do not waste runs on very slow oversized networks.
- Keep sweep ranges bounded so runs cannot become extremely slow from oversized
policies or excessive env settings.
- Watch `perf`, `base_perf`, `curriculum_perf`, `chirps_emitted`,
`chirp_overlap_fraction`, `chirp_tempo_ratio`, `collision`, and SPS.

## Priority judgment

The current ordering is sound: the video/audio capture work is useful for demos,
but the bug-reflection timing penalty is more likely to improve level 7/10
performance before the next sweep.
260 changes: 260 additions & 0 deletions BAT_SONAR_OBSERVATION_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
# Bat Sonar Observation Notes

Status: design note for current and future Bat agents

Workspace: `/home/claude/pathfinder`

Related spec: `BAT_SPEC.md`

## Purpose

This note records the intended next observation and echo model for the Bat
environment. The current implementation was deliberately simplified to get a
trainable baseline. The next rung should make active echolocation real: the bat
should hear frequency energy only when echoes from its own chirps return.

## Retired Scaffold Implementation

The first Bat observation was a fast synthetic feature extractor, not a true
chirp-return audio model. It has been retired, but the notes are kept here so
future agents understand why the env moved away from it.

Current layout:

- `left_range_energy[16]`
- `left_doppler_energy[16]`
- `right_range_energy[16]`
- `right_doppler_energy[16]`
- `chirp_age_norm`
- `last_chirp_start_freq_norm`
- `last_chirp_end_freq_norm`
- `last_chirp_duration_norm`
- `forward_speed_norm`
- `turn_rate_norm`

Total size: `70`.

Each frame, the env recomputes current echo features from the current bat,
bug, wall, and obstacle positions. The bug is one strong moving reflector.
Walls and obstacle edges are sampled into static point reflectors. For each
reflector, the env computes approximate left-ear and right-ear path lengths,
attenuation, left/right gain, and a normalized Doppler value. It then deposits
energy into range-indexed observation slots.

This is useful for a first baseline, but it is too informative:

- The bat gets fresh echo-like information every frame, even if it did not
chirp.
- Chirp start frequency, end frequency, and duration do not materially affect
the acoustic observation.
- The Doppler channels are scalar range-indexed values, not FFT bins.
- Range is exposed as direct binned path length instead of being inferred from
echo return timing.

## Current Target Model

The observation should be per-tick binaural frequency energy:

- `left_freq_bins[N]`
- `right_freq_bins[N]`
- chirp metadata
- cooldown/age metadata
- self-motion metadata

No explicit delay/range bins are needed in the observation. Distance should be
implicit in time. The policy should infer range from when frequency energy
returns after a chirp.

Current layout:

- `left_freq_bins[16]`
- `right_freq_bins[16]`
- `chirp_age_norm`
- `chirp_cooldown_norm`
- `last_chirp_start_freq_norm`
- `last_chirp_end_freq_norm`
- `last_chirp_duration_norm`
- `forward_speed_norm`
- `turn_rate_norm`

Total size: `39`.

If 16 bins is too coarse after implementation, use 24 bins per ear for a total
size of `55`.

## Event-Driven Echo Model

Do not synthesize raw audio and do not run an FFT per environment step. Use an
analytic event model that directly deposits echo energy into frequency bins at
the tick when the echo reaches each ear.

When a chirp is emitted:

1. Break the chirp into a small number of time slices.
2. For each slice, compute the emitted frequency from chirp start frequency,
end frequency, and duration.
3. For each reflector, compute when that slice reaches the reflector.
4. Compute when the reflected sound reaches the left ear and right ear.
5. Compute returned amplitude, ear gain, and Doppler-shifted frequency.
6. Enqueue an echo event for each ear.

Each echo event should store:

- receive time in continuous ticks or seconds
- target ear
- returned normalized frequency
- intensity
- source chirp identifier or chirp birth tick, if useful for debugging

On each env tick:

1. Clear left/right frequency bins.
2. Process all echo events whose receive time falls in the current tick window.
3. Deposit event intensity into the relevant frequency bin, with optional
fractional spill into neighboring bins.
4. Add a small configurable noise floor.
5. Apply bounded compression, such as `log1p(k * energy) / log1p(k)`.
6. Append chirp and self-motion metadata.

This produces the desired behavior:

- No chirp means no new echo energy, aside from noise or any intentionally
modeled lingering sensor state.
- A low-to-high chirp creates a time-coded return pattern.
- Multiple reflectors can overlap naturally in the same tick and frequency
bin.
- Range must be inferred from echo timing, not from a direct range channel.

## Example: Two-Frequency Chirp and Two Targets

Assume two frequency bins: low and high.

The bat emits a two-slice chirp:

- slice 0: high frequency
- slice 1: low frequency

There are two static targets, one near and one far. With zero Doppler, the
per-tick ear spectrum could look like:

```text
[0, 0] sound still traveling
[0, 0] sound still traveling
[0, 1] near target returns high slice
[1, 1] near target returns low slice, far target returns high slice
[1, 0] far target returns low slice
[0, 0] no active returns
```

This is the intended observation style. It is not a delay-bin representation.
The temporal sequence itself contains the delay/range information.

## Timing and Physics Notes

Echo timing is two-way:

```text
emit position -> reflector -> ear
```

For static reflectors, the approximate return time is:

```text
t_receive = t_emit
+ distance(chirp_origin, reflector) / sound_speed
+ distance(reflector, ear_at_receive) / sound_speed
```

For moving reflectors, such as the bug, the hit time should use predicted
reflector position at the time of impact. A linear-motion approximation is good
enough for the next implementation.

Doppler should be based on the rate of change of the acoustic path length:

```text
doppler_shift ~= -path_length_rate / sound_speed
```

Static walls and obstacles can still have Doppler from bat self-motion. The
moving bug additionally contributes target radial velocity.

Use fractional receive times internally. The env control tick can stay at
`1/60` second while echo events are scheduled at sub-tick times and deposited
into the nearest tick or split across adjacent ticks.

## Chirp Overlap and Memory

Without explicit delay bins, the policy needs temporal memory to infer range.
The observation at a single tick only says what frequency energy is arriving
now. It does not directly say how long ago that sound was emitted unless the
policy remembers the chirp sequence or the env provides reliable chirp-age
metadata.

For the next rung, use one active chirp at a time:

- `chirp_cooldown_ticks >= max_echo_return_ticks`
- include `chirp_age_norm`
- include last chirp start frequency, end frequency, and duration

This keeps return timing interpretable before adding overlapping chirps. Later
curriculum stages can reduce cooldown and allow ambiguity from multiple active
chirps.

## Performance Constraints

The target is high SPS. Avoid raw waveform buffers, convolution, and per-step
FFT.

Use:

- a fixed upper bound on active chirps
- a fixed upper bound on echo events
- static reflector precomputation after reset
- direct frequency-bin deposition
- simple geometric attenuation and ear gain
- first-order reflections only

The expected work per tick should stay near:

```text
active_chirps * chirp_slices * reflectors * ears
```

With small constants, this remains cheap C code and should preserve the spirit
of the current native PufferLib env.

## Implementation Direction

The next implementation should replace current range/Doppler observation
generation with an event queue.

Suggested data structures:

- `ChirpEvent`: emitted chirp metadata, birth time, origin, frequency sweep
- `Reflector`: position, velocity, strength, normal or type
- `EchoEvent`: receive time, ear, frequency, intensity

Suggested tests:

- no chirp produces no echo energy beyond noise
- single static reflector returns at expected two-way travel time
- left and right ears receive slightly different timings/intensities off-axis
- two chirp slices and two reflectors produce the expected overlapping bin
pattern
- moving bug shifts frequency in the expected Doppler direction
- cooldown prevents ambiguous overlapping chirps in the initial curriculum
- bug echo progress reward only fires when the echo-derived bug path is shorter
than the previous bug echo path
- static echoes never receive bug echo progress reward

## Non-Goals for the Next Rung

Do not add raw audio synthesis yet.

Do not add an actual FFT dependency yet.

Do not add full wave acoustics.

Do not add multi-bounce reverberation yet.

Do not expose direct range bins if the goal is to force temporal echolocation.
Loading
Loading