Update tests by vcharraut · Pull Request #462 · Emerge-Lab/PufferDrive

vcharraut · 2026-05-30T23:31:34Z

Restructure test suite & consolidate CI workflows

Summary

Reorganizes the test suite into a clear unit_tests/ vs smoke_tests/ split, collapses the fragmented CI workflows into a single ci.yml, and adds a deterministic CPU smoke test for the training pipeline. Net: −685 / +473 lines, far fewer workflow files, and tests that pass from any working directory.

Test layout

tests/
├── unit_tests/          # fast, isolated
│   ├── test_drive_config.py        (moved from tests/)
│   ├── test_drive_map_types.py     (moved)
│   ├── test_eval_manager.py        (moved)
│   ├── test_map_cache.py           (moved)
│   ├── test_single_agent_yaml.py   (moved)
│   ├── test_geometry.py            (moved from pufferlib/ocean/benchmark/)
│   ├── test_map_metrics.py         (moved from benchmark/)
│   ├── test_road_edges.py          (moved from benchmark/)
│   └── test_ttc.py                 (moved from benchmark/)
└── smoke_tests/         # end-to-end pipeline
    ├── test_drive_train.py         (new: deterministic golden-based smoke)
    ├── test_drive_eval.py          (new: stub, "to fill")
    ├── test_validation_replay_html.py (moved)
    ├── test_simulator_perf.py      (moved)
    └── data/drive_smoke_golden.json (committed golden)

Moved the geometry/metric tests out of pufferlib/ocean/benchmark/ into tests/unit_tests/ so all tests live under tests/.
New test_drive_train.py — runs the real load_config → load_env → load_policy → PuffeRL pipeline for 5 CPU epochs and compares PPO + env metrics against a committed golden (np.isclose, tunable via SMOKE_RTOL/SMOKE_ATOL). Replaces the old tests/test_drive_train.py.
Path fixes: every relocated test computed repo-root by walking up a fixed number of levels from __file__; bumped each by one level so fixtures (pufferlib/resources/drive/binaries, launcher YAML, golden) resolve from the new depth.
test_simulator_perf.py: changed the hardcoded cwd-relative map_dir="resources/drive/binaries" to an absolute __file__-based path pointing at binaries/carla (top-level dir holds no .bin files since maps were split into subdirs). It now runs under bare pytest from the repo root.

CI workflows

Replaced 5 overlapping workflows (utest.yml, smoke-train-test.yml, train-ci.yml, perf-ci.yml, training-test.yml) and the dead reusable _cleanup.yml with a single ci.yml containing three parallel jobs:

Job	Runs
`unit-tests`	`pytest tests/unit_tests`
`smoke-tests`	`test_drive_train.py` + `test_validation_replay_html.py`
`perf-tests`	`test_simulator_perf.py`

Dropped stale tests/ini_parser/ C tests and the deleted test_drive_scenario_length.py from CI.
Removed df -h debug noise; bumped actions/* to v4/v5; TMPDIR/PIP_NO_CACHE_DIR scoped to the install step (runner.temp isn't valid in job-level env).
install.yml slimmed to a pure cross-platform install matrix (dropped its duplicate pre-commit job, now py3.12/3.11/3.10).
pre-commit.yml bumped to Python 3.13.

Known trade-off / follow-ups

The three CI jobs each repeat pip install -e . + build_ext (no shared filesystem across jobs). Can be cut to one build via a build job + artifact if desired.
tests/smoke_tests/test_drive_eval.py is a stub ("to fill").

🤖 Generated with Claude Code

…formance

…rf-ci.yml file

Copilot

Pull request overview

This PR reorganizes and substantially expands the test suite. New unit tests cover interaction features (TTC/geometry), map-metric road-edge distances, the EvalManager config/dispatch/aggregation logic, and Python load_config behavior. Smoke coverage is moved under tests/smoke_tests/ and a deterministic CPU smoke training test with a committed golden file is added. CI workflows are consolidated into ci.yml plus a separate perf-ci.yml, and the old C INI parser harness and several legacy workflows/tests are removed. Path resolution in existing tests is corrected for the new tests/unit_tests/ depth.

Changes:

New unit tests for benchmark interaction/map metrics and EvalManager; new smoke training test with golden file; new simulator perf smoke.
Workflow consolidation (ci.yml, simplified install.yml/perf-ci.yml) and removal of utest.yml, train-ci.yml, training-test.yml, _cleanup.yml.
Removal of legacy tests (test_drive_train.py, test_drive_scenario_length.py) and the C INI parser test harness; parents[1]/two-dirname REPO_ROOT bumped to the new depth.

Reviewed changes

Copilot reviewed 20 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit_tests/test_ttc.py	New TTC tests using 3-timestep central-difference setup
tests/unit_tests/test_geometry.py	New box signed-distance + invalid/multi-rollout tests
tests/unit_tests/test_map_metrics.py	New road-edge signed distance and full-pipeline tests
tests/unit_tests/test_road_edges.py	Manual visualization script without pytest test functions
tests/unit_tests/test_eval_manager.py	Extensive EvalManager parsing/dispatch/rollout/render tests
tests/unit_tests/test_drive_config.py	Python `load_config` tests (with inline-comment test skipped)
tests/unit_tests/test_single_agent_yaml.py	REPO_ROOT updated for new path depth
tests/unit_tests/test_map_cache.py	REPO_ROOT updated for new path depth
tests/unit_tests/test_drive_map_types.py	REPO_ROOT updated for new path depth
tests/smoke_tests/test_drive_train.py	New deterministic smoke training test with golden comparison
tests/smoke_tests/test_simulator_perf.py	New CI perf smoke (passes wrong `map_dir`)
tests/smoke_tests/test_validation_replay_html.py	REPO_ROOT updated for new path depth
tests/smoke_tests/data/drive_smoke_golden.json	Committed golden metrics for the smoke train test
tests/test_drive_train.py / tests/test_drive_scenario_length.py	Removed in favor of new smoke/map-type tests
tests/ini_parser/*	C INI parser test harness removed
.github/workflows/ci.yml	New consolidated unit + smoke job
.github/workflows/perf-ci.yml	Reworked to run new perf smoke
.github/workflows/install.yml	Simplified install matrix; pre-commit job dropped
.github/workflows/{utest,train-ci,training-test,_cleanup}.yml	Removed

Comments suppressed due to low confidence (2)

tests/smoke_tests/test_simulator_perf.py:17

The map_dir here points to resources/drive/binaries, but (a) the path is missing the pufferlib/ prefix that the bundled fixtures live under, and (b) that directory contains only sub-directories (carla/, nuplan/, …), not .bin files. Drive.__init__ resolves map_dir with os.listdir(...) filtering on .endswith(".bin"), so self.map_files will be empty and the constructor will fail (or the run will produce no maps). The perf CI job runs this file directly and will fail. Point map_dir at a concrete fixture directory such as pufferlib/resources/drive/binaries/carla.
tests/smoke_tests/test_simulator_perf.py:5
json, warnings, and Path are imported but never used. They can be removed to keep the smoke test minimal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…ters

vcharraut added 2 commits May 31, 2026 01:23

Update tests

d6b8c93

Fix tests

35dff0e

Copilot AI review requested due to automatic review settings May 30, 2026 23:31

Copilot started reviewing on behalf of vcharraut May 30, 2026 23:31 View session

vcharraut added 2 commits May 31, 2026 01:34

Update CI workflows and test simulator configuration for improved per…

fdc0f3d

…formance

Consolidate performance tests into CI workflow and remove separate pe…

fe25a6f

…rf-ci.yml file

Copilot AI reviewed May 30, 2026

View reviewed changes

Comment thread tests/smoke_tests/test_drive_train.py Outdated

vcharraut and others added 6 commits May 31, 2026 01:38

Update Python version matrix in install workflow

daf80c2

Potential fix for pull request finding

4ae27dc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Update Python version in pre-commit workflow to 3.13

fa04603

Reorder push and pull_request triggers in installation workflow

58bd6c9

Add dummy CPU smoke test for PufferDrive evaluation pipeline

37a1c7f

Update drive smoke test data and adjust training configuration parame…

08d0cee

…ters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tests#462

Update tests#462
vcharraut wants to merge 10 commits into
emerge/temp_trainingfrom
vcha/smoke-tests

vcharraut commented May 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vcharraut commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Restructure test suite & consolidate CI workflows

Summary

Test layout

CI workflows

Known trade-off / follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vcharraut commented May 30, 2026 •

edited

Loading