Update tests#462
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reorganizes and substantially expands the test suite. New unit tests cover interaction features (TTC/geometry), map-metric road-edge distances, the EvalManager config/dispatch/aggregation logic, and Python load_config behavior. Smoke coverage is moved under tests/smoke_tests/ and a deterministic CPU smoke training test with a committed golden file is added. CI workflows are consolidated into ci.yml plus a separate perf-ci.yml, and the old C INI parser harness and several legacy workflows/tests are removed. Path resolution in existing tests is corrected for the new tests/unit_tests/ depth.
Changes:
- New unit tests for benchmark interaction/map metrics and EvalManager; new smoke training test with golden file; new simulator perf smoke.
- Workflow consolidation (
ci.yml, simplifiedinstall.yml/perf-ci.yml) and removal ofutest.yml,train-ci.yml,training-test.yml,_cleanup.yml. - Removal of legacy tests (
test_drive_train.py,test_drive_scenario_length.py) and the C INI parser test harness;parents[1]/two-dirnameREPO_ROOT bumped to the new depth.
Reviewed changes
Copilot reviewed 20 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/test_ttc.py | New TTC tests using 3-timestep central-difference setup |
| tests/unit_tests/test_geometry.py | New box signed-distance + invalid/multi-rollout tests |
| tests/unit_tests/test_map_metrics.py | New road-edge signed distance and full-pipeline tests |
| tests/unit_tests/test_road_edges.py | Manual visualization script without pytest test functions |
| tests/unit_tests/test_eval_manager.py | Extensive EvalManager parsing/dispatch/rollout/render tests |
| tests/unit_tests/test_drive_config.py | Python load_config tests (with inline-comment test skipped) |
| tests/unit_tests/test_single_agent_yaml.py | REPO_ROOT updated for new path depth |
| tests/unit_tests/test_map_cache.py | REPO_ROOT updated for new path depth |
| tests/unit_tests/test_drive_map_types.py | REPO_ROOT updated for new path depth |
| tests/smoke_tests/test_drive_train.py | New deterministic smoke training test with golden comparison |
| tests/smoke_tests/test_simulator_perf.py | New CI perf smoke (passes wrong map_dir) |
| tests/smoke_tests/test_validation_replay_html.py | REPO_ROOT updated for new path depth |
| tests/smoke_tests/data/drive_smoke_golden.json | Committed golden metrics for the smoke train test |
| tests/test_drive_train.py / tests/test_drive_scenario_length.py | Removed in favor of new smoke/map-type tests |
| tests/ini_parser/* | C INI parser test harness removed |
| .github/workflows/ci.yml | New consolidated unit + smoke job |
| .github/workflows/perf-ci.yml | Reworked to run new perf smoke |
| .github/workflows/install.yml | Simplified install matrix; pre-commit job dropped |
| .github/workflows/{utest,train-ci,training-test,_cleanup}.yml | Removed |
Comments suppressed due to low confidence (2)
tests/smoke_tests/test_simulator_perf.py:17
- The
map_dirhere points toresources/drive/binaries, but (a) the path is missing thepufferlib/prefix that the bundled fixtures live under, and (b) that directory contains only sub-directories (carla/,nuplan/, …), not.binfiles.Drive.__init__resolvesmap_dirwithos.listdir(...)filtering on.endswith(".bin"), soself.map_fileswill be empty and the constructor will fail (or the run will produce no maps). The perf CI job runs this file directly and will fail. Pointmap_dirat a concrete fixture directory such aspufferlib/resources/drive/binaries/carla.
tests/smoke_tests/test_simulator_perf.py:5 json,warnings, andPathare imported but never used. They can be removed to keep the smoke test minimal.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Restructure test suite & consolidate CI workflows
Summary
Reorganizes the test suite into a clear
unit_tests/vssmoke_tests/split, collapses the fragmented CI workflows into a singleci.yml, and adds a deterministic CPU smoke test for the training pipeline. Net: −685 / +473 lines, far fewer workflow files, and tests that pass from any working directory.Test layout
pufferlib/ocean/benchmark/intotests/unit_tests/so all tests live undertests/.test_drive_train.py— runs the realload_config → load_env → load_policy → PuffeRLpipeline for 5 CPU epochs and compares PPO + env metrics against a committed golden (np.isclose, tunable viaSMOKE_RTOL/SMOKE_ATOL). Replaces the oldtests/test_drive_train.py.__file__; bumped each by one level so fixtures (pufferlib/resources/drive/binaries, launcher YAML, golden) resolve from the new depth.test_simulator_perf.py: changed the hardcoded cwd-relativemap_dir="resources/drive/binaries"to an absolute__file__-based path pointing atbinaries/carla(top-level dir holds no.binfiles since maps were split into subdirs). It now runs under barepytestfrom the repo root.CI workflows
Replaced 5 overlapping workflows (
utest.yml,smoke-train-test.yml,train-ci.yml,perf-ci.yml,training-test.yml) and the dead reusable_cleanup.ymlwith a singleci.ymlcontaining three parallel jobs:unit-testspytest tests/unit_testssmoke-teststest_drive_train.py+test_validation_replay_html.pyperf-teststest_simulator_perf.pytests/ini_parser/C tests and the deletedtest_drive_scenario_length.pyfrom CI.df -hdebug noise; bumpedactions/*to v4/v5;TMPDIR/PIP_NO_CACHE_DIRscoped to the install step (runner.tempisn't valid in job-levelenv).install.ymlslimmed to a pure cross-platform install matrix (dropped its duplicate pre-commit job, now py3.12/3.11/3.10).pre-commit.ymlbumped to Python 3.13.Known trade-off / follow-ups
pip install -e .+build_ext(no shared filesystem across jobs). Can be cut to one build via abuildjob + artifact if desired.tests/smoke_tests/test_drive_eval.pyis a stub ("to fill").🤖 Generated with Claude Code