Skip to content

Update tests#462

Open
vcharraut wants to merge 10 commits into
emerge/temp_trainingfrom
vcha/smoke-tests
Open

Update tests#462
vcharraut wants to merge 10 commits into
emerge/temp_trainingfrom
vcha/smoke-tests

Conversation

@vcharraut
Copy link
Copy Markdown
Collaborator

@vcharraut vcharraut commented May 30, 2026

Restructure test suite & consolidate CI workflows

Summary

Reorganizes the test suite into a clear unit_tests/ vs smoke_tests/ split, collapses the fragmented CI workflows into a single ci.yml, and adds a deterministic CPU smoke test for the training pipeline. Net: −685 / +473 lines, far fewer workflow files, and tests that pass from any working directory.

Test layout

tests/
├── unit_tests/          # fast, isolated
│   ├── test_drive_config.py        (moved from tests/)
│   ├── test_drive_map_types.py     (moved)
│   ├── test_eval_manager.py        (moved)
│   ├── test_map_cache.py           (moved)
│   ├── test_single_agent_yaml.py   (moved)
│   ├── test_geometry.py            (moved from pufferlib/ocean/benchmark/)
│   ├── test_map_metrics.py         (moved from benchmark/)
│   ├── test_road_edges.py          (moved from benchmark/)
│   └── test_ttc.py                 (moved from benchmark/)
└── smoke_tests/         # end-to-end pipeline
    ├── test_drive_train.py         (new: deterministic golden-based smoke)
    ├── test_drive_eval.py          (new: stub, "to fill")
    ├── test_validation_replay_html.py (moved)
    ├── test_simulator_perf.py      (moved)
    └── data/drive_smoke_golden.json (committed golden)
  • Moved the geometry/metric tests out of pufferlib/ocean/benchmark/ into tests/unit_tests/ so all tests live under tests/.
  • New test_drive_train.py — runs the real load_config → load_env → load_policy → PuffeRL pipeline for 5 CPU epochs and compares PPO + env metrics against a committed golden (np.isclose, tunable via SMOKE_RTOL/SMOKE_ATOL). Replaces the old tests/test_drive_train.py.
  • Path fixes: every relocated test computed repo-root by walking up a fixed number of levels from __file__; bumped each by one level so fixtures (pufferlib/resources/drive/binaries, launcher YAML, golden) resolve from the new depth.
  • test_simulator_perf.py: changed the hardcoded cwd-relative map_dir="resources/drive/binaries" to an absolute __file__-based path pointing at binaries/carla (top-level dir holds no .bin files since maps were split into subdirs). It now runs under bare pytest from the repo root.

CI workflows

Replaced 5 overlapping workflows (utest.yml, smoke-train-test.yml, train-ci.yml, perf-ci.yml, training-test.yml) and the dead reusable _cleanup.yml with a single ci.yml containing three parallel jobs:

Job Runs
unit-tests pytest tests/unit_tests
smoke-tests test_drive_train.py + test_validation_replay_html.py
perf-tests test_simulator_perf.py
  • Dropped stale tests/ini_parser/ C tests and the deleted test_drive_scenario_length.py from CI.
  • Removed df -h debug noise; bumped actions/* to v4/v5; TMPDIR/PIP_NO_CACHE_DIR scoped to the install step (runner.temp isn't valid in job-level env).
  • install.yml slimmed to a pure cross-platform install matrix (dropped its duplicate pre-commit job, now py3.12/3.11/3.10).
  • pre-commit.yml bumped to Python 3.13.

Known trade-off / follow-ups

  • The three CI jobs each repeat pip install -e . + build_ext (no shared filesystem across jobs). Can be cut to one build via a build job + artifact if desired.
  • tests/smoke_tests/test_drive_eval.py is a stub ("to fill").

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 30, 2026 23:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reorganizes and substantially expands the test suite. New unit tests cover interaction features (TTC/geometry), map-metric road-edge distances, the EvalManager config/dispatch/aggregation logic, and Python load_config behavior. Smoke coverage is moved under tests/smoke_tests/ and a deterministic CPU smoke training test with a committed golden file is added. CI workflows are consolidated into ci.yml plus a separate perf-ci.yml, and the old C INI parser harness and several legacy workflows/tests are removed. Path resolution in existing tests is corrected for the new tests/unit_tests/ depth.

Changes:

  • New unit tests for benchmark interaction/map metrics and EvalManager; new smoke training test with golden file; new simulator perf smoke.
  • Workflow consolidation (ci.yml, simplified install.yml/perf-ci.yml) and removal of utest.yml, train-ci.yml, training-test.yml, _cleanup.yml.
  • Removal of legacy tests (test_drive_train.py, test_drive_scenario_length.py) and the C INI parser test harness; parents[1]/two-dirname REPO_ROOT bumped to the new depth.

Reviewed changes

Copilot reviewed 20 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit_tests/test_ttc.py New TTC tests using 3-timestep central-difference setup
tests/unit_tests/test_geometry.py New box signed-distance + invalid/multi-rollout tests
tests/unit_tests/test_map_metrics.py New road-edge signed distance and full-pipeline tests
tests/unit_tests/test_road_edges.py Manual visualization script without pytest test functions
tests/unit_tests/test_eval_manager.py Extensive EvalManager parsing/dispatch/rollout/render tests
tests/unit_tests/test_drive_config.py Python load_config tests (with inline-comment test skipped)
tests/unit_tests/test_single_agent_yaml.py REPO_ROOT updated for new path depth
tests/unit_tests/test_map_cache.py REPO_ROOT updated for new path depth
tests/unit_tests/test_drive_map_types.py REPO_ROOT updated for new path depth
tests/smoke_tests/test_drive_train.py New deterministic smoke training test with golden comparison
tests/smoke_tests/test_simulator_perf.py New CI perf smoke (passes wrong map_dir)
tests/smoke_tests/test_validation_replay_html.py REPO_ROOT updated for new path depth
tests/smoke_tests/data/drive_smoke_golden.json Committed golden metrics for the smoke train test
tests/test_drive_train.py / tests/test_drive_scenario_length.py Removed in favor of new smoke/map-type tests
tests/ini_parser/* C INI parser test harness removed
.github/workflows/ci.yml New consolidated unit + smoke job
.github/workflows/perf-ci.yml Reworked to run new perf smoke
.github/workflows/install.yml Simplified install matrix; pre-commit job dropped
.github/workflows/{utest,train-ci,training-test,_cleanup}.yml Removed
Comments suppressed due to low confidence (2)

tests/smoke_tests/test_simulator_perf.py:17

  • The map_dir here points to resources/drive/binaries, but (a) the path is missing the pufferlib/ prefix that the bundled fixtures live under, and (b) that directory contains only sub-directories (carla/, nuplan/, …), not .bin files. Drive.__init__ resolves map_dir with os.listdir(...) filtering on .endswith(".bin"), so self.map_files will be empty and the constructor will fail (or the run will produce no maps). The perf CI job runs this file directly and will fail. Point map_dir at a concrete fixture directory such as pufferlib/resources/drive/binaries/carla.
    tests/smoke_tests/test_simulator_perf.py:5
  • json, warnings, and Path are imported but never used. They can be removed to keep the smoke test minimal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/smoke_tests/test_drive_train.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants