Skip to content

WIP: drive: add goal_mode and goal_on_lane knobs#463

Open
eugenevinitsky wants to merge 1 commit into
emerge/temp_trainingfrom
ev/goal-terminate-mode
Open

WIP: drive: add goal_mode and goal_on_lane knobs#463
eugenevinitsky wants to merge 1 commit into
emerge/temp_trainingfrom
ev/goal-terminate-mode

Conversation

@eugenevinitsky
Copy link
Copy Markdown

Summary

Two orthogonal knobs covering goal placement and episode-end semantics on goal reach. Both default to current behavior — no existing run changes.

knob values effect
`goal_on_lane` `True` (default) / `False` True: existing route-based placement (on lane, in front of agent). False: each goal at a uniformly random drivable point anywhere on the map via new `pick_random_drivable_position` helper.
`goal_mode` `"continue"` (default) / `"terminate"` continue: existing behavior — reaching a goal advances `current_goal_idx`. terminate: reaching the goal sets `terminals[i]=1` (no truncation flag, so PPO does not bootstrap V); env then `add_log + c_reset` to next scenario.

These compose with the existing `target_type` (`static`/`dynamic`), which is unchanged — it still controls only obs format.

What's untouched

When `goal_on_lane=True` (default), `compute_goals` falls through past the new branch and runs the existing `compute_new_route` + path-based placement exactly as before. When `goal_mode="continue"` (default), `c_step`'s termination block is identical to today.

Why not extend `target_type` to a third value?

Earlier draft had `target_type="terminate"` but `target_type` is purely an obs-format knob (3-float position vs 5-float position+heading). Stuffing episode-end semantics into it conflates orthogonal concerns. Splitting into `goal_mode` keeps each knob doing one thing.

Files

file change
`pufferlib/ocean/drive/drive.h` `GOAL_MODE_*` defines, `goal_mode`+`goal_on_lane` in Drive struct, `pick_random_drivable_position` helper, scattered branch in `compute_goals`, `c_step` terminate-on-reach block.
`pufferlib/ocean/drive/binding.c` Unpack both new kwargs.
`pufferlib/ocean/env_binding.h` Export `GOAL_MODE_CONTINUE` / `GOAL_MODE_TERMINATE` constants.
`pufferlib/ocean/drive/drive.py` Validate string values, plumb into `_env_init_kwargs`.
`pufferlib/config/ocean/drive.ini` Defaults under `[env]`.

Phase-1 scope (worth knowing for review)

  • Single-agent focus. Multi-agent + per-agent goal-reach terminal flags work as written (each agent's terminal fires when its own goal is reached), but `add_log + c_reset` ends the env-wide episode for everyone on the first reach. Multi-agent semantics (per-agent reset without disrupting others) is Phase 2.
  • Lane rewards stay on in scattered mode. The agent still has a random-walk path from spawn, so `reward_lane_align`/`reward_lane_center` still fire on that arbitrary path. They may conflict with the scattered goal — model has to learn to balance, or override the lane rewards to 0 via launcher yaml.
  • No A path to scattered goal.* The agent navigates freeform to the scattered goal using only the position beacon in its obs. Phase 2 could compute a route from spawn → goal for cleaner lane reward shaping.
  • goal_radius applies as-is — when randomization is off, the drive.ini value (default 2.0) flows through. For scattered mode you'll likely want a wider radius via launcher yaml override.

Test plan

  • `python setup.py build_ext --inplace --force` builds clean (only pre-existing `maps_checked` warning).
  • Smoke: `Drive(...)` constructs and steps in both default (continue/on_lane) and (terminate/scattered) modes without crashing.
  • Cluster training run on `single_agent_speed_run.yaml` with `env.goal_mode=terminate env.goal_on_lane=False` overrides — verify the policy learns to reach scattered goals.

🤖 Generated with Claude Code

Two orthogonal knobs covering goal placement and episode-end semantics on
goal reach, both defaulting to current behavior:

  goal_on_lane=True (default) / False
    True  -> existing route-based placement (on lane, in front of agent).
    False -> each goal at a uniformly random drivable point anywhere on the
             map, via the new pick_random_drivable_position helper (mirrors
             spawn_agent's lane+geometry pick, sans collision check).

  goal_mode="continue" (default) / "terminate"
    continue  -> existing behavior: reaching a goal advances current_goal_idx;
                 episode keeps running until scenario_length or the inactive
                 threshold trips.
    terminate -> reaching the goal sets terminals[i]=1 for that agent (no
                 truncation flag, so PPO does not bootstrap V); env then
                 add_log + c_reset to advance to the next scenario.

target_type is unchanged -- it still controls obs format (static/dynamic) and
is orthogonal to both new knobs. compute_goals's existing route path is
untouched when goal_on_lane=True.

Files: drive.h struct + defines + compute_goals branch + c_step terminate
hook, env_binding.h exposes GOAL_MODE_* constants, binding.c unpacks both
new kwargs, drive.py validates strings + plumbs through _env_init_kwargs,
drive.ini gives the defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 31, 2026 00:11
@eugenevinitsky eugenevinitsky changed the title drive: add goal_mode and goal_on_lane knobs WIP: drive: add goal_mode and goal_on_lane knobs May 31, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two orthogonal Drive env knobs: goal_mode (continue default vs terminate, controlling episode-end on goal reach) and goal_on_lane (True default vs False, controlling whether goals are placed along the agent's route or scattered at uniformly random drivable points). Both default to the existing behavior.

Changes:

  • Define GOAL_MODE_CONTINUE/GOAL_MODE_TERMINATE, add fields to the Drive struct, branch compute_goals on goal_on_lane, and end episode on first reached goal when in terminate mode.
  • Plumb the two new kwargs from Python through binding.c and export the new int constants from env_binding.h.
  • Validate the new string values in Drive.__init__ and add documented defaults to drive.ini.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pufferlib/ocean/drive/drive.h New mode constants, struct fields, pick_random_drivable_position, scattered branch in compute_goals, and terminate-on-reach block in c_step.
pufferlib/ocean/drive/binding.c Unpacks goal_mode and goal_on_lane kwargs into the env.
pufferlib/ocean/env_binding.h Exports GOAL_MODE_CONTINUE/GOAL_MODE_TERMINATE to Python.
pufferlib/ocean/drive/drive.py New constructor args with validation; passed through _env_init_kwargs.
pufferlib/config/ocean/drive.ini Adds goal_mode and goal_on_lane defaults under [env].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +84
; Episode end on goal reach - options: "continue" (default), "terminate"
goal_mode = "continue"
; True: place goals along the agent's route (existing behavior, on-lane and
; in front of the agent). False: scatter each goal at a uniformly random
; drivable point anywhere on the map.
goal_on_lane = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants