WIP: drive: add goal_mode and goal_on_lane knobs by eugenevinitsky · Pull Request #463 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-05-31T00:11:45Z

Summary

Two orthogonal knobs covering goal placement and episode-end semantics on goal reach. Both default to current behavior — no existing run changes.

knob	values	effect
`goal_on_lane`	`True` (default) / `False`	True: existing route-based placement (on lane, in front of agent). False: each goal at a uniformly random drivable point anywhere on the map via new `pick_random_drivable_position` helper.
`goal_mode`	`"continue"` (default) / `"terminate"`	continue: existing behavior — reaching a goal advances `current_goal_idx`. terminate: reaching the goal sets `terminals[i]=1` (no truncation flag, so PPO does not bootstrap V); env then `add_log + c_reset` to next scenario.

These compose with the existing `target_type` (`static`/`dynamic`), which is unchanged — it still controls only obs format.

What's untouched

When `goal_on_lane=True` (default), `compute_goals` falls through past the new branch and runs the existing `compute_new_route` + path-based placement exactly as before. When `goal_mode="continue"` (default), `c_step`'s termination block is identical to today.

Why not extend `target_type` to a third value?

Earlier draft had `target_type="terminate"` but `target_type` is purely an obs-format knob (3-float position vs 5-float position+heading). Stuffing episode-end semantics into it conflates orthogonal concerns. Splitting into `goal_mode` keeps each knob doing one thing.

Files

file	change
`pufferlib/ocean/drive/drive.h`	`GOAL_MODE_*` defines, `goal_mode`+`goal_on_lane` in Drive struct, `pick_random_drivable_position` helper, scattered branch in `compute_goals`, `c_step` terminate-on-reach block.
`pufferlib/ocean/drive/binding.c`	Unpack both new kwargs.
`pufferlib/ocean/env_binding.h`	Export `GOAL_MODE_CONTINUE` / `GOAL_MODE_TERMINATE` constants.
`pufferlib/ocean/drive/drive.py`	Validate string values, plumb into `_env_init_kwargs`.
`pufferlib/config/ocean/drive.ini`	Defaults under `[env]`.

Phase-1 scope (worth knowing for review)

Single-agent focus. Multi-agent + per-agent goal-reach terminal flags work as written (each agent's terminal fires when its own goal is reached), but `add_log + c_reset` ends the env-wide episode for everyone on the first reach. Multi-agent semantics (per-agent reset without disrupting others) is Phase 2.
Lane rewards stay on in scattered mode. The agent still has a random-walk path from spawn, so `reward_lane_align`/`reward_lane_center` still fire on that arbitrary path. They may conflict with the scattered goal — model has to learn to balance, or override the lane rewards to 0 via launcher yaml.
No A path to scattered goal.* The agent navigates freeform to the scattered goal using only the position beacon in its obs. Phase 2 could compute a route from spawn → goal for cleaner lane reward shaping.
goal_radius applies as-is — when randomization is off, the drive.ini value (default 2.0) flows through. For scattered mode you'll likely want a wider radius via launcher yaml override.

Test plan

`python setup.py build_ext --inplace --force` builds clean (only pre-existing `maps_checked` warning).
Smoke: `Drive(...)` constructs and steps in both default (continue/on_lane) and (terminate/scattered) modes without crashing.
Cluster training run on `single_agent_speed_run.yaml` with `env.goal_mode=terminate env.goal_on_lane=False` overrides — verify the policy learns to reach scattered goals.

🤖 Generated with Claude Code

Two orthogonal knobs covering goal placement and episode-end semantics on goal reach, both defaulting to current behavior: goal_on_lane=True (default) / False True -> existing route-based placement (on lane, in front of agent). False -> each goal at a uniformly random drivable point anywhere on the map, via the new pick_random_drivable_position helper (mirrors spawn_agent's lane+geometry pick, sans collision check). goal_mode="continue" (default) / "terminate" continue -> existing behavior: reaching a goal advances current_goal_idx; episode keeps running until scenario_length or the inactive threshold trips. terminate -> reaching the goal sets terminals[i]=1 for that agent (no truncation flag, so PPO does not bootstrap V); env then add_log + c_reset to advance to the next scenario. target_type is unchanged -- it still controls obs format (static/dynamic) and is orthogonal to both new knobs. compute_goals's existing route path is untouched when goal_on_lane=True. Files: drive.h struct + defines + compute_goals branch + c_step terminate hook, env_binding.h exposes GOAL_MODE_* constants, binding.c unpacks both new kwargs, drive.py validates strings + plumbs through _env_init_kwargs, drive.ini gives the defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds two orthogonal Drive env knobs: goal_mode (continue default vs terminate, controlling episode-end on goal reach) and goal_on_lane (True default vs False, controlling whether goals are placed along the agent's route or scattered at uniformly random drivable points). Both default to the existing behavior.

Changes:

Define GOAL_MODE_CONTINUE/GOAL_MODE_TERMINATE, add fields to the Drive struct, branch compute_goals on goal_on_lane, and end episode on first reached goal when in terminate mode.
Plumb the two new kwargs from Python through binding.c and export the new int constants from env_binding.h.
Validate the new string values in Drive.__init__ and add documented defaults to drive.ini.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pufferlib/ocean/drive/drive.h	New mode constants, struct fields, `pick_random_drivable_position`, scattered branch in `compute_goals`, and terminate-on-reach block in `c_step`.
pufferlib/ocean/drive/binding.c	Unpacks `goal_mode` and `goal_on_lane` kwargs into the env.
pufferlib/ocean/env_binding.h	Exports `GOAL_MODE_CONTINUE`/`GOAL_MODE_TERMINATE` to Python.
pufferlib/ocean/drive/drive.py	New constructor args with validation; passed through `_env_init_kwargs`.
pufferlib/config/ocean/drive.ini	Adds `goal_mode` and `goal_on_lane` defaults under `[env]`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+; Episode end on goal reach - options: "continue" (default), "terminate"
+goal_mode = "continue"
+; True: place goals along the agent's route (existing behavior, on-lane and
+; in front of the agent). False: scatter each goal at a uniformly random
+; drivable point anywhere on the map.
+goal_on_lane = True


Copilot AI review requested due to automatic review settings May 31, 2026 00:11

Copilot started reviewing on behalf of eugenevinitsky May 31, 2026 00:11 View session

eugenevinitsky changed the title ~~drive: add goal_mode and goal_on_lane knobs~~ WIP: drive: add goal_mode and goal_on_lane knobs May 31, 2026

Copilot AI reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: drive: add goal_mode and goal_on_lane knobs#463

WIP: drive: add goal_mode and goal_on_lane knobs#463
eugenevinitsky wants to merge 1 commit into
emerge/temp_trainingfrom
ev/goal-terminate-mode

eugenevinitsky commented May 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented May 31, 2026

Summary

What's untouched

Why not extend `target_type` to a third value?

Files

Phase-1 scope (worth knowing for review)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants