Conversation
…nd removed everything else to prevent redundencies
…sampling logic for initial goals, configurable params for spawn settings
Greptile OverviewGreptile SummaryThis PR implements random agent spawning with configurable dimensions and collision-free placement. Agents are spawned with randomized vehicle dimensions (width, length, height) on drivable lanes, with rejection sampling to prevent collisions and offroad placements. Initial goals are sampled using the 2.0 goal sampling logic. Key Changes:
Critical Issues Found:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Python as Drive.py
participant Binding as binding.c
participant Core as drive.h
Python->>Binding: shared() with random_agents mode
Binding->>Binding: Distribute agents across envs
Binding-->>Python: agent_offsets, map_ids, env_count
Python->>Binding: env_init() for each env
Binding->>Core: Initialize Drive struct
Core->>Core: load_map_binary()
Core->>Core: set_active_agents()
Core->>Core: spawn_agents_with_counts()
loop For each agent
Core->>Core: spawn_agent()
Core->>Core: Random vehicle dimensions
loop Rejection sampling (MAX_SPAWN_ATTEMPTS)
Core->>Core: get_random_point_on_lane()
Core->>Core: check_spawn_collision()
Core->>Core: check_spawn_offroad()
alt Valid spawn
Core->>Core: break
else Invalid
Core->>Core: continue
end
end
Core->>Core: Update agent position & state
end
Core->>Core: init_goal_positions()
Core->>Core: sample_new_goal() for each agent
Core-->>Python: Environment ready
|
| float mod_heading = atan2f(agent->heading_y, agent->heading_x); | ||
| float cos_theta = dot/(mod_to_pt * mod_heading); |
There was a problem hiding this comment.
incorrect cosine calculation - mod_heading uses atan2f which returns an angle, not a magnitude
the cosine is already computed via the dot product formula: cos(theta) = dot / (|v1| * |v2|), where |heading| should be the magnitude (which is 1.0 for normalized heading vectors), not atan2f
| float mod_heading = atan2f(agent->heading_y, agent->heading_x); | |
| float cos_theta = dot/(mod_to_pt * mod_heading); | |
| float mod_heading = sqrtf(agent->heading_x * agent->heading_x + agent->heading_y * agent->heading_y); | |
| float cos_theta = dot/(mod_to_pt * mod_heading); |
There was a problem hiding this comment.
Pull request overview
This PR adds a new random_agents initialization mode to the Ocean Drive environment, enabling randomized per-environment agent counts and randomized vehicle dimensions/poses during spawning, with additional spawn-time collision/off-road rejection and integration with goal sampling.
Changes:
- Extend INI/env config and Python/C bindings to support
random_agentsmode and spawn dimension parameters. - Implement random agent spawning logic in the C core, including collision/off-road checks and goal initialization behavior.
- Update demo/visualization entrypoints and default
drive.inisettings to exercise the new mode.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
pufferlib/ocean/env_config.h |
Adds new spawn/agent-count config fields and parses init_mode strings. |
pufferlib/ocean/drive/visualize.c |
Wires spawn settings into visualization env init; randomizes agent counts in random mode. |
pufferlib/ocean/drive/error.h |
Extends error types and adjusts error string formatting. |
pufferlib/ocean/drive/drive.py |
Adds Python-side config for random agent counts/dimensions and passes through to bindings. |
pufferlib/ocean/drive/drive.h |
Implements spawn rejection sampling, off-road checks, random init mode wiring, and goal sampling adjustments. |
pufferlib/ocean/drive/drive.c |
Updates demo to use spawn settings and random agent counts; changes default weights/map. |
pufferlib/ocean/drive/datatypes.h |
Adds free_agents helper to free an array of agents safely. |
pufferlib/ocean/drive/binding.c |
Adds random agent-count partitioning in shared() and passes spawn settings into env init. |
pufferlib/config/ocean/drive.ini |
Updates defaults to enable random_agents and sets spawn/agent-count parameters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Changed counts to max supported by sim, max_controlled, num_created and removed everything else to prevent redundencies * Fixed memory leaks
…ed defaults to align with requirements
e646569 to
09e725d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Design Choice(we don't have wide vehicles on roads) | ||
| if (spawn_width > spawn_length) | ||
| spawn_width = spawn_length; | ||
| float spawn_height = 1.5f; // Design Choice: Doesn't matter as we don't have flying cars in 2026 |
There was a problem hiding this comment.
spawn_height is currently hardcoded to 1.5f, so the configured env->spawn_settings.h is ignored even though it’s parsed and passed through config/bindings. Use the configured spawn height (and keep the z-centering adjustment) so spawns match the INI/Python parameters.
| float spawn_height = 1.5f; // Design Choice: Doesn't matter as we don't have flying cars in 2026 | |
| float spawn_height = spawn_settings.h; |
| c_reset(&env); | ||
| c_render(&env); | ||
| Weights *weights = load_weights("resources/drive/puffer_drive_resampling_speed_lane.bin"); | ||
| Weights *weights = load_weights("best_policy_with_reward_conditioning.bin"); |
There was a problem hiding this comment.
load_weights("best_policy_with_reward_conditioning.bin") appears to reference a file that isn’t in the repo and also drops the resources/drive/ prefix used elsewhere. This breaks the demo by default unless the artifact is committed and the path is corrected.
There was a problem hiding this comment.
Is this comment correct? It's true that I don't see this file
| // Ensure last env can still meet min_agents_per_env requirement | ||
| int upper = (remaining - max_agents_per_env < min_agents_per_env) ? remaining - min_agents_per_env | ||
| : max_agents_per_env; | ||
| count = min_agents_per_env + rand() % (upper - min_agents_per_env + 1); |
There was a problem hiding this comment.
The upper bound computation for count = min_agents_per_env + rand() % (upper - min_agents_per_env + 1) can yield upper < min_agents_per_env for infeasible combinations of remaining, min_agents_per_env, and max_agents_per_env (e.g. remaining=9, min=5, max=6), leading to modulo-by-zero/UB. Add a feasibility check and a fallback strategy (e.g. error out, relax the min for the final env, or adjust prior counts) before computing the modulo range.
| count = min_agents_per_env + rand() % (upper - min_agents_per_env + 1); | |
| if (upper < min_agents_per_env) { | |
| /* Infeasible to satisfy min_agents_per_env for all envs with remaining agents. | |
| * Fall back by relaxing the minimum for this env: fill up to max_agents_per_env | |
| * (or all remaining if fewer). This avoids a zero/negative modulo range. | |
| */ | |
| count = remaining > max_agents_per_env ? max_agents_per_env : remaining; | |
| } else { | |
| count = min_agents_per_env + rand() % (upper - min_agents_per_env + 1); | |
| } |
| reward_lane_align = 0 | ||
| reward_lane_center = 0 |
| ; Determines the target distance to the new goal in the case of goal_behavior = generate_new_goals. | ||
| ; Large numbers will select a goal point further away from the agent's current position. | ||
| min_goal_distance = 0.5 | ||
| min_goal_distance = 5.0 |
| // Initialization modes | ||
| #define INIT_ALL_VALID 0 | ||
| #define INIT_ONLY_CONTROLLABLE_AGENTS 1 | ||
| #define RANDOM_AGENTS 2 |
There was a problem hiding this comment.
I think this is sort of a bad name, it implies all sorts of things that aren't just "we randomize the number of agents in the scene". It could be called INIT_VARIABLE_AGENT_NUMBER even if that's mouthy
| if (env->init_mode == RANDOM_AGENTS) { | ||
| // TODO: Needs to be handled later if we want GOAL_RESPAWN for random agents | ||
| spawn_agents_with_counts(env); |
There was a problem hiding this comment.
A little puzzled by this. Anytime we need to respawn a single agent we respawn the whole thing?
| render = True | ||
| render_async = False # Render interval of below 50 might cause process starvation and slowness in training | ||
| render_interval = 250 | ||
| render_interval = 1000 |
There was a problem hiding this comment.
Rendering was taking a really long amount of time even with async on so I just increased the interval
There was a problem hiding this comment.
This only happens when max_agents is very high; previously, we were working with just 32 so it was not a problem
|
|
||
| env->num_agents = max_agents; | ||
| if (env->init_mode == INIT_VARIABLE_AGENT_NUMBER) { | ||
| env->spawn_settings.max_agents_in_sim = max_agents_per_env; // Random Agents only supports controlled agents |
There was a problem hiding this comment.
Random Agents is not a good name to use throughout the codebase for this feature, it can mean too many things
| f"Please reduce num_maps, add more maps to {map_dir}, or set allow_fewer_maps=True." | ||
| ) | ||
|
|
||
| self.use_all_maps = use_all_maps |
There was a problem hiding this comment.
i'm sort of struggling to see why these use_all_maps changes showed up in this particular PR
There was a problem hiding this comment.
This was a tiny bug with debugging using drive.py
There was a problem hiding this comment.
This was the exact place of issue: https://github.com/Emerge-Lab/PufferDrive/pull/292/changes#diff-fae14af53080a7325bdddc95a97c9675183e4d0bb11f6e537d4c62e8bbb0eee6L441
| agent->log_velocity_y = (float *)malloc(array_size * sizeof(float)); | ||
| agent->log_heading = (float *)malloc(array_size * sizeof(float)); | ||
| agent->log_valid = (int *)malloc(array_size * sizeof(int)); | ||
| allocate_agent_trajectories(agent); |
There was a problem hiding this comment.
I am a tiny bit puzzled why this was pulled into a function
There was a problem hiding this comment.
I didn't want to like bloat the code and the log trajectories was required for goal respawn behavior
| return (s >= 0 && s <= 1 && t >= 0 && t <= 1); | ||
| } | ||
|
|
||
| static bool check_offroad(Drive *env, Agent *agent) { |
There was a problem hiding this comment.
did we not have an offroad check elsewhere we could reuse?
There was a problem hiding this comment.
Oh I guess you just needed to reuse it in a few spots, good
There was a problem hiding this comment.
No like, this needs to be reworked, cuz it's a good practice
PufferDrive/pufferlib/ocean/drive/drive.h
Line 2207 in 8fd053f
There was a problem hiding this comment.
But yea, there is no single function for check_offroad
| float spawn_length = random_uniform(spawn_settings.min_l, spawn_settings.max_l); | ||
| float spawn_width = random_uniform(spawn_settings.min_w, spawn_settings.max_w); | ||
|
|
||
| // Design Choice(we don't have wide vehicles on roads) | ||
| if (spawn_width > spawn_length) | ||
| spawn_width = spawn_length; |
There was a problem hiding this comment.
btw, this is sort of a biased sampler. It'll tend to sample a lot of squares. Just a heads up. This is pretty fast, so it's possible you can just rejection sample until you get a valid sample
Spawning logic added with a random number of agents and randomized dimensions
Integrated with the 2.0 goal sampling for initial goals
Collision and off-road checks during spawns