vec_log per-agent population mean - average over agents instead of over completed episodes by eugenevinitsky · Pull Request #459 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-05-30T17:47:19Z

Previously we were logging anytime N agents finished, which biased the metrics towards agents that finished frequently. Now we report an average over all the agent slots.

Summary

We use per-agent EMA-based aggregation in `vec_log` so the cross-agent mean reported in wandb has one weight per agent regardless of completion frequency — no per-emission bias and no multi-emission residual bias.

For each agent slot the env keeps a smoothed `Log` state:

```
slot ← α · slot + (1 − α) · episode_log # on every completion (after first)
slot ← episode_log # on the slot's first completion
```

`prepare_log` then sums slots flagged `has_data`, sets `env->log.n = num_with_data`, and does not reset. `vec_log`'s sum-across-envs / divide-by-`aggregate.n` step then produces a population mean: every agent that has ever completed contributes one term, every emission.

Why EMA?

We need some way of providing a good estimate of the average performance of the agents. We don't want to accumulate forever, as that'd be slow to update, and we could use a fixed size number of logs per agent, but this is a nice compromise.

Replaces the existing 'sum-over-completed-episodes / completed-episode-count' aggregation in vec_log with a two-stage per-agent mean: each agent slot contributes one window-mean (mean over its completions since last emit), then those means are averaged across agents. An agent that completes 10 short infraction-terminated episodes now has the same weight in the metric as an agent that completes one full clean episode. drive.h: - New per-env state: per_agent_log_sum[], per_agent_log_count[], per_agent_log_capacity. Grow-only realloc handles REPLAY scenarios where active_agent_count can vary across c_reset. - add_log now stamps the completing agent's slot instead of env->log, and bumps per_agent_log_count[i] instead of env->log.n. - New prepare_log: drains each agent's slot to (sum / count), sums across agents into env->log, sets env->log.n = #agents-with-data. Resets the per-agent buffers in place. - env_static_car_count and static_car_count are folded into each agent's slot so the population mean recovers the env's per-scenario value. drive/binding.c: - New vec_prepare_log binding (MY_METHODS) that calls prepare_log on every env in a VecEnv. drive/drive.py: - Calls binding.vec_prepare_log right before binding.vec_log every report_interval steps. env_binding.h: - Relaxes the vec_log gate from `aggregate.n < num_agents` (wait until enough completed episodes) to `aggregate.n < 1` (emit if anyone has data). For Drive this means "emit if any env has at least one agent with a window-mean"; for other ocean envs sharing this header it means smaller batches emitted more often, which the Python-side mean_and_log in pufferl.py already amortizes via its 0.25s rate-limit. - num_agents arg kept on the C signature for caller-API compatibility, but is now unused. Validated locally: `python setup.py build_ext --inplace --force` clean, existing scenario-length and single-agent-yaml tests pass, and a 64-agent smoke run emits one log dict per scenario_length window with n=num_agents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR changes how Drive environment metrics are aggregated in vec_log to remove completion-rate bias by switching from “mean over completed episodes” to a two-stage “per-agent window mean, then population mean across agent slots” approach. It also adds a Drive-specific vec_prepare_log step to drain per-agent accumulators into env->log immediately before vec_log.

Changes:

Add per-agent log accumulators in Drive and a prepare_log() drain step to produce per-agent population means.
Add a new C binding vec_prepare_log and call it from drive.py before vec_log at each report_interval.
Relax vec_log’s emission gate in env_binding.h from aggregate.n < num_agents to aggregate.n < 1 (API arg retained but unused).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
pufferlib/ocean/env_binding.h	Changes `vec_log` emission gating to emit with any data (ignores `num_agents` arg).
pufferlib/ocean/drive/drive.py	Calls new `binding.vec_prepare_log()` prior to `binding.vec_log()`.
pufferlib/ocean/drive/drive.h	Adds per-agent log sum/count buffers; introduces `ensure_per_agent_log_capacity()` and `prepare_log()`; rewrites `add_log()` to accumulate per-agent.
pufferlib/ocean/drive/binding.c	Exposes `vec_prepare_log` Python binding to call `prepare_log()` across VecEnv envs.

Comments suppressed due to low confidence (1)

pufferlib/ocean/drive/binding.c:55

vec_prepare_log_py doesn’t validate its argument count. Most other vec_* entrypoints in env_binding.h explicitly check PyTuple_Size and raise a clear TypeError; without this, accidental extra args will be silently ignored and missing args will lead to confusing errors inside unpack_vecenv.

        prepare_log((Drive *) vec->envs[i]);
    }
    Py_RETURN_NONE;
}

static int my_put(Env *env, PyObject *args, PyObject *kwargs) {
    PyObject *obs = PyDict_GetItemString(kwargs, "observations");
    if (!PyObject_TypeCheck(obs, &PyArray_Type)) {
        PyErr_SetString(PyExc_TypeError, "Observations must be a NumPy array");
        return 1;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+static void ensure_per_agent_log_capacity(Drive *env, int needed) {
+    if (env->per_agent_log_capacity >= needed) {
+        return;
+    }
+    int old_cap = env->per_agent_log_capacity;
+    env->per_agent_log_sum = (Log *) realloc(env->per_agent_log_sum, needed * sizeof(Log));
+    env->per_agent_log_count = (int *) realloc(env->per_agent_log_count, needed * sizeof(int));
+    memset(&env->per_agent_log_sum[old_cap], 0, (needed - old_cap) * sizeof(Log));
+    memset(&env->per_agent_log_count[old_cap], 0, (needed - old_cap) * sizeof(int));
+    env->per_agent_log_capacity = needed;
+}


+        // Emit whenever any env has data. With Drive's per-agent prepare_log
+        // path, aggregate.n is the cross-env count of agents that contributed
+        // a window-mean (not completed-episode count), so the meaningful gate
+        // is "at least one contribution." Other ocean envs that don't run
+        // prepare_log retain completed-episode-count semantics and now emit
+        // smaller batches more often — the Python-side mean_and_log
+        // (pufferl.py) re-averages across emissions in its rate-limit window.
+        if (aggregate.n < 1) {
+            return dict;


Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the window-mean per-agent state (sum + count, reset on every emit) with a per-agent EMA that persists across emissions. Each agent slot tracks slot = alpha*slot + (1 - alpha)*new_episode_log on every completion, with the first completion seeding the slot directly. prepare_log then emits the population mean across agents flagged has_data, never resetting the EMAs. Removes the residual completion-rate bias that the previous PR design still carried: fast-completers no longer dominate the multi-emission average that pufferl.py's mean_and_log produces, because every agent now contributes a single smoothed value to every emission rather than appearing in some windows and not others. New env config knob log_ema_alpha (default 0.95, half-life ~14 episodes per agent) controls smoothing. alpha=0 collapses to "most recent episode only", alpha->1 freezes the slot at its first observation. Files: - drive.h: rename per_agent_log_sum to per_agent_log_ema, add per_agent_has_data flag, add log_ema_alpha field. add_log builds a fresh episode_log snapshot, then seeds the slot or EMA-blends. prepare_log skips no-data slots and sums (no division) into env->log; vec_log divides by env->log.n = num_with_data downstream. - binding.c: unpack log_ema_alpha kwarg. - drive.py: __init__ takes log_ema_alpha=0.95, plumbs through env_kwargs. - drive.ini: log_ema_alpha = 0.95 under [env]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

T1: gate skips emissions until the first agent completes an episode; first log dict appears exactly when the scenario truncates. T2: once every agent has completed at least one episode, dict["n"] equals num_agents (population fully represented in the cross-agent mean). T3: prepare_log preserves per-agent EMA state across emissions -- two consecutive emissions with no intervening completions produce identical metric values, locking in that the EMA buffers are not reset on emit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

active_agent_count is set once by set_active_agents (called only from init() and never from c_reset), so the grow-only ensure_per_agent_log_capacity helper only ever did real work on its first call and the comment justifying it as defense against varying populations was inaccurate. Allocate the three per-agent arrays in init() right after set_active_agents and drop the helper, the per-step ensure call in add_log, and the now-unused per_agent_log_capacity field. prepare_log iterates active_agent_count directly. Also trim the struct-field, prepare_log, episode_log-snapshot, and env-composition comments that were either describing wiring history or referencing downstream consumers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The eval branch of vec_log only checked env[0]'s log.n before dividing every env's log fields by their own n, so envs other than env[0] with no data would divide by zero. The empty-data short-circuit also returned the unused outer dict instead of the half-built list, leaking it to the GC. Rewrite the eval branch to per-env-check inside the loop: skip the divide-and-emit for envs with n=0 and leave their list slot as an empty dict, so the list shape stays vec->num_envs regardless of which envs have contributed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test_drive_train called load_config which reads sys.argv; under pytest that contains pytest's own flags and the argparser bailed with SystemExit(2). Patch sys.argv to a single-element list inside the test so pytest discovery works (the existing train-ci.yml bare-script invocation is unaffected). Add tests/test_drive_per_agent_logging.py and tests/test_drive_train.py to the utest pytest pipeline. test_drive_train runs in its own pytest step because the test calls os._exit(0) on success to dodge worker-thread hangs, which would otherwise mask any earlier-test failures with exit 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Returning vec->num_envs dicts with empty entries for no-data envs poisons downstream eval aggregation: pufferlib/ocean/benchmark/evaluators/base.py treats a missing "n" key as 1, so each empty dict inflates total_n and the weighted-mean denominator without contributing any numerator -- per-metric means get diluted. Worse, multi_scenario._should_stop counts emissions against num_scenarios, so cold-start steps with all-empty lists could terminate the eval loop early with garbage data. Return a list of only the populated dicts (length <= vec->num_envs). When no env has data the list is empty, drive.py's `if log: info.append(log)` guard skips it, and downstream counters keep ticking on real emissions only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

log_ema_alpha = 0.707 (was 0.95) so the per-agent EMA's effective half-life is ~2 episodes -- responsive enough to track training progress within a few emissions per agent. Drive.__init__ default tracks drive.ini. Also join a split f-string in test_drive_per_agent_logging.py to make ruff-format happy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 30, 2026 17:47

Copilot started reviewing on behalf of eugenevinitsky May 30, 2026 17:47 View session

drive/binding: drop forward-decl wiring comments

dda8ccc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI reviewed May 30, 2026

View reviewed changes

Eugene Vinitsky and others added 9 commits May 30, 2026 13:54

env_binding: focus vec_log gate comment on the divide-by-zero invariant

ff62899

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

env_binding: drop unused num_agents extraction in vec_log

e952bcb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eugenevinitsky changed the title ~~WIP: vec_log per-agent population mean (remove completion-rate bias)~~ vec_log per-agent population mean - average over agents instead of over completed episodes May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vec_log per-agent population mean - average over agents instead of over completed episodes#459

vec_log per-agent population mean - average over agents instead of over completed episodes#459
eugenevinitsky wants to merge 11 commits into
emerge/temp_trainingfrom
ev/per-agent-logging

eugenevinitsky commented May 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why EMA?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eugenevinitsky commented May 30, 2026 •

edited

Loading