Welcome to the Platform!

Documentation is hosted by github.io.

Installation

After cloning this repo, run:

pip install -e .[dev]

Training

To test if the installation was successful (with the --debug mode), run the following command:

python train.py --debug --no-track

To log the training process, edit the wandb section in config.yaml and remove --no-track from the command line. The config.yaml file contains various configuration settings for the project.

Agent zoo and your custom policy

This baseline comes with four different models under the agent_zoo directory: neurips23_start_kit, yaofeng, takeru, and hybrid. You can use any of these models by specifying the -a argument.

python train.py -a hybrid

You can also create your own policy by creating a new module under the agent_zoo directory, which should contain Policy, Recurrent, and RewardWrapper classes.

Curriculum Learning using Syllabus

The training script supports automatic curriculum learning using the Syllabus library. To use it, add --syllabus to the command line.

python train.py --syllabus

Replay generation

The policies directory contains a set of trained policies. For your models, create a directory and copy the checkpoint files to it. To generate a replay, run the following command:

python train.py -m replay -p policies

The replay file ends with .replay.lzma. You can view the replay using the web viewer.

Evaluation

The evaluation script supports the pvp and pve modes. The pve mode spawns all agents using only one policy. The pvp mode spawns groups of agents, each controlled by a different policy.

To evaluate models in the policies directory, run the following command:

python evaluate.py policies pvp -r 10

This generates 10 results json files in the same directory (by using -r 10), each of which contains the results from 200 episodes. Then the task completion metrics can be viewed using:

python analysis/proc_eval_result.py policies

Interpretability Tools

A pipeline for extracting, clustering, and analyzing hidden-state activations from trained policies. The typical workflow is:

Extract activations from a trained policy
Cluster and analyze the activation space
Visualize agent behavior and cluster structure

1. Extract Activations (`extract_activations.py`)

Records per-timestep hidden-state activations from the action decoder, along with observations and actions, during policy evaluation.

# Extract from a single policy
python extract_activations.py pve -p takeru -o activation_data

# List available policies
python extract_activations.py --list

# Quick smoke test
python extract_activations.py --smoke-test

Outputs a directory per policy under activation_data/ containing activations.json with per-timestep records of activations, observations, and actions.

2. Analyze Activations (`analyze_activations.py`)

Clusters activation vectors using UMAP + HDBSCAN and tests whether clusters correspond to interpretable behavioral features. Computes Cohen's d effect sizes vs random baselines, checks for temporal/agent confounds, and generates multiple visualization modes.

# Standard clustering analysis
python analyze_activations.py activation_data/takeru_200M --subsample 10

# High-D clustering: PCA first, then HDBSCAN, then UMAP for viz
python analyze_activations.py activation_data/takeru_200M \
    --cluster-before-umap --pre-cluster-dims 30

# PCA-feature correlation analysis only (fast, no UMAP/HDBSCAN)
python analyze_activations.py activation_data/takeru_200M \
    --metric-only --cluster-before-umap --pre-cluster-dims 30

# All visualization modes
python analyze_activations.py activation_data/takeru_200M \
    --cluster-before-umap --pre-cluster-dims 30 \
    --feature-scatter --umap-pairs --dendrogram-explorer

Behavioral features tracked (14 total): n_visible_entities, n_visible_npcs, n_visible_players, self_health, self_food, self_water, self_gold, max_combat_level, in_combat, n_inventory_items, tick, is_moving, is_attacking, is_trading.

Visualization modes:

Flag	Output	Description
(default)	`umap_scatter.png`	UMAP embedding colored by HDBSCAN cluster
(default)	`cluster_features.png`	Heatmap of Cohen's d per feature per cluster
(default)	`umap_metric_scatter.png`	UMAP colored by rolling-average feature values (opacity-scaled)
`--cluster-before-umap`	`pca_pairs.png`	PCA direction pair plots (PC1&2, PC3&4, ...)
`--cluster-before-umap`	`pca_feature_correlations.png`	Heatmap of Pearson r between PCs and features, with multivariate R²
`--cluster-before-umap`	`pca_feature_profiles.png`	Per-feature scatter + binned mean profiles along top 3 correlated PCs
`--feature-scatter`	`feature_scatter.png`	All 91 feature-pair scatter matrix
`--umap-pairs`	`umap_pairs.png`	N-D UMAP direction pair plots on raw activations
`--dendrogram-explorer`	`dendrogram_explorer.html`	Interactive Plotly slider over HDBSCAN condensed tree hierarchy

Example outputs (takeru 200M policy, 30-D PCA, subsample=20):

UMAP embedding colored by HDBSCAN cluster labels:

Cohen's d effect sizes per cluster — identifies which behavioral features distinguish each cluster from random baselines:

UMAP colored by rolling-average feature values (opacity = magnitude):

PCA-feature correlation heatmap with multivariate R² sidebar — shows which behavioral features are captured by each principal component:

Per-feature scatter + binned mean profiles along top 3 correlated PCs (axes synced so scatter maps directly onto trend line):

All 91 feature-pair scatter matrix (rolling averages):

10-D UMAP direction pair plots:

PCA direction pair plots:

Key options:

Option	Default	Description
`--subsample N`	10	Keep every Nth record per trajectory
`--hdbscan-min-cluster`	15	HDBSCAN min_cluster_size
`--umap-neighbors`	15	UMAP n_neighbors
`--pre-cluster-dims`	30	PCA dimensions before clustering
`--umap-pairs-dims`	10	Number of UMAP dimensions for pairs plot
`--metric-only`	off	Skip HDBSCAN and 2D UMAP; still runs PCA if `--cluster-before-umap`

3. Agent Life Visualization (`agent_life_visualization.py`)

Plots per-agent timelines showing health, food, water, combat level, inventory, and actions over the agent's lifespan. Optionally overlays cluster assignments.

# Visualize the p75-lifespan agent
python agent_life_visualization.py activation_data/takeru_200M

# Sample 5 random agents, with cluster overlay
python agent_life_visualization.py activation_data/takeru_200M \
    --num_agents 5 --cluster_dir analysis_results/run_dir \
    --output agent_plots/

4. Cluster Life Phase Analysis (`cluster_life_phase.py`)

Checks whether cluster assignments follow a life-phase pattern (e.g. early-game vs late-game behavior). Bins each agent's lifespan into phases and plots the cluster distribution across phases as a stacked area chart.

python cluster_life_phase.py activation_data/takeru_200M \
    --cluster-dir analysis_results/run_dir \
    --n-bins 20 --min-lifespan 50

Name		Name	Last commit message	Last commit date
Latest commit History 1,119 Commits
.github/workflows		.github/workflows
agent_zoo		agent_zoo
analysis		analysis
curriculum_generation		curriculum_generation
docs		docs
neurips23_evaluation		neurips23_evaluation
policies		policies
reinforcement_learning		reinforcement_learning
results		results
scripts		scripts
sweep_results		sweep_results
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
EXPERIMENTS.md		EXPERIMENTS.md
LICENSE		LICENSE
README.md		README.md
activation_controls.py		activation_controls.py
activation_controls_results.md		activation_controls_results.md
activation_controls_v2.py		activation_controls_v2.py
activation_patching.py		activation_patching.py
agent_life_p75_takeru_100M.png		agent_life_p75_takeru_100M.png
agent_life_visualization.py		agent_life_visualization.py
analyze_activations.py		analyze_activations.py
cluster_life_phase.py		cluster_life_phase.py
config.yaml		config.yaml
evaluate.py		evaluate.py
extract_activations.py		extract_activations.py
mediation_analysis.py		mediation_analysis.py
pyproject.toml		pyproject.toml
sweep_cluster_params.py		sweep_cluster_params.py
syllabus_wrapper.py		syllabus_wrapper.py
sync_activation_data.sh		sync_activation_data.sh
train.py		train.py
train_helper.py		train_helper.py
train_nonlinear_probes.py		train_nonlinear_probes.py
train_probes.py		train_probes.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Platform!

Installation

Training

Agent zoo and your custom policy

Curriculum Learning using Syllabus

Replay generation

Evaluation

Interpretability Tools

1. Extract Activations (`extract_activations.py`)

2. Analyze Activations (`analyze_activations.py`)

3. Agent Life Visualization (`agent_life_visualization.py`)

4. Cluster Life Phase Analysis (`cluster_life_phase.py`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Platform!

Installation

Training

Agent zoo and your custom policy

Curriculum Learning using Syllabus

Replay generation

Evaluation

Interpretability Tools

1. Extract Activations (extract_activations.py)

2. Analyze Activations (analyze_activations.py)

3. Agent Life Visualization (agent_life_visualization.py)

4. Cluster Life Phase Analysis (cluster_life_phase.py)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Extract Activations (`extract_activations.py`)

2. Analyze Activations (`analyze_activations.py`)

3. Agent Life Visualization (`agent_life_visualization.py`)

4. Cluster Life Phase Analysis (`cluster_life_phase.py`)

Packages