Skip to content

Metta-AI/NeuralMMO-interpretability

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,119 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

figure

icon Welcome to the Platform!

PyPI version Twitter

Documentation is hosted by github.io.

Installation

After cloning this repo, run:

pip install -e .[dev]

Training

To test if the installation was successful (with the --debug mode), run the following command:

python train.py --debug --no-track

To log the training process, edit the wandb section in config.yaml and remove --no-track from the command line. The config.yaml file contains various configuration settings for the project.

Agent zoo and your custom policy

This baseline comes with four different models under the agent_zoo directory: neurips23_start_kit, yaofeng, takeru, and hybrid. You can use any of these models by specifying the -a argument.

python train.py -a hybrid

You can also create your own policy by creating a new module under the agent_zoo directory, which should contain Policy, Recurrent, and RewardWrapper classes.

Curriculum Learning using Syllabus

The training script supports automatic curriculum learning using the Syllabus library. To use it, add --syllabus to the command line.

python train.py --syllabus

Replay generation

The policies directory contains a set of trained policies. For your models, create a directory and copy the checkpoint files to it. To generate a replay, run the following command:

python train.py -m replay -p policies

The replay file ends with .replay.lzma. You can view the replay using the web viewer.

Evaluation

The evaluation script supports the pvp and pve modes. The pve mode spawns all agents using only one policy. The pvp mode spawns groups of agents, each controlled by a different policy.

To evaluate models in the policies directory, run the following command:

python evaluate.py policies pvp -r 10

This generates 10 results json files in the same directory (by using -r 10), each of which contains the results from 200 episodes. Then the task completion metrics can be viewed using:

python analysis/proc_eval_result.py policies

Interpretability Tools

A pipeline for extracting, clustering, and analyzing hidden-state activations from trained policies. The typical workflow is:

  1. Extract activations from a trained policy
  2. Cluster and analyze the activation space
  3. Visualize agent behavior and cluster structure

1. Extract Activations (extract_activations.py)

Records per-timestep hidden-state activations from the action decoder, along with observations and actions, during policy evaluation.

# Extract from a single policy
python extract_activations.py pve -p takeru -o activation_data

# List available policies
python extract_activations.py --list

# Quick smoke test
python extract_activations.py --smoke-test

Outputs a directory per policy under activation_data/ containing activations.json with per-timestep records of activations, observations, and actions.

2. Analyze Activations (analyze_activations.py)

Clusters activation vectors using UMAP + HDBSCAN and tests whether clusters correspond to interpretable behavioral features. Computes Cohen's d effect sizes vs random baselines, checks for temporal/agent confounds, and generates multiple visualization modes.

# Standard clustering analysis
python analyze_activations.py activation_data/takeru_200M --subsample 10

# High-D clustering: PCA first, then HDBSCAN, then UMAP for viz
python analyze_activations.py activation_data/takeru_200M \
    --cluster-before-umap --pre-cluster-dims 30

# PCA-feature correlation analysis only (fast, no UMAP/HDBSCAN)
python analyze_activations.py activation_data/takeru_200M \
    --metric-only --cluster-before-umap --pre-cluster-dims 30

# All visualization modes
python analyze_activations.py activation_data/takeru_200M \
    --cluster-before-umap --pre-cluster-dims 30 \
    --feature-scatter --umap-pairs --dendrogram-explorer

Behavioral features tracked (14 total): n_visible_entities, n_visible_npcs, n_visible_players, self_health, self_food, self_water, self_gold, max_combat_level, in_combat, n_inventory_items, tick, is_moving, is_attacking, is_trading.

Visualization modes:

Flag Output Description
(default) umap_scatter.png UMAP embedding colored by HDBSCAN cluster
(default) cluster_features.png Heatmap of Cohen's d per feature per cluster
(default) umap_metric_scatter.png UMAP colored by rolling-average feature values (opacity-scaled)
--cluster-before-umap pca_pairs.png PCA direction pair plots (PC1&2, PC3&4, ...)
--cluster-before-umap pca_feature_correlations.png Heatmap of Pearson r between PCs and features, with multivariate R²
--cluster-before-umap pca_feature_profiles.png Per-feature scatter + binned mean profiles along top 3 correlated PCs
--feature-scatter feature_scatter.png All 91 feature-pair scatter matrix
--umap-pairs umap_pairs.png N-D UMAP direction pair plots on raw activations
--dendrogram-explorer dendrogram_explorer.html Interactive Plotly slider over HDBSCAN condensed tree hierarchy

Example outputs (takeru 200M policy, 30-D PCA, subsample=20):

UMAP embedding colored by HDBSCAN cluster labels:

umap_scatter

Cohen's d effect sizes per cluster — identifies which behavioral features distinguish each cluster from random baselines:

cluster_features

UMAP colored by rolling-average feature values (opacity = magnitude):

umap_metric_scatter

PCA-feature correlation heatmap with multivariate R² sidebar — shows which behavioral features are captured by each principal component:

pca_feature_correlations

Per-feature scatter + binned mean profiles along top 3 correlated PCs (axes synced so scatter maps directly onto trend line):

pca_feature_profiles

All 91 feature-pair scatter matrix (rolling averages):

feature_scatter

10-D UMAP direction pair plots:

umap_pairs

PCA direction pair plots:

pca_pairs

Key options:

Option Default Description
--subsample N 10 Keep every Nth record per trajectory
--hdbscan-min-cluster 15 HDBSCAN min_cluster_size
--umap-neighbors 15 UMAP n_neighbors
--pre-cluster-dims 30 PCA dimensions before clustering
--umap-pairs-dims 10 Number of UMAP dimensions for pairs plot
--metric-only off Skip HDBSCAN and 2D UMAP; still runs PCA if --cluster-before-umap

3. Agent Life Visualization (agent_life_visualization.py)

Plots per-agent timelines showing health, food, water, combat level, inventory, and actions over the agent's lifespan. Optionally overlays cluster assignments.

# Visualize the p75-lifespan agent
python agent_life_visualization.py activation_data/takeru_200M

# Sample 5 random agents, with cluster overlay
python agent_life_visualization.py activation_data/takeru_200M \
    --num_agents 5 --cluster_dir analysis_results/run_dir \
    --output agent_plots/

4. Cluster Life Phase Analysis (cluster_life_phase.py)

Checks whether cluster assignments follow a life-phase pattern (e.g. early-game vs late-game behavior). Bins each agent's lifespan into phases and plots the cluster distribution across phases as a stacked area chart.

python cluster_life_phase.py activation_data/takeru_200M \
    --cluster-dir analysis_results/run_dir \
    --n-bins 20 --min-lifespan 50

About

Interpretability on NeuralMMO tournament winners

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.8%
  • Shell 2.2%