GitHub - KAIST-Visual-AI-Group/DFP: Official Implementation of Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Juil Koo · Mingue Park · Jiwon Choi · Yunhong Min · Minhyuk Sung

KAIST

Preprint

DFP directly updates a one-step generative policy in action space through a drifting-field objective, avoiding ODE trajectory-level credit assignment during offline-to-online RL fine-tuning.

Overview

Drifting Field Policy (DFP) is a non-ODE one-step generative policy for offline-to-online reinforcement learning. DFP represents the policy as a single-pass pushforward map from Gaussian noise to actions, and frames policy improvement as a Wasserstein-2 gradient flow toward the soft policy improvement target.

Because the exact soft target is intractable, DFP uses a practical top-K critic-selected action surrogate: it samples candidate actions from the current policy, selects high-value actions with the critic, and trains the drifting field toward those actions. This keeps inference one-step while directly applying online reward signals at the action level.

This release contains:

drift: Drifting Field Policy.
meanflow: Mean Velocity Policy comparison backbone.
acfql: QC/FQL baseline retained from the original action-chunking codebase.

Environment and Requirements

Tested Environment

Python: 3.10
CUDA: 12.x
Benchmarks: Robomimic, OGBench

Installation

conda env create -f environment.yml
conda activate dfp

Or install the pip dependencies manually:

conda create -n dfp python=3.10 pip -y
conda activate dfp
pip install -r requirements.txt

Datasets

Robomimic

Place the low-dimensional Robomimic datasets under the standard Robomimic directory:

~/.robomimic/lift/mh/low_dim_v15.hdf5
~/.robomimic/can/mh/low_dim_v15.hdf5
~/.robomimic/square/mh/low_dim_v15.hdf5

If your datasets live elsewhere, set:

export ROBOMIMIC_DATASET_DIR=/path/to/robomimic

The datasets can be downloaded from the Robomimic dataset page: https://robomimic.github.io/docs/datasets/robomimic_v0.1.html

OGBench Cube-Quadruple

For cube-quadruple, we use the 100M-size offline dataset:

wget -r -np -nH --cut-dirs=2 -A "*.npz" \
  https://rail.eecs.berkeley.edu/datasets/ogbench/cube-quadruple-play-100m-v0/

Pass the downloaded directory with:

--ogbench_dataset_dir=/path/to/cube-quadruple-play-100m-v0

MVP Baseline Note

MVP is our main comparison baseline. Since no official implementation was available, we implemented the MVP baseline ourselves for reproduction. Most hyperparameters follow the MVP paper, but for cube-triple experiments we set ivc_lambda=0 because it gave the strongest performance in our runs.

We were not able to fully reproduce the reported paper performance across all settings, so the MVP results in our experiments use the best-performing configuration we found.

Usage

The main results are offline-to-online runs. Each command first trains on the offline dataset and then continues online fine-tuning in the same run.

# DFP
MUJOCO_GL=egl python main.py --agent_config=drift --run_group=reproduce --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# MVP
MUJOCO_GL=egl python main.py --agent_config=meanflow --run_group=reproduce --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# QC-BFN
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.actor_type=best-of-n --agent.actor_num_samples=32 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# QC-FQL
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# BFN
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.actor_type=best-of-n --agent.actor_num_samples=4 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=1

# FQL
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=1

The default agent is acfql, so the QC-BFN, QC-FQL, BFN, and FQL commands do not need an explicit --agent_config=acfql. Override the environment when needed:

MUJOCO_GL=egl python main.py \
  --agent_config=drift \
  --run_group=reproduce \
  --env_name=cube-quadruple-play-100m-singletask-task3-v0 \
  --ogbench_dataset_dir=/path/to/cube-quadruple-play-100m-v0 \
  --seed=42

Online-Only From an Offline Checkpoint

To skip offline training and start online fine-tuning from a saved offline checkpoint, pass the checkpoint and set restore_epoch to the offline training horizon:

MUJOCO_GL=egl python main.py \
  --agent_config=drift \
  --run_group=reproduce \
  --env_name=cube-triple-play-singletask-task3-v0 \
  --restore_path=/path/to/params_offline_final.pkl \
  --restore_epoch=1000000 \
  --seed=42

Repository Layout

Path	Description
`agents/`	DFP, MVP, and QC/FQL baseline agents
`config/`	Main, evaluation, optimizer, and agent configs
`envs/`	Robomimic, OGBench, and D4RL environment utilities
`utils/`	Datasets, networks, drifting loss, logging, and Flax utilities

Citation

If you find our work useful, please consider citing:

@article{koo2026drifting,
  title={Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow},
  author={Koo, Juil and Park, Mingue and Choi, Jiwon and Min, Yunhong and Sung, Minhyuk},
  journal={arXiv preprint arXiv:2605.07727},
  year={2026}
}

Acknowledgements

This repository builds on the Q-chunking/FQL codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
assets		assets
config		config
envs		envs
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
evaluation.py		evaluation.py
log_utils.py		log_utils.py
main.py		main.py
main_online.py		main_online.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Overview

Environment and Requirements

Tested Environment

Installation

Datasets

Robomimic

OGBench Cube-Quadruple

MVP Baseline Note

Usage

Online-Only From an Offline Checkpoint

Repository Layout

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Overview

Environment and Requirements

Tested Environment

Installation

Datasets

Robomimic

OGBench Cube-Quadruple

MVP Baseline Note

Usage

Online-Only From an Offline Checkpoint

Repository Layout

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages