MCAP Data Loader

A Python library for loading and processing MCAP data files in a way that is more suitable for machine learning and robotics training pipelines.

Features

Dataset-style APIs for iterating MCAP data as episodes/samples
Built-in statistics utilities (dataset-level and episode-level)
Convenient access to topics and attachments
Integration CLI for training with LeRobot using MCAP as the dataset backend

Installation

Install from PyPI:

pip install mcap-data-loader

Or install from source:

git clone https://github.com/OpenGHz/MCAP-DataLoader.git --depth 1
cd MCAP-DataLoader
pip install -e .

Quickstart (basic usage)

A basic example showing how to load MCAP files from a directory, inspect statistics, and iterate through episodes/samples:

from mcap_data_loader.datasets.mcap_dataset import (
    McapFlatBuffersEpisodeDataset,
    McapFlatBuffersEpisodeDatasetConfig,
)
from pprint import pprint

dataset = McapFlatBuffersEpisodeDataset(
    McapFlatBuffersEpisodeDatasetConfig(
        data_root="data/example",
        # keys typically include topic names and optional special fields (e.g. "log_stamps")
        keys=["/follow/arm/joint_state/position", "log_stamps"],
    )
)

print(f"All files: {dataset.all_files}")
print(f"Dataset length: {len(dataset)}")

print("Dataset statistics:")
pprint(dataset.statistics())

for episode in dataset:
    print(f"Current file: {episode.config.data_root}")

    for sample in episode:
        print(f"Sample keys: {sample.keys()}")
        break

    print(f"Episode length: {len(episode)}")
    print(f"All topics: {episode.reader.all_topic_names()}")
    print(f"All attachments: {episode.reader.all_attachment_names()}")

    print("Episode statistics:")
    pprint(episode.statistics())
    print("----" * 10)

More examples and detailed usage can be found in the examples directory.

Integration with LeRobot training

MCAP Data Loader provides a CLI to train LeRobot models using MCAP data files. This allows you to use MCAP datasets directly as the training data source for LeRobot, without needing to convert them into a different format.

You should have LeRobot installed in your environment to use this feature. You can install it from PyPI (0.4.3 is tested):

pip install lerobot

Train with an MCAP dataset

Run:

mcap_lerobot_train -c configs/config.yaml

Recommended: place your config file under a configs/ directory in your current working directory.

Configuration reference

The top level is the standard LeRobot configuration, with an additional mcap section for MCAP dataset loading settings:

batch_size: 2
num_workers: 1
policy:
  type: act
  push_to_hub: false
  chunk_size: 2
  n_action_steps: 2

dataset:
  root: data
  repo_id: example
  streaming: true

mcap:
  states:
    - /follow/arm/joint_state/position
    - /follow/eef/joint_state/position
  actions:
    - /lead/arm/pose/position
    - /lead/arm/pose/orientation
  images:
    - /env_camera/color/image_raw

The lists of topics specified by states and actions will be loaded and concatenated to form the observation.state and action required by lerobot, serving as low-dimensional state and action inputs in the training data. Meanwhile, images will be appended to the observation.images field, using the first part of the name (e.g., env_camera in the example above) as a suffix for image input, such as observation.images.env_camera, for use during training.

Notes:

dataset.root and dataset.repo_id are reused to specify the MCAP dataset root directory and dataset name.
Command-line overrides compatible with LeRobot are supported and take the highest priority (they override values in the config file). For example:
```
mcap_lerobot_train -c configs/config.yaml --dataset.repo_id=example_task
```

Train with LeRobot’s original dataset format

If you want to use LeRobot’s original data format (while still using this CLI), add --ori:

mcap_lerobot_train -c configs/ori.yaml --ori

Make sure the dataset path in your config points to the actual LeRobot dataset location.

Help / supported CLI args

Show supported parameters:

mcap_lerobot_train -h

If the output is long, redirect to a file:

mcap_lerobot_train -h > lerobot_help.txt

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
examples		examples
mcap_data_loader		mcap_data_loader
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCAP Data Loader

Features

Installation

Quickstart (basic usage)

Integration with LeRobot training

Train with an MCAP dataset

Configuration reference

Train with LeRobot’s original dataset format

Help / supported CLI args

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MCAP Data Loader

Features

Installation

Quickstart (basic usage)

Integration with LeRobot training

Train with an MCAP dataset

Configuration reference

Train with LeRobot’s original dataset format

Help / supported CLI args

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages