A Python library for loading and processing MCAP data files in a way that is more suitable for machine learning and robotics training pipelines.
- Dataset-style APIs for iterating MCAP data as episodes/samples
- Built-in statistics utilities (dataset-level and episode-level)
- Convenient access to topics and attachments
- Integration CLI for training with LeRobot using MCAP as the dataset backend
Install from PyPI:
pip install mcap-data-loaderOr install from source:
git clone https://github.com/OpenGHz/MCAP-DataLoader.git --depth 1
cd MCAP-DataLoader
pip install -e .A basic example showing how to load MCAP files from a directory, inspect statistics, and iterate through episodes/samples:
from mcap_data_loader.datasets.mcap_dataset import (
McapFlatBuffersEpisodeDataset,
McapFlatBuffersEpisodeDatasetConfig,
)
from pprint import pprint
dataset = McapFlatBuffersEpisodeDataset(
McapFlatBuffersEpisodeDatasetConfig(
data_root="data/example",
# keys typically include topic names and optional special fields (e.g. "log_stamps")
keys=["/follow/arm/joint_state/position", "log_stamps"],
)
)
print(f"All files: {dataset.all_files}")
print(f"Dataset length: {len(dataset)}")
print("Dataset statistics:")
pprint(dataset.statistics())
for episode in dataset:
print(f"Current file: {episode.config.data_root}")
for sample in episode:
print(f"Sample keys: {sample.keys()}")
break
print(f"Episode length: {len(episode)}")
print(f"All topics: {episode.reader.all_topic_names()}")
print(f"All attachments: {episode.reader.all_attachment_names()}")
print("Episode statistics:")
pprint(episode.statistics())
print("----" * 10)More examples and detailed usage can be found in the examples directory.
MCAP Data Loader provides a CLI to train LeRobot models using MCAP data files. This allows you to use MCAP datasets directly as the training data source for LeRobot, without needing to convert them into a different format.
You should have LeRobot installed in your environment to use this feature. You can install it from PyPI (0.4.3 is tested):
pip install lerobotRun:
mcap_lerobot_train -c configs/config.yamlRecommended: place your config file under a configs/ directory in your current working directory.
The top level is the standard LeRobot configuration, with an additional mcap section for MCAP dataset loading settings:
batch_size: 2
num_workers: 1
policy:
type: act
push_to_hub: false
chunk_size: 2
n_action_steps: 2
dataset:
root: data
repo_id: example
streaming: true
mcap:
states:
- /follow/arm/joint_state/position
- /follow/eef/joint_state/position
actions:
- /lead/arm/pose/position
- /lead/arm/pose/orientation
images:
- /env_camera/color/image_rawThe lists of topics specified by states and actions will be loaded and concatenated to form the observation.state and action required by lerobot, serving as low-dimensional state and action inputs in the training data. Meanwhile, images will be appended to the observation.images field, using the first part of the name (e.g., env_camera in the example above) as a suffix for image input, such as observation.images.env_camera, for use during training.
Notes:
dataset.rootanddataset.repo_idare reused to specify the MCAP dataset root directory and dataset name.- Command-line overrides compatible with LeRobot are supported and take the highest priority (they override values in the config file). For example:
mcap_lerobot_train -c configs/config.yaml --dataset.repo_id=example_task
If you want to use LeRobot’s original data format (while still using this CLI), add --ori:
mcap_lerobot_train -c configs/ori.yaml --oriMake sure the dataset path in your config points to the actual LeRobot dataset location.
Show supported parameters:
mcap_lerobot_train -hIf the output is long, redirect to a file:
mcap_lerobot_train -h > lerobot_help.txtSee LICENSE.