Overmind

Cut PyTorch model loading time from 15s to 0.2s with zero-copy shared memory caching.

Overmind is a non-intrusive caching library that dramatically speeds up PyTorch model loading by storing serialized models in shared memory. Once a model is loaded, subsequent loads from any process take milliseconds instead of seconds.

Named after the Overmind from StarCraft, it coordinates model caching across processes like the Overmind coordinates the Zerg Swarm.

Note that the package name on PyPI is overmind-cache, since overmind is taken.

Features

Fast model loading - First load caches to shared memory; subsequent loads are ~5x faster
Process-agnostic - Cache persists across process restarts via a background server
Non-intrusive - Just add one line of code; no changes to model loading logic
Memory efficient - Multiple processes share the same cached tensors in memory
Broad compatibility - Works with diffusers, transformers, bitsandbytes quantization, and vanilla torch.load

Installation

pip install overmind-cache

Or install from source:

git clone https://github.com/taichi-dev/overmind.git
cd overmind
pip install -e .

Quick Start

Option 1: Monkey Patching (Recommended)

Add a single line at the top of your script to automatically accelerate all supported model loading:

import overmind.api
overmind.api.monkey_patch_all()

# Your existing code works unchanged!
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
)
pipeline.to('cuda')
# First run: ~24s
# Subsequent runs: ~1s (mainly consumed by .to('cuda'))

Option 2: Explicit API

For the ones don't like monkey-patching, use the load function directly:

from overmind.api import load
from diffusers import DiffusionPipeline

pipeline = load(
    DiffusionPipeline.from_pretrained,
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
)

Supported Libraries

Overmind automatically patches these loading functions:

Library	Functions
Diffusers	`DiffusionPipeline.from_pretrained`, `ModelMixin.from_pretrained`, `SchedulerMixin.from_pretrained`, `FromSingleFileMixin.from_single_file`
Transformers	`PreTrainedModel.from_pretrained`, `PreTrainedTokenizerBase.from_pretrained`, `AutoProcessor.from_pretrained`, `pipeline`
PyTorch	`torch.load`, `torch.jit.load`
Safetensors	`safetensors.torch.load_file`
TorchVision	`vgg16`, `vgg19`
OpenCLIP	`create_model_and_transforms`

Custom Patch Points

Create an overmind.cfg file in your package root to add custom patch points:

# overmind.cfg
mylib.models::MyModel.from_pretrained
mylib.utils::load_checkpoint

CLI Commands

# Start the server manually (usually auto-started)
overmind-server

# Start as daemon
overmind-server --daemon

# List cached models
overmind-list

# Shutdown the server (clears cache)
overmind-shutdown

Environment Variables

Variable	Description
`OVERMIND_DISABLE`	Set to any value to disable Overmind, falling back to a local cache
`OVERMIND_NO_LOCAL_CACHE`	Disable local caching too

Benchmarks

Loading a Stable Diffusion ControlNet pipeline with VAE, on Linux + Intel i9-11900K + RTX 4090:

Using demo-vae.py as example:

Run	`vae`	`depth`	`edge`	`pipeline`	to('cuda')	Total
w/o Overmind (2nd+)	1.18s	0.98s	1.41s	1.65s	0.91s	6.16s
w/ Overmind (1st)	5.44s	5.17s	5.41s	7.29s	0.86s	24.20s
w/ Overmind (2nd+)	0.00s	0.01s	0.01s	0.20s	0.87s	1.12s

The first load with Overmind is slower due to pickling overhead. Subsequent loads are 5-6x faster than without Overmind, with the only remaining cost being the to('cuda') transfer.

License

Apache 2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Developed by Taichi Graphics for production AI inference workloads.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
src/overmind		src/overmind
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
blog.md		blog.md
demo-vae.py		demo-vae.py
demo.py		demo.py
demo2.py		demo2.py
demo4.py		demo4.py
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overmind

Features

Installation

Quick Start

Option 1: Monkey Patching (Recommended)

Option 2: Explicit API

Supported Libraries

Custom Patch Points

CLI Commands

Environment Variables

Benchmarks

License

Contributing

Acknowledgments

About

Uh oh!

Releases 20

Packages

Contributors 3

Uh oh!

Languages

License

taichi-dev/overmind

Folders and files

Latest commit

History

Repository files navigation

Overmind

Features

Installation

Quick Start

Option 1: Monkey Patching (Recommended)

Option 2: Explicit API

Supported Libraries

Custom Patch Points

CLI Commands

Environment Variables

Benchmarks

License

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 3

Uh oh!

Languages

Packages