BalatroBench

Benchmark LLMs playing Balatro

BalatroBench is a benchmark analysis tool and leaderboard for BalatroLLM runs. It processes game data and generates interactive leaderboards comparing LLM models and strategies playing Balatro.

Note

You can download all the data for runs and benchmarks from the kaggle.

🚀 Related Projects

BalatroBot: API for developing Balatro bots
BalatroLLM: Play Balatro with LLMs
BalatroBench: Benchmark LLMs playing Balatro

📚 Documentation

Important

This is the documentation for analyzing runs artifacts produced by BalatroLLM. This project parses the data and displays it as a website.

Requirements

uv - Python package manager (installation steps below)
Node.js - JavaScript runtime (includes npm) required just for Playwright tests

Installation

Follow these steps to set up BalatroBench:

Install uv

Install the uv Python package manager:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

See the uv installation docs for more options.

Clone the repository

git clone https://github.com/coder/balatrobench.git
cd balatrobench

Configure environment variables

Copy the example environment file and fill in your values:
```
cp .envrc.example .envrc
```
Edit .envrc and set the following variables (required for uploading benchmarks to CDN):
- BUNNY_BASE_URL - BunnyCDN base URL
- BUNNY_STORAGE_ZONE - Storage zone name
- BUNNY_API_KEY - API key for authentication
Install dependencies
```
make install
```
This runs uv sync for Python packages and npm install for Playwright tests.
Activate the environment
```
source .envrc
```
Alternatively, use direnv to automatically load the environment:
```
# Install direnv, then allow the directory
direnv allow
```
Install browser binaries (first time only)
```
npx playwright install chromium
```

Generating Benchmarks

Generate benchmark data from BalatroLLM runs:

# Analyze runs from a specific directory
balatrobench --input-dir /path/to/runs/v1.0.0

# Custom output directory
balatrobench --input-dir /path/to/runs/v1.0.0 --output-dir /path/to/output

# Enable WebP conversion for screenshots
balatrobench --input-dir /path/to/runs/v1.0.0 --webp

Starting the Website

Serve the site locally:

make serve

This will start a local server at http://localhost:8000 and automatically open it in your browser.

The environment is automatically detected (localhost = development, otherwise = production). To override, use the query parameter: ?env=development or ?env=production.

Running Tests

End-to-end tests use Playwright and balatrobench tests:

make test

Note

Although playwright.config.js includes webServer configuration, the server may not auto-start reliably. If tests fail to connect, manually start the server first:

make serve  # In a separate terminal
make test   # Run tests

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.claude		.claude
.github		.github
.mux		.mux
site		site
src/balatrobench		src/balatrobench
tests		tests
.editorconfig		.editorconfig
.envrc.example		.envrc.example
.gitignore		.gitignore
.mdformat.toml		.mdformat.toml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
pyproject.toml		pyproject.toml
upload.py		upload.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BalatroBench

🚀 Related Projects

📚 Documentation

Requirements

Installation

Generating Benchmarks

Starting the Website

Running Tests

About

Uh oh!

Releases 5

Uh oh!

Contributors 2

Languages

License

coder/balatrobench

Folders and files

Latest commit

History

Repository files navigation

BalatroBench

🚀 Related Projects

📚 Documentation

Requirements

Installation

Generating Benchmarks

Starting the Website

Running Tests

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors 2

Languages