Skip to content

coder/balatrobench

BalatroBench

GitHub release Discord

balatrobench

Benchmark LLMs playing Balatro


BalatroBench is a benchmark analysis tool and leaderboard for BalatroLLM runs. It processes game data and generates interactive leaderboards comparing LLM models and strategies playing Balatro.

Note

You can download all the data for runs and benchmarks from the kaggle.

πŸš€ Related Projects

πŸ“š Documentation

Important

This is the documentation for analyzing runs artifacts produced by BalatroLLM. This project parses the data and displays it as a website.

Requirements

  • uv - Python package manager (installation steps below)
  • Node.js - JavaScript runtime (includes npm) required just for Playwright tests

Installation

Follow these steps to set up BalatroBench:

  1. Install uv

    Install the uv Python package manager:

    # macOS/Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Windows
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

    See the uv installation docs for more options.

  2. Clone the repository

    git clone https://github.com/coder/balatrobench.git
    cd balatrobench
  3. Configure environment variables

    Copy the example environment file and fill in your values:

    cp .envrc.example .envrc

    Edit .envrc and set the following variables (required for uploading benchmarks to CDN):

    • BUNNY_BASE_URL - BunnyCDN base URL
    • BUNNY_STORAGE_ZONE - Storage zone name
    • BUNNY_API_KEY - API key for authentication
  4. Install dependencies

    make install

    This runs uv sync for Python packages and npm install for Playwright tests.

  5. Activate the environment

    source .envrc

    Alternatively, use direnv to automatically load the environment:

    # Install direnv, then allow the directory
    direnv allow
  6. Install browser binaries (first time only)

    npx playwright install chromium

Generating Benchmarks

Generate benchmark data from BalatroLLM runs:

# Analyze runs from a specific directory
balatrobench --input-dir /path/to/runs/v1.0.0

# Custom output directory
balatrobench --input-dir /path/to/runs/v1.0.0 --output-dir /path/to/output

# Enable WebP conversion for screenshots
balatrobench --input-dir /path/to/runs/v1.0.0 --webp

Starting the Website

Serve the site locally:

make serve

This will start a local server at http://localhost:8000 and automatically open it in your browser.

The environment is automatically detected (localhost = development, otherwise = production). To override, use the query parameter: ?env=development or ?env=production.

Running Tests

End-to-end tests use Playwright and balatrobench tests:

make test

Note

Although playwright.config.js includes webServer configuration, the server may not auto-start reliably. If tests fail to connect, manually start the server first:

make serve  # In a separate terminal
make test   # Run tests

About

Benchmark LLMs' strategic performance in Balatro πŸ“Š

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks