Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ IMPLEMENTATION-PLAN.md
DISCOVERY-SUMMARY.md
IMPLEMENTATION-ROADMAP.md
RESUMPTION-PROMPT.md
CLAUDE.md.bak
87 changes: 42 additions & 45 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,21 @@
# Model Colosseum

## Project Overview
A local-first Tauri 2.0 desktop app for evaluating Ollama models across three modes: Arena (model vs model debates with Elo ratings), Benchmark (custom test suites with manual + auto-judge scoring), and Sparring Ring (structured human vs AI debates with scorecards). All modes feed a unified leaderboard backed by SQLite. macOS-only, dark theme, arena/colosseum aesthetic.
Local-first Tauri 2.x desktop app for evaluating Ollama models: Arena (model vs model debates, Elo ratings), Benchmark (custom test suites, TTFT/TPS metrics, manual + auto-judge scoring), and Sparring Ring (human vs AI debates, scorecards). All modes feed a unified leaderboard backed by SQLite. macOS-only, dark theme, arena/colosseum aesthetic.

## Stack

## Tech Stack
- Runtime: Tauri 2.x (Rust backend + webview frontend)
- Frontend: React 19 + TypeScript 5.x strict mode
- Build: Vite 6.x with `@tauri-apps/vite-plugin`
- Styling: Tailwind CSS 4.x (dark theme, gold/amber accents)
- State: Zustand 5.x
- Routing: React Router 7.x
- Charts: Recharts 2.x
- State: Zustand 5.x; Routing: React Router 7.x; Charts: Recharts 2.x
- Database: SQLite via `rusqlite` 0.31+ (bundled, WAL mode)
- HTTP: `reqwest` 0.12+ (async streaming)
- Async: `tokio` 1.x
- System info: `sysinfo` 0.31+
- LLM: Ollama REST API (localhost:11434)
- HTTP: `reqwest` 0.12+; Async: `tokio` 1.x; System info: `sysinfo` 0.31+
- LLM: Ollama REST API (`localhost:11434`)

## Architecture
React frontend communicates with Rust backend via Tauri IPC (`invoke` for commands, `listen` for streaming events). Rust backend owns all Ollama communication, SQLite access, and Elo calculations. Frontend is purely presentational + state management.

React frontend → Tauri IPC (`invoke` / `listen`) → Rust backend. Rust owns all Ollama communication, SQLite access, and Elo calculations. Frontend is purely presentational + state management.

Key modules:
- `src-tauri/src/db.rs` — SQLite connection, migrations, schema (13 tables), seed data
Expand All @@ -29,46 +26,46 @@ Key modules:
- `src-tauri/src/elo.rs` — Elo rating calculations (67 tests)
- `src-tauri/src/prompts.rs` — System prompt templates (arena, formal, socratic, sparring, scorecard judge)

## Development Conventions
- TypeScript strict mode. No `any` types.
- React: Functional components with hooks only. No class components.
- Rust: `clippy` clean. `cargo fmt` on save.
- File naming: `snake_case.rs` for Rust, `PascalCase.tsx` for React components, `camelCase.ts` for utilities
- Git commits: conventional commits (`feat:`, `fix:`, `refactor:`, `chore:`)
- All Tauri commands return `Result<T, String>` — handle errors in Rust, display in frontend
- Database writes wrapped in explicit transactions
- No unwrap() in production Rust code — use ? operator or proper error handling
## Build / Test / Run

## Current Phase
**v1.0.0 — Feature Complete** (all phases done, audit remediation applied)
```bash
pnpm install # install deps
pnpm tauri dev # dev server (hot reload)
pnpm tauri build # production build
pnpm test # runs: cd src-tauri && cargo test
cargo clippy -- -D warnings # lint (must pass clean)
cargo fmt # format on save
Comment on lines +36 to +37
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point Rust tooling at src-tauri manifest

In the documented repo-root workflow, these Rust commands fail because this repository has no root Cargo.toml; the only manifest is src-tauri/Cargo.toml, and running cargo clippy -- -D warnings from /workspace/ModelColosseum exits with could not find Cargo.toml. Cargo's own help lists --manifest-path <PATH> for clippy, so this block should either cd src-tauri or pass --manifest-path src-tauri/Cargo.toml for the Rust lint/format steps, otherwise agents following CLAUDE.md cannot run the stated clean gate.

Useful? React with 👍 / 👎.

```

- [x] **Phase 0: Foundation** — Tauri 2.0 scaffold, SQLite (13 tables, WAL), Ollama REST client, Elo module
- [x] **Phase 1: Arena Mode** — Debate engine (freestyle/formal/socratic), vote + Elo, leaderboard, history
- [x] **Phase 2: Benchmark** — CRUD suites/prompts, runner with TTFT/TPS metrics, manual + auto-judge scoring, blind comparison, hardware metrics, import/export
- [x] **Phase 3: Sparring Ring** — Human vs AI debates, 3 difficulty levels, 4-phase structure, scorecards, user Elo
- [x] **Phase 4: Polish** — 3 debate formats, topic suggestions, settings page, blind test, animations, skeleton loading, export (Markdown/CSV/JSON)
- [x] **Audit** — Security hardening (configurable Ollama URL, query limit caps, settings key whitelist), accessibility (ARIA attributes), error handling, 67 Rust tests
## Conventions

- TypeScript strict mode; type with `unknown` + narrowing, never `any`
- React functional components with hooks only; no class components
- Rust: `clippy` clean, `cargo fmt` on save; use `?` or proper error handling — no `unwrap()` in production code
- File naming: `snake_case.rs`, `PascalCase.tsx`, `camelCase.ts`
- Tauri commands return `Result<T, String>` — handle errors in Rust, surface to frontend
- Database writes in explicit transactions
- Data directory: `~/.model-colosseum/` — the only storage location (`colosseum.db` lives here)

## Gotchas

- Use Tauri v2 APIs only — import paths are `@tauri-apps/api` v2; v1 APIs are incompatible
- Use `rusqlite` directly, not `tauri-plugin-sql` — needed for WAL mode, migrations, concurrent access
- Always health-check Ollama before calling it; handle absence gracefully (`localhost:11434`)
- Network calls to localhost Ollama only — no telemetry, no cloud, no external endpoints
- Concurrent streaming: runs concurrent with auto sequential fallback when models > 40B combined (prevents OOM)
- Ollama streaming: NDJSON line-by-line parsing, not SSE

## Key Decisions

## Key Decisions Made
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Concurrent streaming | Concurrent with auto sequential fallback when models > 40B combined | User wants dramatic visual. Fallback prevents OOM. |
| Database access | rusqlite directly, not tauri-plugin-sql | More control over WAL mode, migrations, concurrent access |
| Elo parameters | Start 1500, K=40→32→24 based on game count | Standard chess Elo with decay to stabilize ratings |
| Concurrent streaming | Concurrent + auto sequential fallback (>40B combined) | Dramatic visual; fallback prevents OOM |
| Database access | `rusqlite` directly, not `tauri-plugin-sql` | WAL mode, migrations, concurrent access |
| Elo parameters | Start 1500, K=40→32→24 by game count | Standard chess Elo with decay to stabilize |
| Benchmark scoring | 1-5 manual, 1-10 auto-judge normalized | Fast manual scoring, more granular auto-judge |
| App modes | Arena → Benchmark → Sparring (build order) | Arena builds all shared infra, others plug in |
| DB location | ~/.model-colosseum/colosseum.db | Standard macOS app data location |
| Ollama streaming | NDJSON line-by-line parsing, not SSE | That's what Ollama returns |

## Do NOT
- Do not scaffold the entire project in one session — follow the phased plan strictly
- Do not use Tauri v1 APIs or import paths — this is Tauri 2.x (`@tauri-apps/api` v2)
- Do not use `tauri-plugin-sql` — we use `rusqlite` directly
- Do not use `unwrap()` in Rust production code — use `?` or proper error handling
- Do not make any network calls except to localhost Ollama (no telemetry, no cloud)
- Do not use class components in React — hooks only
- Do not store any data outside `~/.model-colosseum/` — single source of truth
- Do not assume Ollama is running — always health check first and handle absence gracefully
| DB location | `~/.model-colosseum/colosseum.db` | Standard macOS app data location |
| Ollama streaming | NDJSON line-by-line, not SSE | That's what Ollama returns |

<!-- portfolio-context:start -->
# Portfolio Context
Expand Down
Loading