Documentation

Guides for getting started, running evaluations, and understanding the results.

New here? Start with the QUICKSTART at the repo root — it explains what this tool does and walks you through a 30-second offline demo. Once you've seen the demo, the guides below take you deeper.

All guides

Guide	Audience	Description
how-to-run-live-eval.md	All users	End-to-end: credentials → config → run → results
how-to-custom-dataset.md	All users	Bring your own prompts (JSONL, CSV, or SQL)
how-to-interpret-results.md	All users	Every chart and metric explained in plain language
how-to-resume-and-scale.md	Intermediate	Checkpoint/resume and 1,000-prompt scaling
how-to-compare-runs.md	Intermediate	Diff two evaluation runs to track improvements
how-to-foundry-eval-sdk.md	Intermediate	Cloud-based grading via Microsoft Foundry
foundry-cost-latency-design.md	Advanced	Why cost/latency use `python` graders in Foundry
methodology.md	Advanced	Scoring rubrics, bias mitigation, cost formula
architecture.md	Advanced	Component diagram and data flow
faq.md	All users	Troubleshooting and common questions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation

Suggested reading order

All guides

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Documentation

Suggested reading order

All guides