This project uses OpenSpec. All meaningful changes start from specs and tracked changes under openspec/.
- Read the relevant capability spec in
openspec/specs/. - Check active changes in
openspec/changes/. - If your work changes behavior or scope, create or update a change proposal before editing code.
- Keep the branch focused and short-lived.
- Run validation and a review pass before merge.
For AI-specific guidance, see AGENTS.md.
git clone https://github.com/LessUp/llm-speed.git
cd llm-speed
python3 -m venv .venv
. .venv/bin/activate
pip install -U pip setuptools wheel
pip install -r requirements.txt pytest hypothesis ruff pre-commitTo build the CUDA extension locally:
pip install -e .ruff check cuda_llm_ops/ tests/ benchmarks/
pytest tests/ -v -m "not cuda"
pre-commit run --all-filesGPU-specific validation can be run separately with:
pytest tests/ -v -m cuda/opsx:propose <change-name>
/opsx:apply <change-name>
/opsx:archive <change-name>- Keep one branch per change or cleanup slice.
- Merge promptly after a coherent slice; avoid stale local/cloud branch drift.
- Use
/reviewor an equivalent review pass before merge. - Prefer concise, project-specific docs and config over generic process sprawl.
Use conventional commits such as:
fix(ci): simplify cpu-safe validation
docs(readme): align setup with OpenSpec workflow
- C/CUDA:
clangd+cmake --preset default - Python:
pyright/basedpyright+pyrightconfig.json - Hooks:
pre-commitwith Ruff, clang-format, and file hygiene - MCP: optional; prefer
gh, OpenSpec commands, and targeted subagents unless a heavier integration clearly pays off
The repository is being normalized for final stabilization. bf16-support and flashattention-backward remain deferred backlog and are not part of the active closeout path.