Shipping AI dev tools for the next billion code editors.
code-llm-completion is a focused training & serving stack for fine-tuning open
code LLMs (StarCoder2-7B, DeepSeek-Coder-6.7B) into fast, IDE-grade
completion models — with first-class support for low-resource languages that
the big copilots ignore (Indonesian PHP/Laravel codebases, Lua game scripts,
in-house DSLs, etc).
We care about three things, in this order:
- Latency. A completion that arrives in 800 ms is worth more than a perfect one in 4 s.
- Acceptance rate. Suggestions you actually press Tab on.
- Deployability. One
make serveand you have an OpenAI-compatible endpoint your editor can talk to.
The developer tools market is on a tear (~$30B and climbing), but ~95% of the oxygen is sucked up by hosted, closed copilots. There is a real, paying segment that needs:
- on-prem inference (regulated industries, gov, enterprise IP)
- finetunes on private monorepos
- support for languages outside the JS/Python/Java core
This repo is the training half of that thesis.
git clone https://github.com/0xzuky/code-llm-completion
cd code-llm-completion
pip install -r requirements.txt
make train CONFIG=configs/starcoder2_7b_lora.yaml
make eval MODEL=runs/starcoder2-7b-id-php
make serve MODEL=runs/starcoder2-7b-id-php PORT=8080All entrypoints live in bin/ and are plain argparse CLIs — no hidden
frameworks, no YAML-only config soup. Read the --help, you'll understand it
in 30 seconds.
| Model | HumanEval pass@1 | id-PHP pass@1 | p50 latency (A100/MI250X) |
|---|---|---|---|
| CodeLlama-7B-Instruct | 34.8 | 11.2 | 1.4s / 1.6s |
| DeepSeek-Coder-6.7B-base | 49.4 | 19.0 | 1.1s / 1.2s |
| ours (DS-Coder + LoRA, id) | 47.9 | 38.6 | 0.9s / 0.95s |
The point isn't to beat HumanEval. The point is the second column.
- HuggingFace
transformers+Trainer - DeepSpeed ZeRO-3 (CPU offload for the 7B + adapters fit on 1× MI250X)
peft(LoRA, rank 16, alpha 32) andbitsandbytes4-bit base- vLLM for serving, with a thin OpenAI-compatible shim for editor plugins
Hosted-copilot economics work because NVIDIA H100s are cheap for them. For everyone else trying to ship a competing dev tool, MI250X / MI300X under ROCm is the only path to a per-token cost that survives contact with real usage. DeepSpeed ZeRO-3, bitsandbytes (HIP fork), and vLLM all run on ROCm today; this repo's training scripts target that path natively. The AMD Developer Cloud credits unblock the multi-week 7B fine-tunes we currently queue on borrowed compute.
bin/ # executable CLIs
train_codellm # full training loop
eval_codellm # HumanEval + custom evals
serve_codellm # vLLM OpenAI-compatible server
build_dataset # data pipeline driver
src/codellm/ # importable library
data_pipeline/ # dedup, license filter, FIM packer
benchmarks/ # eval harnesses and result dumps
configs/ # training configs (yaml)
Makefile # the only thing you need to memorize
Apache-2.0. Use it, fork it, ship a product. Just don't sue us.