Skip to content

0xzuky/code-llm-completion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

code-llm-completion

Shipping AI dev tools for the next billion code editors.

code-llm-completion is a focused training & serving stack for fine-tuning open code LLMs (StarCoder2-7B, DeepSeek-Coder-6.7B) into fast, IDE-grade completion models — with first-class support for low-resource languages that the big copilots ignore (Indonesian PHP/Laravel codebases, Lua game scripts, in-house DSLs, etc).

We care about three things, in this order:

  1. Latency. A completion that arrives in 800 ms is worth more than a perfect one in 4 s.
  2. Acceptance rate. Suggestions you actually press Tab on.
  3. Deployability. One make serve and you have an OpenAI-compatible endpoint your editor can talk to.

Why this exists

The developer tools market is on a tear (~$30B and climbing), but ~95% of the oxygen is sucked up by hosted, closed copilots. There is a real, paying segment that needs:

  • on-prem inference (regulated industries, gov, enterprise IP)
  • finetunes on private monorepos
  • support for languages outside the JS/Python/Java core

This repo is the training half of that thesis.

Quickstart

git clone https://github.com/0xzuky/code-llm-completion
cd code-llm-completion
pip install -r requirements.txt
make train CONFIG=configs/starcoder2_7b_lora.yaml
make eval  MODEL=runs/starcoder2-7b-id-php
make serve MODEL=runs/starcoder2-7b-id-php PORT=8080

All entrypoints live in bin/ and are plain argparse CLIs — no hidden frameworks, no YAML-only config soup. Read the --help, you'll understand it in 30 seconds.

Benchmarks (preliminary, internal eval set)

Model HumanEval pass@1 id-PHP pass@1 p50 latency (A100/MI250X)
CodeLlama-7B-Instruct 34.8 11.2 1.4s / 1.6s
DeepSeek-Coder-6.7B-base 49.4 19.0 1.1s / 1.2s
ours (DS-Coder + LoRA, id) 47.9 38.6 0.9s / 0.95s

The point isn't to beat HumanEval. The point is the second column.

Stack

  • HuggingFace transformers + Trainer
  • DeepSpeed ZeRO-3 (CPU offload for the 7B + adapters fit on 1× MI250X)
  • peft (LoRA, rank 16, alpha 32) and bitsandbytes 4-bit base
  • vLLM for serving, with a thin OpenAI-compatible shim for editor plugins

Why AMD ROCm

Hosted-copilot economics work because NVIDIA H100s are cheap for them. For everyone else trying to ship a competing dev tool, MI250X / MI300X under ROCm is the only path to a per-token cost that survives contact with real usage. DeepSpeed ZeRO-3, bitsandbytes (HIP fork), and vLLM all run on ROCm today; this repo's training scripts target that path natively. The AMD Developer Cloud credits unblock the multi-week 7B fine-tunes we currently queue on borrowed compute.

Repo layout

bin/                    # executable CLIs
  train_codellm         # full training loop
  eval_codellm          # HumanEval + custom evals
  serve_codellm         # vLLM OpenAI-compatible server
  build_dataset         # data pipeline driver
src/codellm/            # importable library
data_pipeline/          # dedup, license filter, FIM packer
benchmarks/             # eval harnesses and result dumps
configs/                # training configs (yaml)
Makefile                # the only thing you need to memorize

License

Apache-2.0. Use it, fork it, ship a product. Just don't sue us.

Releases

No releases published

Packages

 
 
 

Contributors