Feature request: document/support llama-server HTTP endpoint for OpenAI-compatible serving

## Background

`setup_env.py` already builds `llama-server` as part of the cmake step (it lives at `build/bin/llama-server` after a successful build). This binary provides a fully OpenAI-compatible HTTP API (`/v1/chat/completions`, `/v1/completions`, `/v1/models`) — the same interface as llama.cpp's server.

The README currently only documents `run_inference.py` for inference. The server binary is silently present but undiscovered by most users.

## What this unlocks

- Drop-in replacement for OpenAI API in downstream tools (LangChain, Open WebUI, custom apps) without code changes
- Persistent model loading (no 2-3s cold-start per request)
- Integration with job queues or proxy layers that speak OpenAI protocol

## Minimal usage (after build)

```bash
./build/bin/llama-server     --model models/BitNet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf     --host 127.0.0.1     --port 8080     --parallel 1     --ctx-size 4096

# Then:
curl http://127.0.0.1:8080/v1/chat/completions   -d '{"model":"bitnet","messages":[{"role":"user","content":"Hello"}]}'
```

## Question

Would the team be interested in a PR that:
1. Documents this capability in the README (a single section — no code changes)
2. Optionally adds a minimal Python wrapper script (consistent with the repo's Python-first style) to make the invocation discoverable

Happy to contribute either or both if there's interest. Flagging as a question first rather than opening a cold PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: document/support llama-server HTTP endpoint for OpenAI-compatible serving #432

Background

What this unlocks

Minimal usage (after build)

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: document/support llama-server HTTP endpoint for OpenAI-compatible serving #432

Description

Background

What this unlocks

Minimal usage (after build)

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions