fix(server): expose loaded model metadata by derogab · Pull Request #287 · antirez/ds4

derogab · 2026-05-29T16:23:42Z

Summary

Fixes model metadata reporting for /v1/models and /v1/models/<id>.

The server loads exactly one GGUF at startup, but previously model discovery advertised both deepseek-v4-flash and deepseek-v4-pro. This could mislead clients into thinking both variants were available at the same time.

This PR makes model metadata reflect the loaded GGUF:

Flash GGUF exposes only deepseek-v4-flash
Pro GGUF exposes only deepseek-v4-pro
/v1/models/<id> returns 404 for the non-loaded model id

The request model field still does not select a different GGUF; inference uses the single loaded model.

Commits

77c27a5 fix(server): report pro model name in models endpoint
059d6cf fix(server): expose only loaded model metadata

Testing

Tested on Mac CPU build, 36 GB:

./ds4_test --server

  Result:

  server:
  server: OK
  ds4 tests: ok

Tested on NVIDIA DGX Spark / GB10, 128 GB:

make ds4_test
./ds4_test --server

  Result:

  server:
  server: OK
  ds4 tests: ok

Live CUDA server test on NVIDIA DGX Spark / GB10, 128 GB, with Flash GGUF:

./ds4-server --cuda --ctx 2048 --tokens 256 --host 127.0.0.1 --port 8000
curl -s http://127.0.0.1:8000/v1/models | jq -r '.data[] | "\(.id) => \(.name)"'

  Result:
  deepseek-v4-flash => DeepSeek V4 Flash

Loaded model endpoint:
curl -s http://127.0.0.1:8000/v1/models/deepseek-v4-flash | jq -r '.id + " => " + .name'

  Result:

  deepseek-v4-flash => DeepSeek V4 Flash

Non-loaded model endpoint:
curl -s -w '\nHTTP %{http_code}\n' http://127.0.0.1:8000/v1/models/deepseek-v4-pro

  Result:
  
  {"error":{"message":"unknown model","type":"invalid_request_error"}}
  
  HTTP 404

Model discovery should not advertise Flash and Pro as simultaneously available because requests always run against the single loaded GGUF.

derogab added 2 commits May 29, 2026 17:04

fix(server): report pro model name in models endpoint

77c27a5

fix(server): expose only loaded model metadata

059d6cf

Model discovery should not advertise Flash and Pro as simultaneously available because requests always run against the single loaded GGUF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): expose loaded model metadata#287

fix(server): expose loaded model metadata#287
derogab wants to merge 2 commits into
antirez:mainfrom
derogab:fix/model-name

derogab commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

derogab commented May 29, 2026

Summary

Commits

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant