Skip to content

fix(server): expose loaded model metadata#287

Open
derogab wants to merge 2 commits into
antirez:mainfrom
derogab:fix/model-name
Open

fix(server): expose loaded model metadata#287
derogab wants to merge 2 commits into
antirez:mainfrom
derogab:fix/model-name

Conversation

@derogab
Copy link
Copy Markdown

@derogab derogab commented May 29, 2026

Summary

Fixes model metadata reporting for /v1/models and /v1/models/<id>.

The server loads exactly one GGUF at startup, but previously model discovery advertised both deepseek-v4-flash and deepseek-v4-pro. This could mislead clients into thinking both variants were available at the same time.

This PR makes model metadata reflect the loaded GGUF:

  • Flash GGUF exposes only deepseek-v4-flash
  • Pro GGUF exposes only deepseek-v4-pro
  • /v1/models/<id> returns 404 for the non-loaded model id

The request model field still does not select a different GGUF; inference uses the single loaded model.

Commits

  • 77c27a5 fix(server): report pro model name in models endpoint
  • 059d6cf fix(server): expose only loaded model metadata

Testing

Tested on Mac CPU build, 36 GB:

./ds4_test --server

  Result:

  server:
  server: OK
  ds4 tests: ok

Tested on NVIDIA DGX Spark / GB10, 128 GB:

make ds4_test
./ds4_test --server

  Result:

  server:
  server: OK
  ds4 tests: ok

Live CUDA server test on NVIDIA DGX Spark / GB10, 128 GB, with Flash GGUF:

./ds4-server --cuda --ctx 2048 --tokens 256 --host 127.0.0.1 --port 8000
curl -s http://127.0.0.1:8000/v1/models | jq -r '.data[] | "\(.id) => \(.name)"'

  Result:
  deepseek-v4-flash => DeepSeek V4 Flash

Loaded model endpoint:
curl -s http://127.0.0.1:8000/v1/models/deepseek-v4-flash | jq -r '.id + " => " + .name'

  Result:

  deepseek-v4-flash => DeepSeek V4 Flash

Non-loaded model endpoint:
curl -s -w '\nHTTP %{http_code}\n' http://127.0.0.1:8000/v1/models/deepseek-v4-pro

  Result:
  
  {"error":{"message":"unknown model","type":"invalid_request_error"}}
  
  HTTP 404

derogab added 2 commits May 29, 2026 17:04
Model discovery should not advertise Flash and Pro as simultaneously
available because requests always run against the single loaded GGUF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant