chore(model): remove inert --name flag from obol model setup custom#509
chore(model): remove inert --name flag from obol model setup custom#509bussyjd wants to merge 5 commits into
obol model setup custom#509Conversation
The `--name` flag on `obol model setup custom` was documented as
informational only and never participated in any routing or persistence:
- ModelEntry has only `model_name` (route key) + `litellm_params`; the
CLI `--name` value was never written to either.
- `detectProvider` (used by `obol model list/status`) inspects
`entry.ModelName` + `entry.LiteLLMParams.Model` prefixes; the `--name`
string never reached it.
- It was only echoed back in two log lines and passed as a UI label to
`RestartLiteLLM` on the hot-add fallback path.
This caused confusion in QA: an operator running
obol model setup custom --name foo --model my/model ...
would later call the route as `foo` and get LiteLLM's
BadRequestError: ... There are no healthy deployments for this model.
(The same error message the operator at #v0.10.0-rc1-upgrade-report
attributed to a cache-survives-stack-up bug. Five fresh-cluster probes
on rc3 could not reproduce the cache bug — the consistent reproducer
turned out to be calling the route by the user-given `--name` rather
than the actual registered `--model` value.)
Changes:
- cmd/obol/model.go: drop --name flag from modelSetupCustomCommand
- internal/model/model.go: drop name parameter from AddCustomEndpoint;
fallback RestartLiteLLM label now uses modelName
- flows/lib.sh: route_llm_via_obol_cli no longer reads OBOL_LLM_NAME or
passes --name
- flows/buy-external.sh: OBOL_LLM_NAME env var removed (orphan)
- CLAUDE.md / monetize-guide SKILL.md / llm-routing.md: example commands
and env-var lists drop --name / OBOL_LLM_NAME
|
Update: added the frontend rc2 pin on top of this branch.
|
|
Update: rc2 exposed a cache-control issue in the admin shell ( I fixed that in frontend PR #338, merged it, and cut
|
|
Updated this PR from frontend v0.1.25-rc3 to v0.1.25-rc4. Why rc4: rc3 was cut before frontend release aggregation PR #337 landed on main, so it had the admin HTML no-store fix but missed the consolidated marketplace/drain/CSRF/RBAC bundle. rc4 is tagged from current frontend main at 9a56791 and includes #337 plus the no-store header fix. Frontend release: https://github.com/ObolNetwork/obol-stack-front-end/releases/tag/v0.1.25-rc4 Local checks on this branch:
|
Summary
What changed:
--nameflag fromobol model setup custom.nameparameter frommodel.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey)signature.OBOL_LLM_NAMEenv var fromflows/lib.sh::route_llm_via_obol_cliandflows/buy-external.sh.CLAUDE.md,internal/embed/skills/monetize-guide/SKILL.md, and.agents/skills/obol-stack-dev/references/llm-routing.mdexample commands and env-var lists.Why it matters:
The
--nameflag was 100% inert — it never reachedModelEntry, never persisted to the LiteLLM ConfigMap, and never influencedobol model list/prefer/status/sync/removeor any routing. The string was echoed in two log lines and passed as a UI label toRestartLiteLLMon the fallback path. Nothing else. Its help text said "informational only — LiteLLM keys the route by --model, not --name".The trap: operators run
obol model setup custom --name foo --model my/model …, then call the route asfoo, and get:This was the same error string in the v0.10.0-rc1 upgrade report attributed to a "cache survives
obol stack up" bug. After five fresh-cluster probes on rc3 could not reproduce the cache bug, the reliable reproducer turned out to be calling the route by the user-given--namerather than the--modelvalue LiteLLM actually keys on. Removing the flag eliminates that UX trap.Risk level: low
Commit under test: b9ff172 (this PR), parent f8df92e (tag v0.10.0-rc3)
Base branch: main
Scope
Validation
CI checks:
go test ./cmd/obol/ ./internal/model/ -count=1bash -n flows/*.shobol model setup customend-to-end)Unit tests:
Integration tests:
Flow tests:
.tmp/release-smoke-20260521-134647/flow-02-stack-init-up.log.tmp/release-smoke-20260521-134647/flow-05-network.log.tmp/release-smoke-20260521-134647/flow-07-sell-verify.log.tmp/release-smoke-20260521-134647/flow-08-buy.log.tmp/release-smoke-20260521-134647/flow-09-lifecycle.log.tmp/release-smoke-20260521-134647/flow-10-anvil-facilitator.log/v1/modelsreturns 401 to unauthenticated probeendpoint validation failed: ... context deadline exceeded— Unsloth Studio 27B cold-start exceedsValidateCustomEndpoint's 60s timeout. CLI parsed args correctly — no--nameregression.:11434; QA host runs Unslothobol model setup customvalidation. Preflight (Alice ETH, Bob USDC, facilitator) all PASS.Release smoke:
Failure attribution: Zero failures involve
--nameparsing.grep "unknown flag\|flag provided but not defined" .tmp/release-smoke-*/*.logreturns nothing. Everyobol model setup custominvocation parsed arguments and reached endpoint validation correctly. All 5 failures are upstream of the CLI:/v1/modelsrequires bearer token; flow-01's simple unauthenticated probe gets 401.ValidateCustomEndpoint's 60s timeout. Surfaces in flow-03 and flow-11.flow-06-sell-setup.shpreflight checkslocalhost:11434; QA host runs Unsloth, not Ollama.A vLLM/llama.cpp QA host without auth would not hit any of these. Six flows pass cleanly including all on-chain payment flows (flow-08 buy, flow-09 lifecycle).
Live Chain Evidence
Do not include private keys, seed phrases, passwords, hostnames, personal paths, or raw bearer tokens.
Network: base-sepolia (flow-08, flow-11)
RPC/provider: default free-tier fallback (no paid RPC set this run)
Facilitator:
https://x402.gcp.obol.tech(reachable, supports Base Sepolia exact)Contracts and tokens:
0x036CbD53842c5426634e7929541eC2318f3dCF7eWallet roles:
0xC0De030F6C37f490594F93fB99e2756703c4297E0x57b0eF875DeB5A37301F1640E469a2129Da9490EBalances:
0x036CbD…CF7eTransaction receipts: none on-chain this run (PR is CLI-only, no settlement path touched).
Runtime Evidence
QA environment:
Images:
nousresearch/hermes-agentghcr.io/berriai/litellm:main-stableghcr.io/obolnetwork/<name>:latest(locally built, OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES=true)Kubernetes / stack:
smart-dinosaur(validation phase),wondrous-crane(release-smoke) — both torn downcleanup_stackstrap removed test workspaces on exitModel and routing:
kubectl get secret litellm-secrets -n llm; Unsloth JWT from POST/api/auth/loginArtifacts and logs:
.tmp/release-smoke-20260521-134647/RELEASE_REPORT.mdobol-0:qa.0Demo readiness:
Review Notes
Known gaps:
obol model setup custom— it requires a bearer JWT and has slow cold-start for large GGUFs that exceeds the 60s validation timeout. Adding first-class Unsloth support (alongside Ollama) would let release-smoke run cleanly on hosts without vLLM. Out of scope for this PR.obol model setup customhot-add path fails with[Errno 30] Read-only file system: '/etc/litellm/config.yaml'because the ConfigMap is mounted RO. The CLI correctly falls back to a deployment restart, so users aren't blocked, but every custom-endpoint setup pays the full ~90s rollout cost. Worth a follow-up.TestWarnIfNoChatModel_EmitsWarnWhenNoModelsininternal/stackis broken onmainHEAD as of this PR (asserts warn-on-stderr but the message lands on stdout). Predates this branch.Follow-ups:
obol model setup(auth handling, longer first-call timeout).TestWarnIfNoChatModel_EmitsWarnWhenNoModels(separate issue).agent_render.go::agentPodSpecstill lacks an init container.Reviewer focus:
AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey)(noname) is acceptable — the function is only called fromcmd/obol/model.goand has no test mocks. Repo-wide grep returns zero stale callers.RestartLiteLLM(cfg, u, modelName)fallback label change is OK — the third arg is a UI-only string.OBOL_LLM_NAMEremoval fromflows/buy-external.shdoesn't break any external automation referencing it (none found in this repo or in~/.config).