chore(model): remove inert --name flag from `obol model setup custom` by bussyjd · Pull Request #509 · ObolNetwork/obol-stack

bussyjd · 2026-05-21T10:09:26Z

Summary

What changed:

Removed --name flag from obol model setup custom.
Dropped name parameter from model.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey) signature.
Removed OBOL_LLM_NAME env var from flows/lib.sh::route_llm_via_obol_cli and flows/buy-external.sh.
Updated CLAUDE.md, internal/embed/skills/monetize-guide/SKILL.md, and .agents/skills/obol-stack-dev/references/llm-routing.md example commands and env-var lists.

Why it matters:
The --name flag was 100% inert — it never reached ModelEntry, never persisted to the LiteLLM ConfigMap, and never influenced obol model list/prefer/status/sync/remove or any routing. The string was echoed in two log lines and passed as a UI label to RestartLiteLLM on the fallback path. Nothing else. Its help text said "informational only — LiteLLM keys the route by --model, not --name".

The trap: operators run obol model setup custom --name foo --model my/model …, then call the route as foo, and get:

litellm.BadRequestError: You passed in model=foo.
There are no healthy deployments for this model.

This was the same error string in the v0.10.0-rc1 upgrade report attributed to a "cache survives obol stack up" bug. After five fresh-cluster probes on rc3 could not reproduce the cache bug, the reliable reproducer turned out to be calling the route by the user-given --name rather than the --model value LiteLLM actually keys on. Removing the flag eliminates that UX trap.

Risk level: low

Commit under test: b9ff172 (this PR), parent f8df92e (tag v0.10.0-rc3)

Base branch: main

Scope

Validation

CI checks:

Check	Status	Link
Unit tests (touched packages)	✅ pass	local — `go test ./cmd/obol/ ./internal/model/ -count=1`
Full unit suite	⚠️ 1 pre-existing fail	local — see below
Shell syntax	✅ pass	local — `bash -n flows/*.sh`
Release-smoke (flows 01-12)	⚠️ 5 fails, all env-related	local — see report below
Live cluster smoke (`obol model setup custom` end-to-end)	✅ pass	local

Unit tests:

$ go test ./... -count=1
ok      github.com/ObolNetwork/obol-stack/cmd/obol      1.476s
ok      github.com/ObolNetwork/obol-stack/internal/model      0.731s
ok      ... (39 packages total)
FAIL    github.com/ObolNetwork/obol-stack/internal/stack      8.202s
  └── TestWarnIfNoChatModel_EmitsWarnWhenNoModels

PRE-EXISTING: Reproduces on clean `main` HEAD (f8df92e / v0.10.0-rc3 tag)
with this PR's changes stashed. Test asserts the warn is on stderr but
the message arrives on stdout. Unrelated to this PR.

Integration tests:

SKIPPED — internal/openclaw integration tests expect host Ollama on
:11434; QA host runs Unsloth Studio on :8888 instead. Not exercised
by this change either way (--name was never used by Ollama setup path).

Flow tests:

Flow	Network	QA machine label	Worktree	Result	Artifacts
flow-02-stack-init-up	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-02-stack-init-up.log`
flow-05-network	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-05-network.log`
flow-07-sell-verify	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-07-sell-verify.log`
flow-08-buy	base-sepolia	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-08-buy.log`
flow-09-lifecycle	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-09-lifecycle.log`
flow-10-anvil-facilitator	local	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-10-anvil-facilitator.log`
flow-01-prerequisites	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	Unsloth `/v1/models` returns 401 to unauthenticated probe
flow-03-inference	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	`endpoint validation failed: ... context deadline exceeded` — Unsloth Studio 27B cold-start exceeds `ValidateCustomEndpoint`'s 60s timeout. CLI parsed args correctly — no `--name` regression.
flow-04-agent	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	cascades from flow-03 (no model registered)
flow-06-sell-setup	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	hard-coded preflight for Ollama at `:11434`; QA host runs Unsloth
flow-11-dual-stack	base-sepolia	macOS Docker Desktop	none (in-tree)	FAIL (env)	same Unsloth cold-start timeout during alice's `obol model setup custom` validation. Preflight (Alice ETH, Bob USDC, facilitator) all PASS.

Release smoke:

$ OBOL_LLM_ENDPOINT=http://host.k3d.internal:8888/v1 \
  OBOL_LLM_MODEL=unsloth/Qwen3.6-27B-MTP-GGUF \
  OBOL_LLM_API_KEY=<unsloth-studio-jwt> \
  bash flows/release-smoke.sh

| Flow                       | Result | FAIL lines | SKIP lines | Exit code |
| -------------------------- | ------ | ---------: | ---------: | --------: |
| flow-01-prerequisites      | FAIL   |          1 |          0 |         1 |
| flow-02-stack-init-up      | PASS   |          0 |          0 |         0 |
| flow-03-inference          | FAIL   |          6 |          0 |         1 |
| flow-04-agent              | FAIL   |          1 |          0 |         1 |
| flow-05-network            | PASS   |          0 |          0 |         0 |
| flow-06-sell-setup         | FAIL   |          1 |          0 |         1 |
| flow-07-sell-verify        | PASS   |          0 |          0 |         0 |
| flow-10-anvil-facilitator  | PASS   |          0 |          0 |         0 |
| flow-08-buy                | PASS   |          0 |          0 |         0 |
| flow-09-lifecycle          | PASS   |          0 |          0 |         0 |
| flow-11-dual-stack         | FAIL   |          0 |          0 |         1 |

Release smoke failed: 5 flow(s)

Failure attribution: Zero failures involve --name parsing. grep "unknown flag\|flag provided but not defined" .tmp/release-smoke-*/*.log returns nothing. Every obol model setup custom invocation parsed arguments and reached endpoint validation correctly. All 5 failures are upstream of the CLI:

Unsloth Studio auth — /v1/models requires bearer token; flow-01's simple unauthenticated probe gets 401.
Unsloth Studio cold-start — first inference call on the 27B GGUF triggers model load, exceeding ValidateCustomEndpoint's 60s timeout. Surfaces in flow-03 and flow-11.
flow-06 Ollama hardcode — flow-06-sell-setup.sh preflight checks localhost:11434; QA host runs Unsloth, not Ollama.
flow-04 — cascades from flow-03 leaving LiteLLM without the routed model.

A vLLM/llama.cpp QA host without auth would not hit any of these. Six flows pass cleanly including all on-chain payment flows (flow-08 buy, flow-09 lifecycle).

Live Chain Evidence

Do not include private keys, seed phrases, passwords, hostnames, personal paths, or raw bearer tokens.

Network: base-sepolia (flow-08, flow-11)

RPC/provider: default free-tier fallback (no paid RPC set this run)

Facilitator: https://x402.gcp.obol.tech (reachable, supports Base Sepolia exact)

Contracts and tokens:

Name	Address	Version / notes
USDC (Base Sepolia)	`0x036CbD53842c5426634e7929541eC2318f3dCF7e`	facilitator-default

Wallet roles:

Role	Address	Source
Alice / seller / register	`0xC0De030F6C37f490594F93fB99e2756703c4297E`	flow-11 derived from REMOTE_SIGNER_PRIVATE_KEY
Bob / buyer / payer	`0x57b0eF875DeB5A37301F1640E469a2129Da9490E`	flow-11 derived from REMOTE_SIGNER_PRIVATE_KEY (2nd derive)
Facilitator / receiver	n/a	hosted x402-rs

Balances:

Token	Address	Before	After	Expected delta	Actual delta
USDC	`0x036CbD…CF7e`	Bob 4.95 USDC	Bob 4.95 USDC	0 (no purchase fired — flow-11 failed at LLM setup)	0

Transaction receipts: none on-chain this run (PR is CLI-only, no settlement path touched).

Runtime Evidence

QA environment:

Item	Value
OS / arch	macOS Darwin 25.5.0 / arm64
Backend	Docker Desktop + k3d v5.8.3
Tool versions	go1.25.5, k3d 5.8.3, helm 3.x, helmfile 1.4.x, kubectl 1.35.x
QA agent/model	Hermes (nousresearch/hermes-agent:v2026.5.7) + Unsloth Studio serving unsloth/Qwen3.6-27B-MTP-GGUF

Images:

Component	Image	Tag / digest	Source
obol-agent (Hermes)	`nousresearch/hermes-agent`	v2026.5.7	docker.io
LiteLLM	`ghcr.io/berriai/litellm:main-stable`	upstream	ghcr
x402-verifier / serviceoffer-controller / x402-buyer / demo-server / public-storefront	`ghcr.io/obolnetwork/<name>`	`:latest` (locally built, OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES=true)	in-tree Dockerfiles

Kubernetes / stack:

Item	Value
Stack IDs	`smart-dinosaur` (validation phase), `wondrous-crane` (release-smoke) — both torn down
Namespaces	standard set (llm, x402, hermes-obol-agent, erpc, monitoring, traefik, obol-frontend)
Pod readiness	all default infra pods Running 2/2 or 1/1 during validation
Cleanup result	release-smoke's `cleanup_stacks` trap removed test workspaces on exit

Model and routing:

Item	Value
Agent/model used	Hermes → LiteLLM → claude-sonnet-4-6 (Anthropic via ANTHROPIC_API_KEY) and unsloth/Qwen3.6-27B-MTP-GGUF (Unsloth Studio on host)
LiteLLM route	claude-sonnet-4-6 + paid/* + anthropic/* + unsloth/Qwen3.6-27B-MTP-GGUF
Paid endpoint status	not exercised this PR
Auth token source	LITELLM_MASTER_KEY from `kubectl get secret litellm-secrets -n llm`; Unsloth JWT from POST `/api/auth/login`

Artifacts and logs:

Artifact	Location / link	Notes
Release-smoke run	`.tmp/release-smoke-20260521-134647/`	11 per-flow logs + `RELEASE_REPORT.md`
Pre-merge live smoke	tmux pane `obol-0:qa.0`	4 chat probes, all HTTP 200

Demo readiness:

Item	Status	Notes
Seller visible / registered	n/a	not in scope of this PR
Buyer discovery works	✅	flow-08 buy and flow-09 lifecycle both PASS in release-smoke
Paid route works	✅	flow-08 PASS
Settlement visible on-chain	n/a	no settlement triggered this PR

Review Notes

Known gaps:

Unsloth Studio is not natively supported by obol model setup custom — it requires a bearer JWT and has slow cold-start for large GGUFs that exceeds the 60s validation timeout. Adding first-class Unsloth support (alongside Ollama) would let release-smoke run cleanly on hosts without vLLM. Out of scope for this PR.
A separate bug surfaced during validation: obol model setup custom hot-add path fails with [Errno 30] Read-only file system: '/etc/litellm/config.yaml' because the ConfigMap is mounted RO. The CLI correctly falls back to a deployment restart, so users aren't blocked, but every custom-endpoint setup pays the full ~90s rollout cost. Worth a follow-up.
Pre-existing unit-test failure: TestWarnIfNoChatModel_EmitsWarnWhenNoModels in internal/stack is broken on main HEAD as of this PR (asserts warn-on-stderr but the message lands on stdout). Predates this branch.

Follow-ups:

Native Unsloth support in obol model setup (auth handling, longer first-call timeout).
Fix hot-add by writing the merged config to an emptyDir or making the mount RW.
Fix TestWarnIfNoChatModel_EmitsWarnWhenNoModels (separate issue).
Re-validate Issue Add helm to obolup #2 (per-agent Hermes crashloop) on a Linux k3d host — cannot reproduce on macOS Docker Desktop due to virtiofs ownership translation, but source inspection on rc3 confirms agent_render.go::agentPodSpec still lacks an init container.

Reviewer focus:

Confirm the signature change AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey) (no name) is acceptable — the function is only called from cmd/obol/model.go and has no test mocks. Repo-wide grep returns zero stale callers.
Confirm the RestartLiteLLM(cfg, u, modelName) fallback label change is OK — the third arg is a UI-only string.
Confirm OBOL_LLM_NAME removal from flows/buy-external.sh doesn't break any external automation referencing it (none found in this repo or in ~/.config).

The `--name` flag on `obol model setup custom` was documented as informational only and never participated in any routing or persistence: - ModelEntry has only `model_name` (route key) + `litellm_params`; the CLI `--name` value was never written to either. - `detectProvider` (used by `obol model list/status`) inspects `entry.ModelName` + `entry.LiteLLMParams.Model` prefixes; the `--name` string never reached it. - It was only echoed back in two log lines and passed as a UI label to `RestartLiteLLM` on the hot-add fallback path. This caused confusion in QA: an operator running obol model setup custom --name foo --model my/model ... would later call the route as `foo` and get LiteLLM's BadRequestError: ... There are no healthy deployments for this model. (The same error message the operator at #v0.10.0-rc1-upgrade-report attributed to a cache-survives-stack-up bug. Five fresh-cluster probes on rc3 could not reproduce the cache bug — the consistent reproducer turned out to be calling the route by the user-given `--name` rather than the actual registered `--model` value.) Changes: - cmd/obol/model.go: drop --name flag from modelSetupCustomCommand - internal/model/model.go: drop name parameter from AddCustomEndpoint; fallback RestartLiteLLM label now uses modelName - flows/lib.sh: route_llm_via_obol_cli no longer reads OBOL_LLM_NAME or passes --name - flows/buy-external.sh: OBOL_LLM_NAME env var removed (orphan) - CLAUDE.md / monetize-guide SKILL.md / llm-routing.md: example commands and env-var lists drop --name / OBOL_LLM_NAME

bussyjd · 2026-05-24T13:41:58Z

Update: added the frontend rc2 pin on top of this branch.

obol-stack-front-end release: v0.1.25-rc2
frontend merge commit/tag SHA: ad5816c3d29054ab4811633832ffb958cc4b02cc
Docker multi-arch index digest: sha256:0a54d01401256c70a21d03ea348d4f2a449c30e4ee2e8a530b3e1f3a4c0cf327
obol-stack commit: ffd2d8e (updates internal/embed/infrastructure/values/obol-frontend.yaml.gotmpl)
local validation: go build ./... and go test ./internal/embed/... pass
running cluster obol-stack-working-longhorn has been rolled to the same digest and is 1/1 ready.

bussyjd · 2026-05-24T14:11:32Z

Update: rc2 exposed a cache-control issue in the admin shell (Cache-Control: s-maxage=31536000 on prerendered HTML), which is why the browser could keep showing the old frontend even after the pod image changed.

I fixed that in frontend PR #338, merged it, and cut v0.1.25-rc3. This PR is now repinned from rc2 to rc3.

obol-stack-front-end release: v0.1.25-rc3
frontend commit/tag SHA: e75272363a97809aba27333e2d9d4930b7e1710c
Docker multi-arch index digest: sha256:6b7cde94dc73e877d7a3888b055914343e2237ad282652734260554c7eeb8db3
obol-stack commit: 3a8ef47
local validation: go build ./... and go test ./internal/embed/... pass
running cluster obol-stack-working-longhorn has been rolled to the rc3 digest and is 1/1 ready
verified / and /marketplace now return Cache-Control: no-store, no-cache, max-age=0, must-revalidate.

bussyjd · 2026-05-24T14:28:22Z

Updated this PR from frontend v0.1.25-rc3 to v0.1.25-rc4.

Why rc4: rc3 was cut before frontend release aggregation PR #337 landed on main, so it had the admin HTML no-store fix but missed the consolidated marketplace/drain/CSRF/RBAC bundle. rc4 is tagged from current frontend main at 9a56791 and includes #337 plus the no-store header fix.

Frontend release: https://github.com/ObolNetwork/obol-stack-front-end/releases/tag/v0.1.25-rc4
Docker index digest: sha256:143633300757bec467a8818aa8aa99ec30d70f5096ffe4a075e66b6adc6014a0

Local checks on this branch:

go build ./...
go test ./internal/embed/...

bussyjd added 2 commits May 21, 2026 14:07

Merge branch 'main' into chore/remove-unused-model-name-flag

5310f0d

bussyjd mentioned this pull request May 24, 2026

fix: resolve marketplace bundle architecture blockers #541

Merged

chore(frontend): bump to v0.1.25-rc2

ffd2d8e

chore(frontend): bump to v0.1.25-rc3

3a8ef47

chore(frontend): bump to v0.1.25-rc4

b28b169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(model): remove inert --name flag from `obol model setup custom`#509

chore(model): remove inert --name flag from `obol model setup custom`#509
bussyjd wants to merge 5 commits into
mainfrom
chore/remove-unused-model-name-flag

bussyjd commented May 21, 2026

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bussyjd commented May 21, 2026

Summary

Scope

Validation

Live Chain Evidence

Runtime Evidence

Review Notes

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

bussyjd commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant