diff --git a/.agents/skills/obol-stack-dev/SKILL.md b/.agents/skills/obol-stack-dev/SKILL.md
index 33ab1661..4de88f55 100644
--- a/.agents/skills/obol-stack-dev/SKILL.md
+++ b/.agents/skills/obol-stack-dev/SKILL.md
@@ -2,7 +2,7 @@
 name: obol-stack-dev
 description: Obol Stack development and QA runbook. Use when working on obol-stack flows, x402 seller/buyer tests, live Base Sepolia OBOL smoke, Anvil fork regressions, ERC-8004 registration, LiteLLM paid routing, release-smoke, cloudflared, Renovate image bumps, or remote QA worktrees.
 metadata:
-  version: "3.0.0"
+  version: "3.1.0"
   domain: infrastructure
   role: specialist
   scope: development-and-testing
@@ -17,6 +17,7 @@ Operational router. Load only the reference for the task. **Do not delegate unde
 | Need | Read |
 |---|---|
 | Local build, env vars, force-rebuild, CLI surface | `references/dev.md` |
+| PR trains, ordered merge/collapse, release candidate gate | `references/release-train.md` |
 | Release-smoke broken — what to check first | `references/release-smoke-debugging.md` |
 | Live OBOL smoke, flow choice, Bob derivation, success criteria | `references/paid-flows.md` |
 | LiteLLM model setup, paid/* route, port-forward | `references/llm-routing.md` |
diff --git a/.agents/skills/obol-stack-dev/references/release-train.md b/.agents/skills/obol-stack-dev/references/release-train.md
new file mode 100644
index 00000000..7c502c97
--- /dev/null
+++ b/.agents/skills/obol-stack-dev/references/release-train.md
@@ -0,0 +1,156 @@
+# PR And Release Train
+
+Use this when asked to review or merge a set of obol-stack PRs, pin a frontend RC, handle GHAS/Renovate comments, or cut a release candidate. This is the orchestration layer; load the other references for the specific smoke, LLM, paid-flow, or remote-QA details.
+
+## Inputs To Nail Down
+
+- PR range and exclusions, for example "all PRs greater than #509 except #542".
+- Target base branch and whether the work should merge existing PRs, collapse them, or open fix PRs.
+- Release tag, frontend image tag, and whether the release is draft, prerelease, or ready.
+- Validation target: local unit tests, running cluster upgrade, live OBOL smoke, fork smoke, or full `flows/release-smoke.sh`.
+- Any required OpenAI-compatible QA LLM endpoint and model. Keep endpoint details in the shell environment or private notes, not in skill files, commit messages, PR text, or release text.
+
+## Train Shape
+
+```mermaid
+flowchart LR
+    A["Inventory PRs and checks"] --> B["Architectural review"]
+    B --> C{"Incorrect or risky?"}
+    C -- "yes" --> D["Open fix/<topic> PR"]
+    C -- "no" --> E["Mark ready / merge in order"]
+    D --> F["Parallel targeted validation"]
+    F --> E
+    E --> G["Upgrade running cluster"]
+    G --> H["Release-smoke gate"]
+    H --> I{"Green enough to release?"}
+    I -- "no" --> J["Record blockers, do not claim green"]
+    I -- "yes" --> K["Template-based non-draft RC release"]
+```
+
+## Inventory
+
+Start with source-of-truth state, not memory:
+
+```bash
+gh pr list --state open --limit 100 --json number,title,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,updatedAt
+gh pr view <number> --json number,title,body,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,reviewDecision,commits,files,comments,reviews
+```
+
+Build a table with number, topic, branch, draft status, checks, review status, dependency order, and whether it changes runtime behavior, release artifacts, CI, chart manifests, or docs only.
+
+## Architectural Review
+
+For each PR, review the diff in dependency order. The decision is not "does it compile"; it is whether the change preserves the stack contracts:
+
+- No regression in public/private route boundaries. Frontend, eRPC, storefront, `/.well-known/agent-registration.json`, and `/skill.md` stay intentionally exposed; agent internals do not become public.
+- No loss of x402 semantics: `PurchaseRequest Ready=True`, paid HTTP 200, exact balance deltas, on-chain transfer, and buyer route hot-add remain required evidence.
+- No dev/prod image confusion. Under `OBOL_DEVELOPMENT=true`, running pods must use the local images intended by the branch.
+- No release-only migration or wrapper unless the repo already has a durable helper. Prefer release notes warnings and operator directions when the product is not yet production-released.
+- No narrowing of supported chain names, model endpoint forms, or URL forms unless the caller and tests prove the old form is dead.
+- No broad cleanup. Delete only clusters, worktrees, containers, or ports whose ownership is recorded by the current worktree or explicitly confirmed.
+
+Subagents are useful for sidecar trace work, but the main agent owns the final architectural judgement. Give subagents bounded questions such as "trace all callers of this field" or "verify this PR cannot expose a private route"; do not delegate the whole train.
+
+## Fix PRs
+
+When a PR is architecturally wrong, open a minimal fix branch:
+
+```bash
+git switch -c fix/<short-topic>
+```
+
+PR descriptions should be self-contained and should not mention Codex or local host details. Include:
+
+- What invariant was violated.
+- Why the fix is the smallest correct change.
+- A Mermaid diagram when the behavior crosses controllers, charts, tunnels, buyers, or releases.
+- Exact validation run and result.
+- Remaining risk or follow-up, if any.
+
+Diagram template:
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant CLI as obol CLI
+    participant K8s as Kubernetes
+    participant Controller
+    participant Service as Runtime service
+    User->>CLI: command / upgrade / smoke
+    CLI->>K8s: apply intended manifests
+    K8s->>Controller: reconcile desired state
+    Controller->>Service: publish route or config
+    Service-->>User: validated behavior
+```
+
+## GHAS, Renovate, And Image Pins
+
+Treat bot comments as review input, not noise:
+
+- Read the exact comment and affected line before changing anything.
+- For GitHub Actions and third-party images, prefer current versions pinned by immutable SHA or digest when the repo pattern expects it.
+- Check whether Renovate has a matching manager/rule for frontend RC images and digest updates. If it failed to open a bump, fix the rule and validate it with the narrowest available Renovate config check.
+- For frontend RCs, verify both the repo pin and the running pod image/digest after cluster upgrade.
+- Do not mark the train done until PR checks and security comments are either fixed or explicitly documented as non-actionable with evidence.
+
+## Merge And Collapse Order
+
+Merge from the oldest/base dependency forward. After each merge or collapse step:
+
+```bash
+git fetch origin
+git log --oneline --decorate --graph --max-count=30 origin/main
+gh pr view <number> --json state,mergedAt,mergeCommit,isDraft,mergeStateStatus,statusCheckRollup
+```
+
+Before merging the next PR, confirm the previous behavior did not regress:
+
+- Branch head contains the expected commits and did not drop earlier fixes.
+- Required CI checks are complete or the reason for bypass is recorded.
+- Any running-cluster upgrade still points at the expected backend and frontend images.
+- Release notes and PR descriptions still match the final merged code, not an earlier draft.
+
+## Release Candidate Gate
+
+A release candidate is not ready just because the GitHub release exists. Gate it in this order:
+
+1. Start the body from `.github/release-template.md`.
+2. Keep generated `What's Changed`, `New Contributors`, and `Full Changelog` at the bottom.
+3. Include warnings and operator directions for known upgrade issues only after validating the upgrade path or explicitly labeling the warning as unverified.
+4. Run the smoke set required by the release. For full RCs, use `flows/release-smoke.sh` with live and fork flags when credentials and RPC capacity are available.
+5. Fill the release body with the actual smoke report: command, artifact path, pass/fail table, failed flow names, and current blockers.
+6. Only make the RC non-draft when the release body and validation evidence are complete.
+
+If any smoke flow fails, say exactly what failed. Do not present a release as green when the report is red or partially blocked.
+
+## Running-Cluster Upgrade Check
+
+Before testing an upgrade against a live local cluster:
+
+```bash
+k3d cluster list
+kubectl get pods -A
+kubectl get deploy -A -o wide
+```
+
+Identify the active stack ID, frontend image, backend component images, ports, and any parallel obol-stack clusters. Use tmux for long-running commands or shared sudo prompts. Clean up only stale stacks that are not the target and whose ownership is clear.
+
+After the upgrade:
+
+```bash
+kubectl get deploy -A -o wide
+kubectl get pods -A
+```
+
+Then run the targeted flow or full release smoke. Archive the log and artifact directory path in the PR or release description.
+
+## Final Report
+
+End with a short, auditable status:
+
+- PRs reviewed, fixed, merged, skipped, or left blocked.
+- Bot comments resolved or remaining.
+- Image pins and Renovate rules checked.
+- Smoke command, report path, and pass/fail summary.
+- Release URL and draft/prerelease status.
+- Cleanup performed and any cluster/worktree intentionally left running.
diff --git a/.github/workflows/helm-template-smoke.yml b/.github/workflows/helm-template-smoke.yml
index 9c27bc5f..a4744a61 100644
--- a/.github/workflows/helm-template-smoke.yml
+++ b/.github/workflows/helm-template-smoke.yml
@@ -26,7 +26,7 @@ jobs:
       - name: Set up Helm
         uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # v5.0.0
         with:
-          version: v3.20.1   # match obolup.sh pinned version
+          version: v3.21.0   # match obolup.sh pinned version
 
       - name: helm template ./base
         run: |
diff --git a/cmd/obol/model.go b/cmd/obol/model.go
index c47568ee..89c4fc4c 100644
--- a/cmd/obol/model.go
+++ b/cmd/obol/model.go
@@ -267,6 +267,7 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
 			&cli.StringFlag{Name: "endpoint", Usage: "Full base URL (e.g. http://host:8000/v1)", Required: true},
 			&cli.StringFlag{Name: "model", Usage: "Model identifier at the endpoint — this is also the LiteLLM model_name the agent will call", Required: true},
 			&cli.StringFlag{Name: "api-key", Usage: "API key (optional, some endpoints don't require it)"},
+			&cli.BoolFlag{Name: "disable-thinking", Usage: "Tells a model not to use its thinking mode to reason about turns for longer."},
 			&cli.BoolFlag{Name: "no-sync", Usage: "Skip the agent model sync (batch with other model commands, then run `obol model sync` once)"},
 		},
 		Action: func(ctx context.Context, cmd *cli.Command) error {
@@ -275,7 +276,10 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
 			modelName := cmd.String("model")
 			apiKey := cmd.String("api-key")
 
-			if err := model.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey); err != nil {
+			options := model.CustomEndpointOptions{
+				DisableThinking: cmd.Bool("disable-thinking"),
+			}
+			if err := model.AddCustomEndpointWithOptions(cfg, u, endpoint, modelName, apiKey, options); err != nil {
 				return err
 			}
 
diff --git a/flows/flow-01-prerequisites.sh b/flows/flow-01-prerequisites.sh
index db4f2a1a..be495055 100755
--- a/flows/flow-01-prerequisites.sh
+++ b/flows/flow-01-prerequisites.sh
@@ -9,8 +9,12 @@ run_step "Docker daemon running" docker info
 # LLM endpoint must be serving. Full QA uses an OpenAI-compatible
 # vLLM/llama.cpp endpoint; local development can still use Ollama.
 if [ -n "${OBOL_LLM_ENDPOINT:-}" ]; then
-    run_step_grep "OpenAI-compatible LLM endpoint serving models" "data|id" \
-        curl -sf "${OBOL_LLM_ENDPOINT%/}/models"
+    step "OpenAI-compatible LLM endpoint returns final chat content"
+    if preflight_openai_llm_endpoint; then
+        pass "LLM endpoint usable for model ${OBOL_LLM_MODEL:-qwen36-deep}"
+    else
+        fail "LLM endpoint did not pass OpenAI-compatible chat preflight"
+    fi
 else
     run_step_grep "Ollama serving models" "models" curl -sf http://localhost:11434/api/tags
 fi
diff --git a/flows/flow-03-inference.sh b/flows/flow-03-inference.sh
index 9233056c..ada2600d 100755
--- a/flows/flow-03-inference.sh
+++ b/flows/flow-03-inference.sh
@@ -60,22 +60,62 @@ else
 fi
 
 # §3d: Tool-call passthrough
+tool_call_name() {
+    python3 -c '
+import json
+import sys
+
+try:
+    data = json.load(sys.stdin)
+except Exception:
+    sys.exit(1)
+
+choices = data.get("choices") or []
+if not choices:
+    sys.exit(1)
+
+message = choices[0].get("message") or {}
+for call in message.get("tool_calls") or []:
+    function = call.get("function") or {}
+    if function.get("name") == "get_weather":
+        print("get_weather")
+        sys.exit(0)
+
+sys.exit(1)
+'
+}
+
 step "Tool-call passthrough"
 tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $LITELLM_KEY" \
     -d '{
         "model":"'"$LITELLM_MODEL"'",
-        "messages":[{"role":"user","content":"What is the weather in London?"}],
+        "messages":[{"role":"user","content":"Call the get_weather tool for London. Do not answer in text."}],
         "tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
-        "max_tokens":100,"stream":false
+        "tool_choice":{"type":"function","function":{"name":"get_weather"}},
+        "temperature":0,"max_tokens":100,"stream":false
     }' 2>&1) || true
 
-if echo "$tool_out" | grep -q "tool_calls\|get_weather"; then
+if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
     pass "Tool-call passthrough works"
 else
-    # Small/local models may not reliably support tool calls — soft fail
-    fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
+    # Some OpenAI-compatible endpoints accept tools but reject forced tool_choice.
+    tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
+        -H "Content-Type: application/json" \
+        -H "Authorization: Bearer $LITELLM_KEY" \
+        -d '{
+            "model":"'"$LITELLM_MODEL"'",
+            "messages":[{"role":"user","content":"Call the get_weather tool with location London. Do not answer in text."}],
+            "tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
+            "temperature":0,"max_tokens":100,"stream":false
+        }' 2>&1) || true
+
+    if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
+        pass "Tool-call passthrough works"
+    else
+        fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
+    fi
 fi
 
 cleanup_pid "$PF_PID"
diff --git a/flows/flow-04-agent.sh b/flows/flow-04-agent.sh
index bb593eb6..23154c2a 100755
--- a/flows/flow-04-agent.sh
+++ b/flows/flow-04-agent.sh
@@ -117,10 +117,11 @@ if [ -n "${OBOL_LLM_ENDPOINT:-}" ] && [ "$model_name" != "${OBOL_LLM_MODEL:-qwen
     exit 0
 fi
 
+llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
 out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $TOKEN" \
-    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false}" 2>&1) || true
+    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false${llm_payload_suffix}}" 2>&1) || true
 
 if echo "$out" | grep -q "choices"; then
     pass "Agent inference returned response"
@@ -138,7 +139,7 @@ step "Agent answers 'hello' without parroting tool catalogue (model rank regress
 hello_out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $TOKEN" \
-    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false}" 2>&1) || true
+    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false${llm_payload_suffix}}" 2>&1) || true
 hello_content=$(echo "$hello_out" | python3 -c "
 import json, sys
 try:
diff --git a/flows/flow-06-sell-setup.sh b/flows/flow-06-sell-setup.sh
index 6882002a..1240f43e 100755
--- a/flows/flow-06-sell-setup.sh
+++ b/flows/flow-06-sell-setup.sh
@@ -20,14 +20,16 @@ else
     fail "CRD API group/version unexpected: group=$crd_group, version=$crd_version"
 fi
 run_step_grep "x402 verifier running" "Running" "$OBOL" kubectl get pods -n x402 --no-headers
-# x402-verifier has 2 replicas for high availability (CLAUDE.md: "2 replicas")
-step "x402-verifier has 2 replicas (high availability)"
+# The embedded x402 manifest intentionally runs one verifier replica in local
+# stacks. Keep the smoke assertion aligned with the shipped manifest; HA belongs
+# to production sizing, not the single-node release-smoke cluster.
+step "x402-verifier has 1 replica (local stack sizing)"
 verifier_replicas=$("$OBOL" kubectl get deployment x402-verifier -n x402 \
     -o jsonpath='{.spec.replicas}' 2>&1) || true
-if [ "$verifier_replicas" = "2" ]; then
-    pass "x402-verifier: 2 replicas (HA payment gate)"
+if [ "$verifier_replicas" = "1" ]; then
+    pass "x402-verifier: 1 replica (local payment gate)"
 else
-    fail "x402-verifier replica count: $verifier_replicas (expected 2)"
+    fail "x402-verifier replica count: $verifier_replicas (expected 1)"
 fi
 # x402-verifier service must be on port 8080 (matches ForwardAuth address :8080/verify)
 step "x402-verifier service on port 8080"
diff --git a/flows/flow-11-dual-stack.sh b/flows/flow-11-dual-stack.sh
index a0a19677..4fb48a8f 100755
--- a/flows/flow-11-dual-stack.sh
+++ b/flows/flow-11-dual-stack.sh
@@ -608,6 +608,8 @@ except Exception as e:
 wait_for_paid_inference() {
     local attempts="${1:-24}"
     local delay="${2:-5}"
+    local transient_retries="${PAID_INFERENCE_TRANSIENT_RETRIES:-1}"
+    local transient_seen=0
     local out=""
     local i
 
@@ -617,9 +619,14 @@ wait_for_paid_inference() {
             printf '%s\n' "$out"
             return 0
         fi
-        if echo "$out" | grep -q "Payment verification failed" || \
-           echo "$out" | grep -q "ERROR=503" || \
-           echo "$out" | grep -q "ServiceUnavailableError"; then
+        if echo "$out" | paid_inference_pending_error; then
+            sleep "$delay"
+            continue
+        fi
+        if echo "$out" | paid_inference_transient_error && [ "$transient_seen" -lt "$transient_retries" ]; then
+            transient_seen=$((transient_seen + 1))
+            echo "RETRY_TRANSIENT=${transient_seen}/${transient_retries}: paid inference hit transient timeout/error" >&2
+            printf '%s\n' "$out" >&2
             sleep "$delay"
             continue
         fi
@@ -1271,6 +1278,7 @@ else
 fi
 
 step "Bob's agent: discover Alice via ERC-8004 registry"
+llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
 discover_response=$(curl -sf --max-time 300 \
     -X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
     -H "Authorization: Bearer $BOB_TOKEN" \
@@ -1282,7 +1290,7 @@ discover_response=$(curl -sf --max-time 300 \
             \"content\": \"Search the ERC-8004 agent identity registry on Base Sepolia for recently registered AI inference services that support x402 payments. Use the discovery skill to scan for agents. Look for one named 'Dual-Stack Test Inference' or similar with natural_language_processing skills. Report what you find — the agent ID, name, endpoint URL, and whether it supports x402.\"
         }],
         \"max_tokens\": 4000,
-	        \"stream\": false
+	        \"stream\": false${llm_payload_suffix}
 	    }" 2>&1 || true)
 
 discover_content=$(extract_assistant_content "$discover_response" 2>/dev/null || true)
@@ -1341,7 +1349,7 @@ else
                 \"content\": \"Use the buy-x402 skill and your terminal tool. Run exactly once: ERPC_URL=http://erpc.erpc.svc.cluster.local/rpc ERPC_NETWORK=base-sepolia python3 $BOB_OBOL_SKILLS_DIR/buy-x402/scripts/buy.py buy alice-inference --endpoint $TUNNEL_URL/services/alice-inference/v1/chat/completions --model $OBOL_LLM_MODEL --count $FLOW11_BUY_COUNT\"
             }],
             \"max_tokens\": 4000,
-	            \"stream\": false
+	            \"stream\": false${llm_payload_suffix}
 	        }" 2>&1 || true)
 
     buy_content=$(extract_assistant_content "$buy_response" 2>/dev/null || true)
diff --git a/flows/flow-13-dual-stack-obol.sh b/flows/flow-13-dual-stack-obol.sh
index 3b2cea6f..eac09169 100755
--- a/flows/flow-13-dual-stack-obol.sh
+++ b/flows/flow-13-dual-stack-obol.sh
@@ -870,29 +870,14 @@ else
 fi
 
 # ═════════════════════════════════════════════════════════════════
-# 34. AGENT DISCOVERS ALICE (via skill.md or ERC-8004)
+# 34. BOB AGENT POD DISCOVERS ALICE VIA SKILL CATALOG
 # ═════════════════════════════════════════════════════════════════
 
-step "Bob's agent: discover Alice's OBOL service"
-discover_response=$(curl -sf --max-time 300 \
-    -X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
-    -H "Authorization: Bearer $BOB_TOKEN" \
-    -H "Content-Type: application/json" \
-    -d "{
-        \"model\": \"$BOB_AGENT_RUNTIME-agent\",
-        \"messages\": [{
-            \"role\": \"user\",
-            \"content\": \"Search the local ERC-8004 registry on Base Sepolia (chain 84532) for the agent named 'Dual-Stack OBOL Test Inference'. Use the discovery skill or fetch $TUNNEL_URL/skill.md. Report the agent's ID, name, endpoint, and the asset symbol it requires for x402 payments.\"
-        }],
-        \"max_tokens\": 4000,
-        \"stream\": false
-    }" 2>&1 || true)
-discover_content=$(extract_assistant_content "$discover_response" 2>/dev/null || true)
-echo "${discover_content:0:500}"
-# Discovery is informational only on this flow. The structural proof that the
-# agent can reach Alice is the next "buy" step + the PurchaseRequest CR going
-# Ready=True. Natural-language assertions on agent responses are brittle.
-pass "Agent discovery prompt issued (success will be confirmed by buy + PurchaseRequest CR)"
+step "Bob's agent pod: discover Alice's OBOL service in /api/services.json"
+# This check proves Bob's agent pod can reach Alice's public catalog without
+# burning a long LLM turn. The structural agent proof remains the next step:
+# Hermes must invoke buy.py and create the PurchaseRequest.
+assert_bob_service_catalog_contains "alice-obol-inference" "OBOL"
 
 # ═════════════════════════════════════════════════════════════════
 # 35. BUY 5 AUTHS VIA buy.py (Permit2-aware on integration branch)
diff --git a/flows/flow-14-live-obol-base-sepolia.sh b/flows/flow-14-live-obol-base-sepolia.sh
index 4b0f5bb8..a5ee7ed8 100755
--- a/flows/flow-14-live-obol-base-sepolia.sh
+++ b/flows/flow-14-live-obol-base-sepolia.sh
@@ -924,29 +924,14 @@ else
 fi
 
 # ═════════════════════════════════════════════════════════════════
-# 29. AGENT DISCOVERS ALICE (via ERC-8004 / skill.md)
+# 29. BOB AGENT POD DISCOVERS ALICE VIA SKILL CATALOG
 # ═════════════════════════════════════════════════════════════════
 
-step "Bob's agent: discover Alice's OBOL service"
-discover_response=$(curl -sf --max-time 300 \
-    -X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
-    -H "Authorization: Bearer $BOB_TOKEN" \
-    -H "Content-Type: application/json" \
-    -d "{
-        \"model\": \"$BOB_AGENT_RUNTIME-agent\",
-        \"messages\": [{
-            \"role\": \"user\",
-            \"content\": \"Search the ERC-8004 registry on Base Sepolia for the agent named 'Live OBOL Base Sepolia Test Inference'. Use the discovery skill or fetch $TUNNEL_URL/skill.md. Report the agent's ID, name, endpoint, and the asset symbol it requires for x402 payments.\"
-        }],
-        \"max_tokens\": 4000,
-        \"stream\": false
-    }" 2>&1 || true)
-discover_content=$(extract_assistant_content "$discover_response" 2>/dev/null || true)
-echo "${discover_content:0:500}"
-# Discovery is informational only on this flow. The structural proof that the
-# agent can reach Alice is the next "buy" step + the PurchaseRequest CR going
-# Ready=True.
-pass "Agent discovery prompt issued (success will be confirmed by buy + PurchaseRequest CR)"
+step "Bob's agent pod: discover Alice's OBOL service in /api/services.json"
+# This check proves Bob's agent pod can reach Alice's public catalog without
+# burning a long LLM turn. The structural agent proof remains the next step:
+# Hermes must invoke buy.py and create the PurchaseRequest.
+assert_bob_service_catalog_contains "alice-obol-inference" "OBOL"
 
 # ═════════════════════════════════════════════════════════════════
 # 30. BUY 5 AUTHS VIA buy.py (Permit2-aware on integration branch)
diff --git a/flows/lib-dual-stack.sh b/flows/lib-dual-stack.sh
index 5d6bbea1..e650bc7a 100644
--- a/flows/lib-dual-stack.sh
+++ b/flows/lib-dual-stack.sh
@@ -329,6 +329,72 @@ except Exception as e:
 " 2>/dev/null || true
 }
 
+assert_bob_service_catalog_contains() {
+    local service_name="$1"
+    local token_symbol="$2"
+    local expected_path="${3:-/services/$service_name}"
+    local catalog_url="${TUNNEL_URL%/}/api/services.json"
+    local out i
+
+    out=""
+    for i in $(seq 1 12); do
+        out=$(bob kubectl exec -i -n "$BOB_AGENT_NS" "deploy/$BOB_AGENT_DEPLOY" -c "$BOB_AGENT_CONTAINER" -- \
+            env CATALOG_URL="$catalog_url" SERVICE_NAME="$service_name" TOKEN_SYMBOL="$token_symbol" EXPECTED_PATH="$expected_path" \
+            python3 - <<'PY' 2>&1 || true
+import json
+import os
+import sys
+import urllib.request
+from urllib.parse import urlparse
+
+url = os.environ["CATALOG_URL"]
+service_name = os.environ["SERVICE_NAME"]
+token_symbol = os.environ["TOKEN_SYMBOL"].upper()
+expected_path = os.environ["EXPECTED_PATH"]
+
+with urllib.request.urlopen(
+    urllib.request.Request(url, headers={"Accept": "application/json"}),
+    timeout=20,
+) as resp:
+    status = resp.status
+    services = json.loads(resp.read(200000))
+
+entry = next((svc for svc in services if svc.get("name") == service_name), None)
+if entry is None:
+    raise RuntimeError(f"{service_name} not present")
+
+asset = entry.get("asset") or {}
+endpoint_path = urlparse(entry.get("endpoint", "")).path
+problems = []
+if status != 200:
+    problems.append(f"HTTP {status}")
+if endpoint_path != expected_path:
+    problems.append(f"endpoint.path={endpoint_path!r}")
+if (asset.get("symbol") or "").upper() != token_symbol:
+    problems.append(f"asset.symbol={asset.get('symbol')!r}")
+if asset.get("transferMethod") != "permit2":
+    problems.append(f"asset.transferMethod={asset.get('transferMethod')!r}")
+if entry.get("network") != "base-sepolia" and entry.get("caip2Network") != "eip155:84532":
+    problems.append(f"network={entry.get('network')!r}/{entry.get('caip2Network')!r}")
+if problems:
+    raise RuntimeError("; ".join(problems))
+
+print(f"HTTP {status} {service_name} {entry.get('endpoint')} {asset.get('symbol')} {asset.get('transferMethod')}")
+PY
+        )
+        if printf '%s' "$out" | grep -q '^HTTP 200 '; then
+            echo "$out"
+            pass "Agent pod found $service_name ($token_symbol) in service catalog"
+            return 0
+        fi
+        echo "${out:0:500}"
+        sleep 5
+    done
+
+    fail "Agent pod could not find $service_name ($token_symbol) in $catalog_url"
+    emit_metrics; exit 1
+}
+
 purchase_request_status() {
     bob kubectl get purchaserequests.obol.org -n "$BOB_AGENT_NS" --no-headers 2>&1 || true
 }
@@ -347,23 +413,127 @@ except Exception as e:
 " 2>&1 || true
 }
 
-# Send the long single-shot buy prompt to Bob's agent. The prompt expands
-# against the caller's environment (BOB_AGENT_PORT, BOB_TOKEN,
-# BOB_AGENT_RUNTIME, BOB_OBOL_SKILLS_DIR, TUNNEL_URL, OBOL_LLM_MODEL).
-_agent_buy_send_prompt() {
-    curl -sf --max-time 300 \
+AGENT_CHAT_HTTP_STATUS=""
+AGENT_CHAT_CURL_EXIT=""
+AGENT_CHAT_ERROR=""
+AGENT_CHAT_BODY=""
+
+_agent_chat_payload() {
+    local prompt="$1"
+    local max_tokens="${2:-4000}"
+
+    DUAL_STACK_AGENT_MODEL="$BOB_AGENT_RUNTIME-agent" \
+    DUAL_STACK_AGENT_PROMPT="$prompt" \
+    DUAL_STACK_AGENT_MAX_TOKENS="$max_tokens" \
+    DUAL_STACK_DISABLE_THINKING="${OBOL_LLM_DISABLE_THINKING:-false}" \
+        python3 - <<'PY'
+import json
+import os
+
+payload = {
+    "model": os.environ["DUAL_STACK_AGENT_MODEL"],
+    "messages": [{"role": "user", "content": os.environ["DUAL_STACK_AGENT_PROMPT"]}],
+    "max_tokens": int(os.environ.get("DUAL_STACK_AGENT_MAX_TOKENS") or "4000"),
+    "stream": False,
+}
+if os.environ.get("DUAL_STACK_DISABLE_THINKING") == "true":
+    payload["chat_template_kwargs"] = {"enable_thinking": False}
+print(json.dumps(payload, separators=(",", ":")))
+PY
+}
+
+_agent_chat_send() {
+    local prompt="$1"
+    local max_tokens="${2:-4000}"
+    local timeout="${3:-300}"
+    local payload body_file err_file http_status rc
+
+    AGENT_CHAT_HTTP_STATUS=""
+    AGENT_CHAT_CURL_EXIT=""
+    AGENT_CHAT_ERROR=""
+    AGENT_CHAT_BODY=""
+
+    payload=$(_agent_chat_payload "$prompt" "$max_tokens") || return 1
+    body_file=$(mktemp)
+    err_file=$(mktemp)
+    rc=0
+    http_status=$(curl -sS --max-time "$timeout" \
+        -o "$body_file" \
+        -w "%{http_code}" \
         -X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
         -H "Authorization: Bearer $BOB_TOKEN" \
         -H "Content-Type: application/json" \
-        -d "{
-            \"model\": \"$BOB_AGENT_RUNTIME-agent\",
-            \"messages\": [{
-                \"role\": \"user\",
-                \"content\": \"Use the buy-x402 skill and your terminal tool. Run exactly once: ERPC_URL=http://erpc.erpc.svc.cluster.local/rpc ERPC_NETWORK=base-sepolia python3 $BOB_OBOL_SKILLS_DIR/buy-x402/scripts/buy.py buy alice-obol --endpoint $TUNNEL_URL/services/alice-obol-inference/v1/chat/completions --model $OBOL_LLM_MODEL --count 5\"
-            }],
-            \"max_tokens\": 4000,
-            \"stream\": false
-        }" 2>&1 || true
+        --data-binary "$payload" 2>"$err_file") || rc=$?
+
+    AGENT_CHAT_HTTP_STATUS="$http_status"
+    AGENT_CHAT_CURL_EXIT="$rc"
+    AGENT_CHAT_ERROR="$(cat "$err_file" 2>/dev/null || true)"
+    AGENT_CHAT_BODY="$(cat "$body_file" 2>/dev/null || true)"
+    rm -f "$body_file" "$err_file"
+    return 0
+}
+
+_agent_chat_transient_error() {
+    {
+        printf 'HTTP_STATUS=%s\n' "$AGENT_CHAT_HTTP_STATUS"
+        printf 'CURL_EXIT=%s\n' "$AGENT_CHAT_CURL_EXIT"
+        printf '%s\n' "$AGENT_CHAT_ERROR"
+        printf '%s\n' "$AGENT_CHAT_BODY"
+    } | grep -qiE "HTTP_STATUS=503|CURL_EXIT=28|Loading model|ServiceUnavailableError|TimeoutError|timed out|context canceled|deadline exceeded|upstream request timeout"
+}
+
+_agent_chat_status_ok() {
+    [ "$AGENT_CHAT_CURL_EXIT" = "0" ] && [ "$AGENT_CHAT_HTTP_STATUS" = "200" ]
+}
+
+_agent_chat_failure_preview() {
+    printf 'http=%s curl=%s %s %s' \
+        "${AGENT_CHAT_HTTP_STATUS:-unknown}" \
+        "${AGENT_CHAT_CURL_EXIT:-unknown}" \
+        "${AGENT_CHAT_ERROR:0:180}" \
+        "${AGENT_CHAT_BODY:0:300}"
+}
+
+_agent_ready_preflight() {
+    local marker="OBOL_AGENT_READY"
+    local content i
+    local attempts="${AGENT_READY_PREFLIGHT_ATTEMPTS:-3}"
+    local timeout="${AGENT_READY_PREFLIGHT_TIMEOUT:-300}"
+
+    for i in $(seq 1 "$attempts"); do
+        _agent_chat_send "Reply exactly $marker" 64 "$timeout"
+        if _agent_chat_status_ok; then
+            content=$(extract_assistant_content "$AGENT_CHAT_BODY" 2>/dev/null || true)
+            if printf '%s' "$content" | grep -Fq "$marker"; then
+                echo "  Agent readiness preflight OK (attempt $i)"
+                return 0
+            fi
+            if _agent_chat_transient_error; then
+                echo "  Agent readiness transient (attempt $i/$attempts): ${content:0:300}"
+                sleep 10
+                continue
+            fi
+            fail "Agent readiness preflight returned unexpected content: ${content:0:300}"
+            emit_metrics; exit 1
+        fi
+        if _agent_chat_transient_error; then
+            echo "  Agent readiness transient (attempt $i/$attempts): $(_agent_chat_failure_preview)"
+            sleep 10
+            continue
+        fi
+        fail "Agent readiness preflight failed: $(_agent_chat_failure_preview)"
+        emit_metrics; exit 1
+    done
+
+    fail "Agent readiness preflight did not clear transient errors after $attempts attempts: $(_agent_chat_failure_preview)"
+    emit_metrics; exit 1
+}
+
+_agent_buy_send_prompt() {
+    _agent_chat_send \
+        "Use the buy-x402 skill and your terminal tool. Run exactly once: ERPC_URL=http://erpc.erpc.svc.cluster.local/rpc ERPC_NETWORK=base-sepolia python3 $BOB_OBOL_SKILLS_DIR/buy-x402/scripts/buy.py buy alice-obol --endpoint $TUNNEL_URL/services/alice-obol-inference/v1/chat/completions --model $OBOL_LLM_MODEL --count 5" \
+        4000 \
+        300
 }
 
 _agent_buy_pr_exists() {
@@ -371,62 +541,62 @@ _agent_buy_pr_exists() {
         -o name 2>/dev/null | grep -q .
 }
 
-# 1-retry wrapper for the agent buy prompt at flow-13/14 step 46. The QA LLM
-# (qwen36-deep, 27B-class — see OBOL_LLM_MODEL default) occasionally narrates a
-# fabricated failure on the long single-shot buy prompt instead of actually
-# invoking the bash tool. When that happens, no PurchaseRequest is created and
-# step 47 fails with "PurchaseRequest CR not ready" — even though buy.py was
-# never invoked. The smaller qwen36-fast (~4B) flakes much more often; deep is
-# the new default for that reason. See plans/inference-v1337-followup-20260514.md.
-#
-# Strategy: poll for the PR for up to 60s after the first prompt; if absent,
-# print a LOUD warning flagging this as agent unreliability and re-send the
-# prompt once. If still absent after the retry, step 47 fails as before.
+# 1-retry wrapper for the agent buy prompt at flow-13/14 step 46. The smoke
+# proof remains structural: Bob's agent must create the PurchaseRequest, then
+# the flow waits for Ready=True, sidecar auths, paid HTTP 200, settlement, and
+# exact balance deltas. This wrapper only makes the agent/LiteLLM readiness
+# and retry semantics honest around transient model-loading failures.
 agent_buy_with_retry() {
-    local response content retried=0 i
+    local content attempt i max_attempts=2
 
-    response=$(_agent_buy_send_prompt)
-    content=$(extract_assistant_content "$response" 2>/dev/null || true)
-    echo "${content:0:500}"
-    if [ -z "$(printf '%s' "$content" | tr -d '[:space:]')" ]; then
-        echo "  ! Agent returned no final assistant text; confirming purchase via PurchaseRequest CR"
-    fi
-    if printf '%s' "$content" | agent_response_refused; then
-        fail "Agent refused to run buy.py: ${content:0:500}"
-        emit_metrics; exit 1
-    fi
+    _agent_ready_preflight
 
-    # Wait up to 60s for the controller to reconcile the PR. Healthy runs see
-    # it within ~5s; the long ceiling absorbs cluster-cold-start jitter.
-    for i in $(seq 1 12); do
-        _agent_buy_pr_exists && break
-        sleep 5
-    done
+    for attempt in $(seq 1 "$max_attempts"); do
+        _agent_buy_send_prompt
 
-    if ! _agent_buy_pr_exists; then
-        echo ""
-        echo "  ╔════════════════════════════════════════════════════════════════════════╗"
-        echo "  ║  WARN: agent did NOT create a PurchaseRequest after 60s.               ║"
-        echo "  ║  Documented LLM flake on the long single-shot buy prompt — agent       ║"
-        echo "  ║  narrated a fabricated failure instead of invoking buy.py.             ║"
-        echo "  ║  Re-prompting ONCE.                                                    ║"
-        echo "  ║  If this fires regularly: confirm OBOL_LLM_MODEL=qwen36-deep (default) ║"
-        echo "  ║  not qwen36-fast (4B), or escalate to qwen36-35b-heretic, or add a     ║"
-        echo "  ║  non-agent fallback path.                                              ║"
-        echo "  ║  Ref: plans/inference-v1337-followup-20260514.md                       ║"
-        echo "  ╚════════════════════════════════════════════════════════════════════════╝"
-        echo ""
-        retried=1
-        response=$(_agent_buy_send_prompt)
-        content=$(extract_assistant_content "$response" 2>/dev/null || true)
-        echo "  RETRY response: ${content:0:500}"
+        if ! _agent_chat_status_ok; then
+            if _agent_chat_transient_error && [ "$attempt" -lt "$max_attempts" ]; then
+                echo "  Agent buy transient (attempt $attempt/$max_attempts): $(_agent_chat_failure_preview)"
+                sleep 10
+                continue
+            fi
+            fail "Agent buy request failed: $(_agent_chat_failure_preview)"
+            emit_metrics; exit 1
+        fi
+
+        content=$(extract_assistant_content "$AGENT_CHAT_BODY" 2>/dev/null || true)
+        echo "${content:0:500}"
+        if [ -z "$(printf '%s' "$content" | tr -d '[:space:]')" ]; then
+            echo "  ! Agent returned no final assistant text; confirming purchase via PurchaseRequest CR"
+        fi
+        if _agent_chat_transient_error && [ "$attempt" -lt "$max_attempts" ]; then
+            echo "  Agent buy transient content (attempt $attempt/$max_attempts): ${content:0:300}"
+            sleep 10
+            continue
+        fi
         if printf '%s' "$content" | agent_response_refused; then
-            fail "Agent refused to run buy.py on retry: ${content:0:500}"
+            fail "Agent refused to run buy.py: ${content:0:500}"
             emit_metrics; exit 1
         fi
-    fi
 
-    pass "Agent buy prompt issued (retry=$retried; success will be confirmed by PurchaseRequest CR)"
+        # Wait up to 60s for buy.py to create the PR. Healthy runs see it
+        # within ~5s; the long ceiling absorbs cluster-cold-start jitter.
+        for i in $(seq 1 12); do
+            if _agent_buy_pr_exists; then
+                pass "Agent buy created PurchaseRequest (attempt=$attempt; Ready=True confirmed next)"
+                return 0
+            fi
+            sleep 5
+        done
+
+        if [ "$attempt" -lt "$max_attempts" ]; then
+            echo "  ! Agent did not create PurchaseRequest after 60s; re-prompting once"
+            continue
+        fi
+    done
+
+    fail "Agent did not create PurchaseRequest after $max_attempts attempts"
+    emit_metrics; exit 1
 }
 
 extract_assistant_content() {
@@ -490,6 +660,8 @@ except Exception as e:
 wait_for_paid_inference() {
     local attempts="${1:-24}"
     local delay="${2:-5}"
+    local transient_retries="${PAID_INFERENCE_TRANSIENT_RETRIES:-1}"
+    local transient_seen=0
     local out=""
     local i
 
@@ -499,9 +671,14 @@ wait_for_paid_inference() {
             printf '%s\n' "$out"
             return 0
         fi
-        if echo "$out" | grep -q "Payment verification failed" || \
-           echo "$out" | grep -q "ERROR=503" || \
-           echo "$out" | grep -q "ServiceUnavailableError"; then
+        if echo "$out" | paid_inference_pending_error; then
+            sleep "$delay"
+            continue
+        fi
+        if echo "$out" | paid_inference_transient_error && [ "$transient_seen" -lt "$transient_retries" ]; then
+            transient_seen=$((transient_seen + 1))
+            echo "RETRY_TRANSIENT=${transient_seen}/${transient_retries}: paid inference hit transient timeout/error" >&2
+            printf '%s\n' "$out" >&2
             sleep "$delay"
             continue
         fi
diff --git a/flows/lib.sh b/flows/lib.sh
index e1121b48..87c831e7 100755
--- a/flows/lib.sh
+++ b/flows/lib.sh
@@ -564,6 +564,158 @@ bootstrap_flow_workspace() {
     done
 }
 
+# Validate that OBOL_LLM_ENDPOINT is OpenAI-compatible and returns final
+# assistant content for the configured OBOL_LLM_MODEL.
+#
+# Activated when OBOL_LLM_ENDPOINT is set (for example,
+# http://127.0.0.1:8000/v1 on a QA machine). The endpoint must be
+# OpenAI-compatible, such as vLLM or llama.cpp.
+# OBOL_LLM_MODEL is the upstream model id (default qwen36-deep, 27B-class).
+# qwen36-fast (4B) is faster but flakes on long single-shot agent prompts; see
+# the flow-13/14 step 46 retry-wrapper rationale in lib-dual-stack.sh.
+preflight_openai_llm_endpoint() {
+    local out rc
+
+    rc=0
+    out=$(OBOL_LLM_ENDPOINT="${OBOL_LLM_ENDPOINT:-}" \
+    OBOL_LLM_MODEL="${OBOL_LLM_MODEL:-qwen36-deep}" \
+    OBOL_LLM_API_KEY="${OBOL_LLM_API_KEY:-}" \
+    python3 - <<'PY' 2>&1
+import json
+import os
+import sys
+import urllib.error
+import urllib.request
+
+endpoint = os.environ["OBOL_LLM_ENDPOINT"].rstrip("/")
+model = os.environ["OBOL_LLM_MODEL"]
+api_key = os.environ.get("OBOL_LLM_API_KEY", "")
+marker = "OBOL_LLM_PREFLIGHT_OK"
+
+if not endpoint:
+    print("OBOL_LLM_ENDPOINT is empty", file=sys.stderr)
+    sys.exit(2)
+
+
+def request_json(path, payload=None, timeout=30):
+    data = None
+    headers = {}
+    method = "GET"
+    if payload is not None:
+        data = json.dumps(payload).encode()
+        headers["Content-Type"] = "application/json"
+        method = "POST"
+    if api_key:
+        headers["Authorization"] = "Bearer " + api_key
+    req = urllib.request.Request(endpoint + path, data=data, headers=headers, method=method)
+    try:
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            body = resp.read()
+            return json.loads(body.decode() or "{}")
+    except urllib.error.HTTPError as exc:
+        body = exc.read().decode(errors="replace")[:300]
+        raise RuntimeError(f"HTTP {exc.code}: {body}") from None
+    except urllib.error.URLError as exc:
+        raise RuntimeError(f"network error: {exc.reason}") from None
+    except json.JSONDecodeError as exc:
+        raise RuntimeError(f"invalid JSON response: {exc}") from None
+
+
+def model_ids(models_body):
+    ids = []
+    data = models_body.get("data")
+    if isinstance(data, list):
+        for item in data:
+            if isinstance(item, dict) and isinstance(item.get("id"), str):
+                ids.append(item["id"])
+    return ids
+
+
+def content_from_message(message):
+    content = message.get("content") or ""
+    if isinstance(content, list):
+        parts = []
+        for part in content:
+            if isinstance(part, dict) and isinstance(part.get("text"), str):
+                parts.append(part["text"])
+            elif isinstance(part, str):
+                parts.append(part)
+        content = " ".join(parts) if parts else json.dumps(content, separators=(",", ":"))
+    return " ".join(str(content).split())
+
+
+def chat(disable_thinking):
+    payload = {
+        "model": model,
+        "messages": [
+            {"role": "user", "content": f"Reply exactly: {marker}"}
+        ],
+        "temperature": 0,
+        "max_tokens": 64,
+        "stream": False,
+    }
+    if disable_thinking:
+        payload["chat_template_kwargs"] = {"enable_thinking": False}
+    body = request_json("/chat/completions", payload=payload, timeout=75)
+    choices = body.get("choices")
+    if not choices:
+        raise RuntimeError("chat response has no choices")
+    message = choices[0].get("message") or {}
+    content = content_from_message(message)
+    reasoning = message.get("reasoning_content") or message.get("reasoning") or ""
+    return content, bool(reasoning)
+
+
+errors = []
+try:
+    ids = model_ids(request_json("/models", timeout=20))
+except Exception as exc:
+    print(f"LLM preflight failed: /models unavailable ({exc})", file=sys.stderr)
+    sys.exit(1)
+
+if ids and model not in ids:
+    sample = ", ".join(ids[:12])
+    more = "" if len(ids) <= 12 else f", ... ({len(ids)} total)"
+    print(f"LLM preflight failed: model {model!r} not listed by /models (saw: {sample}{more})", file=sys.stderr)
+    sys.exit(1)
+
+for disable_thinking in (False, True):
+    try:
+        content, reasoning = chat(disable_thinking)
+    except Exception as exc:
+        errors.append(f"disable_thinking={disable_thinking}: {exc}")
+        continue
+    if content and marker in content:
+        suffix = " with enable_thinking=false" if disable_thinking else ""
+        print(f"LLM_PREFLIGHT_OK model={model} content_chars={len(content)}{suffix}")
+        sys.exit(0)
+    if content:
+        errors.append(f"disable_thinking={disable_thinking}: final content missed marker: {content[:120]!r}")
+    elif reasoning:
+        errors.append(f"disable_thinking={disable_thinking}: reasoning was present but final content was empty")
+    else:
+        errors.append(f"disable_thinking={disable_thinking}: final content was empty")
+
+print("LLM preflight failed: /chat/completions did not return usable final content", file=sys.stderr)
+for err in errors:
+    print("  - " + err, file=sys.stderr)
+sys.exit(1)
+PY
+) || rc=$?
+
+    printf '%s\n' "$out"
+    if [ "$rc" -eq 0 ] && echo "$out" | grep -q "enable_thinking=false"; then
+        export OBOL_LLM_DISABLE_THINKING=true
+    fi
+    return "$rc"
+}
+
+llm_disable_thinking_payload_suffix() {
+    if [ "${OBOL_LLM_DISABLE_THINKING:-false}" = "true" ]; then
+        printf ',"chat_template_kwargs":{"enable_thinking":false}'
+    fi
+}
+
 # Repoint a stack at a QA LLM via the canonical `obol model` CLI.
 #
 # Activated when OBOL_LLM_ENDPOINT is set (for example,
@@ -577,6 +729,10 @@ bootstrap_flow_workspace() {
 # helmfile rollout at the end):
 #   1. obol model setup custom --endpoint … --model … --no-sync
 #      (validates the endpoint, patches LiteLLM, hot-adds the model.)
+#      If the LLM preflight proved the endpoint needs enable_thinking=false,
+#      the route stores that provider-specific body at LiteLLM so agent calls
+#      inherit it too; callers like Hermes do not preserve arbitrary request
+#      extension fields.
 #   2. obol model prefer <model> --no-sync
 #      (configured LiteLLM order is the primary-model contract.)
 #   3. obol model sync
@@ -594,6 +750,9 @@ route_llm_via_obol_cli() {
         if [ -n "${OBOL_LLM_API_KEY:-}" ]; then
             args+=(--api-key "$OBOL_LLM_API_KEY")
         fi
+        if [ "${OBOL_LLM_DISABLE_THINKING:-false}" = "true" ]; then
+            args+=(--disable-thinking)
+        fi
         $runner "${args[@]}" || return 1
         $runner model prefer "$model" --no-sync || return 1
 
@@ -888,6 +1047,14 @@ paid_inference_content_invalid() {
     grep -qiE "thinking process|analy[sz]e the (user )?(input|request)|chain[- ]of[- ]thought|step[- ]by[- ]step|\\*\\*(Services|Tools|Skills|Functionality)\\*\\*|^[[:space:]]*[1-9]\\..*\\*\\*(Hermes|Skills|Terminal|Todo|Vision)"
 }
 
+paid_inference_pending_error() {
+    grep -qiE "Payment verification failed|ERROR=503|ServiceUnavailableError"
+}
+
+paid_inference_transient_error() {
+    grep -qiE "ERROR=524|524: A timeout occurred|TimeoutError|timed out|context canceled|deadline exceeded|upstream request timeout"
+}
+
 assert_obol_kubeconfig() {
     local expected actual
 
diff --git a/flows/release-smoke.sh b/flows/release-smoke.sh
index 2c3d34d2..148e0d67 100755
--- a/flows/release-smoke.sh
+++ b/flows/release-smoke.sh
@@ -155,6 +155,18 @@ release-smoke: OBOL_LLM_ENDPOINT must be set when RELEASE_SMOKE_INCLUDE_OBOL=tru
     export OBOL_LLM_MODEL=qwen36-deep      # 27B-class default; or whatever the endpoint serves
 
   See .claude/skills/obol-stack-dev/references/qa-model-envs.md.
+EOF
+        exit 2
+    fi
+
+    if ! preflight_openai_llm_endpoint; then
+        cat >&2 <<EOF
+release-smoke: OBOL_LLM_ENDPOINT is set but did not pass the OpenAI-compatible
+               chat preflight for model ${OBOL_LLM_MODEL:-qwen36-deep}.
+
+  The OBOL flows depend on the endpoint returning final assistant content, not
+  only reasoning metadata or an empty response. Fix OBOL_LLM_ENDPOINT /
+  OBOL_LLM_MODEL before spending time on the cluster flows.
 EOF
         exit 2
     fi
diff --git a/internal/embed/embed_image_pin_test.go b/internal/embed/embed_image_pin_test.go
index 1517bf76..051b8c0a 100644
--- a/internal/embed/embed_image_pin_test.go
+++ b/internal/embed/embed_image_pin_test.go
@@ -268,3 +268,28 @@ func TestEmbeddedImages_CloudflaredHelmTagIsDigestPinned(t *testing.T) {
 			strings.TrimSpace(tagLine))
 	}
 }
+
+func TestEmbeddedLiteLLMConfigUsesWritableRuntimeCopy(t *testing.T) {
+	data, err := ReadInfrastructureFile("base/templates/llm.yaml")
+	if err != nil {
+		t.Fatalf("read llm.yaml: %v", err)
+	}
+	text := string(data)
+
+	if strings.Contains(text, "mountPath: /etc/litellm/config.yaml") {
+		t.Fatalf("LiteLLM still mounts the ConfigMap directly at /etc/litellm/config.yaml; /model/new must write to a writable runtime copy")
+	}
+
+	for _, want := range []string{
+		"initContainers:",
+		"name: prepare-litellm-config",
+		"name: litellm-config-source",
+		"name: litellm-config-work",
+		"mountPath: /etc/litellm",
+		"emptyDir:",
+	} {
+		if !strings.Contains(text, want) {
+			t.Fatalf("LiteLLM writable config pattern missing %q", want)
+		}
+	}
+}
diff --git a/internal/embed/infrastructure/base/templates/llm.yaml b/internal/embed/infrastructure/base/templates/llm.yaml
index ec02c0d8..ae7d89c2 100644
--- a/internal/embed/infrastructure/base/templates/llm.yaml
+++ b/internal/embed/infrastructure/base/templates/llm.yaml
@@ -67,7 +67,10 @@ subsets:
 
 ---
 # LiteLLM configuration: base paid/* route plus provider and purchased models.
-# Models are persisted here so LiteLLM reloads survive pod restarts.
+# This ConfigMap is the Kubernetes source of truth. The pod copies it into a
+# writable emptyDir before LiteLLM starts so /model/new can update the live
+# router and persist to its configured YAML path without trying to write back to
+# a read-only ConfigMap volume.
 apiVersion: v1
 kind: ConfigMap
 metadata:
@@ -187,13 +190,35 @@ spec:
         fsGroup: 65532
         seccompProfile:
           type: RuntimeDefault
+      initContainers:
+        - name: prepare-litellm-config
+          image: ghcr.io/obolnetwork/litellm:sha-9b3e569@sha256:ac453f9cdfa3752efa38998aa5bbf4f9a67e642a68b27a647aaf667c083ddc51
+          imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
+          command:
+            - python3
+            - -c
+          args:
+            - |
+              with open("/config-src/config.yaml", "rb") as src, open("/config/config.yaml", "wb") as dst:
+                  dst.write(src.read())
+          volumeMounts:
+            - name: litellm-config-source
+              mountPath: /config-src
+              readOnly: true
+            - name: litellm-config-work
+              mountPath: /config
       containers:
         - name: litellm
           # Obol fork of LiteLLM with config-only model management API.
           # No Postgres required — /model/new and /model/delete work via
           # in-memory router + config.yaml persistence.
           # Source: https://github.com/ObolNetwork/litellm
-          image: ghcr.io/obolnetwork/litellm:sha-c16b156@sha256:9f112b51ac5a57d73cdd54103fb98d24eabaddd8689a9a285884dca6456dc86e
+          image: ghcr.io/obolnetwork/litellm:sha-9b3e569@sha256:ac453f9cdfa3752efa38998aa5bbf4f9a67e642a68b27a647aaf667c083ddc51
           imagePullPolicy: IfNotPresent
           # PSS Restricted: drop all caps, no privilege escalation, RO rootfs.
           # Python writes are funneled to the emptyDir mounts below.
@@ -228,9 +253,8 @@ spec:
             - name: HF_HOME
               value: /home/litellm/.cache/huggingface
           volumeMounts:
-            - name: litellm-config
-              mountPath: /etc/litellm/config.yaml
-              subPath: config.yaml
+            - name: litellm-config-work
+              mountPath: /etc/litellm
             - name: litellm-tmp
               mountPath: /tmp
             - name: litellm-home
@@ -262,10 +286,10 @@ spec:
           resources:
             requests:
               cpu: 100m
-              memory: 256Mi
+              memory: 512Mi
             limits:
               cpu: "1"
-              memory: 1Gi
+              memory: 2Gi
         - name: x402-buyer
           # Pinned by sha256 digest (multi-arch manifest list, amd64+arm64)
           # so the deployed sidecar is byte-for-byte identical across QA
@@ -324,12 +348,19 @@ spec:
             - name: x402-buyer-state
               mountPath: /state
       volumes:
-        - name: litellm-config
+        - name: litellm-config-source
           configMap:
             name: litellm-config
             items:
               - key: config.yaml
                 path: config.yaml
+        # Runtime copy of litellm-config. ConfigMap volumes are read-only, but
+        # LiteLLM's config-backed /model/new path persists to config.yaml after
+        # updating the live router. Keep the source ConfigMap immutable and give
+        # LiteLLM a pod-local writable working file instead.
+        - name: litellm-config-work
+          emptyDir:
+            sizeLimit: 1Mi
         # Writable /tmp for Python tempfile / multipart uploads. Sized
         # modestly — LiteLLM streams responses rather than buffering them.
         - name: litellm-tmp
diff --git a/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml b/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml
index 5115a7d4..68b71326 100644
--- a/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml
+++ b/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml
@@ -131,6 +131,35 @@ spec:
               1e-9
             )
 
+        # Age of the last successful payment per (offer, chain). Used by
+        # the frontend My Purchases drawer to render "last settlement:
+        # 12s ago" labels without joining against the buyer sidecar or
+        # chasing on-chain receipts.
+        #
+        # The underlying gauge
+        # `obol_x402_verifier_last_payment_success_seconds` stamps
+        # `time.Now().Unix()` (seconds since epoch, UTC) into the same
+        # `(offer_namespace, offer_name, chain, asset_symbol)` label set
+        # used by `obol_x402_verifier_charged_requests_total`, right
+        # next to the counter `.Inc()` in the verifier's success branch.
+        # Until an offer settles at least once, no series exists for it
+        # and the rule's `max(...)` returns no samples — the frontend
+        # then renders "no settlements yet" rather than a misleading
+        # "X seconds ago since 1970". A pod restart clears the gauge
+        # (in-process state) but Prometheus retention keeps the last
+        # sample, so the rule keeps producing a sensible age across
+        # rollouts.
+        #
+        # `max by (...)` collapses the asset_symbol dimension on the
+        # input gauge: a single (offer, chain) almost always pins one
+        # asset, but `max` is safe even if a route re-pins from USDC
+        # to OBOL — the most recent settlement still wins.
+        - record: x402:last_payment_age_seconds:by_offer_chain
+          expr: |
+            time() - max by (offer_namespace, offer_name, chain) (
+              obol_x402_verifier_last_payment_success_seconds
+            )
+
     - name: x402.alerting
       rules:
         # Payment-failure ratio crossed 10% over the last hour for a paid
diff --git a/internal/embed/infrastructure/cloudflared/values.yaml b/internal/embed/infrastructure/cloudflared/values.yaml
index 8ff1670c..5f0553d9 100644
--- a/internal/embed/infrastructure/cloudflared/values.yaml
+++ b/internal/embed/infrastructure/cloudflared/values.yaml
@@ -5,7 +5,7 @@ transport:
 
 image:
   repository: cloudflare/cloudflared
-  tag: "2026.3.0@sha256:6b599ca3e974349ead3286d178da61d291961182ec3fe9c505e1dd02c8ac31b0"
+  tag: "2026.5.0@sha256:59bab8d3aceec09bf6bdb07d6beca0225ca5cd7ab79436a87ea97978fe1dc4f9"
 
 metrics:
   address: "0.0.0.0:2000"
diff --git a/internal/embed/infrastructure/helmfile.yaml b/internal/embed/infrastructure/helmfile.yaml
index 65008dcc..631c792b 100644
--- a/internal/embed/infrastructure/helmfile.yaml
+++ b/internal/embed/infrastructure/helmfile.yaml
@@ -17,7 +17,7 @@ repositories:
 # Single source of truth: change this to switch networks
 values:
   - network: mainnet
-  - gatewayApiVersion: v1.4.1
+  - gatewayApiVersion: v1.5.1
   # Default the cloudflared release to enabled. `obol stack up` overrides via
   # `--state-values-set cloudflared.enabled=false` when it detects a running
   # quick tunnel that should be preserved across the sync.
diff --git a/internal/embed/skills/buy-x402/scripts/buy.py b/internal/embed/skills/buy-x402/scripts/buy.py
index 1dbe8b1b..0b5e3765 100644
--- a/internal/embed/skills/buy-x402/scripts/buy.py
+++ b/internal/embed/skills/buy-x402/scripts/buy.py
@@ -800,7 +800,11 @@ def _presign_auths(signer_address, pay_to, price, chain, usdc_addr, count, payme
     print(f"Pre-signing {count} payment authorizations ...")
     for i in range(count):
         if transfer_method == "permit2":
-            valid_after = str(max(0, int(time.time()) - 600))
+            # Permit2 validates against chain time, not the buyer host clock.
+            # Forked/local chains only advance block timestamps when a tx is
+            # mined, so wall-clock based "now - slack" can still be in the
+            # future and the facilitator rejects with PaymentTooEarly().
+            valid_after = "0"
             expiry_window = max(int(payment.get("maxTimeoutSeconds", 60)), 300)
             deadline = str(int(time.time()) + expiry_window)
             permit2_nonce = str(int.from_bytes(secrets.token_bytes(32), "big"))
diff --git a/internal/model/model.go b/internal/model/model.go
index 1b8042ea..873d8e6f 100644
--- a/internal/model/model.go
+++ b/internal/model/model.go
@@ -78,6 +78,10 @@ type LiteLLMParams struct {
 	Model   string `yaml:"model"`
 	APIBase string `yaml:"api_base,omitempty"`
 	APIKey  string `yaml:"api_key,omitempty"`
+	// ExtraBody is merged by LiteLLM into every upstream request for this
+	// model. It is intentionally opt-in because many OpenAI-compatible servers
+	// reject unknown provider-specific fields.
+	ExtraBody map[string]any `yaml:"extra_body,omitempty"`
 	// CacheControlInjectionPoints is a LiteLLM directive that tells the proxy
 	// to attach Anthropic-style `cache_control: {type: ephemeral}` markers to
 	// specific messages on every request to this model. We pin the system
@@ -85,6 +89,24 @@ type LiteLLMParams struct {
 	CacheControlInjectionPoints []CacheControlInjection `yaml:"cache_control_injection_points,omitempty"`
 }
 
+// CustomEndpointOptions controls optional per-request behavior for custom
+// OpenAI-compatible endpoints.
+type CustomEndpointOptions struct {
+	DisableThinking bool
+}
+
+func (o CustomEndpointOptions) extraBody() map[string]any {
+	if !o.DisableThinking {
+		return nil
+	}
+
+	return map[string]any{
+		"chat_template_kwargs": map[string]any{
+			"enable_thinking": false,
+		},
+	}
+}
+
 // CacheControlInjection is one entry in LiteLLM's
 // cache_control_injection_points list. Either Role or Index narrows which
 // message in the request gets the cache_control marker.
@@ -522,13 +544,18 @@ func hotAddModels(cfg *config.Config, u *ui.UI, entries []ModelEntry) error {
 	kubeconfigPath := filepath.Join(cfg.ConfigDir, "kubeconfig.yaml")
 
 	for _, entry := range entries {
+		params := map[string]any{
+			"model":    entry.LiteLLMParams.Model,
+			"api_base": entry.LiteLLMParams.APIBase,
+			"api_key":  entry.LiteLLMParams.APIKey,
+		}
+		if len(entry.LiteLLMParams.ExtraBody) > 0 {
+			params["extra_body"] = entry.LiteLLMParams.ExtraBody
+		}
+
 		body := map[string]any{
-			"model_name": entry.ModelName,
-			"litellm_params": map[string]any{
-				"model":    entry.LiteLLMParams.Model,
-				"api_base": entry.LiteLLMParams.APIBase,
-				"api_key":  entry.LiteLLMParams.APIKey,
-			},
+			"model_name":     entry.ModelName,
+			"litellm_params": params,
 		}
 		bodyJSON, err := json.Marshal(body)
 		if err != nil {
@@ -809,6 +836,10 @@ func RemoveModel(cfg *config.Config, u *ui.UI, modelName string) error {
 // model" behavior an operator running `obol model setup custom` wants when
 // they re-run the command.
 func AddCustomEndpoint(cfg *config.Config, u *ui.UI, endpoint, modelName, apiKey string) error {
+	return AddCustomEndpointWithOptions(cfg, u, endpoint, modelName, apiKey, CustomEndpointOptions{})
+}
+
+func AddCustomEndpointWithOptions(cfg *config.Config, u *ui.UI, endpoint, modelName, apiKey string, options CustomEndpointOptions) error {
 	kubectlBinary := filepath.Join(cfg.BinDir, "kubectl")
 	kubeconfigPath := filepath.Join(cfg.ConfigDir, "kubeconfig.yaml")
 
@@ -824,7 +855,7 @@ func AddCustomEndpoint(cfg *config.Config, u *ui.UI, endpoint, modelName, apiKey
 	validationEndpoint = strings.Replace(validationEndpoint, "host.k3d.internal", "localhost", 1)
 
 	validationEndpoint = strings.Replace(validationEndpoint, "host.docker.internal", "localhost", 1)
-	if err := ValidateCustomEndpoint(validationEndpoint, modelName, apiKey); err != nil {
+	if err := ValidateCustomEndpointWithOptions(validationEndpoint, modelName, apiKey, options); err != nil {
 		return fmt.Errorf("endpoint validation failed: %w", err)
 	}
 
@@ -836,7 +867,7 @@ func AddCustomEndpoint(cfg *config.Config, u *ui.UI, endpoint, modelName, apiKey
 		u.Infof("Cluster endpoint: %s (translated from %s)", clusterEndpoint, endpoint)
 	}
 
-	entry := buildCustomEndpointEntry(modelName, clusterEndpoint, apiKey)
+	entry := buildCustomEndpointEntryWithOptions(modelName, clusterEndpoint, apiKey, options)
 
 	u.Infof("Adding custom endpoint (model: %s) to LiteLLM config", modelName)
 
@@ -855,11 +886,16 @@ func AddCustomEndpoint(cfg *config.Config, u *ui.UI, endpoint, modelName, apiKey
 	return nil
 }
 
-// ValidateCustomEndpoint validates that a custom OpenAI-compatible endpoint works.
-// It runs a 2-step validation: reachability check, then inference probe.
-// The inference probe is the definitive test — some servers (e.g., mlx-lm) don't
-// list the loaded model in /models but accept it for inference.
 func ValidateCustomEndpoint(endpoint, modelName, apiKey string) error {
+	return ValidateCustomEndpointWithOptions(endpoint, modelName, apiKey, CustomEndpointOptions{})
+}
+
+// ValidateCustomEndpointWithOptions validates that a custom OpenAI-compatible
+// endpoint works. It runs a 2-step validation: reachability check, then
+// inference probe. The inference probe is the definitive test — some servers
+// (e.g., mlx-lm) don't list the loaded model in /models but accept it for
+// inference.
+func ValidateCustomEndpointWithOptions(endpoint, modelName, apiKey string, options CustomEndpointOptions) error {
 	client := &http.Client{Timeout: 60 * time.Second}
 
 	authHeader := ""
@@ -899,11 +935,15 @@ func ValidateCustomEndpoint(endpoint, modelName, apiKey string) error {
 	}
 
 	// Step 2: Inference probe — the definitive test
-	probePayload, _ := json.Marshal(map[string]any{ //nolint:errchkjson // map[string]any is safe, keys/values are controlled
+	probe := map[string]any{
 		"model":      modelName,
 		"messages":   []map[string]string{{"role": "user", "content": "ping"}},
 		"max_tokens": 1,
-	})
+	}
+	for k, v := range options.extraBody() {
+		probe[k] = v
+	}
+	probePayload, _ := json.Marshal(probe) //nolint:errchkjson // map[string]any is safe, keys/values are controlled
 	completionsURL := strings.TrimRight(endpoint, "/") + "/chat/completions"
 
 	probeReq, err := http.NewRequest(http.MethodPost, completionsURL, bytes.NewReader(probePayload))
@@ -1313,12 +1353,17 @@ func buildModelEntries(provider string, models []string) []ModelEntry {
 // standalone helper so the entry shape is unit-testable without going
 // through the full kubectl-driven AddCustomEndpoint path.
 func buildCustomEndpointEntry(modelName, clusterEndpoint, apiKey string) ModelEntry {
+	return buildCustomEndpointEntryWithOptions(modelName, clusterEndpoint, apiKey, CustomEndpointOptions{})
+}
+
+func buildCustomEndpointEntryWithOptions(modelName, clusterEndpoint, apiKey string, options CustomEndpointOptions) ModelEntry {
 	entry := ModelEntry{
 		ModelName: modelName,
 		LiteLLMParams: LiteLLMParams{
-			Model:   "openai/" + modelName,
-			APIBase: clusterEndpoint,
-			APIKey:  apiKey,
+			Model:     "openai/" + modelName,
+			APIBase:   clusterEndpoint,
+			APIKey:    apiKey,
+			ExtraBody: options.extraBody(),
 		},
 	}
 	if apiKey == "" {
diff --git a/internal/model/model_test.go b/internal/model/model_test.go
index 405989ca..ef0556e4 100644
--- a/internal/model/model_test.go
+++ b/internal/model/model_test.go
@@ -153,6 +153,9 @@ func TestBuildCustomEndpointEntry(t *testing.T) {
 		if entry.LiteLLMParams.APIKey != "secret-key" {
 			t.Errorf("api_key = %q, want secret-key", entry.LiteLLMParams.APIKey)
 		}
+		if entry.LiteLLMParams.ExtraBody != nil {
+			t.Errorf("extra_body = %+v, want nil by default", entry.LiteLLMParams.ExtraBody)
+		}
 	})
 
 	t.Run("empty api_key falls back to none", func(t *testing.T) {
@@ -174,6 +177,17 @@ func TestBuildCustomEndpointEntry(t *testing.T) {
 			t.Errorf("ModelName = %q, want qwen3:9b-mlx unchanged", entry.ModelName)
 		}
 	})
+
+	t.Run("disable thinking stores LiteLLM extra_body", func(t *testing.T) {
+		entry := buildCustomEndpointEntryWithOptions("qwen36", "http://host:8000/v1", "", CustomEndpointOptions{DisableThinking: true})
+		kwargs, ok := entry.LiteLLMParams.ExtraBody["chat_template_kwargs"].(map[string]any)
+		if !ok {
+			t.Fatalf("extra_body missing chat_template_kwargs: %+v", entry.LiteLLMParams.ExtraBody)
+		}
+		if got, ok := kwargs["enable_thinking"].(bool); !ok || got {
+			t.Fatalf("enable_thinking = %#v, want false", kwargs["enable_thinking"])
+		}
+	})
 }
 
 func TestExpandWildcard(t *testing.T) {
@@ -497,6 +511,39 @@ func TestValidateCustomEndpoint(t *testing.T) {
 		}
 	})
 
+	t.Run("disable thinking is sent in inference probe", func(t *testing.T) {
+		var probe map[string]any
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			w.Header().Set("Content-Type", "application/json")
+
+			switch r.URL.Path {
+			case "/v1/models":
+				fmt.Fprint(w, `{"data":[{"id":"test-model"}]}`)
+			case "/v1/chat/completions":
+				if err := json.NewDecoder(r.Body).Decode(&probe); err != nil {
+					t.Fatalf("decode probe: %v", err)
+				}
+				fmt.Fprint(w, `{"choices":[{"message":{"content":"pong"}}]}`)
+			default:
+				http.NotFound(w, r)
+			}
+		}))
+		defer srv.Close()
+
+		err := ValidateCustomEndpointWithOptions(srv.URL+"/v1", "test-model", "", CustomEndpointOptions{DisableThinking: true})
+		if err != nil {
+			t.Fatalf("unexpected error: %v", err)
+		}
+
+		kwargs, ok := probe["chat_template_kwargs"].(map[string]any)
+		if !ok {
+			t.Fatalf("probe missing chat_template_kwargs: %+v", probe)
+		}
+		if got, ok := kwargs["enable_thinking"].(bool); !ok || got {
+			t.Fatalf("enable_thinking = %#v, want false", kwargs["enable_thinking"])
+		}
+	})
+
 	t.Run("inference probe returns empty choices", func(t *testing.T) {
 		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 			w.Header().Set("Content-Type", "application/json")
diff --git a/internal/schemas/service-catalog.schema.json b/internal/schemas/service-catalog.schema.json
index 58dbc7c4..fb02742a 100644
--- a/internal/schemas/service-catalog.schema.json
+++ b/internal/schemas/service-catalog.schema.json
@@ -78,8 +78,7 @@
         "payTo",
         "network",
         "description",
-        "isDemo",
-        "available"
+        "isDemo"
       ],
       "properties": {
         "name": {
@@ -154,14 +153,10 @@
         "registrationPending": {
           "type": "boolean"
         },
-        "available": {
-          "type": "boolean",
-          "description": "False during a drain window. Catalog consumers should treat unset as true for backwards compatibility."
-        },
         "drainEndsAt": {
           "type": "string",
           "format": "date-time",
-          "description": "RFC3339 timestamp at which the offer's HTTPRoute will be torn down. Set only when available=false."
+          "description": "RFC3339 timestamp at which the offer's HTTPRoute will be torn down. Set only when the offer is draining. Catalog consumers should detect drain via the presence of this field."
         }
       }
     }
diff --git a/internal/schemas/service_catalog.go b/internal/schemas/service_catalog.go
index eb8bba78..6f839135 100644
--- a/internal/schemas/service_catalog.go
+++ b/internal/schemas/service_catalog.go
@@ -40,19 +40,12 @@ type ServiceCatalogEntry struct {
 	// ERC-8004 discovery via the chain still resolves to the prior state.
 	RegistrationPending bool `json:"registrationPending,omitempty"`
 
-	// Available is false when the offer is in its drain window. Buyers
-	// can still complete in-flight payments until DrainEndsAt, but
-	// discovery surfaces should advertise the wind-down so external
-	// observers can react. When false, DrainEndsAt is set to the RFC3339
-	// timestamp at which the HTTPRoute will be torn down. Catalog
-	// consumers should treat unset Available (the default-true field) as
-	// "available" for backwards compatibility — the field is only written
-	// false during drain.
-	Available bool `json:"available"`
-
 	// DrainEndsAt is the RFC3339 timestamp at which the offer's
-	// HTTPRoute will be removed. Set only when Available=false. Buyers
-	// SHOULD migrate to alternative providers before this time.
+	// HTTPRoute will be removed. Set ONLY when the offer is draining.
+	// Consumers detect a drain window with `if (entry.drainEndsAt)`:
+	// active offers serialize without this field, so the schema stays
+	// purely additive vs. pre-drain catalogs. Buyers SHOULD migrate to
+	// alternative providers before this time.
 	DrainEndsAt string `json:"drainEndsAt,omitempty"`
 }
 
diff --git a/internal/serviceoffercontroller/render.go b/internal/serviceoffercontroller/render.go
index 0f65fa89..b892daa2 100644
--- a/internal/serviceoffercontroller/render.go
+++ b/internal/serviceoffercontroller/render.go
@@ -753,8 +753,6 @@ func serviceDefWithDrain(offer *monetizeapi.ServiceOffer, svc erc8004.ServiceDef
 	if offer == nil || !offer.IsDraining() || offer.DrainExpired(time.Now()) {
 		return svc
 	}
-	available := false
-	svc.Available = &available
 	svc.DrainEndsAt = offer.DrainEndsAt().UTC().Format(time.RFC3339)
 	return svc
 }
@@ -795,9 +793,8 @@ func buildSkillCatalogMarkdown(offers []*monetizeapi.ServiceOffer, baseURL strin
 		}
 		// Drained offers (post-grace-period) have no live route — drop
 		// them from the catalog entirely. Draining offers (pre-expiry)
-		// stay in the catalog with available=false + drainEndsAt set so
-		// buyers can see the wind-down via discovery before the route
-		// disappears.
+		// stay in the catalog with draining status + drainEndsAt so buyers
+		// can see the wind-down via discovery before the route disappears.
 		if offer.DrainExpired(now) {
 			continue
 		}
@@ -827,16 +824,16 @@ func buildSkillCatalogMarkdown(offers []*monetizeapi.ServiceOffer, baseURL strin
 	}
 
 	lines = append(lines, "## Services", "")
-	lines = append(lines, "| Service | Type | Model | Price | Available | Endpoint |")
-	lines = append(lines, "|---------|------|-------|-------|-----------|----------|")
+	lines = append(lines, "| Service | Type | Model | Price | Status | Endpoint |")
+	lines = append(lines, "|---------|------|-------|-------|--------|----------|")
 	for _, offer := range ready {
 		modelName := offer.Spec.Model.Name
 		if modelName == "" {
 			modelName = "—"
 		}
-		availability := "yes"
+		status := "available"
 		if offer.IsDraining() {
-			availability = fmt.Sprintf("draining (ends %s)", offer.DrainEndsAt().UTC().Format(time.RFC3339))
+			status = fmt.Sprintf("draining · ends `%s`", offer.DrainEndsAt().UTC().Format(time.RFC3339))
 		}
 		lines = append(lines, fmt.Sprintf(
 			"| [%s](#%s) | %s | %s | %s | %s | `%s%s` |",
@@ -845,7 +842,7 @@ func buildSkillCatalogMarkdown(offers []*monetizeapi.ServiceOffer, baseURL strin
 			fallbackOfferType(offer),
 			modelName,
 			describeOfferPrice(offer),
-			availability,
+			status,
 			baseURL,
 			offer.EffectivePath(),
 		))
@@ -863,10 +860,7 @@ func buildSkillCatalogMarkdown(offers []*monetizeapi.ServiceOffer, baseURL strin
 		lines = append(lines, fmt.Sprintf("- **Pay To**: `%s`", firstNonEmpty(offer.Spec.Payment.PayTo, "—")))
 		lines = append(lines, fmt.Sprintf("- **Network**: %s", firstNonEmpty(offer.Spec.Payment.Network, "—")))
 		if offer.IsDraining() {
-			lines = append(lines, "- **Available**: false (draining)")
 			lines = append(lines, fmt.Sprintf("- **Drain ends at**: %s", offer.DrainEndsAt().UTC().Format(time.RFC3339)))
-		} else {
-			lines = append(lines, "- **Available**: true")
 		}
 		description := offer.Spec.Registration.Description
 		if description == "" {
@@ -979,7 +973,6 @@ func buildServiceCatalogJSON(offers []*monetizeapi.ServiceOffer, baseURL string)
 			modelName = offer.Status.AgentResolution.Model
 		}
 
-		available := !offer.IsDraining()
 		drainEndsAt := ""
 		if offer.IsDraining() {
 			drainEndsAt = offer.DrainEndsAt().UTC().Format(time.RFC3339)
@@ -997,7 +990,6 @@ func buildServiceCatalogJSON(offers []*monetizeapi.ServiceOffer, baseURL string)
 			Description:         desc,
 			IsDemo:              offer.Namespace == "demo",
 			RegistrationPending: offerAwaitingRegistration(offer),
-			Available:           available,
 			DrainEndsAt:         drainEndsAt,
 		}
 
diff --git a/internal/serviceoffercontroller/render_test.go b/internal/serviceoffercontroller/render_test.go
index 286e1763..a68d7061 100644
--- a/internal/serviceoffercontroller/render_test.go
+++ b/internal/serviceoffercontroller/render_test.go
@@ -440,11 +440,8 @@ func TestBuildRegistrationServices_IncludesDrainMetadata(t *testing.T) {
 		t.Fatalf("services = %+v, want web + A2A", services)
 	}
 	for _, svc := range services {
-		if svc.Available == nil {
-			t.Fatalf("%s missing available=false drain marker: %+v", svc.Name, svc)
-		}
-		if *svc.Available {
-			t.Fatalf("%s available = true, want false during drain: %+v", svc.Name, svc)
+		if svc.Available != nil {
+			t.Fatalf("%s.Available = %v, want nil (drain is signalled via DrainEndsAt only): %+v", svc.Name, *svc.Available, svc)
 		}
 		if _, err := time.Parse(time.RFC3339, svc.DrainEndsAt); err != nil {
 			t.Fatalf("%s drainEndsAt = %q is not RFC3339: %v", svc.Name, svc.DrainEndsAt, err)
@@ -474,8 +471,8 @@ func TestBuildIdentityRegistrationServices_IncludesDrainMetadata(t *testing.T) {
 		t.Fatalf("services = %+v, want web + MCP", services)
 	}
 	for _, svc := range services {
-		if svc.Available == nil || *svc.Available {
-			t.Fatalf("%s missing available=false drain marker: %+v", svc.Name, svc)
+		if svc.Available != nil {
+			t.Fatalf("%s.Available = %v, want nil (drain is signalled via DrainEndsAt only): %+v", svc.Name, *svc.Available, svc)
 		}
 		if _, err := time.Parse(time.RFC3339, svc.DrainEndsAt); err != nil {
 			t.Fatalf("%s drainEndsAt = %q is not RFC3339: %v", svc.Name, svc.DrainEndsAt, err)
@@ -648,6 +645,67 @@ func TestBuildSkillCatalogMarkdown(t *testing.T) {
 	}
 }
 
+// TestBuildSkillCatalogMarkdown_DrainAdditiveDetail locks in the
+// pure-additive markdown surface: active offers must NOT emit a
+// `- **Available**:` detail bullet (that wire was removed when drain
+// landed). Draining offers may have a `- **Drain ends at**:` bullet
+// but never a separate Available bullet, because consumers detect
+// drain solely via the timestamp's presence.
+func TestBuildSkillCatalogMarkdown_DrainAdditiveDetail(t *testing.T) {
+	readyCond := []monetizeapi.Condition{{Type: "Ready", Status: "True"}}
+	activeOffer := &monetizeapi.ServiceOffer{
+		ObjectMeta: metav1.ObjectMeta{Name: "alpha", Namespace: "llm"},
+		Spec: monetizeapi.ServiceOfferSpec{
+			Type: "http",
+			Payment: monetizeapi.ServiceOfferPayment{
+				Network: "base",
+				PayTo:   "0x1111111111111111111111111111111111111111",
+				Price:   monetizeapi.ServiceOfferPriceTable{PerRequest: "0.001"},
+			},
+		},
+		Status: monetizeapi.ServiceOfferStatus{Conditions: readyCond},
+	}
+
+	drainAt := metav1.NewTime(time.Now())
+	grace := metav1.Duration{Duration: time.Hour}
+	drainingOffer := &monetizeapi.ServiceOffer{
+		ObjectMeta: metav1.ObjectMeta{Name: "bravo", Namespace: "llm"},
+		Spec: monetizeapi.ServiceOfferSpec{
+			Type:             "http",
+			DrainAt:          &drainAt,
+			DrainGracePeriod: &grace,
+			Payment: monetizeapi.ServiceOfferPayment{
+				Network: "base",
+				PayTo:   "0x2222222222222222222222222222222222222222",
+				Price:   monetizeapi.ServiceOfferPriceTable{PerRequest: "0.001"},
+			},
+		},
+		Status: monetizeapi.ServiceOfferStatus{Conditions: readyCond},
+	}
+
+	content := buildSkillCatalogMarkdown(
+		[]*monetizeapi.ServiceOffer{activeOffer, drainingOffer},
+		"https://example.com",
+	)
+
+	if strings.Contains(content, "- **Available**:") {
+		t.Errorf("markdown contains `- **Available**:` bullet; drain wire is additive (drainEndsAt only):\n%s", content)
+	}
+	if !strings.Contains(content, "| [alpha](#alpha) | http | — | 0.001 USDC/request | available |") {
+		t.Errorf("active offer status missing `available` table signal:\n%s", content)
+	}
+	if !strings.Contains(content, "- **Drain ends at**:") {
+		t.Errorf("draining offer missing `- **Drain ends at**:` bullet:\n%s", content)
+	}
+	// Table header should expose Status, not the legacy Available column.
+	if strings.Contains(content, "| Available |") {
+		t.Errorf("markdown table header still has `Available` column; expected `Status`:\n%s", content)
+	}
+	if !strings.Contains(content, "| Status |") {
+		t.Errorf("markdown table header missing `Status` column:\n%s", content)
+	}
+}
+
 func TestBuildSkillCatalogHTTPRoute(t *testing.T) {
 	route := buildSkillCatalogHTTPRoute()
 	if route.GetName() != skillCatalogRouteName {
@@ -856,16 +914,28 @@ func TestBuildServiceCatalogJSON_ExcludesNonReady(t *testing.T) {
 	if services[0].Name != "ready-svc" {
 		t.Errorf("got %q, want ready-svc — filter pipeline leaked another offer", services[0].Name)
 	}
-	if !services[0].Available {
-		t.Errorf("ready-svc.available = false, want true (offer is not draining)")
+
+	// Pure-additive wire schema: active offers must serialize without
+	// `available` (no field at all). Consumers detect drain via the
+	// presence of `drainEndsAt`, not via a legacy `available` boolean.
+	var raw []map[string]any
+	if err := json.Unmarshal([]byte(jsonStr), &raw); err != nil {
+		t.Fatalf("invalid raw JSON: %v\n%s", err, jsonStr)
+	}
+	if _, ok := raw[0]["available"]; ok {
+		t.Errorf("ready-svc JSON contains `available` key; drain wire schema must be additive (drainEndsAt only)")
+	}
+	if _, ok := raw[0]["drainEndsAt"]; ok {
+		t.Errorf("ready-svc JSON contains `drainEndsAt`; should only appear on draining offers")
 	}
 }
 
 // TestBuildServiceCatalogJSON_DrainLifecycle covers the three drain
-// states explicitly: pre-drain (available=true, no drainEndsAt), mid-drain
-// (in catalog, available=false, drainEndsAt populated), and drain-expired
-// (filtered out of the catalog because the controller has torn down the
-// underlying route).
+// states explicitly under the pure-additive wire schema: pre-drain
+// (no `available` key, no `drainEndsAt`), mid-drain (no `available`
+// key, only `drainEndsAt` populated), and drain-expired (filtered out
+// of the catalog because the controller has torn down the underlying
+// route). Consumers detect drain with `if (entry.drainEndsAt)`.
 func TestBuildServiceCatalogJSON_DrainLifecycle(t *testing.T) {
 	readyCond := []monetizeapi.Condition{{Type: "Ready", Status: "True"}}
 	mkOffer := func(name string) monetizeapi.ServiceOffer {
@@ -901,39 +971,41 @@ func TestBuildServiceCatalogJSON_DrainLifecycle(t *testing.T) {
 	exp.Spec.DrainGracePeriod = &expGrace
 
 	jsonStr := buildServiceCatalogJSON([]*monetizeapi.ServiceOffer{&pre, &mid, &exp}, "https://example.com")
-	var services []schemas.ServiceCatalogEntry
-	if err := json.Unmarshal([]byte(jsonStr), &services); err != nil {
+	var raw []map[string]any
+	if err := json.Unmarshal([]byte(jsonStr), &raw); err != nil {
 		t.Fatalf("invalid JSON: %v\n%s", err, jsonStr)
 	}
-	if len(services) != 2 {
-		t.Fatalf("expected 2 services (pre + mid; expired filtered out), got %d: %+v", len(services), services)
+	if len(raw) != 2 {
+		t.Fatalf("expected 2 services (pre + mid; expired filtered out), got %d: %+v", len(raw), raw)
 	}
 
-	byName := map[string]schemas.ServiceCatalogEntry{}
-	for _, s := range services {
-		byName[s.Name] = s
+	byName := map[string]map[string]any{}
+	for _, s := range raw {
+		name, _ := s["name"].(string)
+		byName[name] = s
 	}
-	if pre, ok := byName["pre"]; !ok {
+	if entry, ok := byName["pre"]; !ok {
 		t.Fatal("pre-drain offer missing from catalog")
 	} else {
-		if !pre.Available {
-			t.Errorf("pre.available = false, want true")
+		if _, has := entry["available"]; has {
+			t.Errorf("pre entry contains `available` key; drain wire schema must be additive")
 		}
-		if pre.DrainEndsAt != "" {
-			t.Errorf("pre.drainEndsAt = %q, want empty", pre.DrainEndsAt)
+		if _, has := entry["drainEndsAt"]; has {
+			t.Errorf("pre entry contains `drainEndsAt` key; should only appear on draining offers")
 		}
 	}
-	if mid, ok := byName["mid"]; !ok {
+	if entry, ok := byName["mid"]; !ok {
 		t.Fatal("mid-drain offer missing from catalog")
 	} else {
-		if mid.Available {
-			t.Errorf("mid.available = true, want false (offer is draining)")
+		if _, has := entry["available"]; has {
+			t.Errorf("mid entry contains `available` key; drain wire schema must be additive (drainEndsAt only)")
 		}
-		if mid.DrainEndsAt == "" {
-			t.Errorf("mid.drainEndsAt is empty, want RFC3339 timestamp")
+		drainEndsAt, has := entry["drainEndsAt"].(string)
+		if !has || drainEndsAt == "" {
+			t.Errorf("mid entry missing `drainEndsAt`; should be populated for draining offers")
 		}
-		if _, err := time.Parse(time.RFC3339, mid.DrainEndsAt); err != nil {
-			t.Errorf("mid.drainEndsAt = %q is not RFC3339: %v", mid.DrainEndsAt, err)
+		if _, err := time.Parse(time.RFC3339, drainEndsAt); err != nil {
+			t.Errorf("mid.drainEndsAt = %q is not RFC3339: %v", drainEndsAt, err)
 		}
 	}
 	if _, ok := byName["expired"]; ok {
diff --git a/internal/stack/stack.go b/internal/stack/stack.go
index 9ff1a489..22662f36 100644
--- a/internal/stack/stack.go
+++ b/internal/stack/stack.go
@@ -11,6 +11,7 @@ import (
 	"os"
 	"os/exec"
 	"path/filepath"
+	"reflect"
 	"runtime"
 	"strconv"
 	"strings"
@@ -482,7 +483,7 @@ func syncDefaults(cfg *config.Config, u *ui.UI, kubeconfigPath string, dataDir s
 		u.Dim("  Tear down only if you really want to: obol stack down")
 
 		if previousLiteLLMConfig != "" {
-			if restoreErr := restoreLiteLLMConfig(cfg, kubeconfigPath, previousLiteLLMConfig); restoreErr != nil {
+			if _, restoreErr := restoreLiteLLMConfig(cfg, kubeconfigPath, previousLiteLLMConfig); restoreErr != nil {
 				u.Warnf("Failed to restore LiteLLM config after Helmfile error: %v", restoreErr)
 			}
 		}
@@ -492,11 +493,25 @@ func syncDefaults(cfg *config.Config, u *ui.UI, kubeconfigPath string, dataDir s
 
 	u.Success("Default infrastructure deployed")
 
+	restoredLiteLLMConfig := false
 	if previousLiteLLMConfig != "" {
-		if err := restoreLiteLLMConfig(cfg, kubeconfigPath, previousLiteLLMConfig); err != nil {
+		var err error
+		restoredLiteLLMConfig, err = restoreLiteLLMConfig(cfg, kubeconfigPath, previousLiteLLMConfig)
+		if err != nil {
 			u.Warnf("Failed to restore LiteLLM config after base migration: %v", err)
 		}
 	}
+	if restoredLiteLLMConfig {
+		// Helmfile may have restarted the pod while the chart-default
+		// ConfigMap was in place. Restart once after restoring user models so
+		// LiteLLM's writable runtime copy is seeded from the restored source of
+		// truth before autoConfigureLLM decides no further model change is
+		// needed.
+		if err := model.RestartLiteLLM(cfg, u, "restored LiteLLM config"); err != nil {
+			u.Warnf("LiteLLM restart after config restore failed: %v", err)
+			u.Dim("  The ConfigMap is restored; run `obol model setup` or `obol stack up` again if model routing looks stale.")
+		}
+	}
 
 	// Populate the x402-verifier CA bundle from the host so TLS verification of
 	// the facilitator works without needing to run `obol sell pricing` first.
@@ -1424,24 +1439,29 @@ func preserveLiteLLMConfigForHelm(cfg *config.Config, kubeconfigPath string) (st
 	return raw, nil
 }
 
-func restoreLiteLLMConfig(cfg *config.Config, kubeconfigPath, raw string) error {
+func restoreLiteLLMConfig(cfg *config.Config, kubeconfigPath, raw string) (bool, error) {
 	if strings.TrimSpace(raw) == "" {
-		return nil
+		return false, nil
 	}
 
 	kubectlBinary := filepath.Join(cfg.BinDir, "kubectl")
-	if current, err := kubectl.Output(kubectlBinary, kubeconfigPath,
-		"get", "configmap", "litellm-config", "-n", "llm", "-o", "jsonpath={.data.config\\.yaml}"); err == nil && strings.TrimSpace(current) != "" {
+	current := ""
+	if currentRaw, err := kubectl.Output(kubectlBinary, kubeconfigPath,
+		"get", "configmap", "litellm-config", "-n", "llm", "-o", "jsonpath={.data.config\\.yaml}"); err == nil && strings.TrimSpace(currentRaw) != "" {
+		current = currentRaw
 		merged, err := mergeLiteLLMConfig(current, raw)
 		if err != nil {
-			return err
+			return false, err
 		}
 		raw = merged
 	}
+	if litellmConfigSemanticallyEqual(current, raw) {
+		return false, nil
+	}
 
 	manifest := configMapFieldOwnershipManifest("litellm-config", "llm", "config.yaml", raw)
 
-	return kubectl.ApplyServerSideForceConflicts(kubectlBinary, kubeconfigPath, manifest, "helm")
+	return true, kubectl.ApplyServerSideForceConflicts(kubectlBinary, kubeconfigPath, manifest, "helm")
 }
 
 func mergeLiteLLMConfig(currentRaw, previousRaw string) (string, error) {
@@ -1486,6 +1506,17 @@ func mergeLiteLLMConfig(currentRaw, previousRaw string) (string, error) {
 	return string(merged), nil
 }
 
+func litellmConfigSemanticallyEqual(aRaw, bRaw string) bool {
+	var a, b model.LiteLLMConfig
+	if err := yaml.Unmarshal([]byte(aRaw), &a); err != nil {
+		return strings.TrimSpace(aRaw) == strings.TrimSpace(bRaw)
+	}
+	if err := yaml.Unmarshal([]byte(bRaw), &b); err != nil {
+		return strings.TrimSpace(aRaw) == strings.TrimSpace(bRaw)
+	}
+	return reflect.DeepEqual(a, b)
+}
+
 func configMapFieldOwnershipManifest(name, namespace, key, value string) []byte {
 	var b strings.Builder
 
diff --git a/internal/stack/stack_test.go b/internal/stack/stack_test.go
index 6c63e030..edc0f02d 100644
--- a/internal/stack/stack_test.go
+++ b/internal/stack/stack_test.go
@@ -604,6 +604,56 @@ model_list:
 	t.Fatalf("merged config missing paid route:\n%s", merged)
 }
 
+func TestLiteLLMConfigSemanticEqualIgnoresFormatting(t *testing.T) {
+	a := `model_list:
+  - model_name: "paid/*"
+    litellm_params:
+      model: "openai/*"
+      api_base: "http://127.0.0.1:8402/v1"
+      api_key: "unused"
+litellm_settings:
+  drop_params: true
+`
+	b := `litellm_settings:
+    drop_params: true
+model_list:
+- model_name: paid/*
+  litellm_params:
+    model: openai/*
+    api_base: http://127.0.0.1:8402/v1
+    api_key: unused
+`
+	if !litellmConfigSemanticallyEqual(a, b) {
+		t.Fatal("semantically equivalent LiteLLM configs compared unequal")
+	}
+}
+
+func TestSyncDefaultsRestartsLiteLLMAfterConfigRestore_SourceGuard(t *testing.T) {
+	src, err := os.ReadFile("stack.go")
+	if err != nil {
+		t.Fatalf("read stack.go: %v", err)
+	}
+	body := string(src)
+	start := strings.Index(body, "func syncDefaults(")
+	if start < 0 {
+		t.Fatal("syncDefaults not found")
+	}
+	end := strings.Index(body[start+1:], "\nfunc ")
+	if end < 0 {
+		t.Fatal("could not delimit syncDefaults body")
+	}
+	fn := body[start : start+1+end]
+	restoreIdx := strings.Index(fn, "restoredLiteLLMConfig, err = restoreLiteLLMConfig")
+	restartIdx := strings.Index(fn, "model.RestartLiteLLM(cfg, u, \"restored LiteLLM config\")")
+	autoIdx := strings.Index(fn, "autoConfigureLLM(cfg, u)")
+	if restoreIdx < 0 || restartIdx < 0 || autoIdx < 0 {
+		t.Fatalf("syncDefaults must restore ConfigMap, restart LiteLLM, then auto-configure; restore=%d restart=%d auto=%d", restoreIdx, restartIdx, autoIdx)
+	}
+	if !(restoreIdx < restartIdx && restartIdx < autoIdx) {
+		t.Fatalf("syncDefaults order wrong: restore=%d restart=%d auto=%d", restoreIdx, restartIdx, autoIdx)
+	}
+}
+
 func TestConfigMapFieldOwnershipManifestUsesLiteralBlock(t *testing.T) {
 	manifest := string(configMapFieldOwnershipManifest("litellm-config", "llm", "config.yaml", "model_list:\n  - model_name: paid/*\n"))
 
diff --git a/obolup.sh b/obolup.sh
index 4ff28c28..592b74e5 100755
--- a/obolup.sh
+++ b/obolup.sh
@@ -55,19 +55,19 @@ fi
 # Pinned dependency versions
 # Update these versions to upgrade dependencies across all installations
 # renovate: datasource=github-releases depName=kubernetes/kubernetes
-readonly KUBECTL_VERSION="1.35.3"
+readonly KUBECTL_VERSION="1.36.1"
 # renovate: datasource=github-releases depName=helm/helm
-readonly HELM_VERSION="3.20.1"
+readonly HELM_VERSION="3.21.0"
 # renovate: datasource=github-releases depName=k3d-io/k3d
 readonly K3D_VERSION="5.8.3"
 # renovate: datasource=github-releases depName=helmfile/helmfile
-readonly HELMFILE_VERSION="1.4.3"
+readonly HELMFILE_VERSION="1.5.2"
 # renovate: datasource=github-releases depName=derailed/k9s
 readonly K9S_VERSION="0.50.18"
 # renovate: datasource=github-releases depName=databus23/helm-diff
-readonly HELM_DIFF_VERSION="3.15.4"
+readonly HELM_DIFF_VERSION="3.15.7"
 # renovate: datasource=github-releases depName=ollama/ollama
-readonly OLLAMA_VERSION="0.20.2"
+readonly OLLAMA_VERSION="0.24.0"
 # Must match internal/openclaw/OPENCLAW_VERSION (without "v" prefix).
 # Tested by TestOpenClawVersionConsistency.
 readonly OPENCLAW_VERSION="2026.4.21"
diff --git a/renovate.json b/renovate.json
index 1317c83f..5c21ccde 100644
--- a/renovate.json
+++ b/renovate.json
@@ -57,22 +57,9 @@
     },
     {
       "customType": "regex",
-      "description": "Update LiteLLM image tag",
+      "description": "Update cloudflared image tag and digest",
       "matchStrings": [
-        "image:\\s*ghcr\\.io/berriai/litellm:(?<currentValue>[\\w.-]+)"
-      ],
-      "fileMatch": [
-        "^internal/embed/infrastructure/base/templates/llm\\.yaml$"
-      ],
-      "datasourceTemplate": "docker",
-      "depNameTemplate": "ghcr.io/berriai/litellm",
-      "versioningTemplate": "loose"
-    },
-    {
-      "customType": "regex",
-      "description": "Update cloudflared image tag",
-      "matchStrings": [
-        "repository:\\s*cloudflare/cloudflared\\s*\\n\\s*tag:\\s*[\"']?(?<currentValue>[\\w.-]+)[\"']?"
+        "repository:\\s*cloudflare/cloudflared\\s*\\n\\s*tag:\\s*[\"']?(?<currentValue>[\\w.-]+)(?:@(?<currentDigest>sha256:[a-f0-9]+))?[\"']?"
       ],
       "fileMatch": [
         "^internal/embed/infrastructure/cloudflared/values\\.yaml$"
@@ -263,7 +250,7 @@
         "docker"
       ],
       "matchPackageNames": [
-        "ghcr.io/berriai/litellm"
+        "ghcr.io/obolnetwork/litellm"
       ],
       "labels": [
         "renovate/litellm"
@@ -271,7 +258,8 @@
       "schedule": [
         "before 6am on monday"
       ],
-      "groupName": "LiteLLM updates"
+      "groupName": "LiteLLM updates",
+      "pinDigests": true
     },
     {
       "description": "Group cloudflared updates",
@@ -287,7 +275,8 @@
       "schedule": [
         "before 6am on monday"
       ],
-      "groupName": "cloudflared updates"
+      "groupName": "cloudflared updates",
+      "pinDigests": true
     },
     {
       "description": "Batch obolup.sh tool version updates into a single PR",
diff --git a/tests/test_buy_autorefill.py b/tests/test_buy_autorefill.py
index 8f498494..dfd33c94 100644
--- a/tests/test_buy_autorefill.py
+++ b/tests/test_buy_autorefill.py
@@ -173,6 +173,46 @@ def test_paid_request_failure_hint_for_transient(self):
             )
         self.assertIn("transient error", buf.getvalue())
 
+    def test_permit2_auths_are_immediately_valid_on_chain(self):
+        mod = load_buy_module()
+        signer = "0x57b0ef875deb5a37301f1640e469a2129da9490e"
+        pay_to = "0xc0de030f6c37f490594f93fb99e2756703c4297e"
+        asset = "0x210bbd033630e5e611b7922d70b0caabe64636d9"
+        payment = {
+            "scheme": "exact",
+            "network": "eip155:84532",
+            "amount": "1000000000000000",
+            "asset": asset,
+            "payTo": pay_to,
+            "maxTimeoutSeconds": 60,
+            "extra": {
+                "assetTransferMethod": "permit2",
+                "name": "Obol Network",
+                "version": "1",
+            },
+        }
+
+        with mock.patch.object(mod, "_supports_erc20_permit", return_value=False), \
+             mock.patch.object(mod, "_signer_post", return_value={"signature": "0x" + ("11" * 65)}) as signer_post, \
+             mock.patch.object(mod.time, "time", return_value=1779730000):
+            auths = mod._presign_auths(
+                signer,
+                pay_to,
+                "1000000000000000",
+                "base-sepolia",
+                asset,
+                1,
+                payment=payment,
+                extensions={},
+            )
+
+        typed_data = signer_post.call_args.args[1]
+        self.assertEqual(typed_data["message"]["witness"]["validAfter"], "0")
+        self.assertEqual(
+            auths[0]["payment"]["payload"]["permit2Authorization"]["witness"]["validAfter"],
+            "0",
+        )
+
     def test_build_active_auth_pool_appends_new_auths(self):
         mod = load_buy_module()
         existing = [{"nonce": "a"}, {"nonce": "b"}, {"nonce": "c"}]