docs(migration): bedag/raw → base release ownership transfer script#528
Closed
bussyjd wants to merge 5 commits into
Closed
docs(migration): bedag/raw → base release ownership transfer script#528bussyjd wants to merge 5 commits into
bussyjd wants to merge 5 commits into
Conversation
PR #481 only repaired hermes-<id> volumes after hermes.Sync (master agent). Child agents live under agent-<name> and are provisioned by the controller or agent-factory without that path, so hermes-data stayed 1000:1000 while Hermes runs as 10000:10000 and crash-looped on Permission denied under /data/.hermes. Extend EnsureHermesDataPVCOwnership to agent-<name>/hermes-data, call it from obol agent new and obol sell demo quant, and add obol agent repair-perms for factory-only creates that cannot docker-exec the k3d node from in-cluster. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace host-side Hermes PVC ownership repair with Kubernetes fsGroup and keep only a tiny k3d fallback.
PR #523 moved 6 bedag/raw helmfile releases into the base chart so there's one source of truth for what ships in each namespace. Fresh installs work. EXISTING clusters being upgraded from pre-#523 obol-stack fail at `helm upgrade base` with: Error: UPGRADE FAILED: <resource> exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "base" This blocks `obol stack up` until the operator manually re-annotates ~10 resources (Namespaces, HTTPRoutes, Middlewares, ConfigMaps, PrometheusRule, PodMonitor, ClusterRole/Binding). Adds hack/migrate-bedag-raw-to-base.sh which finds all such orphans and re-annotates them in bulk. Idempotent — safe to re-run. Surfaced by the 14-PR integration test campaign; see plans/integration-test-results-final-20260524.md Bug #2.
6 tasks
Collaborator
Author
|
Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #523 relocates 6
bedag/rawhelmfile releases into thebasechart so thestack has one source of truth for what ships in the
erpc,obol-frontend,and
llmnamespaces. Fresh installs are unaffected. Clusters created beforePR #523 fail at
helm upgrade basewithinvalid ownership metadatabecauseHelm refuses to adopt resources owned by another release.
This PR ships a one-shot migration script and operator documentation so
existing clusters can be upgraded without hand-fixing ~10 resources.
Symptom
Before / After
When to run
obol stack up, against any cluster created beforePR refactor: relocate remaining bedag/raw helmfile releases into base chart #523 merges.
Releases handled
obol-frontend-rbacobol-frontendobol-frontend-httprouteobol-frontenderpc-httprouteerpcerpc-x402-middlewareerpcerpc-metadataerpcllm-buyer-podmonitorllmx402-verifier-podmonitorx402(partial-upgrade clusters from before PR #513)Plus three resources that may exist with no Helm ownership at all and need
to be adopted into
base:namespace/erpc,namespace/obol-frontend,prometheusrule/x402-verifier.Files
hack/migrate-bedag-raw-to-base.sh— the migration script (executable).docs/upgrade-from-pre-pr-523.md— operator-facing upgrade guide..github/release-template.md— release-notes entry underBreaking changes / Migration notes pointing future release authors at the
script.
Test plan
bash -n hack/migrate-bedag-raw-to-base.sh(syntax check) passes.against a pre-refactor: relocate remaining bedag/raw helmfile releases into base chart #523 cluster and confirmed to unblock
obol stack up— everyinvalid ownership metadatafailure resolved on the next upgrade.(every resource reports
already on base, skipping).found) and does not affect
obol stack up.Surfaced by the 14-PR integration test campaign; see
plans/integration-test-results-final-20260524.mdBug #2.