fix(controller/render): Restricted PSS securityContext on httpd workloads#529
Closed
bussyjd wants to merge 5 commits into
Closed
fix(controller/render): Restricted PSS securityContext on httpd workloads#529bussyjd wants to merge 5 commits into
bussyjd wants to merge 5 commits into
Conversation
PR #481 only repaired hermes-<id> volumes after hermes.Sync (master agent). Child agents live under agent-<name> and are provisioned by the controller or agent-factory without that path, so hermes-data stayed 1000:1000 while Hermes runs as 10000:10000 and crash-looped on Permission denied under /data/.hermes. Extend EnsureHermesDataPVCOwnership to agent-<name>/hermes-data, call it from obol agent new and obol sell demo quant, and add obol agent repair-perms for factory-only creates that cannot docker-exec the k3d node from in-cluster. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace host-side Hermes PVC ownership repair with Kubernetes fsGroup and keep only a tiny k3d fallback.
…oads PR #521 enforces Restricted Pod Security Standard on x402 + llm namespaces. The controller renders two httpd-based Deployments (obol-skill-md publisher + agentidentity-default-registration well- known/agent-registration.json publisher) without securityContext, so PSS admission rejects them and they never start. Result: marketplace API returns STACK_UNREACHABLE because skill-md isn't reachable. Adds Restricted-compliant securityContext to both renderers: pod: runAsNonRoot, runAsUser=1000, RunAsGroup=1000, seccompProfile=RuntimeDefault, fsGroup=1000 container: allowPrivilegeEscalation=false, drop ALL capabilities Both Deployments already bind httpd to 8080, which is non-root safe, so no port change is required. Surfaced by the 14-PR integration test campaign. The integration test workaround patched the running Deployments manually: plans/integration-test-results-final-20260524.md Bug #3.
6 tasks
Collaborator
Author
|
Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cross-PR interaction fix surfaced by the 14-PR integration test campaign (Bug #3).
PR #521 enforces the Restricted Pod Security Standard on the
x402(andllm) namespaces. The serviceoffer-controller renders two httpd-based Deployments without asecurityContext:obol-skill-md— publishes/skill.mdandapi/services.jsonagentidentity-<name>-registration— publishes/.well-known/agent-registration.jsonBoth pods are rejected at admission with:
so they never start. The marketplace API then returns
STACK_UNREACHABLEbecause skill-md isn't reachable.Fix
Adds Restricted-PSS-compliant
securityContextblocks to both render functions ininternal/serviceoffercontroller/render.go:runAsNonRoot: true,runAsUser: 1000,runAsGroup: 1000,fsGroup: 1000,seccompProfile.type: RuntimeDefaulthttpd):allowPrivilegeEscalation: false,capabilities.drop: [\"ALL\"]The two
securityContextpayloads are factored into helpers (restrictedPodSecurityContext,restrictedContainerSecurityContext) so future controller-rendered workloads can reuse the same Restricted defaults.Both Deployments already bind httpd to
8080(httpd -f -p 8080 -h /www), which non-root UID 1000 can bind cleanly. No port or Service changes were required.Why UID 1000
That's the canonical busybox non-root UID and the only Linux user/group the upstream
busybox:1.36image exposes besides root. The httpd payload is a read-only ConfigMap projection, and the newfsGroup: 1000keeps the projected volumes readable.Tests
TestBuildSkillCatalogDeployment_RestrictedPSSTestBuildAgentIdentityRegistrationDeployment_RestrictedPSSassertRestrictedPSSasserts every Restricted-PSS field on the rendered Deployment so regressions show up at the renderer, not at PSS admission in a live cluster.Test plan
go build ./...cleango test ./internal/serviceoffercontroller/...greensecurityContextis removed (covered by helper invariants)Readyand/skill.md+/.well-known/agent-registration.jsonresolve through TraefikRelated
plans/integration-test-results-final-20260524.mdBug User facing ingress #3 (manual workaround)