Skip to content

sec: migrate standard Dockerfiles to Chainguard golden base images#159

Merged
scale-ballen merged 10 commits intomainfrom
sec/golden-standard-dockerfiles
Mar 12, 2026
Merged

sec: migrate standard Dockerfiles to Chainguard golden base images#159
scale-ballen merged 10 commits intomainfrom
sec/golden-standard-dockerfiles

Conversation

@scale-ballen
Copy link
Contributor

@scale-ballen scale-ballen commented Mar 11, 2026

Summary

Migrate standard (non-FIPS) agentex and agentex-ui Dockerfiles from public Docker Hub base images to hardened Chainguard golden base images hosted in artifacts prod ECR (022465994601). This aligns standard builds with the FIPS variants already in production.

Base Image Changes

Service Before After
agentex python:3.12-slim (Docker Hub) golden/chainguard/python:3.12-dev (artifacts prod ECR)
agentex-ui node:20 (Docker Hub) golden/chainguard/node:20-dev (artifacts prod ECR)

Dockerfile Changes (agentex/Dockerfile)

  • Multi-stage build: basedevdocs-builderproduction
  • Dependency installation: Uses uv sync with UV_PROJECT_ENVIRONMENT=/usr for deterministic lockfile-based installs (improvement over FIPS variant's uv pip install --system)
  • Production stage: Targeted binary copies (uvicorn, ddtrace-run, python3, python3.12) instead of blanket /usr/bin copy — reduces attack surface
  • Runtime dependencies: Only postgresql-client and libpq in production (no build tools)
  • User: Runs as nonroot (UID 65532) in production
  • ENTRYPOINT cleared: Both dev and production stages set ENTRYPOINT [] to override Chainguard's default python entrypoint, which breaks shell commands in docker-compose

Dockerfile Changes (agentex-ui/Dockerfile)

  • Structurally identical to FIPS variant, just swaps base image tag (node:20-dev vs node-fips:20-dev)

CI/Integration Test Changes

  • integration-tests.yml: Added AWS OIDC credentials + artifacts prod ECR login so integration tests can pull golden base images during docker-compose build
  • id-token: write scoped to job level: Moved from workflow-level permissions to only the run-integration-tests job (principle of least privilege per Greptile review)
  • docker-compose.yml: Added args: SOURCE_DIR: agentex to both agentex and agentex-temporal-worker services — docker-compose builds from repo root where code is at agentex/, not public/agentex/ (the default for CI builds in the parent agentex repo)

Greptile Review Fixes

  1. Removed busybox from base stage: Chainguard deliberately excludes busybox to minimize attack surface. Re-adding it undermines the security model. bash alone is sufficient for shell compatibility.
  2. Scoped id-token: write to job level: Only the run-integration-tests job needs OIDC federation for AWS ECR access. Other jobs (changes, discover-agent-images) don't need token minting capability.
  3. Cleared ENTRYPOINT on dev stage: Chainguard Python images set ENTRYPOINT ["python"], which causes docker-compose bash -c "..." commands to be interpreted as python bash -c "...". Fixed by adding ENTRYPOINT [] to the dev stage.

Integration Test Failures Fixed

  1. "/public/agentex/src": not found — Dockerfile defaults SOURCE_DIR=public/agentex for CI builds from the parent agentex repo root. Docker-compose builds from scale-agentex repo root where the path is agentex/. Fixed by passing SOURCE_DIR: agentex as a build arg in docker-compose.yml.
  2. /usr/bin/python: can't open file '/app/bash' — Chainguard's ENTRYPOINT ["python"] caused shell commands to be passed as arguments to Python. Fixed by adding ENTRYPOINT [] on the dev stage.

Test plan

  • CI lint and typecheck pass
  • Unit tests pass
  • Integration tests pass (docker-compose builds with golden base images)
  • Production stage builds successfully in parent agentex repo CI
  • Images start and serve traffic correctly at runtime

🤖 Generated with Claude Code

Greptile Summary

This PR migrates the agentex and agentex-ui Dockerfiles from public Docker Hub base images (python:3.12-slim, node:20) to hardened Chainguard golden images hosted in the artifacts-prod ECR (022465994601), aligning standard builds with the FIPS variants already in production. CI is updated with AWS OIDC + ECR login so integration tests can pull these images.

Key changes:

  • agentex/Dockerfile: Introduces a clean multi-stage build (basedev / docs-builderproduction). base uses uv sync --no-dev, dev layers on --group dev, and production copies only targeted binaries (uvicorn, ddtrace-run, python3, python3.12) rather than all of /usr/bin, reducing attack surface. Both dev and production stages clear Chainguard's default ENTRYPOINT ["python"] to allow docker-compose shell commands.
  • agentex-ui/Dockerfile: Straightforward base image swap to node:20-dev, with ENTRYPOINT [] and USER nonroot (UID 65532) applied consistently.
  • .github/workflows/integration-tests.yml: AWS OIDC credentials + ECR login added to the run-integration-tests job; id-token: write is correctly scoped at the job level only, not the workflow level.
  • agentex/docker-compose.yml: SOURCE_DIR: agentex build arg added to both services so local builds resolve paths from the repo root correctly.

Issue found:

  • The comment # Conditionally copy docs from builder stage (line 80 of agentex/Dockerfile) is misleading — the COPY --from=docs-builder instruction is unconditional and ARG INCLUDE_DOCS=false has no effect. This is a new comment added in this PR that incorrectly describes the pre-existing behavior.

Confidence Score: 4/5

  • Safe to merge — this is a well-structured security hardening change that addresses all previous review findings.
  • All critical issues from the prior review (build-tool binary leakage into production, busybox re-installation, missing ENTRYPOINT reset, dev deps in production, workflow-level id-token) have been resolved. The only new finding is a misleading comment on the docs copy — a non-blocking style issue. The pre-existing uv.lock non-determinism concern remains but is not a regression from this PR.
  • agentex/Dockerfile — minor: the INCLUDE_DOCS arg is still never evaluated as a condition (pre-existing) and a new misleading comment was added that implies the docs copy is conditional when it is not.

Important Files Changed

Filename Overview
agentex/Dockerfile Multi-stage Chainguard migration. Previous issues (build-tool leakage, busybox, no-dev) are resolved. One new minor issue: # Conditionally copy docs comment is misleading — the COPY --from=docs-builder is unconditional. The pre-existing uv sync without uv.lock issue (flagged in prior review) persists but is not a regression from this PR.
.github/workflows/integration-tests.yml Adds AWS OIDC credentials + ECR login steps so integration tests can pull Chainguard golden base images during docker-compose build. id-token: write is correctly scoped to the run-integration-tests job only (not workflow-level), addressing the prior review finding. Order of steps (credentials before ECR login) is correct.
agentex-ui/Dockerfile Clean swap of node:20 for Chainguard's node:20-dev. ENTRYPOINT [] correctly overrides Chainguard's default node entrypoint. USER root / USER nonroot (UID 65532) pattern is correct. Build tools left in the single-stage image is intentional (needed by Sharp at build time).
agentex/docker-compose.yml Adds args: SOURCE_DIR: agentex to both agentex and agentex-temporal-worker services, overriding the public/agentex default so local docker-compose builds resolve source paths correctly from the repo root.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    BASE["base stage\nFROM chainguard/python:3.12-dev\nuv sync --no-dev\n(prod deps only)"]
    DEV["dev stage\nFROM base\nuv sync --group dev\n+ ENTRYPOINT []"]
    DOCS["docs-builder stage\nFROM base\nuv sync --group docs\nmkdocs build"]
    PROD["production stage\nFROM chainguard/python:3.12-dev\napk add postgresql-client libpq\nCOPY /usr/lib/python3.12 from base\nCOPY targeted binaries from base\nUSER nonroot"]

    BASE --> DEV
    BASE --> DOCS
    BASE --> PROD
    DOCS -->|"COPY --from=docs-builder\n/app/docs/site (unconditional)"| PROD

    subgraph ECR ["022465994601.dkr.ecr.us-west-2.amazonaws.com"]
        IMG["golden/chainguard/python:3.12-dev"]
    end

    ECR --> BASE
    ECR --> PROD
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: agentex/Dockerfile
Line: 80-81

Comment:
**Misleading "Conditionally" comment — copy is unconditional**

The comment on line 80 says `# Conditionally copy docs from builder stage`, but the `COPY` instruction on line 81 has no guard and always executes. The `ARG INCLUDE_DOCS=false` declared on line 54 is never referenced in a conditional expression, so documentation assets are always baked into the production image regardless of the build arg value.

Either use a Dockerfile shell-form `RUN` with a conditional, or remove the misleading comment and rename the arg to reflect its true state:

```suggestion
# Docs site is always included in the production image
COPY --from=docs-builder /app/docs/site /app/docs/site
```

If conditional inclusion is actually desired in the future, note that Dockerfile has no native `if` syntax for `COPY --from`; a common pattern is to gate the copy via a separate build target or a `RUN` script that only copies if the arg is set.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 88218f6

Migrate from Docker Hub base images to ECR-mirrored Chainguard golden images:
- agentex: python:3.12-slim → golden/chainguard/python:3.12-dev
- agentex-ui: node:20 → golden/chainguard/node:20-dev

Mirrors the pattern established in the FIPS Dockerfiles (PR #308).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@scale-ballen scale-ballen requested a review from a team as a code owner March 11, 2026 18:17
- Replace blanket COPY --from=base /usr/bin with targeted copies of
  only the console_scripts needed at runtime (uvicorn, ddtrace-run,
  python3, python3.12), preventing build tools (gcc, make) from leaking
  into the production image
- Switch docs-builder from uv sync --group docs to uv pip install
  --system --group docs for deterministic builds and consistency with
  the rest of the Dockerfile
- Use mkdocs build directly instead of uv run mkdocs build since
  packages are now installed to system Python

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scale-ballen and others added 3 commits March 11, 2026 14:39
The golden Chainguard base image requires ECR authentication which is
unavailable in integration test CI (scale-agentex repo lacks the IAM
role). Add configurable BASE_IMAGE ARG defaulting to golden image for
production builds, with docker-compose overriding to python:3.12-alpine
for local dev and CI. Also adds bash to system deps for docker-compose
command compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert the BASE_IMAGE workaround and instead properly authenticate
with ECR in CI. Adds AWS credentials config (github-action-agentex
role) and egp-prod ECR login to integration-tests.yml so docker
compose can pull golden Chainguard base images.

Requires Terracode-Infra change to add scaleapi/scale-agentex:* to
the github-action-agentex IAM role OIDC subjects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv pip install does not support --group flag. Revert to uv sync
(matching original Dockerfile) with UV_PROJECT_ENVIRONMENT=/usr
for Chainguard's Python prefix. Addresses Greptile findings #3 and #4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scale-ballen and others added 2 commits March 11, 2026 15:23
Switch from the overprivileged github-action-agentex role to the new
github-action-scale-agentex-ecr-read role which only grants ECR read
access to golden/* repos. Addresses Greptile review finding about
excessive permissions for a public repository.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…atibility

The Dockerfile defaults SOURCE_DIR=public/agentex (for CI builds from repo
root), but docker-compose builds from the scale-agentex repo root where the
path is agentex/. Override the arg so integration tests can find source files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scale-ballen and others added 2 commits March 12, 2026 12:02
Chainguard Python images set ENTRYPOINT ["python"], so docker-compose
commands like `bash -c "..."` get interpreted as `python bash -c "..."`.
Clear the entrypoint on the dev stage so shell commands work correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove busybox from base stage apk install — Chainguard deliberately
  excludes it to minimize attack surface; bash alone is sufficient
- Move id-token: write from workflow-level to run-integration-tests job
  only, following principle of least privilege (Greptile review)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use `uv sync --no-dev` in base stage so dev-only packages (test runners,
linters, debug tools) don't leak into production via the COPY --from=base
of /usr/lib/python3.12. Dev stage still gets them via `uv sync --group dev`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@scale-ballen scale-ballen merged commit b413297 into main Mar 12, 2026
51 of 71 checks passed
@scale-ballen scale-ballen deleted the sec/golden-standard-dockerfiles branch March 12, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants