Context
SDK PR #2522 (ARG ordering fix) restored registry cache hits for apt-get/npm layers across SDK bumps, cutting per-image build time from ~322s to ~154s (validation in #544). This recovers the Feb baseline of ~5-6 hours for 433 images.
However, each image still runs a full docker buildx build that pulls the SWE-bench base image, resolves registry cache, and copies the builder output. With 433 images, this adds up.
Proposal: pre-built base images
Split the build into two independently-tagged layers:
Layer 1 — Repo base (rarely changes):
ghcr.io/openhands/eval-base:{SWEBENCH_IMAGE_ID}
Contains everything from the SWE-bench base through apt/npm setup. Only rebuilds when the Dockerfile base layers or upstream SWE-bench image change.
Layer 2 — SDK agent (changes per SDK commit):
FROM ghcr.io/openhands/eval-base:${BASE_TAG}
COPY --from=builder /agent-server /agent-server
ENTRYPOINT ["/agent-server/.venv/bin/python", "-m", "openhands.agent_server"]
Just a COPY onto the cached base — ~5-10s per image.
Expected impact
| Scenario |
Current (with ARG fix) |
With base split |
| Same SDK (all cached in GHCR) |
3-12 min |
3-12 min |
| New SDK, same SWE-bench instances |
~5-6 hours |
~30-45 min |
| New SDK + new SWE-bench base |
~5-6 hours |
~5-6 hours |
The common case (new SDK commit, same instances) would go from hours to under an hour.
Trade-offs
- Adds operational complexity: two image registries, invalidation logic for base images
- Requires changes to
build_utils.py and the workflow
- The ARG fix already handles the regression; this is a further optimization
Related
Context
SDK PR #2522 (ARG ordering fix) restored registry cache hits for
apt-get/npmlayers across SDK bumps, cutting per-image build time from ~322s to ~154s (validation in #544). This recovers the Feb baseline of ~5-6 hours for 433 images.However, each image still runs a full
docker buildx buildthat pulls the SWE-bench base image, resolves registry cache, and copies the builder output. With 433 images, this adds up.Proposal: pre-built base images
Split the build into two independently-tagged layers:
Layer 1 — Repo base (rarely changes):
Contains everything from the SWE-bench base through apt/npm setup. Only rebuilds when the Dockerfile base layers or upstream SWE-bench image change.
Layer 2 — SDK agent (changes per SDK commit):
Just a COPY onto the cached base — ~5-10s per image.
Expected impact
The common case (new SDK commit, same instances) would go from hours to under an hour.
Trade-offs
build_utils.pyand the workflowRelated