Skip to content

Split base image from SDK layer to avoid full 9h+ rebuilds on SDK changes #538

@VascoSch92

Description

@VascoSch92

Context

SDK PR #2522 (ARG ordering fix) restored registry cache hits for apt-get/npm layers across SDK bumps, cutting per-image build time from ~322s to ~154s (validation in #544). This recovers the Feb baseline of ~5-6 hours for 433 images.

However, each image still runs a full docker buildx build that pulls the SWE-bench base image, resolves registry cache, and copies the builder output. With 433 images, this adds up.

Proposal: pre-built base images

Split the build into two independently-tagged layers:

Layer 1 — Repo base (rarely changes):

ghcr.io/openhands/eval-base:{SWEBENCH_IMAGE_ID}

Contains everything from the SWE-bench base through apt/npm setup. Only rebuilds when the Dockerfile base layers or upstream SWE-bench image change.

Layer 2 — SDK agent (changes per SDK commit):

FROM ghcr.io/openhands/eval-base:${BASE_TAG}
COPY --from=builder /agent-server /agent-server
ENTRYPOINT ["/agent-server/.venv/bin/python", "-m", "openhands.agent_server"]

Just a COPY onto the cached base — ~5-10s per image.

Expected impact

Scenario Current (with ARG fix) With base split
Same SDK (all cached in GHCR) 3-12 min 3-12 min
New SDK, same SWE-bench instances ~5-6 hours ~30-45 min
New SDK + new SWE-bench base ~5-6 hours ~5-6 hours

The common case (new SDK commit, same instances) would go from hours to under an hour.

Trade-offs

  • Adds operational complexity: two image registries, invalidation logic for base images
  • Requires changes to build_utils.py and the workflow
  • The ARG fix already handles the regression; this is a further optimization

Related

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions