Split base image from SDK layer to avoid full 9h+ rebuilds on SDK changes

## Context

SDK PR [#2522](https://github.com/OpenHands/software-agent-sdk/pull/2522) (ARG ordering fix) restored registry cache hits for `apt-get`/`npm` layers across SDK bumps, cutting per-image build time from ~322s to ~154s ([validation in #544](https://github.com/OpenHands/benchmarks/issues/544)). This recovers the Feb baseline of ~5-6 hours for 433 images.

However, each image still runs a full `docker buildx build` that pulls the SWE-bench base image, resolves registry cache, and copies the builder output. With 433 images, this adds up.

## Proposal: pre-built base images

Split the build into two independently-tagged layers:

**Layer 1 — Repo base** (rarely changes):
```
ghcr.io/openhands/eval-base:{SWEBENCH_IMAGE_ID}
```
Contains everything from the SWE-bench base through apt/npm setup. Only rebuilds when the Dockerfile base layers or upstream SWE-bench image change.

**Layer 2 — SDK agent** (changes per SDK commit):
```dockerfile
FROM ghcr.io/openhands/eval-base:${BASE_TAG}
COPY --from=builder /agent-server /agent-server
ENTRYPOINT ["/agent-server/.venv/bin/python", "-m", "openhands.agent_server"]
```

Just a COPY onto the cached base — ~5-10s per image.

## Expected impact

| Scenario | Current (with ARG fix) | With base split |
|---|---|---|
| Same SDK (all cached in GHCR) | 3-12 min | 3-12 min |
| **New SDK, same SWE-bench instances** | **~5-6 hours** | **~30-45 min** |
| New SDK + new SWE-bench base | ~5-6 hours | ~5-6 hours |

The common case (new SDK commit, same instances) would go from hours to under an hour.

## Trade-offs

- Adds operational complexity: two image registries, invalidation logic for base images
- Requires changes to `build_utils.py` and the workflow
- The ARG fix already handles the regression; this is a further optimization

## Related

- #544 — Root cause investigation (ARG ordering)
- #531 — Master tracker
- SDK [#2522](https://github.com/OpenHands/software-agent-sdk/pull/2522) — ARG fix (merged)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split base image from SDK layer to avoid full 9h+ rebuilds on SDK changes #538

Context

Proposal: pre-built base images

Expected impact

Trade-offs

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	Current (with ARG fix)	With base split
Same SDK (all cached in GHCR)	3-12 min	3-12 min
New SDK, same SWE-bench instances	~5-6 hours	~30-45 min
New SDK + new SWE-bench base	~5-6 hours	~5-6 hours

Split base image from SDK layer to avoid full 9h+ rebuilds on SDK changes #538

Description

Context

Proposal: pre-built base images

Expected impact

Trade-offs

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions