Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -311,13 +311,22 @@ jobs:
password: ${{ secrets.ORG_ACCESS_TOKEN }}

- name: Set up Buildx
id: buildx
uses: docker/setup-buildx-action@d7f5e7f509e45cec5c76c4d5afdd7de93d0b3df5 # v4.1.0
with:
version: "lab:latest"
driver: cloud
endpoint: "docker/make-product-smarter"
install: true

# Purge the shared cloud builder's cache before building. The release
# builds 7 image variants (cpu/cuda on amd64+arm64) on one cloud builder,
# and accumulated cache from previous runs eventually fills its disk —
# surfacing as "no space left on device" while unpacking the (growing)
# upstream llama.cpp image snapshots. Starting clean avoids that.
- name: Free build cache on cloud builder
run: docker buildx prune -af --builder ${{ steps.buildx.outputs.name }}

- name: Build CPU image
uses: docker/build-push-action@f9f3042f7e2789586610d6e8b85c8f03e5195baf
with:
Expand Down
2 changes: 1 addition & 1 deletion .versions
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ VLLM_UPSTREAM_VERSION=0.19.0
VLLM_METAL_RELEASE=v0.2.0-20260420-142150
DIFFUSERS_RELEASE=v0.1.0-20260216-000000
SGLANG_VERSION=0.5.6
LLAMA_SERVER_VERSION=b9501
LLAMA_SERVER_VERSION=b9592
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# syntax=docker/dockerfile:1

ARG GO_VERSION=1.25
ARG LLAMA_SERVER_VERSION=b9501
ARG LLAMA_SERVER_VERSION=b9592

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The pull request title and description state that a docker buildx prune -af step is being added to the release workflow to resolve the ResourceExhausted cache issue. However, the actual changes in this PR only consist of version bumps in .versions, Dockerfile, and the llama.cpp submodule. The CI/CD workflow file containing the prune step is missing from this pull request. Please include the workflow changes to ensure the cache is pruned before building.

ARG LLAMA_SERVER_VARIANT=cpu
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9501
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9592
Comment on lines +4 to +6

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Reduce the risk of version skew between LLAMA_SERVER_VERSION and LLAMA_UPSTREAM_IMAGE.

Both variables currently encode b9592 separately. To prevent future mismatches, derive LLAMA_UPSTREAM_IMAGE from LLAMA_SERVER_VERSION (e.g., ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/...:server-vulkan-${LLAMA_SERVER_VERSION}) or otherwise ensure a single source of truth for this version.

Suggested change
ARG LLAMA_SERVER_VERSION=b9592
ARG LLAMA_SERVER_VARIANT=cpu
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9501
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9592
ARG LLAMA_SERVER_VERSION=b9592
ARG LLAMA_SERVER_VARIANT=cpu
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-${LLAMA_SERVER_VERSION}


ARG VERSION=dev

Expand Down
2 changes: 1 addition & 1 deletion llamacpp/native/vendor/llama.cpp
Submodule llama.cpp updated 340 files