Skip to content

fix: handle opaque whiteouts before layer extraction#95

Open
hiroTamada wants to merge 3 commits intomainfrom
fix/opaque-whiteout
Open

fix: handle opaque whiteouts before layer extraction#95
hiroTamada wants to merge 3 commits intomainfrom
fix/opaque-whiteout

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Feb 13, 2026

Summary

  • Fixes /bin/sh: no such file or directory crash on images with layers that replace directories (e.g. Python base images)
  • Opaque whiteouts (.wh..wh..opq) must clear directories before extraction, not after — otherwise the current layer's own files get deleted along with lower layer files
  • Pre-scans each layer's tar for opaque whiteout markers, clears those directories, then extracts

Test plan

  • Deploy with Python base image that was previously failing
  • Deploy with Node.js base image (regression check)
  • Verify /bin/sh exists in booted VM

🤖 Generated with Claude Code


Note

Medium Risk
Changes core layer-unpack semantics and disk flush behavior inside the builder VM; mistakes could yield incorrect rootfs contents or flaky image readiness, though scope is limited to the in-VM erofs fast-path.

Overview
Fixes in-VM erofs creation for images that rely on OCI opaque whiteouts by pre-scanning each layer tarball for .wh..wh..opq, clearing affected directories before extracting the layer, and then applying regular .wh.* deletions post-extract.

Hardens the final erofs writeout by replacing the previous sync-only flush with fsync on the output file + source directory and an umount-based flush fallback, and relaxes TestCreateImage_Idempotent to accept additional in-progress statuses (pulling, converting).

Written by Cursor Bugbot for commit 9d43543. This will update automatically on new commits. Configure here.

hiroTamada and others added 2 commits February 12, 2026 20:17
Opaque whiteouts (.wh..wh..opq) mean "replace this directory entirely
with this layer's contents." The previous approach processed them after
extraction, which deleted the current layer's own files (e.g. /bin/sh)
along with the lower layer files.

Now: pre-scan each layer's tar for opaque whiteouts, clear those
directories before extraction, then extract. Regular whiteouts are
still processed after extraction. This fixes /bin/sh not found errors
on images with layers that replace directories like /bin/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous sync command was insufficient to flush writes through the
virtio-blk layer. This caused the host to read stale/incomplete data
from the source volume, resulting in LZ4 decompression errors when
booting the erofs rootfs.

Now: fsync the erofs file and directory entry, then unmount the source
volume entirely before reporting the result. This guarantees all writes
have reached the host-side block device file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

for _, e := range entries {
os.RemoveAll(filepath.Join(opaqueDir, e.Name()))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal in pre-scan enables directory clearing outside exportDir

Low Severity

The pre-scan processes raw tar -tf output entries without sanitizing path traversal sequences. filepath.Join(exportDir, filepath.Dir(entry)) with an entry containing ../ resolves outside exportDir — e.g. a crafted entry like ../../../etc/.wh..wh..opq resolves opaqueDir to /etc, causing os.RemoveAll on every entry in that directory. Notably, tar -tf shows raw archive paths (including ../) while tar -xf strips them by default, creating an asymmetry where the pre-scan clears directories that extraction would never actually touch.

Fix in Cursor Fix in Web

The idempotent create test only expected "pending" or "ready" but the
second call can also return "pulling" or "converting" depending on
timing. Add all valid intermediate statuses to fix the flaky test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant