fix(ilp): fix disk-full handling for SF buffer creation#25
Open
mtopolnik wants to merge 2 commits into
Open
Conversation
openCleanRW(path, size) and allocate(fd, size) both advanced the
logical EOF, which became a problem once allocate gained a
never-shrinks contract: callers that did openCleanRW(path, sz) then
allocate(fd, sz) (the only production pattern) hit the
target == currentSize short-circuit and got no block reservation.
SIGBUS protection on mmap stores was silently disabled.
Clean split of responsibilities:
- openCleanRW(path): sole owner of file lifecycle -- create or
truncate to empty, open RW. The size parameter is gone from the
JNI, Java, and FilesFacade API.
- allocate(fd, size): sole owner of "extend EOF and reserve real
blocks for [0, target)." Cross-platform contract:
* Linux: posix_fallocate(fd, currentSize, target - currentSize),
followed by ftruncate only on the sparse-fallback path.
* macOS: fcntl(F_PREALLOCATE) passes newBytes (not the full
target) to fst_length, fixing a long-standing over-allocation
on non-empty files; ftruncate(fd, target) advances EOF.
* Windows: FILE_ALLOCATION_INFO + FILE_END_OF_FILE_INFO,
short-circuiting when the file is already at target.
Production callers updated:
- MmapSegment.create: openCleanRW(ptr) + allocate(fd, sizeBytes).
Restores ENOSPC-at-create semantics for SF buffers.
- AckWatermark.open: openCleanRW(path) + allocate(fd, FILE_SIZE) on
the wrong-/missing-size branch; the correct-size branch still uses
openRW to preserve the previous session's watermark.
Tests updated for the new signatures; the full client test suite
passes (2219 / 2219).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits the responsibilities of
Files.openCleanRWandFiles.allocateand unifies theallocatecontract across Linux, macOS, and Windows.The two functions previously overlapped: both advanced logical EOF, and the old
allocateon POSIX simply calledftruncate(leaving the file sparse) while on macOS it also over-allocated bycurrentSizeon non-empty files. The single production caller pattern --openCleanRW(path, sz)thenallocate(fd, sz)-- left a sparse file on POSIX and a correctly-reserved one on Windows. When the disk filled later, a producer thread storing into the mmap'd region would trigger a SIGBUS that aborts the JVM (Linux/macOS) or an in-page exception (Windows).After:
openCleanRW(path)owns the file lifecycle: open RW, truncate to empty. Nosizeparameter.allocate(fd, size)owns "extend EOF and reserve real disk blocks for[0, target)."target = max(size, currentSize); the file never shrinks. ENOSPC /ERROR_DISK_FULLsurface as a cleanfalsereturn. Same observable behaviour on all three platforms.End-user impact: SF buffer creation hitting a full disk now fails synchronously with a
MmapSegmentExceptionthe sender can recover from, instead of crashing the JVM later.Tradeoffs
Files.openCleanRWandFilesFacade.openCleanRWlost theirsizeparameter. Callers that need a sized file follow withFiles.allocate(reserves blocks; fails on ENOSPC) orFiles.truncate(sparse; faster). Two production call sites and several test sites updated.posix_fallocate/F_PREALLOCATE,allocatefalls back toftruncateand leaves blocks sparse; the SIGBUS risk re-emerges on that filesystem only. Documented onFiles.allocate's Javadoc. Windows has no equivalent fallback.allocatenow passestarget - currentSizetoF_PREALLOCATE(the correct beyond-EOF length underF_PEOFPOSMODE) instead of the fulltarget. Behaviour change on non-empty files: stops requesting duplicate allocation for the existing region.Test plan