Skip to content

fix(snapshot): align file cursor between slots in dump_dirty#5696

Closed
ejc3 wants to merge 1 commit intofirecracker-microvm:mainfrom
ejc3:fix-dump-dirty-multi-slot-cursor
Closed

fix(snapshot): align file cursor between slots in dump_dirty#5696
ejc3 wants to merge 1 commit intofirecracker-microvm:mainfrom
ejc3:fix-dump-dirty-multi-slot-cursor

Conversation

@ejc3
Copy link
Copy Markdown

@ejc3 ejc3 commented Feb 18, 2026

Summary

Fix diff snapshot corruption for VMs with multiple KVM memory slots (x86_64 VMs with >3GB RAM).

GuestMemorySlot::dump_dirty processes dirty pages using seek+write operations but did not advance the file cursor to the end of each slot. When a slot ends with clean pages, the cursor is short by those pages' size. All subsequent slots' data is then written at wrong file offsets, corrupting the snapshot.

Root Cause

In src/vmm/src/vstate/memory.rs, the dump_dirty per-slot function tracks skip_size for trailing clean pages but only seeks when the next dirty page is found. If the slot ends with clean pages, no seek occurs — the file cursor remains at slot_start + last_dirty_page_end instead of slot_start + slot_size.

For single-slot VMs (≤3GB on x86_64, below the MMIO gap), this is harmless — there's no subsequent slot to misalign. For multi-slot VMs (>3GB, where the MMIO gap at ~3GB splits memory into two KVM slots), slot 1's data is shifted backward by the size of slot 0's trailing clean pages.

Impact

  • Diff snapshots of VMs with >3GB RAM produce corrupted memory files
  • ~47,000 pages (185MB) written to wrong offsets in a 4GB VM
  • On restore: InvalidAvailIdx panics (virtio avail ring reads stale data from wrong offset)
  • Full snapshots are unaffected (they dump all pages sequentially, no dirty bitmap)
  • VMs with ≤3GB RAM are unaffected (single memory slot)

Fix

Track total cursor advancement (cursor_offset) through the dump loop. After processing all bitmap entries, seek to slot.len() - cursor_offset to align the cursor at the exact slot boundary.

Test Plan

Added test_dump_dirty_multi_slot_cursor_alignment regression test:

  • Creates two memory regions with known data patterns
  • Marks only the first page of region 1 as dirty (127 trailing clean pages)
  • Marks all pages of region 2 as dirty
  • Verifies region 2 data in the diff dump matches a full dump at the same offset
cargo test -p vmm --lib -- test_dump_dirty
running 2 tests
test vstate::memory::tests::test_dump_dirty ... ok
test vstate::memory::tests::test_dump_dirty_multi_slot_cursor_alignment ... ok

Also verified end-to-end with 4GB and 8GB VMs:

  • 4GB: diff snapshot restore passes (1.0s warm start)
  • 8GB: diff snapshot restore passes (2.0s warm start)

When taking a diff snapshot of a VM with multiple KVM memory slots
(e.g., x86_64 VMs with >3GB RAM where the MMIO gap splits memory
into two regions), `GuestMemorySlot::dump_dirty` did not advance
the file cursor to the end of each slot after processing its dirty
bitmap. If a slot ended with clean (non-dirty) pages, the cursor
remained short by the size of those trailing pages.

This caused all subsequent slots' data to be written at incorrect
file offsets, corrupting the diff snapshot. On restore, the shifted
data produced invalid virtio queue state (e.g., avail ring idx
reading stale values), triggering `InvalidAvailIdx` panics.

The bug only affects VMs with multiple memory slots — single-slot
VMs (≤3GB on x86_64) were unaffected because there is no subsequent
slot whose cursor would be misaligned.

Fix: track total cursor advancement through the dump loop and seek
to `slot.len() - cursor_offset` at the end of each slot, ensuring
the cursor is always aligned at the exact slot boundary.

Signed-off-by: EJ Ciramella <ejc3@meta.com>
@ilstam ilstam self-assigned this Feb 18, 2026
@ilstam ilstam added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Feb 18, 2026
@ilstam
Copy link
Copy Markdown
Contributor

ilstam commented Feb 20, 2026

Hello EJ,

Thank you very much for reporting this.

This is indeed a bug in Firecracker.

I think there is a slightly simpler way to implement the fix. Also, I would rather extend the existing test_dump_dirty() rather than introducing a new one.

Normally we'd share the feedback here and ask you to send a new revision, but given the importance of the bug we want to move a bit faster here in order to get a release with the fix out sooner.

I've been told that Github allows me to force push to your branch associated with this PR, but personally I find this weird.

Hence, I have created a separate PR: #5705

I would appreciate your reviews and feedback on the new PR.

Again, thank you for reporting this.

Ilias

@ilstam ilstam closed this Feb 20, 2026
@ejc3
Copy link
Copy Markdown
Author

ejc3 commented Feb 20, 2026

All good, thank you for doing the work to get this landed.

@Manciukic
Copy link
Copy Markdown
Contributor

@ejc3 thanks again for reporting the issue and proposing a fix! The release with the fix is now live: https://github.com/firecracker-microvm/firecracker/releases/tag/v1.14.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants