Persist snapshot to disk#1373
Conversation
7f499c8 to
94e728e
Compare
There was a problem hiding this comment.
Pull request overview
Adds first-class support for persisting Snapshots to disk and rehydrating sandboxes from those snapshots to avoid ELF parsing / guest init on cold start, using file-backed mappings for (near) zero-copy loads.
Changes:
- Implement
Snapshot::to_file(),Snapshot::from_file(), andSnapshot::from_file_unchecked()with a versioned on-disk header + mmappable memory blob. - Add
MultiUseSandbox::from_snapshot()fast-path for sandbox creation directly from an in-memory or disk-loaded snapshot. - Introduce file-backed
ReadonlySharedMemory::from_file()and refactorSandboxMemoryLayoutto expose enough stable layout fields for serialization.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/hyperlight_host/src/sandbox/uninitialized_evolve.rs | Updates layout API usage (peb_address() accessor). |
| src/hyperlight_host/src/sandbox/uninitialized.rs | Ensures snapshot-based sandbox creation registers default HostPrint via FunctionRegistry::with_default_host_print(). |
| src/hyperlight_host/src/sandbox/snapshot.rs | Defines snapshot file format + (de)serialization, file load/save APIs, sregs I/O, and extensive tests. |
| src/hyperlight_host/src/sandbox/initialized_multi_use.rs | Adds MultiUseSandbox::from_snapshot() instantiation path. |
| src/hyperlight_host/src/sandbox/host_funcs.rs | Adds FunctionRegistry::with_default_host_print() helper and makes default writer private. |
| src/hyperlight_host/src/mem/shared_mem.rs | Adds ReadonlySharedMemory::from_file() for file-backed snapshot memory mappings. |
| src/hyperlight_host/src/mem/mgr.rs | Updates layout field access for scratch I/O buffer sizes. |
| src/hyperlight_host/src/mem/memory_region.rs | Routes Snapshot regions through the Windows surrogate “ReadOnlyFile” mapping path. |
| src/hyperlight_host/src/mem/layout.rs | Refactors SandboxMemoryLayout to store key sizes directly and computes offsets via methods; updates PEB writing. |
| src/hyperlight_host/src/hypervisor/hyperlight_vm/x86_64.rs | Factors out apply_sregs() and updates peb_address() usage. |
| src/hyperlight_host/benches/benchmarks.rs | Adds benchmarks for snapshot file save/load and cold start via snapshot. |
| src/hyperlight_common/src/mem.rs | Adds write_to() helpers for GuestMemoryRegion and HyperlightPEB. |
| docs/snapshot-file-implementation-plan.md | Adds a detailed design/format plan and future work notes. |
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
Squash of hyperlight-dev#1373 by Ludvig Liljenberg onto current upstream main. Ports his three-commit series (layout refactor, design doc, persistence) as a single commit on this branch so we can iterate it without touching his fork. Highlights: Snapshot::to_file(path) — write a sandbox snapshot to disk (header + page-aligned blob + CoW bitmap + guard-page padding) Snapshot::from_file(path) — mmap it back with zero copy MultiUseSandbox::from_snapshot() — instantiate a sandbox directly from a persisted snapshot, bypassing ELF parsing and guest init ReadonlySharedMemory::from_file — the shared-memory primitive under both of the above, with Linux (mmap(MAP_PRIVATE)) and Windows (CreateFileMappingA + MapViewOfFile) zero-copy paths See docs/snapshot-file-implementation-plan.md for the wire format. The Windows code path currently maps the file as read-only shared (PAGE_READONLY / FILE_MAP_READ) rather than true copy-on-write (PAGE_WRITECOPY / FILE_MAP_COPY). That works for the boot path on WHP because guest writes go through the surrogate's own mapping, but breaks the contract for anything that writes directly through the host view. A follow-up commit on this branch switches it to true CoW so the API matches the Linux semantics end-to-end. Based-on: hyperlight-dev#1373 Authored-by: Ludvig Liljenberg <ludfjig@users.noreply.github.com> Signed-off-by: danbugs <danilochiarlone@gmail.com>
The Windows path in ReadonlySharedMemory::from_file_windows was
created with PAGE_READONLY + FILE_MAP_READ. That matches the name
('ReadonlySharedMemory') but not the semantics the caller needs: a
sandbox loaded from a snapshot still has to be a writable view of
the guest's memory from the host's perspective, so WHP/MSHV can
service copy-on-write faults the guest takes on first write.
A read-only mapping triggers an access violation on the host thread
the moment the guest touches any page, before the VMM can vector
the fault into the in-kernel CoW path.
Switch to PAGE_WRITECOPY + FILE_MAP_COPY — the Windows equivalent
of Linux's mmap(MAP_PRIVATE) that Linux's from_file path already
uses. Reads still come from the backing file; writes transparently
allocate private copy-on-write pages.
Follow-up to hyperlight-dev#1373; depends on that PR
landing first.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
andreiltd
left a comment
There was a problem hiding this comment.
Looks good! I think the biggest issue, that have to be addressed here, is to NOT trust the parsed from binary header values and validate them before transforming it to the layout.
I would also recommend using winnow to improve mechanics and safety of parsing.
ff7c424 to
4b3292a
Compare
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Replace the racy 'inner continue, outer continue, quit' pattern with 'detach, quit' inside the breakpoint commands. After the previous inner continue, the inferior could exit and the gdb stub could close the remote before gdb dispatched the outer continue, producing 'Remote connection closed' and a non-zero exit. The new shape lets the host run the guest call to completion on its own after detach, with no pending remote work in gdb. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
simongdavies
left a comment
There was a problem hiding this comment.
This is great work @ludfjig !!
I've left a couple of bits of feedback but my biggest comments are:
Discovery
There should be a way for a host to inspect a hls file , so it can discover what host functions it requires, what hypervisor it needs etc.
Something like
let snapshot_info = Snapshot::inspect-file("some.hls")
Process Wide Snapshot Cache
We should do the validation of the snapshot once and cache it (and remove the unchecked load). This way we pay the price for the validation once per process, this is likely the happy path for usage and should give the advantages of unchecked whilst retaining integrity checks.
static SNAPSHOT_CACHE: OnceLock<Mutex<HashMap<[u8; 32], Weak<Snapshot>>>> = OnceLock::new();
pub fn from_file_cached(
path: impl AsRef<std::path::Path>,
) -> crate::Result<Arc<Self>> {
let cache = SNAPSHOT_CACHE.get_or_init(|| Mutex::new(HashMap::new()));
// Read just enough to extract the blob_hash (cache key).
// This is ~1.2 KiB — one syscall, no mmap.
let declared_hash = Self::read_blob_hash(path.as_ref())?;
// Fast path: cache hit.
{
let map = cache.lock().map_err(|_| {
crate::new_error!("snapshot cache mutex poisoned")
})?;
if let Some(weak) = map.get(&declared_hash) {
if let Some(arc) = weak.upgrade() {
return Ok(arc);
}
}
}
// Slow path: full load with hash verification.
let snapshot = Arc::new(Self::from_file(path)?);
// Double-check: another thread may have loaded the same
// file while we were doing the full load. Prefer theirs
// to avoid duplicate mmaps.
{
let mut map = cache.lock().map_err(|_| {
crate::new_error!("snapshot cache mutex poisoned")
})?;
if let Some(weak) = map.get(&snapshot.hash) {
if let Some(existing) = weak.upgrade() {
// Someone else loaded it first — use theirs,
// our load is dropped.
return Ok(existing);
}
}
map.insert(snapshot.hash, Arc::downgrade(&snapshot));
// Opportunistic cleanup of expired entries.
map.retain(|_, v| v.strong_count() > 0);
}
Ok(snapshot)
}Last thing can we break this up into smaller PRs please, will make it easier to do a more thorough review of the changes. Thanks !
| // without rewriting the header invalidates the always-checked | ||
| // header hash. | ||
| let blob_hash: [u8; 32] = blake3::hash(self.memory.as_slice()).into(); | ||
| let mut hasher = blake3::Hasher::new(); |
There was a problem hiding this comment.
the comment above says that the blob_hash is part of the header_hash but it doesn't seem to be included in the data hashed for the header_hash ?
| // the guest resumes in the correct CPU state. | ||
| #[cfg(not(feature = "i686-guest"))] | ||
| if matches!(snapshot.entrypoint(), super::snapshot::NextAction::Call(_)) { | ||
| let sregs = snapshot.sregs().cloned().unwrap_or_else(|| { |
There was a problem hiding this comment.
Is this correct, shouldn't it be an error if there are no sregs when NextAction::Call since VM should already be initialised and the sregs should have been captured?
| /// and may allow conversion between versions), an ABI mismatch means | ||
| /// the memory blob is incompatible and the snapshot must be | ||
| /// regenerated from the guest binary. | ||
| const SNAPSHOT_ABI_VERSION: u32 = 1; |
There was a problem hiding this comment.
I think its a bit fragile to hope that someone will increment this if there are any breaking changes , I think a combination of tests and const asserts would help massively, we should be able to get pretty much full coverage such that the asserts/tests fail forcing the developer to update the tests/asserts/ABI version.
| let seg = |s: RawSegmentRegister| CommonSegmentRegister { | ||
| base: s.base, | ||
| limit: s.limit as u32, | ||
| selector: s.selector as u16, |
There was a problem hiding this comment.
should we be checking the length of these before truncating (or storing the correct sizes?) this is done for init_data_permissions on line 448?
I agree we should let users inspect a snapshot, and maybe even provide a caching mechanism like you propose (although on first glance my impression is that let users do this themselves), but maybe those could be follow ups as there's definitely some more design churn required for them?
I have 15 pretty atomic and separate commits that should aid review, do you feel strongly about separate prs still? More than half of the entire diff is just tests too |
I actually think both these things are pretty important to the fundamental use case for this feature, as a user without inspect there is no way for me to know what host_funcs I should be providing, so I load a snapshot and I don't provide the host functions it expects then there is an error message telling me that but with the inspect API I can detect this and not even try and load the file, more so with the hypervisor tag , its trial and error which file to load, with this API a host could enumerate available snapshots and only attempt to load ones which should work. The caching mechanism again is I think something we should offer as a core part of the API: It allows is to get rid of the unchecked version of the API , we shouldn't have this API, we should verify and validate snapshots always, anything else is just opening a path to hard to debug/diagnose issues. It seems the only reason we have this API is to avoid the cost of the hash verification, having the cache allows us to amortize the cost of the verification across every sandbox created with that snapshot file across the life of the process. This is the primary use case for snapshot persistence. It uses less resources (most of this arguably is trivial but everything helps, for example if we have N sandboxes created from one snapshot file it use 1xSnapshotSize Virtual Memory vs NxSnapshotSize Virtual Memory) The cache also helps in some of the snapshot file modification scenarios , for example if the snapshot files was modified on disk after the first load (assumming headers were still intact) the subsequent snapshot loads would be consistent. |
Agreed the commits make it easier, but as a reviewer I have to review the entire PR in one session, splitting it up means , I can do it in smaller digestible chunks. I think this would be a great test of GH PR stacks, I just asked for access to the preview, if we get it we should try using it on this PR I think. |
persist sandbox snapshots to disk
Adds
Snapshot::to_file/Snapshot::from_fileandMultiUseSandbox::from_snapshot, so a sandbox can be reconstructed from a saved snapshot without going throughUninitializedSandbox::evolve(). Works in-process and across processes.public API
HostFunctionsmust be a superset (by name and signature) of those registered when the snapshot was taken. The optionalSandboxConfigurationoverrides runtime fields (interrupt knobs,guest_core_dump,guest_debug_info). Layout fields (input_data_size,output_data_size,heap_size,scratch_size) are always taken from the snapshot.from_file_uncheckedis provided for trusted environments. It still verifies the header hash but skips the memory blob hash, making large snapshots load in constant time.file format
All header structs are
#[repr(C)]POD types derivingbytemuck::Pod+Zeroable, so the byte layout is whatever those structs declare and there are no separate offsets to keep in sync.The trailing PAGE_SIZE padding exists because Windows read-only file mappings cannot extend beyond the file's actual size, so the file must contain bytes for the trailing guard. Linux ignores the padding (its guard pages come from an anonymous mmap reservation).
versioning and portability tags
RawPreamble.format_version. Wire format of the file. Bumped when header byte layout or section ordering changes. May be convertible across versions.RawHeaderV1.abi_version. ABI of the memory blob contents. Bumped independently when in-guest data layouts change. A mismatch means the snapshot must be regenerated from the guest binary.archtag distinguishes guest arch (x86_64, aarch64, i686), so an i686-guest snapshot cannot be loaded into an amd64-guest build.hypervisortag distinguishes KVM, MSHV, and WHP. Segment register hidden-cache fields (unusable,type_,granularity,db) differ between hypervisors for the same architectural state, so cross-hypervisor loads are rejected by default.what is and is not persisted
Persisted: the snapshot region (guest code, PEB, heap, init data, page tables), all sregs, and the names + signatures of host functions registered at snapshot time.
Not persisted:
sandbox_id. Process-local counter, fresh ID assigned on load (see "sandbox identity andrestore" below).LoadInfo. Debug-only, reconstructible from ELF if needed (see "known limitations").regions. Always empty after snapshot construction (mapped-region contents are absorbed into the memory blob, see "known limitations").The scratch region is recreated fresh on load and re-initialised by copying page tables from snapshot to scratch and writing I/O buffer metadata.
The vCPU special registers are persisted because the guest init code installs a GDT, IDT, TSS, and segment descriptors that differ from the standard 64-bit defaults.
A
header_hashoverpreamble || header || sregs || host_funcsis always verified, even byfrom_file_unchecked. A separateblob_hashover the memory blob is verified byfrom_fileand skipped byfrom_file_unchecked. Becauseblob_hashis itself one of the bytes covered byheader_hash, swapping a memory blob without rewriting the header invalidates the always-checked header hash.gdb / crashdump
guest_debug_infoandguest_core_dumpfields ofSandboxConfigurationare honored byfrom_snapshot, so gdb and core dumps work after loading from disk.HyperlightVm::newinstalls a one-shot entry breakpoint for bothInitialiseandCallsnapshots so the gdb stub event loop enters on the first vCPU run regardless of how the sandbox was constructed. The breakpoint is removed on first hit by the run loop.sandbox identity and
restoreMultiUseSandbox::restorerequires the supplied snapshot to share the sandbox'ssandbox_idso it can reuse the underlying memory layout safely. Ids are process-local atomic counters and are not persisted to disk. Every call toSnapshot::from_fileassigns a fresh id, andfrom_snapshotcopies that id onto the resulting sandbox.So sandboxes built from clones of the same in-memory
Arc<Snapshot>are mutuallyrestore-compatible, while sandboxes from independentfrom_filecalls (even of the same path) are not.performance (Linux/KVM)
End-to-end wall-clock from zero state to a completed
Echoguest call.known limitations
from_snapshotsandbox lackbinary_pathand AT_ENTRY forCallsnapshots, andmem_profilelacks accurate traces. The file format would need extending to fix these.max_guest_log_levelis not plumbed throughfrom_snapshot. It is also intrinsically ineffective forCallsnapshots, this should be addressed orthogonally to this PR.guest-counterdoes not work onfrom_snapshot-built sandboxes.snapshot.regions()is empty after load).to_fileoverwrites the target path non-atomically. A crash mid-write can leave a partially written file. Concurrent writers to the same path are not serialised.future work
Typed error variants. Fuzz target for
from_file. CoW overlay layers. Cross-hypervisor portability via sregs normalisation. Huge page support (MAP_HUGETLB). Atomicto_filevia temp + rename + fsync. OCI distribution.