Skip to content

Conversation

@syntactically
Copy link
Member

This combines a few commits from the original guest-CoW PR, all of which involve properly CoW'ing the page tables into the scratch region, dealing with this in snapshots, etc.

Previously, this was inconsistent.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@syntactically syntactically added kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. Guest-COW PRs that form part of the Guest-COW change labels Jan 28, 2026
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
whp can now map any region as long as it contains a file
handle (which, presently, only snapshot and scratch regions do---not
user-provided regions).

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This fixes an error introduced by a merge conflict in a previous PR.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@syntactically syntactically force-pushed the lm/pt-scratch branch 2 times, most recently from 1b10f54 to 22173e0 Compare January 28, 2026 06:53
Previously, in 7ca0a78, MAP_SHARED was replaced with MAP_PRIVATE,
primarily to bring the Miri and non-Miri code paths closer together,
since it did not appear to have any detrimental effect.  However, this
seems to cause issues with the scratch mapping on mshv on Linux. It is
not entirely clear what mshv is doing here; possibly something like
"mapping a page which has never been written to" causes the problem.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
As per the comments preexisting in
src/hyperlight_common/src/layout.rs. This avoids some fairly
cumbersome proliferation of #[cfg(feature = "init-paging")], and may
be developed further in the future.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Instead of updating page tables in place, and using the heap region
memory for new page tables, this changes the hyperlight_guest virtual
memory subsystem to treat snapshot page tables as copy-on-write and to
allocate all new page tables in the scratch region.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
These were already being CoW'd.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Now that some important things are being written to the scratch
region, this makes the snapshot process aware of it. In particular,
rather than just copying the current snapshot region into a new
snapshot, the snapshot process now traverses the guest page tables and
picks all pages that are mapped into guest virtual memory, placing
them into the new snapshot along with new page tables which account
for all physical memory having moved to the snapshot region.

Several tests are temporarily disabled in this commit. One,
`snapshot_restore_handles_remapping_correctly`, will be re-enabled in
the next commit; its failure is due to an issue with state in the
guest page mapping process not being reset on snapshot. Several
others, related to the stack guard page, are also disabled, because
the snapshot process loses the context, kept in the host, of which
physical page is the stack guard page.  These will be re-enabled in an
upcoming patch series, which moves stack management fully into the
guest.

This commit also disables an optimisation around restoring to a
just-taken snapshot in certain cases; see the comment at the head of
`MultiUseSandbox::restore` for details.  Note that the optimisation of
taking a new snapshot immediately after a restore remains intact.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Previously, the guest tried to infer the addresses of the snapshot
page tables from the values of cr3 that it saw at various times, but
this was brittle and irritating to do without writing to the wrong
address at times.  This replaces that mechanism with an extra word in
the metadata portion of the scratch region that explicitly provides
this information.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Remove the unused constant `SCRATCH_TOP_USED_OFFSET`.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Copy link
Member

@andreiltd andreiltd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! I'm looking forward to removing more special regions from shared memory. Having an extra pair of eyes on the changes would be great also.

@jsturtevant jsturtevant mentioned this pull request Jan 28, 2026
17 tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a meta comment, but I've always felt this was fragile. Is there any reason we don't just replace all the steps in ci with the just commands? that way they stay in sync automatically.

if gpa >= scratch_base {
(scratch, (gpa - scratch_base) as usize)
} else {
(snap, gpa as usize - SandboxMemoryLayout::BASE_ADDRESS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes attempt to subtract with overflow in debug mode. I see its handled later so things don't go wrong but would it be better move checks here so this function isn't mis used at a later point in time. Adding a unit test would help validate this as well

}
.filter_map(move |(gva, gpa, bm)| {
// the scratch map doesn't count
if gva > scratch_base_gva(scratch_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that gva could match scratch base? should this be >= like the line below?

#[allow(clippy::useless_conversion)]
let host_region_base: usize = rgn.host_region.start.into();
return Some(unsafe {
std::slice::from_raw_parts((host_region_base + off) as *const u8, PAGE_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are reading the whole Page size here, is it possible that the off + PAGE_SIZE > region_len?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guest-COW PRs that form part of the Guest-COW change kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants