Skip to content

Conversation

@ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Dec 16, 2025

MSRs will be added in another PR.

  • Removed padding fields from fpu register to simplify comparisons. Not sure why I added them to begin with, we don't need C repr on these. It could possibly allow more efficient into() implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to me
  • Reset vcpu state on snapshot::restore
  • Added bunch of tests
  • MSRs are not included in this PR. Will add MSRs in future PR

Addresses #791 partially

@ludfjig ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Dec 16, 2025
@ludfjig ludfjig force-pushed the reset_vcpu branch 3 times, most recently from 9cfcc9c to 8507c7a Compare December 16, 2025 19:52
xsave
}

fn hyperlight_vm(code: &[u8]) -> Result<HyperlightVm> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this duplicating alot of the configuration code we do during VM setup in the test? how will we maintain this with the actual setup? What if there is a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is calling HyperlightVm::new so they should be the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the logic setting up the exclusive memory, stack cookie, etc. It seems like alot of the same in set_up_hypervisor_partition but simplier and having that not be the same might cause difference in configuration that might not be caught. This seems like a good case of doing integration testing at the host API and guest api layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does indeed do a lot of setup... But that's more related to memory, and whether it's different or not shouldn't affect these tests which exercises vcpu state tests (I think)

Copy link
Contributor Author

@ludfjig ludfjig Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebasing on top of main branch with changes that changed snapshotting, it got even worse. But I still believe it's fine, because memory-specifics is irrelevant to these tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this but I guess it might be fine. We might be able to add an abstraction that "preps the vm" but I haven't really thought this through fully.

@ludfjig ludfjig force-pushed the reset_vcpu branch 11 times, most recently from d09b1fc to 18d4ff9 Compare December 17, 2025 06:46
@ludfjig ludfjig requested a review from Copilot December 17, 2025 06:53

This comment was marked as outdated.

@ludfjig ludfjig marked this pull request as ready for review December 17, 2025 07:26
@ludfjig ludfjig force-pushed the reset_vcpu branch 4 times, most recently from 180efe7 to 6cf483b Compare January 12, 2026 23:35
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to see. Great work figuring out the differences between the hypervisors and paying attention and documenting the differences. I learned a bunch.

My main concern is around the readability and maintenance of the tests. I've left a few suggestions on how we could maybe clean them up to be more concise and easier to reason about

@ludfjig ludfjig force-pushed the reset_vcpu branch 7 times, most recently from 1e8d95a to 71d6a38 Compare January 22, 2026 00:09
- build-guests
strategy:
fail-fast: true
fail-fast: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this purposeful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it just helps me debugging, will remove this before merging 👍

@ludfjig ludfjig force-pushed the reset_vcpu branch 2 times, most recently from 2ccb53d to 2091ef4 Compare January 26, 2026 20:59
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I left some comments below.

Comment on lines +880 to +885
// Resets the following vCPU state:
// - General purpose registers
// - Debug registers
// - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults)
// - Special registers (with saved PML4 if feature enabled)
// TODO: check if other state needs to be reset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
// Resets the following vCPU state:
// - General purpose registers
// - Debug registers
// - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults)
// - Special registers (with saved PML4 if feature enabled)
// TODO: check if other state needs to be reset
/// Resets the following vCPU state:
/// - General purpose registers
/// - Debug registers
/// - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults)
/// - Special registers (with saved PML4 if feature enabled)
/// TODO: check if other state needs to be reset

}

// if this fails, then the sandbox will not have been restored so keep it poisoned (if it is)
self.vm.reset_vcpu().map_err(HyperlightVmError::Restore)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the sandbox is not poisoned (good state), and it fails here?
Won't it be partially reset?
I mean, 15 lines above we run the restore_snapshot method, which restores the memory, so if this fails, it will remain not poisoned, with the memory reset, but not the registers, right?


/// The hypervisor types available for the current platform
#[derive(PartialEq, Eq, Debug)]
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated question: Why is the HypervisorType defined twice?
Once here, the other in sandbox/hypervisor.rs.

let mut buf = XSave::default(); // default is zeroed 4KB buffer

// Copy XCOMP_BV (offset 520-527) - preserves feature mask + compacted bit
buf.buffer[520..528].copy_from_slice(&current_xsave.buffer[520..528]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to preserve this in kvm as well?

// - Bytes 0-1: FCW (x87 FPU Control Word)
// - Bytes 24-27: MXCSR
// - Bytes 512-519: XSTATE_BV (bitmap of valid state components)
xsave.region[0] = FP_CONTROL_WORD_DEFAULT as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is writing 32 bits to a 16 bit field. We only write the 16 bits in mshv. Is this intended?

}

fn reset_xsave(&self) -> std::result::Result<(), RegisterError> {
const XSAVE_MIN_SIZE: usize = 576; // 512 legacy + 64 header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same constant as in whp, consider sharing it?

}

let provided_size = std::mem::size_of_val(xsave) as u32;
if buffer_size_needed > provided_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in kvm and mshv we do an explicit check for std::mem::size_of_val(xsave) != KVM_XSAVE_SIZE_BYTES and here we don't. it means we might provide a buffer to overwrite values and only set certain amount. this is just test code but could help with strange bugs where we think we are doing something different.

// Check FPU is reset to defaults
assert_fpu_reset(hyperlight_vm.vm.as_ref());

// Verify MXCSR via xsave on KVM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might just leave a comment here, that using fpu to check this doesn't work on kvm and is why we do the extra read via xsave

hyperlight_vm.reset_vcpu().unwrap();

// Check registers are reset to defaults
assert_regs_reset(hyperlight_vm.vm.as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much cleaner thanks!

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. A few comments to address otherwise I like it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants