-
Notifications
You must be signed in to change notification settings - Fork 157
Reset more vcpu state on snapshot::restore #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9cfcc9c to
8507c7a
Compare
| xsave | ||
| } | ||
|
|
||
| fn hyperlight_vm(code: &[u8]) -> Result<HyperlightVm> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this duplicating alot of the configuration code we do during VM setup in the test? how will we maintain this with the actual setup? What if there is a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is calling HyperlightVm::new so they should be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about the logic setting up the exclusive memory, stack cookie, etc. It seems like alot of the same in set_up_hypervisor_partition but simplier and having that not be the same might cause difference in configuration that might not be caught. This seems like a good case of doing integration testing at the host API and guest api layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it does indeed do a lot of setup... But that's more related to memory, and whether it's different or not shouldn't affect these tests which exercises vcpu state tests (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebasing on top of main branch with changes that changed snapshotting, it got even worse. But I still believe it's fine, because memory-specifics is irrelevant to these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this but I guess it might be fine. We might be able to add an abstraction that "preps the vm" but I haven't really thought this through fully.
d09b1fc to
18d4ff9
Compare
180efe7 to
6cf483b
Compare
jsturtevant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to see. Great work figuring out the differences between the hypervisors and paying attention and documenting the differences. I learned a bunch.
My main concern is around the readability and maintenance of the tests. I've left a few suggestions on how we could maybe clean them up to be more concise and easier to reason about
1e8d95a to
71d6a38
Compare
| - build-guests | ||
| strategy: | ||
| fail-fast: true | ||
| fail-fast: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this purposeful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it just helps me debugging, will remove this before merging 👍
2ccb53d to
2091ef4
Compare
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
dblnz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I left some comments below.
| // Resets the following vCPU state: | ||
| // - General purpose registers | ||
| // - Debug registers | ||
| // - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults) | ||
| // - Special registers (with saved PML4 if feature enabled) | ||
| // TODO: check if other state needs to be reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| // Resets the following vCPU state: | |
| // - General purpose registers | |
| // - Debug registers | |
| // - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults) | |
| // - Special registers (with saved PML4 if feature enabled) | |
| // TODO: check if other state needs to be reset | |
| /// Resets the following vCPU state: | |
| /// - General purpose registers | |
| /// - Debug registers | |
| /// - XSAVE (includes FPU/SSE state with proper FCW and MXCSR defaults) | |
| /// - Special registers (with saved PML4 if feature enabled) | |
| /// TODO: check if other state needs to be reset |
| } | ||
|
|
||
| // if this fails, then the sandbox will not have been restored so keep it poisoned (if it is) | ||
| self.vm.reset_vcpu().map_err(HyperlightVmError::Restore)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the sandbox is not poisoned (good state), and it fails here?
Won't it be partially reset?
I mean, 15 lines above we run the restore_snapshot method, which restores the memory, so if this fails, it will remain not poisoned, with the memory reset, but not the registers, right?
|
|
||
| /// The hypervisor types available for the current platform | ||
| #[derive(PartialEq, Eq, Debug)] | ||
| #[derive(PartialEq, Eq, Debug, Copy, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated question: Why is the HypervisorType defined twice?
Once here, the other in sandbox/hypervisor.rs.
| let mut buf = XSave::default(); // default is zeroed 4KB buffer | ||
|
|
||
| // Copy XCOMP_BV (offset 520-527) - preserves feature mask + compacted bit | ||
| buf.buffer[520..528].copy_from_slice(¤t_xsave.buffer[520..528]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to preserve this in kvm as well?
| // - Bytes 0-1: FCW (x87 FPU Control Word) | ||
| // - Bytes 24-27: MXCSR | ||
| // - Bytes 512-519: XSTATE_BV (bitmap of valid state components) | ||
| xsave.region[0] = FP_CONTROL_WORD_DEFAULT as u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is writing 32 bits to a 16 bit field. We only write the 16 bits in mshv. Is this intended?
| } | ||
|
|
||
| fn reset_xsave(&self) -> std::result::Result<(), RegisterError> { | ||
| const XSAVE_MIN_SIZE: usize = 576; // 512 legacy + 64 header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same constant as in whp, consider sharing it?
| } | ||
|
|
||
| let provided_size = std::mem::size_of_val(xsave) as u32; | ||
| if buffer_size_needed > provided_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in kvm and mshv we do an explicit check for std::mem::size_of_val(xsave) != KVM_XSAVE_SIZE_BYTES and here we don't. it means we might provide a buffer to overwrite values and only set certain amount. this is just test code but could help with strange bugs where we think we are doing something different.
| // Check FPU is reset to defaults | ||
| assert_fpu_reset(hyperlight_vm.vm.as_ref()); | ||
|
|
||
| // Verify MXCSR via xsave on KVM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might just leave a comment here, that using fpu to check this doesn't work on kvm and is why we do the extra read via xsave
| hyperlight_vm.reset_vcpu().unwrap(); | ||
|
|
||
| // Check registers are reset to defaults | ||
| assert_regs_reset(hyperlight_vm.vm.as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much cleaner thanks!
jsturtevant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. A few comments to address otherwise I like it!
MSRs will be added in another PR.
into()implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to meAddresses #791 partially