Skip to content

igvm: Add CPUID 0xD sub-functions 11 and 12 for CET xstate#60

Open
souradeep100 wants to merge 1 commit intomicrosoft:mainfrom
souradeep100:main
Open

igvm: Add CPUID 0xD sub-functions 11 and 12 for CET xstate#60
souradeep100 wants to merge 1 commit intomicrosoft:mainfrom
souradeep100:main

Conversation

@souradeep100
Copy link
Copy Markdown

@souradeep100 souradeep100 commented Mar 31, 2026

The SNP CPUID table template only includes CPUID 0xD (Extended State Enumeration) sub-functions 0 through 8. On hosts that support CET (Control-flow Enforcement Technology), CPUID 0xD:1 advertises CET_U (bit 11) and CET_S (bit 12) in the XSS supervisor state mask (ECX).

When the guest kernel processes CPUID 0xD:1 and finds CET xstate bits enabled, it calls snp_cpuid_calc_xsave_size() to compute the total xsave area size by iterating over all enabled xfeature sub-functions. Since sub-functions 11 and 12 are missing from the SNP CPUID table, the lookup fails with xfeatures_found != xfeatures_en, returning 0. This causes the kernel to call sev_es_terminate(), which sends GHCB_MSR_TERM_REQ (VMGEXIT 0x100) and crashes the VM.

Add sub-functions 11 (CET_U) and 12 (CET_S) to the CPUID 0xD entries in the SNP CPUID table template. The PSP firmware populates the actual EAX/EBX/ECX/EDX values at launch time from hardware. On hosts without CET support, these entries are harmlessly zeroed out by the PSP.

Testing:

Reproducer (on Azure with CET, /dev/mshv):

igvmgen -kernel bzImage
-append "console=ttyS0 root=/dev/vda1 rw"
-boot_mode x64 -vtl 0 -svme 1 -encrypted_page 1
-pvalidate_opt 1 -o output.bin

cloud-hypervisor --cpus boot=1,nested=off --memory size=512M
--disk path=osdisk.img path=cloudinit
--igvm output.bin --platform sev_snp=on -v

Before fix (with Cloud Hypervisor GHCB CPUID handler patched):

Guest kernel panics during xsave size computation:
x86/fpu: misordered xstate at 576
sev_es_terminate() -> VMGEXIT 0x100 (GHCB_MSR_TERM_REQ)

After fix: VM boots successfully to login prompt.

The SNP CPUID table template only includes CPUID 0xD (Extended State
Enumeration) sub-functions 0 through 8. On hosts that support CET
(Control-flow Enforcement Technology), CPUID 0xD:1 advertises CET_U
(bit 11) and CET_S (bit 12) in the XSS supervisor state mask (ECX).

When the guest kernel processes CPUID 0xD:1 and finds CET xstate bits
enabled, it calls snp_cpuid_calc_xsave_size() to compute the total
xsave area size by iterating over all enabled xfeature sub-functions.
Since sub-functions 11 and 12 are missing from the SNP CPUID table,
the lookup fails with xfeatures_found != xfeatures_en, returning 0.
This causes the kernel to call sev_es_terminate(), which sends
GHCB_MSR_TERM_REQ (VMGEXIT 0x100) and crashes the VM.

Add sub-functions 11 (CET_U) and 12 (CET_S) to the CPUID 0xD entries
in the SNP CPUID table template. The PSP firmware populates the actual
EAX/EBX/ECX/EDX values at launch time from hardware. On hosts without
CET support, these entries are harmlessly zeroed out by the PSP.

Testing:

  Reproducer (on Azure DC16as_cc_v5 with CET, /dev/mshv):

  igvmgen -kernel bzImage \
    -append "console=ttyS0 root=/dev/vda1 rw" \
    -boot_mode x64 -vtl 0 -svme 1 -encrypted_page 1 \
    -pvalidate_opt 1 -o output.bin

  cloud-hypervisor --cpus boot=1,nested=off --memory size=512M \
    --disk path=osdisk.img path=cloudinit \
    --igvm output.bin --platform sev_snp=on -v

  Before fix (with Cloud Hypervisor GHCB CPUID handler patched):

    Guest kernel panics during xsave size computation:
    x86/fpu: misordered xstate at 576
    sev_es_terminate() -> VMGEXIT 0x100 (GHCB_MSR_TERM_REQ)

  After fix: VM boots successfully to login prompt.

Signed-off-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
@weltling
Copy link
Copy Markdown
Member

@KenGordon ping for a review.

Thanks

@KenGordon
Copy link
Copy Markdown
Contributor

Can you explain which generation has these bits, maybe linking to some AMD docs, and confirm that you have tested this on Milan and Genoa please?

@souradeep100
Copy link
Copy Markdown
Author

souradeep100 commented Apr 3, 2026

https://docs.amd.com/v/u/en-US/40332_4.09_APM_PUB mentions about XSS CET_U (bit 11) + CET_S (bit 12) .

  • Milan (Zen 3): Does not enumerate these bits. The fix is harmless on Milan — the PSP fills zeros for sub-functions 11/12 since the hardware doesn't report them.
  • Genoa (Zen 4): Does enumerate CET_U/CET_S. This is where the bug manifests — the host reports XSS = 0x1800 (bits 11+12 set), but the IGVM CPUID table was missing the corresponding 0xD:11 and 0xD:12 entries, causing the
    snp_cpuid_calc_xsave_size() mismatch and VMGEXIT 0x100 termination.

Our Azure host AMD EPYC 7763 — actually a Milan part, but the Azure CVM SKU exposes CET bits via the hypervisor) showed:

  • CPUID 0xD:1 ECX = 0x1800 → XSS advertises CET_U (bit 11) + CET_S (bit 12)
  • CPUID 0xD:11 → CET_U: size=16, supervisor state
  • CPUID 0xD:12 → CET_S: size=24, supervisor state

The igvmgen table was only going up to sub-function 8, so 11 and 12 were missing — causing the snp_cpuid_calc_xsave_size() mismatch and the VMGEXIT 0x100 crash.

@souradeep100
Copy link
Copy Markdown
Author

souradeep100 commented Apr 6, 2026

@KenGordon ping for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants