igvminspect: add tool to extract and dump igvm file content#3033
igvminspect: add tool to extract and dump igvm file content#3033justus-camp-microsoft wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an igvmfilegen diff subcommand to help investigate IGVM reproducibility issues by extracting IGVM contents into structured temp directories and invoking diffoscope on the extracted trees.
Changes:
- Add
Diffsubcommand wiring toigvmfilegenCLI. - Implement IGVM extraction + region naming/coalescing +
diffoscopeinvocation in a newdiffmodule. - Document the new tool/subcommand in the Guide and link it from
SUMMARY.md.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/loader/igvmfilegen/src/main.rs | Adds Diff subcommand to CLI and dispatches to the new diff implementation. |
| vm/loader/igvmfilegen/src/diff.rs | New extraction/coalescing logic and diffoscope runner for comparing IGVM outputs. |
| vm/loader/igvmfilegen/Cargo.toml | Adds tempfile dependency for managing extracted temp directories. |
| Guide/src/dev_guide/dev_tools/igvmfilegen.md | New documentation page describing igvmfilegen diff usage and workflows. |
| Guide/src/SUMMARY.md | Adds the new igvmfilegen doc page to the Guide table of contents. |
| Cargo.lock | Locks tempfile as a dependency due to the new module. |
| let component = lookup_map_name(map, *gpa).unwrap_or("unmapped").to_string(); | ||
| // Deduplicate pages at the same GPA (different compatibility masks | ||
| // produce duplicate PageData entries with identical content). | ||
| if seen_gpas.insert(*gpa) { | ||
| page_data_entries.push(PageDataEntry { |
| /// Parse an IGVM .map file, extracting all layout entries across all isolation sections. | ||
| /// Deduplicates by (start_gpa, end_gpa). | ||
| fn parse_map_file(path: &Path) -> anyhow::Result<Vec<MapEntry>> { | ||
| let content = fs_err::read_to_string(path).context("reading map file")?; | ||
| let mut entries: Vec<MapEntry> = Vec::new(); | ||
| let mut seen: std::collections::HashSet<(u64, u64)> = std::collections::HashSet::new(); | ||
| let mut in_layout = false; | ||
|
|
ca19f4d to
b2e9e62
Compare
b2e9e62 to
d7ae1f0
Compare
There was a problem hiding this comment.
Pull request overview
Introduces a new igvminspect CLI tool under vm/loader/ to inspect IGVM binaries and diff two IGVM outputs (with their .bin.map files) via diffoscope, supporting reproducibility investigations. As part of the split, the legacy dump functionality is removed from igvmfilegen and documented separately in the Guide.
Changes:
- Add new
igvminspectcrate withdumpanddiffsubcommands (diff extracts IGVM parts and runs diffoscope). - Remove the
Dumpsubcommand fromigvmfilegen(moving inspection responsibilities toigvminspect). - Update the Guide SUMMARY and add dev-tool pages for
igvmfilegenandigvminspect.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/loader/igvminspect/src/main.rs | New CLI entrypoint defining dump/diff subcommands and tracing setup. |
| vm/loader/igvminspect/src/diff.rs | Extraction logic + map parsing + diffoscope invocation for IGVM diffs. |
| vm/loader/igvminspect/Cargo.toml | New crate manifest and dependencies. |
| vm/loader/igvmfilegen/src/main.rs | Removes Dump subcommand from igvmfilegen. |
| Guide/src/SUMMARY.md | Adds links for igvmfilegen and new igvminspect pages. |
| Guide/src/dev_guide/dev_tools/igvminspect.md | New documentation for usage and workflows for diffing IGVMs. |
| Guide/src/dev_guide/dev_tools/igvmfilegen.md | New short doc entry for igvmfilegen. |
| Cargo.toml | Adds vm/loader/igvminspect to workspace members. |
| Cargo.lock | Adds igvminspect package entry. |
Comments suppressed due to low confidence (1)
vm/loader/igvmfilegen/src/main.rs:53
- The PR title/description says “igvmfilegen: add diff subcommand…”, but the diff functionality is implemented in a new
igvminspecttool andigvmfilegenonly removes the olddumpsubcommand. Please update the PR title/description to match the actual change (or rename/move as intended) to avoid confusion for reviewers/users.
#[derive(Parser)]
#[clap(name = "igvmfilegen", about = "Tool to generate IGVM files")]
enum Options {
/// Build an IGVM file according to a manifest
Manifest {
/// Config manifest file path
#[clap(short, long = "manifest")]
manifest: PathBuf,
| let image = fs_err::read(file_path).context("reading input file")?; | ||
| let fixed_header = IGVM_FIXED_HEADER::read_from_prefix(image.as_bytes()) | ||
| .expect("Invalid fixed header") | ||
| .0; // TODO: zerocopy: use-rest-of-range (https://github.com/microsoft/openvmm/issues/759) | ||
|
|
||
| let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid"); | ||
| println!("Total file size: {} bytes\n", fixed_header.total_file_size); |
There was a problem hiding this comment.
dump currently panics on invalid/unexpected input via .expect(...) (fixed header parse and IGVM parse). Since this tool processes arbitrary IGVM binaries, please return a contextual anyhow error instead of panicking so failures are reported cleanly (and consistent with the repo’s “don’t panic on untrusted input” guidance).
| for region in ®ions { | ||
| let total = name_counts[®ion.component]; | ||
| let idx = name_indices.entry(region.component.clone()).or_insert(0); | ||
| let ext = extension_for_component(®ion.component); | ||
| let filename = if total == 1 { | ||
| format!("{}.{ext}", region.component) | ||
| } else { | ||
| format!("{}_{}.{ext}", region.component, idx) | ||
| }; | ||
| *idx += 1; | ||
|
|
||
| fs_err::write(regions_dir.join(&filename), ®ion.data)?; | ||
| writeln!( |
There was a problem hiding this comment.
Region/component names from the .bin.map file are used directly in output filenames. A crafted map entry containing path separators or .. could cause path traversal (writing outside the temp dir) or create unintended nested paths. Please sanitize component names into a safe filename (e.g., replace / and \\, strip .., and/or use a conservative allowlist) and keep the original name only in regions.txt metadata if needed.
| let left_igvm = | ||
| IgvmFile::new_from_binary(&left_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; | ||
| let right_igvm = | ||
| IgvmFile::new_from_binary(&right_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; |
There was a problem hiding this comment.
IGVM parse errors are currently converted with map_err(|e| anyhow!("{e:?}")), which loses structured context and doesn’t indicate which side failed. Prefer preserving the original error and adding context like "parsing left/right IGVM file" for more actionable output.
| let left_igvm = | |
| IgvmFile::new_from_binary(&left_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; | |
| let right_igvm = | |
| IgvmFile::new_from_binary(&right_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; | |
| let left_igvm = IgvmFile::new_from_binary(&left_data, None) | |
| .with_context(|| "parsing left IGVM file")?; | |
| let right_igvm = IgvmFile::new_from_binary(&right_data, None) | |
| .with_context(|| "parsing right IGVM file")?; |
| /// Parse an IGVM .map file, extracting all layout entries across all isolation sections. | ||
| /// Deduplicates by (start_gpa, end_gpa). | ||
| fn parse_map_file(path: &Path) -> anyhow::Result<Vec<MapEntry>> { | ||
| let content = fs_err::read_to_string(path).context("reading map file")?; | ||
| let mut entries: Vec<MapEntry> = Vec::new(); | ||
| let mut seen: std::collections::HashSet<(u64, u64)> = std::collections::HashSet::new(); | ||
| let mut in_layout = false; | ||
|
|
||
| for line in content.lines() { | ||
| if line.starts_with("IGVM file layout:") { | ||
| in_layout = true; | ||
| continue; | ||
| } | ||
| if in_layout { | ||
| let trimmed = line.trim(); | ||
| if trimmed.is_empty() | ||
| || trimmed.starts_with("IGVM file required memory:") | ||
| || trimmed.starts_with("IGVM file relocatable regions:") | ||
| || trimmed.starts_with("IGVM file isolation:") | ||
| { | ||
| in_layout = false; | ||
| continue; | ||
| } | ||
| // Parse: " 0000000000100000 - 0000000000700000 (0x600000 bytes) uefi-image" | ||
| if let Some(entry) = parse_map_line(trimmed) { | ||
| if seen.insert((entry.start_gpa, entry.end_gpa)) { | ||
| entries.push(entry); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| entries.sort_by_key(|e| e.start_gpa); | ||
| Ok(entries) | ||
| } | ||
|
|
||
| fn parse_map_line(line: &str) -> Option<MapEntry> { | ||
| // Format: "0000000000100000 - 0000000000700000 (0x600000 bytes) uefi-image" | ||
| let parts: Vec<&str> = line.splitn(2, ')').collect(); | ||
| if parts.len() != 2 { | ||
| return None; | ||
| } | ||
| let name = parts[1].trim().to_string(); | ||
| let addr_part = parts[0]; // "0000000000100000 - 0000000000700000 (0x600000 bytes" | ||
| let tokens: Vec<&str> = addr_part.split_whitespace().collect(); | ||
| if tokens.len() < 3 || tokens[1] != "-" { | ||
| return None; | ||
| } | ||
| let start_gpa = u64::from_str_radix(tokens[0], 16).ok()?; | ||
| let end_gpa = u64::from_str_radix(tokens[2], 16).ok()?; | ||
| Some(MapEntry { | ||
| start_gpa, | ||
| end_gpa, | ||
| name, | ||
| }) | ||
| } |
There was a problem hiding this comment.
parse_map_file / parse_map_line implement non-trivial parsing of the .bin.map format but have no unit tests. Adding a few focused tests (valid line, invalid line, multiple isolation sections, dedup behavior) would help keep the tool reliable as the map format evolves.
| ```bash | ||
| # Download artifacts from two different runs | ||
| gh run download <run-id-a> --repo microsoft/openvmm \ | ||
| --name x64-openhcl-igvm --dir /tmp/build-a | ||
| gh run download <run-id-a> --repo microsoft/openvmm \ | ||
| --name x64-openhcl-igvm-extras --dir /tmp/extras-a | ||
| gh run download <run-id-b> --repo microsoft/openvmm \ | ||
| --name x64-openhcl-igvm --dir /tmp/build-b | ||
| gh run download <run-id-b> --repo microsoft/openvmm \ | ||
| --name x64-openhcl-igvm-extras --dir /tmp/extras-b | ||
|
|
There was a problem hiding this comment.
This page contains multiple very long lines (e.g., gh run download ... --repo ... commands) that exceed the Guide style guide’s 80-character wrap guideline. Please wrap the prose and consider splitting long command invocations across lines more consistently to keep the docs readable in narrow renderers.
| @@ -0,0 +1,7 @@ | |||
| # igvmfilegen | |||
|
|
|||
| `igvmfilegen` is the tool that assembles IGVM files for OpenHCL from a | |||
There was a problem hiding this comment.
I feel like either we should fully flesh out this page or we shouldn't have it at all. Having this template page just means we'll forget to add more stuff to it later.
| let image = fs_err::read(file_path).context("reading input file")?; | ||
| let fixed_header = IGVM_FIXED_HEADER::read_from_prefix(image.as_bytes()) | ||
| .expect("Invalid fixed header") | ||
| .0; // TODO: zerocopy: use-rest-of-range (https://github.com/microsoft/openvmm/issues/759) |
There was a problem hiding this comment.
Might as well address this TODO now?
| cargo run -p igvminspect -- dump --filepath <file.bin> | ||
| ``` | ||
|
|
||
| ## `igvminspect diff` |
There was a problem hiding this comment.
Would it be better/cleaner to only provide an extract command and let people bring their own diff tools? It feels odd to hardcode to one specific tool. We could still recommend diffoscope in the guide ofc. But it feels odd to me that all we're doing is extracting twice and running an external tool, when we could just provide "extract" and get a much simpler interface with the same power.
There was a problem hiding this comment.
Yeah you're definitely right that this should just be an extract command - doing the diffoscope as part of this has actually been annoying me and I just end up running diffoscope on the extracted stuff anyways. I'll push an update that just pulls the IGVM into its parts and call it Extract
d7ae1f0 to
1db816a
Compare
|
Guide and PR title/description need updating |
1db816a to
ce53b7f
Compare
| .expect("Invalid fixed header"); | ||
|
|
||
| let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid"); |
There was a problem hiding this comment.
dump currently panics on malformed input via expect(...) when reading the fixed header and parsing the IGVM file. Since the IGVM path is user-provided, this should return a structured error (e.g., anyhow::bail! / Context) instead of aborting the process.
| .expect("Invalid fixed header"); | |
| let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid"); | |
| .context("invalid IGVM fixed header")?; | |
| let igvm_data = | |
| IgvmFile::new_from_binary(&image, None).context("parsing IGVM file")?; |
| let filename = if total == 1 { | ||
| format!("{}.{ext}", region.component) | ||
| } else { | ||
| format!("{}_{}.{ext}", region.component, idx) | ||
| }; |
There was a problem hiding this comment.
region.component (derived from the optional map file) is used directly to form output filenames. A crafted map entry containing path separators or .. could escape regions_dir (path traversal) or create invalid filenames. Consider validating/sanitizing component names (e.g., reject any name with path separators/components other than Normal, or map to a safe slug) before writing files.
| }; | ||
|
|
||
| let data = fs_err::read(igvm_path).context("reading IGVM file")?; | ||
| let igvm = IgvmFile::new_from_binary(&data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; |
There was a problem hiding this comment.
IgvmFile::new_from_binary errors are converted into a debug-formatted string with map_err(|e| anyhow!("{e:?}")), which drops the original error as a source and loses actionable context. Prefer preserving the error and adding context like "parsing IGVM file " (e.g., with_context).
| let igvm = IgvmFile::new_from_binary(&data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?; | |
| let igvm = IgvmFile::new_from_binary(&data, None) | |
| .with_context(|| format!("parsing IGVM file {}", igvm_path.display()))?; |
| pub fn extract_igvm_file( | ||
| igvm_path: &Path, | ||
| map_path: Option<&Path>, | ||
| output_dir: &Path, | ||
| ) -> anyhow::Result<()> { |
There was a problem hiding this comment.
The new extract_igvm_file behavior (map parsing + region coalescing + output layout) is non-trivial but has no unit tests. Other tool crates in this repo include tests; adding tests for parse_map_file and write_coalesced_regions (including edge cases like multiple layout blocks and unmapped regions) would help prevent regressions.
ce53b7f to
1aa0cc2
Compare
This pulls the Dump command out into a new crate
igvminspectand also introduces an Extract command, which uses the map file to pull the IGVM file into its parts. This is useful for diffing reproducible builds as diffoscope doesn't have support for IGVM files, but has support for each of its constituent parts.