Skip to content

igvminspect: add tool to extract and dump igvm file content#3033

Open
justus-camp-microsoft wants to merge 1 commit intomicrosoft:mainfrom
justus-camp-microsoft:igvm_file_diff
Open

igvminspect: add tool to extract and dump igvm file content#3033
justus-camp-microsoft wants to merge 1 commit intomicrosoft:mainfrom
justus-camp-microsoft:igvm_file_diff

Conversation

@justus-camp-microsoft
Copy link
Copy Markdown
Contributor

@justus-camp-microsoft justus-camp-microsoft commented Mar 17, 2026

This pulls the Dump command out into a new crate igvminspect and also introduces an Extract command, which uses the map file to pull the IGVM file into its parts. This is useful for diffing reproducible builds as diffoscope doesn't have support for IGVM files, but has support for each of its constituent parts.

@justus-camp-microsoft justus-camp-microsoft requested review from a team as code owners March 17, 2026 19:52
Copilot AI review requested due to automatic review settings March 17, 2026 19:52
@github-actions github-actions bot added the Guide label Mar 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an igvmfilegen diff subcommand to help investigate IGVM reproducibility issues by extracting IGVM contents into structured temp directories and invoking diffoscope on the extracted trees.

Changes:

  • Add Diff subcommand wiring to igvmfilegen CLI.
  • Implement IGVM extraction + region naming/coalescing + diffoscope invocation in a new diff module.
  • Document the new tool/subcommand in the Guide and link it from SUMMARY.md.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vm/loader/igvmfilegen/src/main.rs Adds Diff subcommand to CLI and dispatches to the new diff implementation.
vm/loader/igvmfilegen/src/diff.rs New extraction/coalescing logic and diffoscope runner for comparing IGVM outputs.
vm/loader/igvmfilegen/Cargo.toml Adds tempfile dependency for managing extracted temp directories.
Guide/src/dev_guide/dev_tools/igvmfilegen.md New documentation page describing igvmfilegen diff usage and workflows.
Guide/src/SUMMARY.md Adds the new igvmfilegen doc page to the Guide table of contents.
Cargo.lock Locks tempfile as a dependency due to the new module.

Comment on lines +154 to +158
let component = lookup_map_name(map, *gpa).unwrap_or("unmapped").to_string();
// Deduplicate pages at the same GPA (different compatibility masks
// produce duplicate PageData entries with identical content).
if seen_gpas.insert(*gpa) {
page_data_entries.push(PageDataEntry {
Comment on lines +26 to +33
/// Parse an IGVM .map file, extracting all layout entries across all isolation sections.
/// Deduplicates by (start_gpa, end_gpa).
fn parse_map_file(path: &Path) -> anyhow::Result<Vec<MapEntry>> {
let content = fs_err::read_to_string(path).context("reading map file")?;
let mut entries: Vec<MapEntry> = Vec::new();
let mut seen: std::collections::HashSet<(u64, u64)> = std::collections::HashSet::new();
let mut in_layout = false;

Comment thread vm/loader/igvminspect/src/extract.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new igvminspect CLI tool under vm/loader/ to inspect IGVM binaries and diff two IGVM outputs (with their .bin.map files) via diffoscope, supporting reproducibility investigations. As part of the split, the legacy dump functionality is removed from igvmfilegen and documented separately in the Guide.

Changes:

  • Add new igvminspect crate with dump and diff subcommands (diff extracts IGVM parts and runs diffoscope).
  • Remove the Dump subcommand from igvmfilegen (moving inspection responsibilities to igvminspect).
  • Update the Guide SUMMARY and add dev-tool pages for igvmfilegen and igvminspect.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vm/loader/igvminspect/src/main.rs New CLI entrypoint defining dump/diff subcommands and tracing setup.
vm/loader/igvminspect/src/diff.rs Extraction logic + map parsing + diffoscope invocation for IGVM diffs.
vm/loader/igvminspect/Cargo.toml New crate manifest and dependencies.
vm/loader/igvmfilegen/src/main.rs Removes Dump subcommand from igvmfilegen.
Guide/src/SUMMARY.md Adds links for igvmfilegen and new igvminspect pages.
Guide/src/dev_guide/dev_tools/igvminspect.md New documentation for usage and workflows for diffing IGVMs.
Guide/src/dev_guide/dev_tools/igvmfilegen.md New short doc entry for igvmfilegen.
Cargo.toml Adds vm/loader/igvminspect to workspace members.
Cargo.lock Adds igvminspect package entry.
Comments suppressed due to low confidence (1)

vm/loader/igvmfilegen/src/main.rs:53

  • The PR title/description says “igvmfilegen: add diff subcommand…”, but the diff functionality is implemented in a new igvminspect tool and igvmfilegen only removes the old dump subcommand. Please update the PR title/description to match the actual change (or rename/move as intended) to avoid confusion for reviewers/users.
#[derive(Parser)]
#[clap(name = "igvmfilegen", about = "Tool to generate IGVM files")]
enum Options {
    /// Build an IGVM file according to a manifest
    Manifest {
        /// Config manifest file path
        #[clap(short, long = "manifest")]
        manifest: PathBuf,

Comment on lines +70 to +76
let image = fs_err::read(file_path).context("reading input file")?;
let fixed_header = IGVM_FIXED_HEADER::read_from_prefix(image.as_bytes())
.expect("Invalid fixed header")
.0; // TODO: zerocopy: use-rest-of-range (https://github.com/microsoft/openvmm/issues/759)

let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid");
println!("Total file size: {} bytes\n", fixed_header.total_file_size);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump currently panics on invalid/unexpected input via .expect(...) (fixed header parse and IGVM parse). Since this tool processes arbitrary IGVM binaries, please return a contextual anyhow error instead of panicking so failures are reported cleanly (and consistent with the repo’s “don’t panic on untrusted input” guidance).

Copilot uses AI. Check for mistakes.
Comment on lines +351 to +363
for region in &regions {
let total = name_counts[&region.component];
let idx = name_indices.entry(region.component.clone()).or_insert(0);
let ext = extension_for_component(&region.component);
let filename = if total == 1 {
format!("{}.{ext}", region.component)
} else {
format!("{}_{}.{ext}", region.component, idx)
};
*idx += 1;

fs_err::write(regions_dir.join(&filename), &region.data)?;
writeln!(
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Region/component names from the .bin.map file are used directly in output filenames. A crafted map entry containing path separators or .. could cause path traversal (writing outside the temp dir) or create unintended nested paths. Please sanitize component names into a safe filename (e.g., replace / and \\, strip .., and/or use a conservative allowlist) and keep the original name only in regions.txt metadata if needed.

Copilot uses AI. Check for mistakes.
Comment thread vm/loader/igvminspect/src/diff.rs Outdated
Comment on lines +392 to +395
let left_igvm =
IgvmFile::new_from_binary(&left_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
let right_igvm =
IgvmFile::new_from_binary(&right_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IGVM parse errors are currently converted with map_err(|e| anyhow!("{e:?}")), which loses structured context and doesn’t indicate which side failed. Prefer preserving the original error and adding context like "parsing left/right IGVM file" for more actionable output.

Suggested change
let left_igvm =
IgvmFile::new_from_binary(&left_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
let right_igvm =
IgvmFile::new_from_binary(&right_data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
let left_igvm = IgvmFile::new_from_binary(&left_data, None)
.with_context(|| "parsing left IGVM file")?;
let right_igvm = IgvmFile::new_from_binary(&right_data, None)
.with_context(|| "parsing right IGVM file")?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Comment on lines +26 to +81
/// Parse an IGVM .map file, extracting all layout entries across all isolation sections.
/// Deduplicates by (start_gpa, end_gpa).
fn parse_map_file(path: &Path) -> anyhow::Result<Vec<MapEntry>> {
let content = fs_err::read_to_string(path).context("reading map file")?;
let mut entries: Vec<MapEntry> = Vec::new();
let mut seen: std::collections::HashSet<(u64, u64)> = std::collections::HashSet::new();
let mut in_layout = false;

for line in content.lines() {
if line.starts_with("IGVM file layout:") {
in_layout = true;
continue;
}
if in_layout {
let trimmed = line.trim();
if trimmed.is_empty()
|| trimmed.starts_with("IGVM file required memory:")
|| trimmed.starts_with("IGVM file relocatable regions:")
|| trimmed.starts_with("IGVM file isolation:")
{
in_layout = false;
continue;
}
// Parse: " 0000000000100000 - 0000000000700000 (0x600000 bytes) uefi-image"
if let Some(entry) = parse_map_line(trimmed) {
if seen.insert((entry.start_gpa, entry.end_gpa)) {
entries.push(entry);
}
}
}
}

entries.sort_by_key(|e| e.start_gpa);
Ok(entries)
}

fn parse_map_line(line: &str) -> Option<MapEntry> {
// Format: "0000000000100000 - 0000000000700000 (0x600000 bytes) uefi-image"
let parts: Vec<&str> = line.splitn(2, ')').collect();
if parts.len() != 2 {
return None;
}
let name = parts[1].trim().to_string();
let addr_part = parts[0]; // "0000000000100000 - 0000000000700000 (0x600000 bytes"
let tokens: Vec<&str> = addr_part.split_whitespace().collect();
if tokens.len() < 3 || tokens[1] != "-" {
return None;
}
let start_gpa = u64::from_str_radix(tokens[0], 16).ok()?;
let end_gpa = u64::from_str_radix(tokens[2], 16).ok()?;
Some(MapEntry {
start_gpa,
end_gpa,
name,
})
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_map_file / parse_map_line implement non-trivial parsing of the .bin.map format but have no unit tests. Adding a few focused tests (valid line, invalid line, multiple isolation sections, dedup behavior) would help keep the tool reliable as the map format evolves.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +109
```bash
# Download artifacts from two different runs
gh run download <run-id-a> --repo microsoft/openvmm \
--name x64-openhcl-igvm --dir /tmp/build-a
gh run download <run-id-a> --repo microsoft/openvmm \
--name x64-openhcl-igvm-extras --dir /tmp/extras-a
gh run download <run-id-b> --repo microsoft/openvmm \
--name x64-openhcl-igvm --dir /tmp/build-b
gh run download <run-id-b> --repo microsoft/openvmm \
--name x64-openhcl-igvm-extras --dir /tmp/extras-b

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page contains multiple very long lines (e.g., gh run download ... --repo ... commands) that exceed the Guide style guide’s 80-character wrap guideline. Please wrap the prose and consider splitting long command invocations across lines more consistently to keep the docs readable in narrow renderers.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,7 @@
# igvmfilegen

`igvmfilegen` is the tool that assembles IGVM files for OpenHCL from a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like either we should fully flesh out this page or we shouldn't have it at all. Having this template page just means we'll forget to add more stuff to it later.

Comment thread vm/loader/igvminspect/src/main.rs Outdated
let image = fs_err::read(file_path).context("reading input file")?;
let fixed_header = IGVM_FIXED_HEADER::read_from_prefix(image.as_bytes())
.expect("Invalid fixed header")
.0; // TODO: zerocopy: use-rest-of-range (https://github.com/microsoft/openvmm/issues/759)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well address this TODO now?

cargo run -p igvminspect -- dump --filepath <file.bin>
```

## `igvminspect diff`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better/cleaner to only provide an extract command and let people bring their own diff tools? It feels odd to hardcode to one specific tool. We could still recommend diffoscope in the guide ofc. But it feels odd to me that all we're doing is extracting twice and running an external tool, when we could just provide "extract" and get a much simpler interface with the same power.

Copy link
Copy Markdown
Contributor Author

@justus-camp-microsoft justus-camp-microsoft Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're definitely right that this should just be an extract command - doing the diffoscope as part of this has actually been annoying me and I just end up running diffoscope on the extracted stuff anyways. I'll push an update that just pulls the IGVM into its parts and call it Extract

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

@smalis-msft
Copy link
Copy Markdown
Contributor

Guide and PR title/description need updating

@justus-camp-microsoft justus-camp-microsoft changed the title igvmfilegen: add diff subcommand for IGVM reproducibility investigations igvminspect: add tool to extract and dump igvm file content Apr 1, 2026
Copilot AI review requested due to automatic review settings April 1, 2026 21:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Comment thread vm/loader/igvminspect/src/main.rs Outdated
Comment on lines +63 to +65
.expect("Invalid fixed header");

let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid");
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump currently panics on malformed input via expect(...) when reading the fixed header and parsing the IGVM file. Since the IGVM path is user-provided, this should return a structured error (e.g., anyhow::bail! / Context) instead of aborting the process.

Suggested change
.expect("Invalid fixed header");
let igvm_data = IgvmFile::new_from_binary(&image, None).expect("should be valid");
.context("invalid IGVM fixed header")?;
let igvm_data =
IgvmFile::new_from_binary(&image, None).context("parsing IGVM file")?;

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +359
let filename = if total == 1 {
format!("{}.{ext}", region.component)
} else {
format!("{}_{}.{ext}", region.component, idx)
};
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region.component (derived from the optional map file) is used directly to form output filenames. A crafted map entry containing path separators or .. could escape regions_dir (path traversal) or create invalid filenames. Consider validating/sanitizing component names (e.g., reject any name with path separators/components other than Normal, or map to a safe slug) before writing files.

Copilot uses AI. Check for mistakes.
Comment thread vm/loader/igvminspect/src/extract.rs Outdated
};

let data = fs_err::read(igvm_path).context("reading IGVM file")?;
let igvm = IgvmFile::new_from_binary(&data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IgvmFile::new_from_binary errors are converted into a debug-formatted string with map_err(|e| anyhow!("{e:?}")), which drops the original error as a source and loses actionable context. Prefer preserving the error and adding context like "parsing IGVM file " (e.g., with_context).

Suggested change
let igvm = IgvmFile::new_from_binary(&data, None).map_err(|e| anyhow::anyhow!("{e:?}"))?;
let igvm = IgvmFile::new_from_binary(&data, None)
.with_context(|| format!("parsing IGVM file {}", igvm_path.display()))?;

Copilot uses AI. Check for mistakes.
Comment on lines +382 to +386
pub fn extract_igvm_file(
igvm_path: &Path,
map_path: Option<&Path>,
output_dir: &Path,
) -> anyhow::Result<()> {
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new extract_igvm_file behavior (map parsing + region coalescing + output layout) is non-trivial but has no unit tests. Other tool crates in this repo include tests; adding tests for parse_map_file and write_coalesced_regions (including edge cases like multiple layout blocks and unmapped regions) would help prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants