feat(dwarf): make --dwarf remap work on real DWARF + witness oracle (v0.20.0)#209
feat(dwarf): make --dwarf remap work on real DWARF + witness oracle (v0.20.0)#209avrabe wants to merge 3 commits into
Conversation
…130) Falsifying the v0.19.0 single-source remap against a real Rust component (hello_rust.wasm) surfaced four address classes the synthetic oracle missed, plus gimli's all-or-nothing failure mode. All fixed: - low_pc is the function BODY START (locals-decl byte), not the first instruction — the locals/prefix region now maps linearly (locals are preserved verbatim). Previously every low_pc underflowed to a miss. - Exclusive-end addresses (high_pc-as-addr, range ends, line end_sequence) map to output_body_end. - LLVM emits fixed-width zero-padded LEBs for relocatable indices, so DWARF rows can land inside operand bytes — these resolve to the CONTAINING operator (exact on boundaries) for correct line attribution. - Linear-memory/data addresses (DW_OP_addr) at/beyond the code extent pass through unchanged (correct under multi-memory fusion). - Per-address tombstoning replaces all-or-nothing strip: gimli's Dwarf::from aborts on a single unmappable address, which on real modules (dead-code gaps) stripped ALL DWARF. convert_address is now total — correct offset where mappable, identity for data/tombstones, 0xFFFF_FFFF for unmappable code. Guarantee strengthens to "every emitted address is correct or an ignored tombstone, never wrong". Witness oracle (tests/dwarf_remap_witness.rs, gimli dev-dep): fuses a real Rust component with --dwarf remap and asserts every non-tombstone subprogram low_pc equals some fused function's component-provenance code_range.start (the pulseengine/witness contract, in-tree). On the fixture all but 9 of 225 functions map exactly; the 9 are dead code. LS-D-1 amended with the v0.20.0 hardening. Multi-source still strips (gimli writer-API blocker, see #208). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Automated review for PR #209pulseengine/meld: Verdict: 💬 Comment Summary: The commit introduces a real-DWARF hardening and per-address tombstoning feature in MeldCore. It includes several bug fixes and improvements to address issues found during testing. Findings: 0 mechanical (rivet) · 1 from local AI model. Findings (1):
Generated by a local AI model and post-validated against a strict JSON contract. Each finding includes the verbatim line being criticised — verify by reading the file at the cited location. Reviewed at |
LS-N verification gate
Approved Failed LS entries(none) Missing regression tests
Updated automatically by |
Mythos delta-pass (auto)✅ NO FINDINGS across 1 Tier-5 file(s)
Auto-run via |
) The Mythos auto-scan flagged that `AddressRemap::translate` could emit a plausible-but-wrong address at a function boundary. Input bodies are contiguous, so one address is both function A's exclusive end (A.input_end) and the next function B's start (B.input_start); the BTreeMap range lookup selects B and returned B.output_body_start, which differs from A.output_body_end when A and B are not adjacent in the fused output (interleaved / reordered functions). A's high_pc-as-addr, range-list end, and end_sequence then mapped to a wrong offset. `translate` now detects the divergent-boundary case (prev.output_body_end != span.output_body_start) and returns None so the caller tombstones the address rather than emit a wrong one. When the two interpretations coincide (input order preserved — the single-source case meld emits) the boundary resolves cleanly. Pinned by `translate_ambiguous_boundary_tombstones_when_reordered`; LS-D-1 amended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(Mythos #209) Second Mythos auto-scan finding: AddressRemap::code_extent used the max input_end over REGISTERED spans, so a source module's dropped trailing function (DCE'd, no span) has input addresses exceeding code_extent. convert_address then classified them as linear-memory data and passed them through as stale input-module offsets instead of tombstoning. code_extent is now the input module's full code-section extent (every input function, via module_function_layouts which parses them all), so a dropped function's address stays in the code range and tombstones. Pinned by `code_extent_covers_dropped_trailing_function`; LS-D-1 amended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
v0.19.0 shipped
DwarfHandling::Remapvalidated on synthetic DWARF. This PR runs the falsification the witness-integration epic (#130) called for — fusing a real Rust component (hello_rust.wasm) with--dwarf remapand checking the output DWARF against real compiler output. That immediately falsified the v0.19.0 remap (it stripped on the real fixture), surfacing four address-handling gaps and gimli's all-or-nothing failure mode. All are now fixed, and a strong end-to-end oracle pins the result.This is the #3 (witness proof) + the multi-source documentation (#208) slice of the planned v0.20.0 (async #142 follows as v0.21.0).
What the falsification found (and fixed)
low_pcmissedlow_pcis the function body start (locals-decl byte), not the first instruction[low,high)—high_pc/range-end/end_sequencepoint one past the last instructionaddr == input_end→output_body_endDW_OP_addrlinear-memory locations share the address space>= code_extent(correct under multi-memory)gimli::write::Dwarf::fromis all-or-nothing onconvert_addressconvert_address,0xFFFF_FFFFfor unmappable codeThe guarantee strengthens from "all-or-nothing correct-or-strip" to "every emitted address is correct or an ignored tombstone — never a plausible-but-wrong address."
The oracle (
tests/dwarf_remap_witness.rs)Fuses a real Rust component with
--dwarf remap, parses the output DWARF withgimli, and asserts every non-tombstoneDW_TAG_subprogramlow_pcequals some fused function's component-provenancecode_range.start— two independently-derived facts (DWARF vs. re-parsed code section) that only line up if the remap math is right. This is thepulseengine/witnesscode-offset → source-line contract, checked in-tree.Result on the fixture: all but 9 of 225 functions map exactly; the 9 are dead code meld dropped, correctly tombstoned.
Scope / residual
strip— merging independent DWARF unit sets is blocked by gimli's writer API; analysis + relocator design filed as DWARF Phase 2 follow-up: multi-source DWARF merge (cross-module .debug_* relocator) #208.gimliadded as a dev-dependency (already a normal dep) for the oracle.dwarf.rsis Tier-5, so this PR gets the Mythos delta scan.run_ls_verification.py([ OK ] LS-D-1).🤖 Generated with Claude Code