Shrink cwasm .wasmtime.{traps,addrmap} sections#13628
Open
alexcrichton wants to merge 2 commits into
Open
Conversation
This commit scratches an itch I've had for a long time about how we
encode traps into a final `*.cwasm`. This is frequently a pretty
substantial portion of a `*.cwasm` hovering around ~10-15% of the size
often. The goal of this commit is to shrink the size of this section by
at least a factor of two, and this currently shrinks it by ~75%.
The basic problem of this section is it's encoding 5 bytes of
information per trap, the u32 pc offset and the u8 trap code. The
previous encoding used all 5 bytes per trap, but this is generally not
the most efficient method. The other constraint for this section,
however, is that we want O(log N) search time to find a trap code for a
particular trapping offset meaning that a linear scan is a bit too much
to bite off here.
The general idea of this new encoding is as follows:
* Split the entire list of traps for a `*.cwasm` into fixed-width
blocks, here defined as 128 traps-per-block.
* A fixed-width index is created which maps from first-pc-in-block to
where-block-is-encoded. This index is the O(log N) search.
* Each block is encoded as:
* First a trap code byte. Currently the most common trap in this block.
* Next, for each entry in the block,
`uleb((offset - prev_offset) << 1 | different_trap)` is encoded.
This enables a delta-encoding of offsets which is the main source of
compression, and the lowest bit, if present, means that the uleb is
followed by a trap byte indicating what trap this offset corresponds
to.
Overall this gets the original 5-byte-per-trap overhead to roughly 1.5
bytes-per-trap which shaves off 75% of the size of this section. The
lookup factor for traps is still O(log N) with a slightly higher
constant factor than before.
The 128 traps-per-block factor is relatively arbitrary at this time, but
some analysis showed that it was a relatively good sweet spot of not
being too big while still getting the lion's share of compression
benefits.
This commit mirrors the previous commit for the `.wasmtime.addrmap` section of binaries. The encoding is similar in structure but the encoding of each block is slightly different where it handles the different nature of the address map section. Notably the payload of pc-delta's lowest bit of each entry indicates whether this is a "none" position or not. If a position is available then it's sleb-encoded as a delta from the previous position. The goal is to compress the 8-bytes-per-entry to ~2 bytes-per-entry which is largely achieved with this commit. Each entry tends to be pretty close pc-wise to the previous entry and pretty close source-wise from the previous entry as well. Overall this shrinks the `.wasmtime.addrmap` section by ~75% locally. In sum for a `libpython.so` this shaves of 8M of a 25M binary, saving ~30% in total file size between this optimization and the previous. cc bytecodealliance#3547 - note though this doesn't close the issue because this only compresses the section better, it doesn't remove extraneous entries which won't ever be needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is me scratching an itch I've had for quite some time, notably that the
.wasmtime.{traps,addrmap}sections are generally huge and pretty inefficiently encoded. These are inserted into all*.cwasmoutputs by default and can often represent over half the size of a compiled module, which I find pretty wasteful. I was curious to throw the problem at an LLM and see if it had recommendations on alternative encoding schemes, and I feel that the result here is pretty understandable and low-complexity while shrinking both of these sections by ~75% over their encodings today. The end result is a 30% size reduction of alibpython.cwasmfrom 25M to 17M which is a relatively huge improvement. The new encodings are drop-in replacements for the previous API of encoding/searching, and they share general structure but not so much internals to enable the two to still diverge over time if necessary.