Int overflow check with a faster approach#394
Open
lemire wants to merge 3 commits into
Open
Conversation
The previous commit detects multi-wrap u64 overflow at the max_digits
boundary by re-parsing the digits through a checked multiply-add loop
(O(max_digits)). Replace that with the constant-time check used in
simdjson: the leading digit plus a single threshold comparison.
For a max_digits-length value, min_safe_u64(base) == base^(max_digits-1)
is the smallest such value and also the width of each leading-digit band
[d*ms, (d+1)*ms). Since that width is < 2^64, the only band that can
straddle 2^64 is d == dmax (the largest leading digit that still fits),
and there it straddles at most once, so a single threshold dmax*ms
separates wrapped from non-wrapped values. A leading digit above dmax
always overflows; below dmax always fits. dmax and the threshold derive
from the existing min_safe_u64 table, so no new tables are needed and
dmax*ms cannot itself overflow.
Add a programmatic, self-verifying test for parse_int_string overflow
detection covering bases 2..36, complementing the hand-picked strings
added earlier. Every generated input is cross-checked against an
independent trusted oracle (a plain 64-bit checked multiply-add); on
success the parsed value is also compared exactly and full consumption
of the input is asserted.
Per base it exercises:
- an exact-boundary sweep of the 64 values straddling 2^64
(UINT64_MAX-31 .. 2^64+31), built by walking the digit string;
- UINT64_MAX, 2^64 and the all-max-digit value, each also with
leading zeros;
- random max_digits-length values across every leading digit, with
the heaviest sampling on the lead == dmax band that straddles 2^64,
and full coverage of lead > dmax (the multi-wrap region the naive
min_safe check accepted by mistake);
- max_digits-1 (never overflows) and max_digits+1 (always overflows).
A small signed (int64_t) section checks the exact INT64_MIN/INT64_MAX
limits round-trip and that INT64_MAX+1 / INT64_MIN-1 are rejected in
every base.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #393 by @sahvx655-wq found a bug in the overflow check when parsing integers. However, the check was not optimal. This updated PR brings a cheaper approach.