repr: Harden proto/Row decoding against malformed input#36984
Open
def- wants to merge 12 commits into
Open
Conversation
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that `push_range_with` rejects (e.g. lower > upper from an attacker-crafted proto) panicked the process. Propagate the error to the caller via `?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.
Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.
Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata` via `zip_eq`, which panics on length mismatch — reachable from untrusted proto bytes. Check the lengths explicitly and return `InvalidFieldError`.
`RowPacker::push_range_with` documents a panic if a finite range bound expression pushes `Datum::Null`. The proto-decoding path called it with a closure that just forwarded the proto datum, which would happily push a `Null` from a malicious or corrupt `ProtoRangeInner.lower/upper` and crash the process. Detect the `Null` ProtoDatum up front and return a decode error instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `uuid` crate panics while constructing a parse error for some short,
brace-wrapped inputs (e.g. `{}`, `{\0}`) — it mis-slices the input
("byte range starts at 1 but ends at 0"). `strconv::parse_uuid` is reached
from `Value::decode_text` for a client-supplied UUID bind parameter, so
this is a client-triggerable crash. Reject anything that can't be a UUID
(shorter than 32 chars, or non-ASCII) before handing it to the crate, and
route the two pgrepr UUID text-decode call sites through `parse_uuid`
(they previously called `Uuid::parse_str` directly, bypassing the guard).
Found by the value_decode_text cargo-fuzz target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`RowPacker::push_range_with` asserted (`assert!`, so a panic even in release) that a range's bounds are non-null, of consistent datum kind, and the expected count. These bounds are reconstructed from an untrusted `ProtoRow`, so a crafted/corrupted range (e.g. an Array lower and a Map upper) panicked the decoder. Return a new `InvalidRangeError::InvalidRangeData` (with a matching proto variant) instead; valid callers never hit it. Found by the row_proto_roundtrip cargo-fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DatumDictIter` debug-asserts that map keys are unique and strictly ascending, but the `ProtoRow` -> `Row` decode packed a dict's elements verbatim without checking. A crafted proto with duplicate or out-of-order keys therefore decoded to a malformed map that tripped the assert (a panic in debug builds) the next time it was iterated. Validate the key order while decoding and reject a violation as a decode error. Regression for the row_proto_roundtrip cargo-fuzz finding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CheckedTimestamp::round_to_precision` — applied by the string->timestamp and
string->timestamptz casts before storage — truncates to microsecond (or, for
`date_trunc`-style precision, millisecond) resolution via
`truncate_microseconds`/`truncate_milliseconds`. Both rebuilt the time with
`NaiveTime::from_hms_{micro,milli}_opt`, whose leap-second sub-second range
(>= 1s) is only accepted when `sec == 59`.
A parsed `:60` literal is stored as chrono's leap representation, and time-zone
math can leave that 1s sub-second on a second other than `:59` (e.g.
`12:30:56` with a 1,000,000,000ns sub-second). The constructor then returned
`None` and the `.unwrap()` panicked, so e.g. `SELECT TIMESTAMPTZ '...:60...'`
aborted the process.
Truncate with `with_nanosecond` instead: it accepts the whole [0, 2s) range at
any second, so it preserves the value (leap included) and never fails here.
Found by the strconv timestamp round-trip fuzz target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twos_complement_be_to_numeric_inner read input[0] without first checking that the slice is non-empty, panicking with "index out of bounds: the len is 0 but the index is 0" on an empty byte string. That input is reachable from the wire: an Avro `decimal` logical type whose unscaled value is encoded as zero-length `bytes` flows straight from a Kafka message body through AvroFlatDecoder into this function, so a single crafted (or buggy-producer) message crashes Avro source decoding — an availability bug for ingestion. Our own encoder never emits an empty byte string (zero is the single byte 0x00), and Java's Avro decoder likewise rejects empty input (BigInteger throws "Zero length BigInteger"), so the right behaviour is a clean decode error, not a silent zero. The caller already maps the Err to a DecodeError, so the malformed row now surfaces as a decode error instead of aborting the worker. Found by the new avro_decode_fuzzed_schema fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hardens
mz-reprproto/Row decoding against malformed/untrusted bytes — these paths are reachable from persisted state and the wire, so a panic is an availability bug. Replaces panics/asserts with proper decode errors and validates ranges (Date, CheckedTimestamp, ProtoNumeric, ProtoRange, ProtoRelationDesc, ProtoRow dict ordering, uuid parsing, leap-second truncation, Avro decimal).Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.
🤖 Generated with Claude Code