Skip to content

avro: Bound and validate decoding against malformed input#36986

Open
def- wants to merge 6 commits into
MaterializeInc:mainfrom
def-:fuzz-avro
Open

avro: Bound and validate decoding against malformed input#36986
def- wants to merge 6 commits into
MaterializeInc:mainfrom
def-:fuzz-avro

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hardens the mz-avro decoder against adversarial input (Avro bytes/schemas arrive from Kafka and an external registry, so a panic/OOM is an availability bug): bound per-block array/map lengths and object counts by remaining input, cap object-container block byte length, bound schema-parse/value-decode recursion, and fix two schema-resolution panics on unmatched named types.

Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.

🤖 Generated with Claude Code

def- and others added 6 commits June 11, 2026 10:57
Avro encodes arrays and maps as a series of blocks prefixed by a count.
The block decoders fed that count straight into `Vec::with_capacity` /
the per-block loop, so a malicious or corrupt object-container file
claiming a count near `i64::MAX` triggered a `capacity_overflow` panic
under libfuzzer. Reject any block length beyond a fixed 2^24 cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Avro schemas can reference themselves (record fields naming the
enclosing record) and may nest arbitrarily through `array`/`map`/
`record`/`union`. Both `SchemaParser::parse_inner` and
`GeneralDeserializer::deserialize` recursed without a depth limit,
so a malicious schema plus matching wire bytes blew the stack under
libfuzzer. Cap each at 128 nested levels.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reader::read_block_next read the block byte length straight off the
wire as an i64 and fed it to `fill_buf` → `Vec::resize`, so a
malicious file claiming a multi-trillion-byte block sized the
allocation request past ASan's 1 TB limit (and would otherwise OOM).
Route the value through `util::safe_len` to enforce the existing
MAX_ALLOCATION_BYTES (512 MiB) cap that the rest of the decoder
already respects.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The object-container reader took a data block's object count straight from
the wire as `messages_remaining` with no bound (the block's byte length was
already `safe_len`-capped, but the count wasn't). A crafted block with a
huge count and a zero-byte schema (e.g. `null`, whose values encode to no
bytes) made the reader spin decoding billions of empty values — a CPU hang
on untrusted source bytes. Cap the count with `safe_len`, like every other
wire-read length (this also rejects negative counts, which wrap to a huge
`usize`).

Found by the reader_decode cargo-fuzz target (a >30s timeout at ~43 MB RSS).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`resolve_schemas` reconciles a writer schema against a reader schema while
decoding Kafka Avro, so a panic on an adversarial schema pair from an external
registry is an availability bug. Two such panics, both found by the new
schema_resolve fuzz target:

1. Resolving a named type (record/enum/fixed) against a union it does not appear
   in built the "no match" error with `other.get_human_name(writer.root)`, but
   `other` is the *reader's* concrete node, so this indexed the writer's (here
   empty) `named` table out of bounds. Use `reader.root`.

2. When a record resolution failed *after* a nested named type had already been
   resolved, the cleanup did a single `named.pop()`, removing the nested entry
   and orphaning this node's `None` placeholder. Union resolution stores rather
   than propagates the error, so the orphan survived into the returned schema and
   a later `Option::unwrap` panicked. Roll the allocation state back to the
   pre-resolution length, and drop the corresponding name-index entries, instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decoding an Avro array/map reads a per-block element count off the wire and
then builds a Vec of that many elements. The count was only checked against
MAX_BLOCK_LEN (16M); nothing tied it to how much input was actually left. When
the elements consume ~no bytes each (an empty record, a null), a tiny message
can claim a 16M-element block and drive an unbounded allocation: a 163-byte
input grew the decoder to >4GB (a single 576MB = 16M * size_of::<Value>()
allocation, repeated across fields), OOM-killing the decode.

A block can't legitimately hold more elements than there are bytes left to
decode them from, so add a cheap `Skip::remaining_input` hint (known for the
slice/cursor readers the Kafka path uses, None for opaque streams) and reject
any array/map block whose claimed length exceeds it. Normal data has at least
one byte per element, so the bound never trips; only the small-message-huge-
count amplification is rejected, as a clean decode error.

Found by the avro_decode_fuzzed_schema fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@def- def- marked this pull request as ready for review June 12, 2026 10:40
@def- def- requested a review from a team as a code owner June 12, 2026 10:40
@def- def- requested a review from martykulma June 12, 2026 10:40

@martykulma martykulma left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Since we're introducing bounds that don't exist in the AVRO spec, this PR should also update the docs so customers are aware.

There's a subtle issue with remaining_input, see inline for the test case. Could we just drop the remaining input check and rely on the maximum element count * size_of::<Value>? We might want to drop the max element count in that case.

Comment thread src/avro/src/decode.rs
Comment on lines +661 to +667
if let Some(remaining) = self.r.remaining_input() {
if len > remaining {
return Err(AvroError::Decode(DecodeError::Custom(format!(
"Avro array block length {len} exceeds remaining input ({remaining} bytes)"
))));
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtle issue here for zero-byte elements

    #[mz_ore::test]
    fn valid_null_array_falsely_rejected() {
        // `remaining_input` assumes >=1 byte per element, false for null
        // (and other zero-byte) elements, so after the count VARINT, only
        // the 1-byte terminator remains. count >= 2 trips the check.
        use std::str::FromStr;

        use super::{AvroDeserializer, GeneralDeserializer};
        use crate::encode::encode_to_vec;
        use crate::types::Value;
        use crate::{Schema, ValueDecoder};

        let schema = Schema::from_str(r#"{"type": "array", "items": "null"}"#).unwrap();
        let value = Value::Array(vec![Value::Null, Value::Null]);
        let body = encode_to_vec(&value, &schema);

        let dsr = GeneralDeserializer {
            schema: schema.top_node(),
        };
        let mut reader: &[u8] = &body;
        let res = dsr.deserialize(&mut reader, ValueDecoder);
        assert!(
            res.is_ok(),
            "an encoder-produced array of nulls should round-trip, but got: {res:?}"
        );
    }

Comment thread src/avro/src/decode.rs
/// from the wire. Without it, a malicious or corrupt file can claim up
/// to `i64::MAX` items and the generic array/map decode loop will run
/// until it eventually OOMs or hits `Vec` capacity-overflow.
const MAX_BLOCK_LEN: usize = 1 << 24;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would you be opposed to calling this MAX_BLOCK_ELEMENTS - as I was reading this, I kept thinking this was bytes not element count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants