avro: Bound and validate decoding against malformed input by def- · Pull Request #36986 · MaterializeInc/materialize

def- · 2026-06-11T11:39:27Z

Hardens the mz-avro decoder against adversarial input (Avro bytes/schemas arrive from Kafka and an external registry, so a panic/OOM is an availability bug): bound per-block array/map lengths and object counts by remaining input, cap object-container block byte length, bound schema-parse/value-decode recursion, and fix two schema-resolution panics on unmatched named types.

Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.

🤖 Generated with Claude Code

Avro encodes arrays and maps as a series of blocks prefixed by a count. The block decoders fed that count straight into `Vec::with_capacity` / the per-block loop, so a malicious or corrupt object-container file claiming a count near `i64::MAX` triggered a `capacity_overflow` panic under libfuzzer. Reject any block length beyond a fixed 2^24 cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Avro schemas can reference themselves (record fields naming the enclosing record) and may nest arbitrarily through `array`/`map`/ `record`/`union`. Both `SchemaParser::parse_inner` and `GeneralDeserializer::deserialize` recursed without a depth limit, so a malicious schema plus matching wire bytes blew the stack under libfuzzer. Cap each at 128 nested levels. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Reader::read_block_next read the block byte length straight off the wire as an i64 and fed it to `fill_buf` → `Vec::resize`, so a malicious file claiming a multi-trillion-byte block sized the allocation request past ASan's 1 TB limit (and would otherwise OOM). Route the value through `util::safe_len` to enforce the existing MAX_ALLOCATION_BYTES (512 MiB) cap that the rest of the decoder already respects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The object-container reader took a data block's object count straight from the wire as `messages_remaining` with no bound (the block's byte length was already `safe_len`-capped, but the count wasn't). A crafted block with a huge count and a zero-byte schema (e.g. `null`, whose values encode to no bytes) made the reader spin decoding billions of empty values — a CPU hang on untrusted source bytes. Cap the count with `safe_len`, like every other wire-read length (this also rejects negative counts, which wrap to a huge `usize`). Found by the reader_decode cargo-fuzz target (a >30s timeout at ~43 MB RSS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`resolve_schemas` reconciles a writer schema against a reader schema while decoding Kafka Avro, so a panic on an adversarial schema pair from an external registry is an availability bug. Two such panics, both found by the new schema_resolve fuzz target: 1. Resolving a named type (record/enum/fixed) against a union it does not appear in built the "no match" error with `other.get_human_name(writer.root)`, but `other` is the *reader's* concrete node, so this indexed the writer's (here empty) `named` table out of bounds. Use `reader.root`. 2. When a record resolution failed *after* a nested named type had already been resolved, the cleanup did a single `named.pop()`, removing the nested entry and orphaning this node's `None` placeholder. Union resolution stores rather than propagates the error, so the orphan survived into the returned schema and a later `Option::unwrap` panicked. Roll the allocation state back to the pre-resolution length, and drop the corresponding name-index entries, instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Decoding an Avro array/map reads a per-block element count off the wire and then builds a Vec of that many elements. The count was only checked against MAX_BLOCK_LEN (16M); nothing tied it to how much input was actually left. When the elements consume ~no bytes each (an empty record, a null), a tiny message can claim a 16M-element block and drive an unbounded allocation: a 163-byte input grew the decoder to >4GB (a single 576MB = 16M * size_of::<Value>() allocation, repeated across fields), OOM-killing the decode. A block can't legitimately hold more elements than there are bytes left to decode them from, so add a cheap `Skip::remaining_input` hint (known for the slice/cursor readers the Kafka path uses, None for opaque streams) and reject any array/map block whose claimed length exceeds it. Normal data has at least one byte per element, so the bound never trips; only the small-message-huge- count amplification is rejected, as a clean decode error. Found by the avro_decode_fuzzed_schema fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

martykulma

Nice! Since we're introducing bounds that don't exist in the AVRO spec, this PR should also update the docs so customers are aware.

There's a subtle issue with remaining_input, see inline for the test case. Could we just drop the remaining input check and rely on the maximum element count * size_of::<Value>? We might want to drop the max element count in that case.

martykulma · 2026-06-12T13:02:20Z

+            if let Some(remaining) = self.r.remaining_input() {
+                if len > remaining {
+                    return Err(AvroError::Decode(DecodeError::Custom(format!(
+                        "Avro array block length {len} exceeds remaining input ({remaining} bytes)"
+                    ))));
+                }
+            }


subtle issue here for zero-byte elements

#[mz_ore::test] fn valid_null_array_falsely_rejected() { // `remaining_input` assumes >=1 byte per element, false for null // (and other zero-byte) elements, so after the count VARINT, only // the 1-byte terminator remains. count >= 2 trips the check. use std::str::FromStr; use super::{AvroDeserializer, GeneralDeserializer}; use crate::encode::encode_to_vec; use crate::types::Value; use crate::{Schema, ValueDecoder}; let schema = Schema::from_str(r#"{"type": "array", "items": "null"}"#).unwrap(); let value = Value::Array(vec![Value::Null, Value::Null]); let body = encode_to_vec(&value, &schema); let dsr = GeneralDeserializer { schema: schema.top_node(), }; let mut reader: &[u8] = &body; let res = dsr.deserialize(&mut reader, ValueDecoder); assert!( res.is_ok(), "an encoder-produced array of nulls should round-trip, but got: {res:?}" ); }

martykulma · 2026-06-12T13:16:29Z

+/// from the wire. Without it, a malicious or corrupt file can claim up
+/// to `i64::MAX` items and the generic array/map decode loop will run
+/// until it eventually OOMs or hits `Vec` capacity-overflow.
+const MAX_BLOCK_LEN: usize = 1 << 24;


nit: would you be opposed to calling this MAX_BLOCK_ELEMENTS - as I was reading this, I kept thinking this was bytes not element count.

def- and others added 6 commits June 11, 2026 10:57

def- mentioned this pull request Jun 11, 2026

tests: Add the cargo-fuzz fuzzing suite #36982

Draft

def- marked this pull request as ready for review June 12, 2026 10:40

def- requested a review from a team as a code owner June 12, 2026 10:40

def- requested a review from martykulma June 12, 2026 10:40

martykulma reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avro: Bound and validate decoding against malformed input#36986

avro: Bound and validate decoding against malformed input#36986
def- wants to merge 6 commits into
MaterializeInc:mainfrom
def-:fuzz-avro

def- commented Jun 11, 2026 •

edited

Loading

Uh oh!

martykulma left a comment

Uh oh!

martykulma Jun 12, 2026

Uh oh!

martykulma Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

def- commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martykulma left a comment

Choose a reason for hiding this comment

Uh oh!

martykulma Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

martykulma Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

def- commented Jun 11, 2026 •

edited

Loading