avro: Bound and validate decoding against malformed input#36986
Open
def- wants to merge 6 commits into
Open
Conversation
Avro encodes arrays and maps as a series of blocks prefixed by a count. The block decoders fed that count straight into `Vec::with_capacity` / the per-block loop, so a malicious or corrupt object-container file claiming a count near `i64::MAX` triggered a `capacity_overflow` panic under libfuzzer. Reject any block length beyond a fixed 2^24 cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Avro schemas can reference themselves (record fields naming the enclosing record) and may nest arbitrarily through `array`/`map`/ `record`/`union`. Both `SchemaParser::parse_inner` and `GeneralDeserializer::deserialize` recursed without a depth limit, so a malicious schema plus matching wire bytes blew the stack under libfuzzer. Cap each at 128 nested levels. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reader::read_block_next read the block byte length straight off the wire as an i64 and fed it to `fill_buf` → `Vec::resize`, so a malicious file claiming a multi-trillion-byte block sized the allocation request past ASan's 1 TB limit (and would otherwise OOM). Route the value through `util::safe_len` to enforce the existing MAX_ALLOCATION_BYTES (512 MiB) cap that the rest of the decoder already respects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The object-container reader took a data block's object count straight from the wire as `messages_remaining` with no bound (the block's byte length was already `safe_len`-capped, but the count wasn't). A crafted block with a huge count and a zero-byte schema (e.g. `null`, whose values encode to no bytes) made the reader spin decoding billions of empty values — a CPU hang on untrusted source bytes. Cap the count with `safe_len`, like every other wire-read length (this also rejects negative counts, which wrap to a huge `usize`). Found by the reader_decode cargo-fuzz target (a >30s timeout at ~43 MB RSS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`resolve_schemas` reconciles a writer schema against a reader schema while decoding Kafka Avro, so a panic on an adversarial schema pair from an external registry is an availability bug. Two such panics, both found by the new schema_resolve fuzz target: 1. Resolving a named type (record/enum/fixed) against a union it does not appear in built the "no match" error with `other.get_human_name(writer.root)`, but `other` is the *reader's* concrete node, so this indexed the writer's (here empty) `named` table out of bounds. Use `reader.root`. 2. When a record resolution failed *after* a nested named type had already been resolved, the cleanup did a single `named.pop()`, removing the nested entry and orphaning this node's `None` placeholder. Union resolution stores rather than propagates the error, so the orphan survived into the returned schema and a later `Option::unwrap` panicked. Roll the allocation state back to the pre-resolution length, and drop the corresponding name-index entries, instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decoding an Avro array/map reads a per-block element count off the wire and then builds a Vec of that many elements. The count was only checked against MAX_BLOCK_LEN (16M); nothing tied it to how much input was actually left. When the elements consume ~no bytes each (an empty record, a null), a tiny message can claim a 16M-element block and drive an unbounded allocation: a 163-byte input grew the decoder to >4GB (a single 576MB = 16M * size_of::<Value>() allocation, repeated across fields), OOM-killing the decode. A block can't legitimately hold more elements than there are bytes left to decode them from, so add a cheap `Skip::remaining_input` hint (known for the slice/cursor readers the Kafka path uses, None for opaque streams) and reject any array/map block whose claimed length exceeds it. Normal data has at least one byte per element, so the bound never trips; only the small-message-huge- count amplification is rejected, as a clean decode error. Found by the avro_decode_fuzzed_schema fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
martykulma
reviewed
Jun 12, 2026
martykulma
left a comment
Contributor
There was a problem hiding this comment.
Nice! Since we're introducing bounds that don't exist in the AVRO spec, this PR should also update the docs so customers are aware.
There's a subtle issue with remaining_input, see inline for the test case. Could we just drop the remaining input check and rely on the maximum element count * size_of::<Value>? We might want to drop the max element count in that case.
Comment on lines
+661
to
+667
| if let Some(remaining) = self.r.remaining_input() { | ||
| if len > remaining { | ||
| return Err(AvroError::Decode(DecodeError::Custom(format!( | ||
| "Avro array block length {len} exceeds remaining input ({remaining} bytes)" | ||
| )))); | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
subtle issue here for zero-byte elements
#[mz_ore::test]
fn valid_null_array_falsely_rejected() {
// `remaining_input` assumes >=1 byte per element, false for null
// (and other zero-byte) elements, so after the count VARINT, only
// the 1-byte terminator remains. count >= 2 trips the check.
use std::str::FromStr;
use super::{AvroDeserializer, GeneralDeserializer};
use crate::encode::encode_to_vec;
use crate::types::Value;
use crate::{Schema, ValueDecoder};
let schema = Schema::from_str(r#"{"type": "array", "items": "null"}"#).unwrap();
let value = Value::Array(vec![Value::Null, Value::Null]);
let body = encode_to_vec(&value, &schema);
let dsr = GeneralDeserializer {
schema: schema.top_node(),
};
let mut reader: &[u8] = &body;
let res = dsr.deserialize(&mut reader, ValueDecoder);
assert!(
res.is_ok(),
"an encoder-produced array of nulls should round-trip, but got: {res:?}"
);
}| /// from the wire. Without it, a malicious or corrupt file can claim up | ||
| /// to `i64::MAX` items and the generic array/map decode loop will run | ||
| /// until it eventually OOMs or hits `Vec` capacity-overflow. | ||
| const MAX_BLOCK_LEN: usize = 1 << 24; |
Contributor
There was a problem hiding this comment.
nit: would you be opposed to calling this MAX_BLOCK_ELEMENTS - as I was reading this, I kept thinking this was bytes not element count.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hardens the
mz-avrodecoder against adversarial input (Avro bytes/schemas arrive from Kafka and an external registry, so a panic/OOM is an availability bug): bound per-block array/map lengths and object counts by remaining input, cap object-container block byte length, bound schema-parse/value-decode recursion, and fix two schema-resolution panics on unmatched named types.Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.
🤖 Generated with Claude Code