persist: bound fetch read-ahead on all replicas by antiguru · Pull Request #36988 · MaterializeInc/materialize

antiguru · 2026-06-11T11:52:10Z

Motivation

The persist fetch semaphore bounds the bytes of fetched-but-not-yet-decoded parts held in memory. It was gated on is_cc_active and, where active, sized to the full process memory limit, under the assumption that this data spills to lgalloc-backed disk. That assumption no longer holds: there is no spill disk, the data is heap-resident, and the cc/non-cc distinction is gone. So the semaphore was effectively unbounded (MAX_PERMITS) on the replicas that matter, and persist_source could read ahead without bound during hydration. A production heap profile showed fetched-but-undecoded blobs dominating the heap with a large transient spike over steady state.

Relates to CLU-113. Overlaps with #36910, which addresses the same problem but folds in the larger shard_source deasync refactor; this is the minimal standalone fix.

Description

Size the semaphore from the announced memory limit on every replica, and lower the default budget from 1.0x to 0.1x of the limit. The read-ahead buffers are heap-resident and must coexist with the arrangements being built in the same scarce memory, so the bound has to sit well below the full limit. The mechanism is otherwise unchanged: the permit is acquired before download and held with the part until after decode, and a single part larger than the budget still acquires the whole budget rather than deadlocking.

The flag docs are rewritten to describe a hard heap bound instead of an lgalloc spill budget.

This caps read-ahead overshoot; it does not make an undersized replica fit. If an arrangement alone approaches the limit, hydration can still OOM.

Verification

The bounded-memory suite never passed --announce-memory-limit, so the semaphore was unbounded there and the suite never exercised persist's memory budget. clusterd now announces its limit in every scenario, mirroring production, and a new persist-source-read-ahead scenario re-hydrates a large index that must stay within the announced limit. A unit test asserts the semaphore is sized from the memory limit regardless of is_cc_active.

The new scenario's clusterd_memory threshold is an estimate, not yet empirically confirmed; it needs a bounded-memory CI run (and possibly minimization-search) to tune. Activating the announced limit also changes the memory behavior of the existing scenarios, which CI should validate.

🤖 Generated with Claude Code

The persist fetch semaphore bounds the bytes of fetched-but-not-yet-decoded parts held in memory. It was gated on `is_cc_active` and, where active, sized to the full process memory limit, both under the assumption that this data spills to lgalloc-backed disk. That assumption no longer holds: there is no spill disk, the data is heap-resident, and the cc/non-cc distinction is gone. As a result the semaphore was effectively disabled (`MAX_PERMITS`) on the replicas that matter, so persist_source could read ahead without bound during hydration. A production heap profile showed fetched-but-undecoded blobs dominating the heap, with a large transient spike over steady state. Size the semaphore from the announced memory limit on every replica, and lower the default budget from 1.0x to 0.1x of the limit: the read-ahead buffers must now coexist with the arrangements being built in the same scarce heap, so the bound has to sit well below the full limit. Rewrite the flag docs to describe a hard heap bound rather than an lgalloc spill budget. Also make the bounded-memory suite start clusterd with `--announce-memory-limit`, mirroring production. Without it the semaphore was unbounded in the suite, so it never exercised persist's memory budget. Add a `persist-source-read-ahead` scenario that re-hydrates a large index and must stay within the announced limit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

antiguru force-pushed the persist-fetch-backpressure branch from a0bd7b4 to fa932e3 Compare June 12, 2026 07:57

antiguru marked this pull request as ready for review June 12, 2026 14:37

antiguru requested a review from a team as a code owner June 12, 2026 14:37

antiguru requested review from DAlperin, def- and petrosagg June 12, 2026 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist: bound fetch read-ahead on all replicas#36988

persist: bound fetch read-ahead on all replicas#36988
antiguru wants to merge 1 commit into
MaterializeInc:mainfrom
antiguru:persist-fetch-backpressure

antiguru commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antiguru commented Jun 11, 2026

Motivation

Description

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant