Skip to content

feat: cap spill merge fan-in#23066

Open
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:feat/spill-merge-fan-in
Open

feat: cap spill merge fan-in#23066
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:feat/spill-merge-fan-in

Conversation

@Kevin-Li-2025

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

External sort merge phases currently select spill files based only on memory reservation. With many small spills, a single phase can open enough files to exceed the process file-descriptor limit.

What changes are included in this PR?

  • Add datafusion.runtime.max_spill_merge_fan_in (0 preserves the current unlimited behavior).
  • Clamp non-zero values to at least 2 during merge selection so each pass makes progress.
  • Support builder configuration and dynamic SQL SET / RESET / SHOW.
  • Add unit, runtime SQL, SQLLogicTest, information schema, and generated documentation coverage.

Are there any user-facing changes?

Users can cap the number of spill files opened in one external merge pass. The default remains unchanged.

How was this change tested?

  • cargo test -p datafusion-execution test_max_spill_merge_fan_in_builder_and_dynamic_update --lib
  • cargo test -p datafusion-physical-plan spill_merge_fan_in --lib
  • cargo test -p datafusion --test core_integration test_max_spill_merge_fan_in_runtime_config
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- set_variable.slt
  • cargo check -p datafusion
  • cargo clippy -p datafusion-execution -p datafusion-physical-plan -p datafusion --lib -- -D warnings
  • cargo fmt --all -- --check
  • dev/update_config_docs.sh

@github-actions github-actions Bot added documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) execution Related to the execution crate physical-plan Changes to the physical-plan crate labels Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation execution Related to the execution crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a configurable cap on spill-file merge fan-in (max open files during external merge)

1 participant