Skip to content

minor: Validate batch_size configuration when setting it#23054

Open
2010YOUY01 wants to merge 3 commits into
apache:mainfrom
2010YOUY01:safe-batch-size
Open

minor: Validate batch_size configuration when setting it#23054
2010YOUY01 wants to merge 3 commits into
apache:mainfrom
2010YOUY01:safe-batch-size

Conversation

@2010YOUY01

@2010YOUY01 2010YOUY01 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

In datafusion-cli:

DataFusion CLI v53.1.0
> set datafusion.execution.batch_size=0;
0 row(s) fetched.
Elapsed 0.001 seconds.

A '0' batch_size is a invalid config value, and it's widely used in most operators to control behavior. If operators don't validate batch_size, the 0 value might cause bugs/panics. This PR rejects it when setting the config.

I triggered a bug on my first attempt:

> set datafusion.execution.batch_size=0;
0 row(s) fetched.
Elapsed 0.000 seconds.

> select * from generate_series(100);
+-------+
| value |
+-------+
+-------+
0 row(s) fetched.
Elapsed 0.004 seconds.

What changes are included in this PR?

Implement a custom type for non-zero usize configuration value on batch_size, and rejects '0' value on initialization. Rust native NonZero<u32> is not used, because custom type allows to provide better error message on errors, and the implementation complexity is similar.

Note other invalid inputs like '-1' 'a' are already rejected and covered by existing tests.

Are these changes tested?

Yes, sqllogictest

Are there any user-facing changes?

No

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate execution Related to the execution crate functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels Jun 20, 2026
Comment thread datafusion/common/src/config.rs Outdated
pub const fn new(value: usize) -> Self {
match NonZeroUsize::new(value) {
Some(value) => Self(value),
None => panic!("value must be greater than 0"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets return config_err it would super confusing if client crashes

@2010YOUY01 2010YOUY01 Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default constructor for configurations is wrapped inside config_namespace! macro, and it can't handle ? operator on result, so we have to panic here.

But I agree this unsafe API should not be exposed to the caller, so in b37fe74, a private helper is added for unsafe construction, its input is guaranteed to be always valid, since it's only for the hard-coded default value, outside caller can only use the safe try_new()


/// Creates a [`ConfigNonZeroUsize`], returning a configuration error if
/// `value` is zero.
pub fn try_new(value: usize) -> Result<Self> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe new with Result<Self> is more obvious choice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment.

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @2010YOUY01 just some minors

Comment thread datafusion/common/src/config.rs Outdated
Comment thread datafusion/common/src/config.rs Outdated
@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 21, 2026
@2010YOUY01

Copy link
Copy Markdown
Contributor Author

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details

error: `cargo metadata` exited with an error:     Updating crates.io index
error: failed to get `wasm-streams` as a dependency of package `reqwest v0.12.28`
    ... which satisfies dependency `reqwest = "^0.12"` (locked to 0.12.28) of package `object_store v0.13.2`
    ... which satisfies dependency `object_store = "^0.13.2"` (locked to 0.13.2) of package `datafusion-benchmarks v54.0.0 (/home/runner/work/datafusion/datafusion/benchmarks)`

Caused by:
  failed to load source for dependency `wasm-streams`

Caused by:
  unable to update registry `crates-io`

Caused by:
  download of wa/sm/wasm-streams failed

Caused by:
  curl failed

Caused by:
  [16] Error in the HTTP2 framing layer

Looks like it's a false positive due to the flaky Github CI runner.

@Jefffrey Jefffrey removed the auto detected api change Auto detected API change label Jun 21, 2026
Comment thread nonzero_test Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate execution Related to the execution crate functions Changes to functions implementation optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants