docs: align Spark batch_size docs with 8192 default by harsh-ande · Pull Request #323 · lance-format/lance-spark

harsh-ande · 2026-03-21T17:49:07Z

This PR addresses #153.

Spark currently hardcodes read batch_size in lance-spark, which keeps the connector docs tied to a Spark-owned default even though lance-core already provides the underlying default behavior.

This update changes the read path so Spark only forwards batch_size when the user explicitly sets it. When batch_size is omitted, the connector now inherits the lance-core default.

Changes:

make read batch_size optional in LanceSparkReadOptions
only call ScanOptions.batchSize(...) when batch_size is explicitly provided
update Spark read docs to describe inherited-default behavior instead of pinning Spark to 8192
keep the tuning examples for explicit larger batch sizes
replace the default read batch_size regression with unset-vs-explicit-override coverage
remove the unrelated write-side doc/test changes from this PR

Verification:

ran JAVA_HOME=/opt/homebrew/opt/openjdk/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk/bin:$PATH mvn -pl lance-spark-base_2.12 -Dspotless.skip=true -Dtest=LanceSparkReadOptionsJsonTest,LanceDatasetReadTest test
result: BUILD SUCCESS

Notes:

the issue title mentions block_size, but the scan-throughput setting here is batch_size
this PR intentionally stays read-path only

hamersaw

Rather than setting this to 8192 to coincide with the lance-core default, would it make sense to us a Optional value for it? If unset, then we let it fallback to the lance-core default. This way if the lance-core default ever changes, we do not need to make updates through this repo.

harsh-ande · 2026-03-24T21:28:18Z

Thanks for your response @hamersaw, that makes a lot of sense. I’ll update the Spark read path so batch_size is only forwarded when explicitly set, and otherwise it will inherit the lance-core default. I’ll also adjust the docs/tests to reflect that instead of pinning Spark to 8192.

…e-core

harsh-ande · 2026-03-24T22:10:28Z

@hamersaw I updated the PR so Spark no longer hardcodes the read batch_size default. If batch_size is unset, we now leave it unset in lance-spark and let lance-core apply its default. I also updated the read docs to describe the inherited-default behavior and changed the regression coverage to test unset vs explicit override.

Requesting you to please take a look when you get time. Thank you.

hamersaw · 2026-03-25T14:32:22Z

@harsh-ande looks like some of the CI checks are failing. Lets get those fixed up and then I'll make a final pass.

docs: align Spark batch_size docs with 8192 default

0192ff2

github-actions bot added the documentation Improvements or additions to documentation label Mar 21, 2026

harsh-ande mentioned this pull request Mar 21, 2026

perf: set default block size to bigger #153

Open

hamersaw reviewed Mar 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin' into docs-batch-size-default-8192

a96185b

address pr feedback - inherit Spark read batch_size default from lanc…

eb44df8

…e-core

harsh-ande requested a review from hamersaw March 24, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: align Spark batch_size docs with 8192 default#323

docs: align Spark batch_size docs with 8192 default#323
harsh-ande wants to merge 3 commits intolance-format:mainfrom
harsh-ande:docs-batch-size-default-8192

harsh-ande commented Mar 21, 2026 •

edited

Loading

Uh oh!

hamersaw left a comment

Uh oh!

harsh-ande commented Mar 24, 2026

Uh oh!

harsh-ande commented Mar 24, 2026

Uh oh!

hamersaw commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harsh-ande commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamersaw left a comment

Choose a reason for hiding this comment

Uh oh!

harsh-ande commented Mar 24, 2026

Uh oh!

harsh-ande commented Mar 24, 2026

Uh oh!

hamersaw commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harsh-ande commented Mar 21, 2026 •

edited

Loading