docs: align Spark batch_size docs with 8192 default#323
docs: align Spark batch_size docs with 8192 default#323harsh-ande wants to merge 3 commits intolance-format:mainfrom
Conversation
hamersaw
left a comment
There was a problem hiding this comment.
Rather than setting this to 8192 to coincide with the lance-core default, would it make sense to us a Optional value for it? If unset, then we let it fallback to the lance-core default. This way if the lance-core default ever changes, we do not need to make updates through this repo.
|
Thanks for your response @hamersaw, that makes a lot of sense. I’ll update the Spark read path so |
|
@hamersaw I updated the PR so Spark no longer hardcodes the read Requesting you to please take a look when you get time. Thank you. |
|
@harsh-ande looks like some of the CI checks are failing. Lets get those fixed up and then I'll make a final pass. |
This PR addresses #153.
Spark currently hardcodes read
batch_sizeinlance-spark, which keeps the connector docs tied to a Spark-owned default even though lance-core already provides the underlying default behavior.This update changes the read path so Spark only forwards
batch_sizewhen the user explicitly sets it. Whenbatch_sizeis omitted, the connector now inherits the lance-core default.Changes:
batch_sizeoptional inLanceSparkReadOptionsScanOptions.batchSize(...)whenbatch_sizeis explicitly provided8192batch_sizeregression with unset-vs-explicit-override coverageVerification:
JAVA_HOME=/opt/homebrew/opt/openjdk/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk/bin:$PATH mvn -pl lance-spark-base_2.12 -Dspotless.skip=true -Dtest=LanceSparkReadOptionsJsonTest,LanceDatasetReadTest testBUILD SUCCESSNotes:
block_size, but the scan-throughput setting here isbatch_size