fix: dfbench respects DATAFUSION_RUNTIME_MEMORY_LIMIT env var#20631
fix: dfbench respects DATAFUSION_RUNTIME_MEMORY_LIMIT env var#20631adriangb merged 1 commit intoapache:mainfrom
Conversation
0e734ab to
1c8ec9b
Compare
|
@alamb a small fix |
| let mut rt_builder = RuntimeEnvBuilder::new(); | ||
| const NUM_TRACKED_CONSUMERS: usize = 5; | ||
| if let Some(memory_limit) = self.memory_limit { | ||
| // Use CLI --memory-limit if provided, otherwise fall back to |
There was a problem hiding this comment.
Perhaps we could use https://docs.rs/datafusion/latest/datafusion/prelude/struct.SessionConfig.html#method.from_env and pick up any environment variables, rather than just the memory limit?
There was a problem hiding this comment.
That only loads the SessionConfig options, the memory limit is part of the RuntimeEnv which comes into play via SessionContext (SessionConfig + RuntimeEnv).
So we'd have to do some refactoring to create SessionContext::from_env(), or add the memory limit to SessionConfig and add RuntimeEnv::new_from_config(&config) or something.
|
Looks like there is a small conflict with this PR |
I'll fix 😄 |
When no `--memory-limit` CLI flag is provided, `runtime_env_builder()` now falls back to the `DATAFUSION_RUNTIME_MEMORY_LIMIT` environment variable. This makes the memory pool configuration consistent with how other `DATAFUSION_*` env vars are handled via `SessionConfig::from_env()`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1c8ec9b to
341bf3e
Compare
|
@alamb should we have a |
Which issue does this PR close?
N/A — discovered while running benchmarks with
bench.sh.Rationale for this change
When running benchmarks via
bench.sh/dfbench, settingDATAFUSION_RUNTIME_MEMORY_LIMIT=2Gis ignored for memory pool enforcement. MostDATAFUSION_*env vars work becauseSessionConfig::from_env()picks them up, but the memory limit is a special case — it requires constructing aMemoryPoolin theRuntimeEnv, whichdfbenchonly did when--memory-limitwas passed as a CLI flag.What changes are included in this PR?
In
runtime_env_builder(), whenself.memory_limit(CLI flag) isNone, fall back to reading theDATAFUSION_RUNTIME_MEMORY_LIMITenv var using the existingparse_memory_limit()function. The CLI flag still takes precedence when provided.Are these changes tested?
Yes — added
test_runtime_env_builder_reads_env_varwhich sets the env var, constructs aCommonOptwith no CLI memory limit, and verifies the resultingRuntimeEnvhas aFinite(2GB)memory pool.Are there any user-facing changes?
dfbenchnow honors theDATAFUSION_RUNTIME_MEMORY_LIMITenvironment variable as a fallback when--memory-limitis not passed on the command line. No breaking changes — existing CLI flag behavior is unchanged.🤖 Generated with Claude Code