[VL] Reduce Velox scan SQL metrics by default to mitigate driver OOM by lifulong · Pull Request #12127 · apache/gluten

lifulong · 2026-05-22T07:41:13Z

What changes are proposed in this pull request?

Gluten jobs on the Velox backend are more prone to driver memory pressure than vanilla Spark in some production workloads. Investigation points to scan operators registering too many SQL metrics (accumulators).

Each BatchScanExecTransformer / FileSourceScanExecTransformer / HiveTableScanExecTransformer previously registered 30+ executor-side metrics per scan node.

Vanilla Spark is much leaner—for example, BatchScanExec only exposes numOutputRows (+ connector customMetrics), and FileSourceScanExec adds a small set of driver metrics (numFiles, metadataTime, etc.).

This gap increases driver heap usage and can contribute to driver OOM, especially on scan-heavy queries.

(Driver heap dump analysis while oom, the largest memory-consuming object is LiveStageMetrics)

(Gluten has been failed in first scan stage, while vanilla spark finished successfully with same driver memory 12g.)

Introduce a Velox-only minimal scan metrics set by default, with an opt-in switch for full metrics collection (debugging / advanced troubleshooting).
spark.gluten.sql.scan.detailedMetrics.enabled

ClickHouse backend is unchanged—this config does not affect CH scan metrics.

Default minimal metrics (Velox)
BatchScan (9 executor metrics):
rawInputRows, rawInputBytes, numOutputRows, outputBytes, scanTime, wallNanos, peakMemoryBytes, ioWaitTime, storageReadBytes

FileSourceScan / HiveTableScan — above plus Spark-aligned driver metrics:
numFiles, metadataTime, filesSize, numPartitions, pruningTime

Moved to full collection only (when detailed metrics enabled)
Examples include: numInputRows, inputVectors, inputBytes, outputVectors, cpuCount, numMemoryAllocations, skippedSplits, processedSplits, numDynamicFiltersAccepted, loadLazyVectorTime, skippedStrides, processedStrides, connector timing (preloadSplits, pageLoadTime, dataSourceAddSplitTime, dataSourceReadTime), storage cache details (storageReads, localReadBytes, ramReadBytes), etc.

How was this patch tested?

WIP on our produce envriment