-
Notifications
You must be signed in to change notification settings - Fork 207
[AURON #1985] Optimize native metrics retrieval by passing keys directly #1982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes native SQL metric registration by letting operators request only the specific native metric keys they need, instead of always creating the full default metric set and filtering afterward.
Changes:
- Refactors
NativeHelper.getDefaultNativeMetricsto accept an explicitSet[String]of metric keys and create only those metrics. - Updates native Spark plan operators to pass their required metric keys directly to
getDefaultNativeMetrics. - Updates native file scan metrics construction to use the new keyed API.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortMergeJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffledHashJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala | Passes explicit metric key set to getDefaultNativeMetrics for native write path metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala | Switches from “all defaults” to an explicit full key set when registering native metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala | Introduces keyed metric creation via defaultNativeMetricCreators; updates file scan metric assembly. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala | Passes explicit spill/shuffle metric key set to getDefaultNativeMetrics for shuffle write metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| keys -- Set("input_batch_count", "input_row_count", "input_batch_mem_size") | ||
| } | ||
|
|
||
| TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key => |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enabledKeys is a Set, so enabledKeys.flatMap { ... } produces a Set[(String, SQLMetric)]. That forces hashing/equality on SQLMetric instances and allocates an intermediate Set before building the TreeMap. To keep the optimization goal, consider iterating (enabledKeys.iterator.flatMap(...)) and accumulating directly into the TreeMap (or using enabledKeys.iterator.map(...) + collect), which avoids building a Set of metrics first.
| TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key => | |
| TreeMap[String, SQLMetric]() ++ enabledKeys.iterator.flatMap { key => |
| "input_batch_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_batches")), | ||
| "input_row_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_rows")), | ||
| "input_batch_mem_size" -> (sc => SQLMetrics.createSizeMetric(sc, "Native.input_mem_bytes"))) | ||
|
|
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing getDefaultNativeMetrics from a 1-arg method to a 2-arg method is a source/binary breaking change for any downstream code compiled against this module. If this object is part of a published API surface, consider keeping an overloaded getDefaultNativeMetrics(sc: SparkContext) (possibly deprecated) that delegates to the new implementation with the full default key set.
| def getDefaultNativeMetrics(sc: SparkContext): Map[String, SQLMetric] = { | |
| getDefaultNativeMetrics(sc, defaultNativeMetricCreators.keySet) | |
| } |
ShreyeshArangath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which issue does this PR close?
Closes #1985
Rationale for this change
The previous native metrics retrieval fetched all default metrics for every operator. This changes allows specific operators to request only the relevant metric keys, reducing unnecessary overhead.
What changes are included in this PR?
Updated
getDefaultNativeMetricsto accept aSet[String]of metric keys instead of returning a fixed map.Are there any user-facing changes?
No
How was this patch tested?