You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #4328, CometSparkSessionExtensions.isCometLoaded now returns false and logs a warning when spark.comet.exec.shuffle.enabled is true (the default) but spark.shuffle.manager is not set to CometShuffleManager.
This surfaces a long-standing asymmetry in the Spark SQL test diffs:
dev/diffs/{3.4.3,3.5.8,4.0.2,4.1.1}.diff patch SparkSession.applyExtensions to globally add CometSparkSessionExtensions to every SparkSession built in the JVM whenever ENABLE_COMET=true.
The same diffs set spark.shuffle.manager=CometShuffleManageronly inside SharedSparkSessionBase.sparkConf and TestHiveContext.
Test suites that build their own SparkConf/SparkSession outside those traits therefore get Comet installed but no shuffle manager. Before #4328 these suites still ran with Comet, just falling back to Spark's default shuffle. After #4328 Comet disables itself entirely for these suites and logs a warning per isCometLoaded invocation.
The once-per-session log dedup landing alongside this issue keeps the log volume manageable, but the underlying questions are unanswered:
Are these suites supposed to be exercising Comet at all? If yes, the diff needs to register CometShuffleManager for them too (currently a coverage regression).
If no, should we suppress the warning when Comet was auto-injected by the diff rather than explicitly enabled by the user (i.e. only warn when the user set spark.comet.enabled=true themselves)?
The Spark 3.5.8, 4.0.2, and 3.4.3 diffs have the same shape, so the same suites should surface there.
Describe the potential solution
Investigate and decide whether this matters. Possible directions:
Accept as-is — the warning is informative and the per-session dedup keeps log volume manageable. Test coverage for those suites without CometShuffleManager is small.
Extend the diffs to register CometShuffleManager for the affected suites (per-suite patches, or a broader hook such as a system property picked up by every SparkConf).
Add an opt-out config (e.g. spark.comet.exec.shuffle.required) so users / tests can keep Comet enabled with Spark's default shuffle manager.
Restrict the disable-on-missing-shuffle-manager logic to sessions where the user explicitly set spark.comet.enabled=true, leaving auto-injected sessions in the prior "Comet on, default shuffle" mode.
What is the problem the feature request solves?
After #4328,
CometSparkSessionExtensions.isCometLoadednow returnsfalseand logs a warning whenspark.comet.exec.shuffle.enabledis true (the default) butspark.shuffle.manageris not set toCometShuffleManager.This surfaces a long-standing asymmetry in the Spark SQL test diffs:
dev/diffs/{3.4.3,3.5.8,4.0.2,4.1.1}.diffpatchSparkSession.applyExtensionsto globally addCometSparkSessionExtensionsto every SparkSession built in the JVM wheneverENABLE_COMET=true.spark.shuffle.manager=CometShuffleManageronly insideSharedSparkSessionBase.sparkConfandTestHiveContext.Test suites that build their own
SparkConf/SparkSessionoutside those traits therefore get Comet installed but no shuffle manager. Before #4328 these suites still ran with Comet, just falling back to Spark's default shuffle. After #4328 Comet disables itself entirely for these suites and logs a warning perisCometLoadedinvocation.The once-per-session log dedup landing alongside this issue keeps the log volume manageable, but the underlying questions are unanswered:
CometShuffleManagerfor them too (currently a coverage regression).spark.comet.enabled=truethemselves)?spark.comet.exec.shuffle.requiredopt-out that feat: disable Comet by default when CometShuffleManager is not registered #4328's description promised but the merged code did not implement?Spark 4.1.1 suites observed emitting the warning
Attribution is from the in-progress run https://github.com/apache/datafusion-comet/actions/runs/26177223270 (six of seven 4.1.1 jobs complete; sql_core-1 still running at file time). Counts are pre-dedup.
BroadcastJoinSuiteAEBroadcastJoinSuiteExecutorSideSQLConfSuiteExpressionInfoSuiteDisableUnnecessaryBucketedScanWithoutHiveSupportSuiteSparkSessionJobTaggingAndCancellationSuiteSparkSessionBuilderSuiteUISeleniumSuiteUISeleniumWithRocksDBBackendSuiteParquetCommitterSuiteOrcFilterSuiteParquetV1SchemaPruningSuiteOrcV2SchemaPruningSuiteThe Spark 3.5.8, 4.0.2, and 3.4.3 diffs have the same shape, so the same suites should surface there.
Describe the potential solution
Investigate and decide whether this matters. Possible directions:
CometShuffleManagerfor the affected suites (per-suite patches, or a broader hook such as a system property picked up by everySparkConf).spark.comet.exec.shuffle.required) so users / tests can keep Comet enabled with Spark's default shuffle manager.spark.comet.enabled=true, leaving auto-injected sessions in the prior "Comet on, default shuffle" mode.Additional context
spark.comet.exec.shuffle.requiredconfig but the merged code gates onCOMET_EXEC_SHUFFLE_ENABLEDinstead.SQLConf(per session) lands alongside this issue.