[SPARK-55639][K8S] Support Recovery-mode K8s Executors#54558
[SPARK-55639][K8S] Support Recovery-mode K8s Executors#54558dongjoon-hyun wants to merge 5 commits intoapache:masterfrom
Conversation
|
Could you review this simplified version of |
|
This feature makes sense to me but I have 2 questions:
|
|
Thank you for review, @peter-toth .
Here, the successful recovery should be considered as a job completions. If OOM kills executors one by one consecutively due to the re-try, the jobs fail eventually without moving to the next stage. And, if we consider only a single stage, yes. The set of executors will not change further if there is no executor loss.
Ya, that's the valid corner case. Let me disable this feature for that configuration. |
|
I addressed your comment and revised the PR title, @peter-toth . |
Yes, I agree that recovering from an OOM is a huge win.
Would it make sense to set |
Of course, it sounds better to me because it's the theoretical minimum. We simply need to revise our abstract according to this code.
I understand your requirement of fail-fast. Technically, you want to give the users the right to disable the whole feature, right? |
I believe the new |
|
The PR is updated.
Please note that SPARK-55553 Document heterogeneous K8s executors is planned to revisit all config tables and create a new document for this whole feature, @peter-toth . |
|
Thank you, @peter-toth . I'll test more before merging this. |
|
Updated the screenshot. BTW, I observed a funny situation where the recovery-mode executor is much faster than the normal executor in the following case. It's a little counter-intuitive but it might be due to competition overhead in this simple SparkPi case. 😄 |
|
Merged to master. |
What changes were proposed in this pull request?
This PR aims to support
Recovery-modeK8s Executors.Why are the changes needed?
Frequently, Spark jobs have skewed data and corresponding tasks. In the worst case, a single resource-hungry task becomes a serial killer of executors. The homogeneous executors are unable to handle this kind of situation.
So, this PR introduces a recovery-mode executor when Spark driver detects
OOMsituation from the failed executors. A recovery-mode executor aims to provide an executor serving only a single task. In other words, after OOM, new executor will be created with the same pod spec (memory and core) exceptENV_EXECUTOR_CORES=spark.task.cpusenvironment variable setting to allow a single task per executor JVM.EXAMPLE: OOM happens at
Executor 1andExecutor 3is created as the recover modeDoes this PR introduce any user-facing change?
spark.kubernetes.allocation.recoveryMode.enabled=falsewill disable the whole feature.How was this patch tested?
Pass the CIs with newly added test cases.
Was this patch authored or co-authored using generative AI tooling?
Generated-by:
Gemini 3.1 Pro (High)onAntigravity