-
Notifications
You must be signed in to change notification settings - Fork 188
direct: Make cluster resize more resilient: fallback to regular update if resize failed due to INVALID_STATE #5716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
2f1e45b
direct: fall back to edit when resize fails with INVALID_STATE
denik c9b88bd
changelog: add entry for cluster resize fallback fix (#5716)
denik 36dde04
acc: add trace to resize-terminated-fallback script
denik 94c2ec9
direct: pass PlanEntry to DoResize; reuse DoUpdate for fallback
denik 0e73b99
changelog: update resize entry wording
denik 8123da5
direct: keep update plan for terminated clusters; fix test to use sav…
denik a672468
acc: remove redundant bundle plan text output
denik 6f02aea
changelog: prefix cluster resize entry with direct:
denik f850de7
acc: enable cloud run for resize-terminated-fallback
denik 7b95fb7
direct: log debug message when cluster resize falls back to edit
denik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
acceptance/bundle/resources/clusters/resize-terminated-fallback/databricks.yml.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| bundle: | ||
| name: test-bundle | ||
|
|
||
| workspace: | ||
| root_path: ~/.bundle/$UNIQUE_NAME | ||
|
|
||
| resources: | ||
| clusters: | ||
| test_cluster: | ||
| cluster_name: test-cluster-$UNIQUE_NAME | ||
| spark_version: $DEFAULT_SPARK_VERSION | ||
| node_type_id: $NODE_TYPE_ID | ||
| instance_pool_id: $TEST_INSTANCE_POOL_ID | ||
| num_workers: 2 |
3 changes: 3 additions & 0 deletions
3
acceptance/bundle/resources/clusters/resize-terminated-fallback/out.test.toml
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
66 changes: 66 additions & 0 deletions
66
acceptance/bundle/resources/clusters/resize-terminated-fallback/output.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
|
|
||
| >>> [CLI] bundle deploy | ||
| Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME]/files... | ||
| Deploying resources... | ||
| Updating deployment state... | ||
| Deployment complete! | ||
|
|
||
| === Create a plan while cluster is running: should show resize | ||
|
|
||
| >>> [CLI] bundle plan -o json | ||
|
|
||
| === Terminate the cluster before applying the saved plan | ||
|
|
||
| >>> [CLI] clusters get [CLUSTER_ID] | ||
| { | ||
| "cluster_name": "test-cluster-[UNIQUE_NAME]", | ||
| "num_workers": 2, | ||
| "state": "TERMINATED" | ||
| } | ||
|
|
||
| === Apply saved plan: resize fails with INVALID_STATE, falls back to edit | ||
|
|
||
| >>> [CLI] bundle deploy --plan plan.json | ||
| Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME]/files... | ||
| Deploying resources... | ||
| Updating deployment state... | ||
| Deployment complete! | ||
|
|
||
| >>> print_requests.py //clusters/resize //clusters/edit | ||
| { | ||
| "method": "POST", | ||
| "path": "/api/2.1/clusters/resize", | ||
| "body": { | ||
| "cluster_id": "[CLUSTER_ID]", | ||
| "num_workers": 3 | ||
| } | ||
| } | ||
| { | ||
| "method": "POST", | ||
| "path": "/api/2.1/clusters/edit", | ||
| "body": { | ||
| "autotermination_minutes": 60, | ||
| "cluster_id": "[CLUSTER_ID]", | ||
| "cluster_name": "test-cluster-[UNIQUE_NAME]", | ||
| "instance_pool_id": "[TEST_INSTANCE_POOL_ID]", | ||
| "num_workers": 3, | ||
| "spark_version": "13.3.x-snapshot-scala2.12" | ||
| } | ||
| } | ||
|
|
||
| === Cluster should have new num_workers | ||
|
|
||
| >>> [CLI] clusters get [CLUSTER_ID] | ||
| { | ||
| "cluster_name": "test-cluster-[UNIQUE_NAME]", | ||
| "num_workers": 3 | ||
| } | ||
|
|
||
| >>> [CLI] bundle destroy --auto-approve | ||
| The following resources will be deleted: | ||
| delete resources.clusters.test_cluster | ||
|
|
||
| All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME] | ||
|
|
||
| Deleting files... | ||
| Destroy complete! |
27 changes: 27 additions & 0 deletions
27
acceptance/bundle/resources/clusters/resize-terminated-fallback/script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| envsubst < databricks.yml.tmpl > databricks.yml | ||
|
|
||
| cleanup() { | ||
| trace $CLI bundle destroy --auto-approve | ||
| rm -f out.requests.txt | ||
| } | ||
| trap cleanup EXIT | ||
|
|
||
| trace $CLI bundle deploy | ||
|
|
||
| CLUSTER_ID=$($CLI bundle summary -o json | jq -r '.resources.clusters.test_cluster.id') | ||
| echo "$CLUSTER_ID:CLUSTER_ID" >> ACC_REPLS | ||
|
|
||
| title "Create a plan while cluster is running: should show resize\n" | ||
| update_file.py databricks.yml "num_workers: 2" "num_workers: 3" | ||
| trace $CLI bundle plan -o json > plan.json | ||
|
|
||
| title "Terminate the cluster before applying the saved plan\n" | ||
| $CLI clusters delete "$CLUSTER_ID" > /dev/null | ||
| trace $CLI clusters get "$CLUSTER_ID" | jq '{cluster_name,num_workers,state}' | ||
|
|
||
| title "Apply saved plan: resize fails with INVALID_STATE, falls back to edit\n" | ||
| trace $CLI bundle deploy --plan plan.json | ||
| trace print_requests.py //clusters/resize //clusters/edit | ||
|
|
||
| title "Cluster should have new num_workers\n" | ||
| trace $CLI clusters get "$CLUSTER_ID" | jq '{cluster_name,num_workers}' |
8 changes: 8 additions & 0 deletions
8
acceptance/bundle/resources/clusters/resize-terminated-fallback/test.toml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| Local = true | ||
| Cloud = true | ||
| RecordRequests = true | ||
|
|
||
| Ignore = [".databricks", "databricks.yml", "plan.json"] | ||
|
|
||
| [EnvMatrix] | ||
| DATABRICKS_BUNDLE_ENGINE = ["direct"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log (at info / debug level) that resize failed and we fail back, might be useful later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, added.