Skip to content

direct: fix persistent drift on model_serving_endpoints#5708

Open
pietern wants to merge 2 commits into
mainfrom
investigate-model-serving
Open

direct: fix persistent drift on model_serving_endpoints#5708
pietern wants to merge 2 commits into
mainfrom
investigate-model-serving

Conversation

@pietern

@pietern pietern commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Changes

Restore suppression of the backend-defaulted config.traffic_config (as a backend_default) and budget_policy_id (ignore_remote_changes) on model_serving_endpoints, make Changes.HasChange skip-aware so a suppressed-only change doesn't trigger a spurious section update, and teach the testserver to default traffic_config like the backend so the drift is covered locally.

Why

The backend always returns a defaulted config.traffic_config (100% to the served entity) even when the user specifies none. #5693 dropped the ignore_remote_changes entries for traffic_config and budget_policy_id, assuming the new "skip fields missing from RemoteType" auto-skip covered them — but both fields are present in RemoteType, so the auto-skip never fires and the direct engine reported a permanent update after every clean deploy.

traffic_config is a true backend default the user may also set, so backend_default (skip only when the local value is empty; real drift still detected when set) is the right tool rather than ignore_remote_changes. budget_policy_id is never echoed by GET, so it stays under ignore_remote_changes.

Making the testserver faithful surfaced a latent over-update: HasChange counted skipped changes, so the suppressed traffic_config triggered a spurious PUT /config on unrelated updates. HasChange is now skip-aware (matching HasChangeExcept).

Tests

running-endpoint now reproduces the drift locally (fails without the fix). Passes on both engines on real cloud (aws-prod-ucws). Full ./task test / fmt / lint / checks pass.

This PR and its description were written by Isaac.

pietern added 2 commits June 24, 2026 17:14
The backend always returns a defaulted config.traffic_config (100% to the
served entity) even when the user does not specify one. #5693 dropped the
suppression for it (and for budget_policy_id) assuming the new
missing-in-remote auto-skip covered them, but both are present in RemoteType
so the auto-skip never fires, and the direct engine reported a permanent
update after every clean deploy.

Suppress config.traffic_config as a backend_default (skip only when the user
leaves it empty; real drift is still detected if it is set) and keep
budget_policy_id under ignore_remote_changes (GET never echoes it).

Make Changes.HasChange skip-aware so a suppressed-only change under a section
no longer triggers a spurious section update (e.g. PUT /config on an
unrelated email/tags update).

Teach the testserver to default traffic_config like the backend so the
running-endpoint acceptance test reproduces the drift locally.

Co-authored-by: Isaac
Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 24, 2026 15:15 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 24, 2026 15:15 — with GitHub Actions Inactive
@github-actions

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/bundle/ - needs approval

14 files changed
Suggested: @denik
Also eligible: @janniklasrose, @andrewnester, @anton-107, @shreyas-goenka, @lennartkats-db

/bundle/ - needs approval

Files: bundle/deployplan/plan.go, bundle/direct/dresources/resources.yml
Suggested: @denik
Also eligible: @janniklasrose, @andrewnester, @anton-107, @shreyas-goenka, @lennartkats-db

General files (require maintainer)

Files: NEXT_CHANGELOG.md, libs/testserver/serving_endpoints.go
Based on git history:

  • @denik -- recent work in bundle/deployplan/, bundle/direct/dresources/, ./

Any maintainer (@andrewnester, @anton-107, @denik, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@denik

denik commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

how did you find this?

Comment thread bundle/deployplan/plan.go
)

// HasChange checks if there are any changes for fields with the given prefix.
// HasChange checks if there are any actionable changes for fields with the given prefix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasChange is used in a few places:

bind.go
dresources/app.go
dresources/cluster.go
dresources/model_serving_endpoint.go
dresources/sql_warehouse.go
dresources/vector_search_endpoint.go

are all of them safe to change this? should we PR this change separately (maybe we a dedicated regression test).

@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: e52b568

Run: 28109124119

Env ❌​FAIL 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 13 244 1024 5:15
💚​ aws windows 7 13 246 1022 6:46
💚​ aws-ucws linux 7 13 334 940 6:27
💚​ aws-ucws windows 7 13 336 938 7:34
🔄​ azure linux 5 15 243 1022 12:39
❌​ azure windows 1 2 1 15 246 1020 13:43
❌​ azure-ucws linux 3 1 1 15 335 936 14:14
❌​ azure-ucws windows 3 1 15 338 934 11:02
💚​ gcp linux 1 15 246 1024 5:09
💚​ gcp windows 1 15 248 1022 4:14
28 interesting tests: 13 SKIP, 6 flaky, 6 RECOVERED, 3 FAIL
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🔄​ TestAccept 💚​R 💚​R 💚​R 💚​R 🔄​f 💚​R 💚​R 💚​R 💚​R 💚​R
🔄​ TestAccept/bundle/deployment/bind/alert 🙈​s 🙈​s 🙈​s 🙈​s 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/deployment/bind/alert/DATABRICKS_BUNDLE_ENGINE=direct 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/generate/alert ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/generate/alert/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestSyncFullFileSync ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ❌​F ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ❌​F ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ❌​F ❌​F ✅​p ✅​p
Top 17 slowest tests (at least 2 minutes):
duration env testname
4:03 azure windows TestAccept/bundle/run_as/job_default/DATABRICKS_BUNDLE_ENGINE=direct
3:42 azure windows TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=direct
3:28 azure windows TestAccept/bundle/generate/auto-bind/DATABRICKS_BUNDLE_ENGINE=terraform
3:12 azure windows TestAccept
3:08 aws windows TestAccept
3:03 aws-ucws windows TestAccept
2:58 azure-ucws windows TestAccept
2:48 azure windows TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=terraform
2:45 azure-ucws windows TestAccept/bundle/generate/auto-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 azure linux TestAccept/bundle/generate/auto-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:36 azure linux TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:35 gcp windows TestAccept
2:27 azure windows TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=direct
2:26 azure-ucws windows TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=direct
2:13 azure linux TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=direct
2:12 azure-ucws windows TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_BUNDLE_ENGINE=terraform
2:06 azure linux TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_BUNDLE_ENGINE=direct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants