Skip to content

experimental/ssh: upload connect binaries over HTTP/1.1#5703

Open
anton-107 wants to merge 1 commit into
mainfrom
ssh-binary-upload-http11
Open

experimental/ssh: upload connect binaries over HTTP/1.1#5703
anton-107 wants to merge 1 commit into
mainfrom
ssh-binary-upload-http11

Conversation

@anton-107

Copy link
Copy Markdown
Contributor

Why

databricks ssh connect uploads the ~14 MB CLI bundle to the workspace files
import-file API during the "Uploading binaries…" step. Behind some corporate
networks this fails with:

stream error: stream ID N; NO_ERROR; received from peer

That is an HTTP/2 stream reset: an intermediary (corporate egress proxy / VPN /
WAF / load balancer) tears down the large request body mid-upload. The symptom
has two causes that look identical:

  1. A genuine request-body size limit at the intermediary — fails over HTTP/1.1
    too, only a network-policy change fixes it.
  2. The Go HTTP/2 transport ↔ proxy interop reset (RST_STREAM(NO_ERROR)
    mid-body) — Go can't retry a partially-sent non-idempotent POST, so it
    surfaces a fatal error. HTTP/1.1 avoids this entirely. A customer
    confirmed GODEBUG=http2client=0 (force HTTP/1.1) fixed the upload with no
    network change.

A single large POST gains nothing from HTTP/2, so we can sidestep the whole
HTTP/2-reset class for this one upload.

What

  • Upload the binaries over an HTTP/1.1-only client, scoped to that single
    upload; the rest of the connect flow keeps using HTTP/2.
  • Add filer.NewWorkspaceFilesClientWithClient so the upload can run on a
    dedicated client without copying or mutating the shared config.Config
    (it embeds a sync.Mutex).
  • newHTTP11Transport clones the resolved config's transport and disables
    HTTP/2 via a non-nil, empty TLSNextProto map.
  • Broaden the upload-error hint (isStreamResetErrorisProxyUploadError)
    to also catch the HTTP/1.1 signatures of a genuine intermediary body-size
    limit (413, connection reset). Since the upload now always uses HTTP/1.1,
    the reworded message no longer points at HTTP/2 and instead tells the user to
    ask their network admin to allow large uploads or switch networks.

Testing

  • Unit: isProxyUploadError (413 / connection-reset / HTTP/2 / negatives),
    newHTTP11Transport disables HTTP/2, NewWorkspaceFilesClientWithClient
    wiring.
  • Acceptance: acceptance/ssh/connection exercises the full upload over the new
    client against the testserver; recorded output is unchanged.
  • go vet and golangci-lint are clean.

See DECO-27497.

This pull request and its description were written by Isaac.

`databricks ssh connect` uploads the ~14 MB CLI bundle to the workspace
files import-file API during the "Uploading binaries..." step. Some
corporate proxies reset large HTTP/2 request bodies with
RST_STREAM(NO_ERROR), surfacing as `stream error: stream ID N; NO_ERROR;
received from peer` and aborting the connection. A single large POST gains
nothing from HTTP/2, so force HTTP/1.1 for this one upload client, which
sidesteps the HTTP/2 interop entirely while leaving the rest of the connect
flow on HTTP/2.

Add filer.NewWorkspaceFilesClientWithClient so the upload can run on a
dedicated HTTP/1.1 client without copying or mutating the shared config.
Broaden the upload-error hint to also cover the HTTP/1.1 signatures of a
genuine intermediary body-size limit (413, connection reset), since those
(not the HTTP/2 reset) are what remains after forcing HTTP/1.1.

See DECO-27497.

Co-authored-by: Isaac
@anton-107 anton-107 temporarily deployed to test-trigger-is June 24, 2026 13:44 — with GitHub Actions Inactive
@anton-107 anton-107 temporarily deployed to test-trigger-is June 24, 2026 13:44 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 370cd2b

Run: 28103037799

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 13 244 1024 5:41
🟨​ aws windows 7 13 246 1022 8:00
💚​ aws-ucws linux 7 13 334 940 5:17
💚​ aws-ucws windows 7 13 336 938 5:34
💚​ azure linux 1 15 247 1022 7:04
🔄​ azure windows 3 15 247 1020 6:25
💚​ azure-ucws linux 1 15 339 936 6:18
💚​ azure-ucws windows 1 15 341 934 5:50
💚​ gcp linux 1 15 246 1024 4:39
💚​ gcp windows 1 15 248 1022 5:13
22 interesting tests: 13 SKIP, 7 KNOWN, 2 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 🔄​f 💚​R 💚​R 💚​R 💚​R
🔄​ TestAccept/bundle/generate/alert ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/generate/alert/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 3 slowest tests (at least 2 minutes):
duration env testname
3:22 gcp windows TestAccept
3:21 aws-ucws windows TestAccept
3:08 azure-ucws windows TestAccept

@anton-107 anton-107 marked this pull request as ready for review June 24, 2026 14:42
@github-actions

Copy link
Copy Markdown
Contributor

Approval status: pending

/libs/filer/ - needs approval

Files: libs/filer/workspace_files_client.go, libs/filer/workspace_files_client_test.go
Suggested: @Divyansh-db
Also eligible: @simonfaltum, @renaudhartert-db, @hectorcast-db, @parthban-db, @tanmay-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

General files (require maintainer)

Files: experimental/ssh/internal/client/releases.go, experimental/ssh/internal/client/releases_test.go
Based on git history:

  • @pietern -- recent work in libs/filer/, experimental/ssh/internal/client/

Any maintainer (@andrewnester, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants