Add bounded SEA API support for CloudFetch (UseBoundedSeaApi) by gopalldb · Pull Request #1468 · databricks/databricks-jdbc

gopalldb · 2026-05-25T09:30:06Z

Summary

Bounded SEA API compliance for CloudFetch. Brings the driver into alignment with the bounded SEA API contract for OSS drivers. Gated behind UseBoundedSeaApi=0 (default off).

When UseBoundedSeaApi=1 is set:

Sends row_offset on GetResultData requests — appended as a query parameter on /sql/statements/{id}/result/chunks/{idx}?row_offset=N. Derived from the previous link's row_offset + row_count. Required for future >100GB results and cluster-side fetch.
Forces StreamingChunkProvider — which discovers chunk links incrementally via next_chunk_index chaining (not total_chunk_count). Downloads are still concurrent from presigned URLs — the "streaming" refers to link discovery, not data transfer. This bypasses the legacy RemoteChunkProvider which pre-allocates a fixed-size chunk map from the deprecated total_chunk_count manifest field.
Batched link refresh reconciliation — during coalesced link refresh on URL expiry, newly-discovered chunks from the server response are added to the provider's map (previously silently skipped). End-of-stream flag updated from refresh response. Follows per-chunk state-machine reconciliation from the bounded SEA API spec.

Default: off (UseBoundedSeaApi=0). No behavior change unless explicitly enabled.

What already works (no changes needed)

StreamingChunkProvider (the default SEA CloudFetch provider) already:

Uses lastLink.getNextChunkIndex() to chain to the next batch of links
Detects end-of-stream when nextChunkIndex == null on the last link
Does NOT depend on total_chunk_count for control flow
Downloads chunks concurrently via a thread pool (maxChunksInMemory parallel downloads)
Prefetches links ahead of the consumer via a dedicated prefetch thread
Has coalesced link refresh with refetchLock dedup on URL expiry

Changes

File	Change
`DatabricksJdbcUrlParams.java`	New `UseBoundedSeaApi` property (default `0`)
`IDatabricksConnectionContext.java`	New `isBoundedSeaApiEnabled()` method
`DatabricksConnectionContext.java`	Implementation
`DatabricksSdkClient.java`	Send `?row_offset=N` on GetResultData when gate is on
`ArrowStreamResult.java`	Force `StreamingChunkProvider` when gate is on
`SeaChunkLinkFetcher.java`	Updated comments
`StreamingChunkProvider.java`	Reconcile newly-discovered chunks from refresh response; update endOfStream from refresh

NO_CHANGELOG=true

Test plan

ArrowStreamResultTest — 15 tests pass
DatabricksSdkClientTest — 42 tests pass
DatabricksConnectionContextTest — 131 tests pass
Total: 188 tests, 0 failures
With UseBoundedSeaApi=0 (default): zero behavior change
E2E test with UseBoundedSeaApi=1 against live warehouse (follow-up)

This pull request was AI-assisted by Isaac.

Part 1 of bounded SEA API compliance for CloudFetch: 1. New connection property UseBoundedSeaApi (default 0/off). When enabled: - Sends row_offset query parameter on GetResultData requests - Forces StreamingChunkProvider (which uses next_chunk_index, not total_chunk_count) even when streaming is explicitly disabled 2. StreamingChunkProvider already uses next_chunk_index for continuation and end-of-stream detection — no changes needed to its core logic. 3. Legacy RemoteChunkProvider (uses total_chunk_count) is bypassed when bounded SEA is enabled. row_offset is derived from the previous link's row_offset + row_count and sent as a query parameter on /sql/statements/{id}/result/chunks/{idx}. This is required for future >100GB results and cluster-side fetch. Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>

During coalesced link refresh, the server may return links for chunks not yet in the provider's map (newly-discovered chunks beyond highestKnownChunkIndex). Previously these were silently skipped. Now: create new chunks from refresh response links, update highestKnownChunkIndex, and set endOfStreamReached from the response's hasMore flag. Follows the per-chunk state-machine reconciliation from the bounded SEA API spec. Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>

Fixes 3 gaps found by comparing with the legacy ChunkDownloadTask: 1. Outer catch(Throwable) + exception chaining in finally: uncaught exceptions were lost — the finally block created a generic exception without the original cause. Now captures uncaughtException and chains it, matching ChunkDownloadTask's pattern. 2. Thread context propagation: download threads had no connection context or statementId for telemetry/logging. Now captures caller's context via DatabricksThreadContextHolder and clears in finally. 3. Download timing: added task-level timing log (totalMs, retries) matching ChunkDownloadTask's diagnostics. Also includes the RuntimeException catch (parity with PR databricks#1302). Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>

P0-1: Remove redundant chunk.setStatus(DOWNLOAD_FAILED) in inner catch — defer entirely to finally block. Fixes StreamingChunkDownloadTaskTest. P0-2: Add NEXT_CHANGELOG.md entry under ### Added for UseBoundedSeaApi. P1-1: Call triggerDownloads() after reconciliation creates new chunks from refresh response — prevents newly-discovered chunks sitting PENDING. P1-2/P1-3: Un-gated changes (new chunk creation, EOS from refresh, triggerDownloads) are intentional parity fixes for all EnableStreamingChunkProvider=1 users. EnableStreamingChunkProvider defaults to off, so default users are unaffected. P1-4: Revert RuntimeException from inner catch — DatabricksError is caught by outer catch(Throwable) and fails immediately (no retry), matching ChunkDownloadTask behavior exactly. NPE/ISE won't be retried. P2-1: Always send row_offset (even 0 for chunk 0) when bounded SEA enabled — explicit is safer than relying on server default. P2-3: Update nextLinkFetchIndex after reconciliation to avoid prefetch thread re-fetching chunks already discovered via refresh. P2-5: Add "Requires server support" to connection property help text. Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>

gopalldb added 3 commits May 25, 2026 14:58

gopalldb force-pushed the feature/bounded-sea-api-cloudfetch branch from e8e6bde to e9dcd7c Compare May 25, 2026 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bounded SEA API support for CloudFetch (UseBoundedSeaApi)#1468

Add bounded SEA API support for CloudFetch (UseBoundedSeaApi)#1468
gopalldb wants to merge 4 commits into
databricks:mainfrom
gopalldb:feature/bounded-sea-api-cloudfetch

gopalldb commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gopalldb commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What already works (no changes needed)

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gopalldb commented May 25, 2026 •

edited

Loading