Skip to content

fix(export): stream large database dumps with keyset pagination (#59)#233

Open
trongtruong110-ux wants to merge 1 commit into
outerbase:mainfrom
trongtruong110-ux:fix/stream-large-database-dumps
Open

fix(export): stream large database dumps with keyset pagination (#59)#233
trongtruong110-ux wants to merge 1 commit into
outerbase:mainfrom
trongtruong110-ux:fix/stream-large-database-dumps

Conversation

@trongtruong110-ux
Copy link
Copy Markdown

Problem

dumpDatabaseRoute builds the entire dump — every row of every table — into a single in-memory string before responding (let dumpContent = '' … then new Blob([dumpContent])). For a large database this exceeds the isolate's memory limit and the request fails outright, which is what #59 reports.

Changes

Rewrites the SQL dump path to stream instead of buffer:

  • Bounded memory. Rows are read one page at a time and the response body is a pull-driven ReadableStream. The runtime only asks for the next page once the previous chunk has flushed downstream, so peak memory is ~one page regardless of database size.
  • Keyset pagination (O(n)). Pages use WHERE _rowid_ > ? ORDER BY _rowid_ LIMIT ? rather than LIMIT/OFFSET. OFFSET re-scans every skipped row on each page (O(n²)), which is unusable on a large table. WITHOUT ROWID tables (which have no _rowid_) fall back to LIMIT/OFFSET.
  • Correctness. Table identifiers are quoted; values are encoded as proper SQL literals — NULL, numbers, 0/1 for booleans, X'..' for blobs, and escaped strings. Internal sqlite_* tables are skipped.
  • A database/connection error that occurs before streaming starts still returns a clean 500; the table list is resolved eagerly for that reason.

No route or public API change — dumpDatabaseRoute(dataSource, config) is unchanged (an optional pageSize argument is added for tests and tuning).

Tests

src/export/dump.test.ts is rewritten to cover: streamed output with headers, multi-page keyset pagination (asserting the cursor query is used and OFFSET is not), value encoding (NULL / number / boolean / blob / quote-escaping), the WITHOUT ROWID fallback, an empty database, and the 500 error path.

Follow-up

This fixes the memory blow-up for the synchronous /export/dump endpoint. For databases large enough to also exceed the Worker CPU-time limit within a single request, a follow-up can offload the dump to R2 via a Durable Object alarm. Happy to do that as a separate PR.

/claim #59

…rbase#59)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant