Skip to content

Dataset bulk upload is extremely slow with many files #5586

@kunwp1

Description

@kunwp1

What happened?

Uploading a large number of files at once to a dataset is extremely slow. Concretely, dropping 1100 one-byte files into a dataset takes roughly 12 minutes to finish,.

Expected: uploading many files at once should be reasonably fast and keep the UI responsive.

How to reproduce?

  1. Open a dataset you have write access to (the dataset detail page).
  2. Drag-and-drop ~1100 small files (e.g., 1 byte each) into the uploader.
  3. Observe: the upload takes ~12 minutes and the UI is sluggish throughout.

Root cause

While files are uploading, the page re-renders the entire list of queued files over and over, and that rendering runs on the same single browser thread that performs the uploads so with ~1000 files the constant re-rendering causes the slowness of the upload pipeline.

JavaScript is single-threaded, and the thread doing all that re-rendering is the same one that runs the upload callbacks.

Proposed fix

  • Virtualize the Pending and Finished lists (e.g. cdk-virtual-scroll) so only the visible rows are in the DOM — bounding both rendering and memory regardless of file count.
  • Replace the queuedFileNames getter with a cached field updated only when the queue changes, and add trackBy — so a change-detection pass no longer allocates an array or re-checks every row.

Net effect: each change-detection pass becomes cheap (O(visible rows), no allocation), the main thread stays free to drive the uploads, and the DOM/memory stays small.

Version/Branch

1.3.0-incubating-SNAPSHOT (main)

What browsers are you seeing the problem on?

Chrome (Chromium-based). The root cause is browser-agnostic, so other browsers are likely affected as well.

Relevant log output

For the slowness there is no error output — the upload simply takes ~12 minutes with a sluggish UI. Under memory pressure (larger files) Chrome shows an "Aw, Snap! — Out of memory" / renderer crash, typically with no JavaScript exception in the console.

Metadata

Metadata

Assignees

Labels

frontendChanges related to the frontend GUI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions