Skip to content

update umbra#951

Open
toschmidt wants to merge 1 commit into
ClickHouse:mainfrom
umbra-db:schmidt/umbra26.06
Open

update umbra#951
toschmidt wants to merge 1 commit into
ClickHouse:mainfrom
umbra-db:schmidt/umbra26.06

Conversation

@toschmidt

Copy link
Copy Markdown
Contributor

Switch to parquet for loading the data and resolve the memory usage problem during loading on smaller instances.

Update the Umbra ClickBench definition: drop the primary key, ingest from
the Athena hits.parquet via the umbra.parquetview table function (instead
of a TSV COPY), and store the table with zstd compression (create.sql, the
Docker variant, reads /data/hits.parquet from the bind mount).

Switch the dataset download to hits.parquet (BENCH_DOWNLOAD_SCRIPT) to
match, require the loaded row count to equal exactly 99,997,497 (a partial
load otherwise sails through with implausibly fast timings on the
surviving subset), and run the Docker container --privileged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant