Skip to content

Add TPC-H to sqllogictests#6830

Open
AdamGS wants to merge 6 commits intodevelopfrom
adamg/sqllogictest-tpch
Open

Add TPC-H to sqllogictests#6830
AdamGS wants to merge 6 commits intodevelopfrom
adamg/sqllogictest-tpch

Conversation

@AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Mar 6, 2026

This PR adds TPC-H to to our SLT suite, it requires generating data (script provided), and currently doesn't work due to subtle formatting differences between the crates.

One reason for those differences is that we actually run the queries on different schemas as they each infer a different schema.

I've also added some duckdb functionality as I tried to get it working, and I figure its worth merging it anyway.

@AdamGS AdamGS requested review from joseph-isaacs and myrrc March 6, 2026 17:26
@AdamGS AdamGS added changelog/chore A trivial change ext/duckdb Relates to the DuckDB integration ext/datafusion Relates to the DataFusion integration labels Mar 6, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 6, 2026

Merging this PR will degrade performance by 10.2%

⚡ 3 improved benchmarks
❌ 1 regressed benchmark
✅ 976 untouched benchmarks
🆕 20 new benchmarks
⏩ 1466 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.1, 0.5)] 2.6 ms 2.1 ms +23.63%
Simulation take_map[(0.1, 0.1)] 1,007.5 µs 908.9 µs +10.84%
Simulation take_map[(0.1, 1.0)] 4.2 ms 3.5 ms +20.81%
🆕 Simulation sequence_decompress_u32 N/A 4.1 ms N/A
🆕 Simulation sequence_compress_u32 N/A 5.2 ms N/A
🆕 Simulation decompress_utf8[(1000, 4)] N/A 42.6 µs N/A
🆕 Simulation decompress_utf8[(1000, 256)] N/A 28.6 µs N/A
🆕 Simulation decompress_utf8[(1000, 16)] N/A 30.9 µs N/A
🆕 Simulation decompress_utf8[(10000, 1024)] N/A 112.5 µs N/A
🆕 Simulation decompress_utf8[(100000, 4096)] N/A 944.7 µs N/A
🆕 Simulation decompress_utf8[(10000, 16)] N/A 135.9 µs N/A
🆕 Simulation decompress_utf8[(100000, 256)] N/A 958.2 µs N/A
🆕 Simulation decompress_utf8[(10000, 4)] N/A 209.8 µs N/A
🆕 Simulation decompress_utf8[(1000000, 4096)] N/A 9.3 ms N/A
🆕 Simulation decompress_utf8[(1000000, 8192)] N/A 9.3 ms N/A
🆕 Simulation decompress_utf8[(10000, 256)] N/A 113.2 µs N/A
🆕 Simulation decompress_utf8[(100000, 16)] N/A 1.2 ms N/A
🆕 Simulation decompress_utf8[(1000000, 256)] N/A 9.4 ms N/A
🆕 Simulation decompress_utf8[(1000000, 4)] N/A 19 ms N/A
🆕 Simulation decompress_utf8[(1000000, 16)] N/A 11.7 ms N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing adamg/sqllogictest-tpch (b9f1d55) with develop (5d6a3c8)2

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (4cbfb33) during the generation of this report, so 5d6a3c8 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

AdamGS added 3 commits March 6, 2026 17:34
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/sqllogictest-tpch branch from c0b0735 to 1625151 Compare March 6, 2026 17:34
AdamGS added 2 commits March 6, 2026 17:36
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Copy link
Contributor

@myrrc myrrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM sans minor changes.

uses: ./.github/actions/setup-rust
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- name: Install uv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set up uv to upload coverage results, worth removing it from there to avoid duplicate installation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a different job IMO, runs on a different machine

((high as i128) << 64) | (low as i128)
}

#[inline]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth defining in #[tests]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it in ExtractedValue

@@ -0,0 +1,32 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but I'd avoid using bash if we can go with sh.
I see only &> (can be replaced with 1>/dev/null 2>&1) and
BASH_SOURCE. Replacing the second one is more tricky but I think something like $_
would work.

vortex_bail!("DuckDB HugeInt is not yet supported in Vortex");
}
ExtractedValue::UHugeInt(_) => {
vortex_bail!("DuckDB UHugeInt is not yet supported in Vortex");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this u128 int? We might add this soon.

Good to know duckdb has it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah duckdb seems to have both

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change ext/datafusion Relates to the DataFusion integration ext/duckdb Relates to the DuckDB integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants