Skip to content

feat(encoding): support u128 bitpacking for decimal128 columns#6858

Draft
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:feat/decimal128-u128-bitpacking
Draft

feat(encoding): support u128 bitpacking for decimal128 columns#6858
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:feat/decimal128-u128-bitpacking

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

Closes #6857.

What

Extends the miniblock inline-bitpacking chooser to also consider bits = 128 and adds a scalar BitPacking kernel for u128, so decimal128 columns whose values fit in <128 bits no longer fall through to raw 128-bit storage.

Impact

Measured on TPC-DS SF=100 store_sales (288 M rows, 12 × decimal128(7,2) columns):

Before After
On-disk size 34 GiB 15.873 GiB
Bytes per row ~127 ~59

~53 % reduction, schema / row count / file format version (v2.1) unchanged.

Changes

  • rust/compression/bitpacking/src/lib.rs — scalar u128 BitPacking kernel.
  • rust/lance-encoding/src/encodings/physical/bitpacking.rs — u128 miniblock encode / decode wiring.
  • rust/lance-encoding/src/compression.rs — chooser now matches bits ∈ {8, 16, 32, 64, 128}.
  • rust/lance-encoding/src/statistics.rs — stat plumbing for the 128 case.

+454 / -23 across 4 files. No new public API, no on-wire format change (the new bit-width is already valid for v2.1 readers — the encoder just didn't previously emit it).

Testing

  • u128 round-trip unit tests (in-line and out-of-line value distributions, including width = 63 boundary).
  • End-to-end compress → decompress test through the miniblock path.
  • Full Spark V2 CTAS rewrite of TPC-DS SF=100 store_sales verifies row count, schema, and reads back identically.

Notes

Scalar only — no FastLanes-transposed kernel for u128 in this PR; that's a natural follow-up if decode throughput becomes a bottleneck.

decimal128(7,2) columns were stored at full 128-bit width without
compression because BitPacking only supported u8/u16/u32/u64. This
adds scalar u128 bitpacking, reducing decimal128 storage from 131
bits/value to ~24 bits/value (5.6x compression on TPC-DS store_sales).

File size: 34 GiB → 16 GiB for store_sales SF=100.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the enhancement New feature or request label May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 89.20863% with 30 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ance-encoding/src/encodings/physical/bitpacking.rs 83.51% 26 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

@LuciferYang LuciferYang marked this pull request as draft May 20, 2026 13:32
@LuciferYang
Copy link
Copy Markdown
Contributor Author

LuciferYang commented May 20, 2026

Decoding performance has degraded. I need to explore possible optimizations to decide whether to move forward with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support u128 bitpacking for decimal128 columns

1 participant