Skip to content

refactor: move from 3-variant index search result to single generic variant#6902

Open
westonpace wants to merge 1 commit into
lance-format:mainfrom
westonpace:refactor-index-expr-result-2mask-impl
Open

refactor: move from 3-variant index search result to single generic variant#6902
westonpace wants to merge 1 commit into
lance-format:mainfrom
westonpace:refactor-index-expr-result-2mask-impl

Conversation

@westonpace
Copy link
Copy Markdown
Member

@westonpace westonpace commented May 21, 2026

What

Replaces the three-variant enum:

pub enum IndexExprResult {
    Exact(RowAddrMask),
    AtMost(RowAddrMask),   // upper bound on the answer
    AtLeast(RowAddrMask),  // lower bound on the answer
}

with a single struct carrying both bounds:

pub struct IndexExprResult {
    pub lower: RowAddrMask,  // rows definitely in the answer
    pub upper: RowAddrMask,  // rows that may be in the answer
                             // (rows outside `upper` are definitely out)
}

Same change applied to NullableIndexExprResult (the during-evaluation form
that carries SQL 3VL info via NullableRowAddrMask).

The three pre-existing shapes map onto degenerate intervals:

Old variant New form
Exact(m) {lower: m, upper: m}
AtMost(m) {lower: allow_nothing(), upper: m}
AtLeast(m) {lower: m, upper: all_rows()}

Constructed via IndexExprResult::{exact, at_most, at_least} associated
functions; inspected via is_exact() / is_at_most() / is_at_least()
predicates.

Why

Some indices can return three-way info per row range — "definitely
matches", "maybe matches", "definitely doesn't match" — and the old
representation couldn't express it. The new form makes that fourth shape
representable as a non-degenerate interval (nonempty lower strictly
inside non-universe upper), and the boolean algebra collapses to
elementwise lattice operations on the endpoints:

!{l, u}              = {!u, !l}
{l1, u1} & {l2, u2}  = {l1 & l2, u1 & u2}
{l1, u1} | {l2, u2}  = {l1 | l2, u1 | u2}

The motivating case: a zone-map IS NOT NULL search can now produce
{lower: zones_with_zero_nulls, upper: not_all_null_zones}. The read
planner can both skip the all-null zones entirely (I/O saving on the
50%-clustered-nulls case) and skip the recheck on the
guaranteed-non-null zones
(CPU saving). The old algebra had to either
drop the lower bound (resulting in an AtMost that loses the
all-null-skip) or drop the upper bound (resulting in an AtLeast that
loses the recheck-skip).

3VL semantics are preserved: NULL info lives inside each
NullableRowAddrMask (in its nulls field), and NullableRowAddrMask::Not
flips AllowBlock without touching nulls, so !{l, u} = {!u, !l}
preserves NULL on both endpoints.

Wire format

The INDEX_EXPR_RESULT_SCHEMA columns change from
(result: Binary, discriminant: UInt32, fragments_covered: Binary) to
(lower: Binary, upper: Binary, fragments_covered: Binary). The schema
is internal to the in-process hand-off from ScalarIndexExec to
FilteredReadExec, so this isn't a persisted-format break.

Other touchpoints

  • apply_index_to_fragment in filtered_read.rs: switches from a
    3-arm match to a predicate chain. Existing shapes keep their
    existing read-plan semantics. A non-degenerate Refined case is
    handled conservatively for now (read upper, recheck via
    full_filter); a follow-up will split to_read into
    guaranteed-match ranges (skip recheck) and recheck ranges (apply
    full filter) so the new representation actually pays off.
  • MaterializeIndexExec in scalar_index.rs uses result.upper
    for the candidate-row set. AtLeast still bails (upper is
    universe); everything else works.
  • New NullableRowAddrMask::allow_nothing() / all_rows()
    constructors for the at_most / at_least builders.
  • NullableRowAddrMask gains PartialEq so is_exact() can check
    lower == upper.
  • IndexExprResult::{from_parts, serialize_to_arrow} (inherent methods
    on the old enum) → index_expr_result_from_parts(lower, upper) /
    serialize_index_expr_result(result, frags) free functions on
    lance-index::scalar::expression.
  • Existing certainty-propagation tests in expression.rs rewritten to
    use the new constructors and is_at_most() / is_at_least()
    predicates. All assertions still hold under the new algebra.

Performance

Bench from lance-select/benches/index_expr_result.rs at N=10M rows.
Both runs at criterion --measurement-time 3 --warm-up-time 2 --sample-size 30, same workload (single contiguous run, AllowList with
no nulls), same machine, idle CPU. Median time per op:

Op Variant pair Baseline 2-mask Delta
! Exact 8 ns 13 ns +60%
! AtMost / AtLeast 8 ns 14 ns +75%
& Exact_Exact 7.24 µs 13.84 µs +91%
& AtMost_AtMost 7.49 µs 7.32 µs −2%
& AtLeast_AtLeast 7.22 µs 7.35 µs +2%
& Exact_AtMost 7.15 µs 10.16 µs +42%
& Exact_AtLeast 1.63 µs 7.19 µs +341% (was a drop)
& AtMost_AtLeast 1.66 µs 1.76 µs +6%
| Exact_Exact 3.82 µs 7.54 µs +97%
| AtMost_AtMost 3.90 µs 3.87 µs −1%
| AtLeast_AtLeast 3.82 µs 4.00 µs +5%
| Exact_AtMost 3.67 µs 3.75 µs +2%
| Exact_AtLeast 3.74 µs 12.49 µs +234% (was a drop)
| AtMost_AtLeast 3.04 µs 11.70 µs +285% (was a drop)

Reading the deltas:

  • NOT: a few extra ns. Was one mask flip; now two endpoint flips +
    swap. Irrelevant in absolute terms.
  • Same-variant AtMost_AtMost / AtLeast_AtLeast: roughly
    unchanged. The second mask op composes empty-with-empty or
    universe-with-universe and short-circuits in RowAddrMask::BitAnd.
  • Exact_Exact: ~1.9-2× slower. Both endpoints carry the full mask
    (the exact() constructor clones), so we genuinely do two mask ops
    instead of one. Expected.
  • Cases that the old algebra resolved by dropping a bound
    (Exact_AtLeast for both ops, AtMost_AtLeast for OR): 3-4× slower
    in absolute terms. The old code returned without doing a mask op;
    the new code computes both endpoints and preserves both bounds.
    Follow-up optimization here is to add BlockList(empty) /
    AllowList(empty) fast paths to NullableRowAddrMask::{BitAnd, BitOr}, which would catch the universe/empty operands directly.

Trade-off: a few extra microseconds per binary op on 10M-row masks (no
impact on query latency — millisecond-scale and dominated by I/O) in
exchange for richer interval results. The Refined case unlocks
acceleration paths that weren't reachable before.

Verification

  • cargo test -p lance-select --lib: 99 passed
  • cargo test -p lance-index --lib: 311 passed
  • cargo test -p lance --lib io::exec::filtered_read: 23 passed
  • cargo test -p lance --lib io::exec::scalar_index: 2 passed
  • cargo clippy --workspace --tests -- -D warnings: clean
  • cargo fmt --all --check: clean

Follow-ups

  1. Per-range filter pushdown in filtered_read::apply_index_to_fragment
    so the Refined case can split to_read into definitely-match (skip
    recheck) and recheck ranges (apply full filter) within one fragment.
    That's what makes the new representation actually pay off for IS NOT
    NULL.
  2. Empty/universe short-circuit in NullableRowAddrMask::{BitAnd, BitOr}. Detect AllowList(empty_set) and BlockList(empty_set)
    operands and bypass the per-row work. Erases the regressions on
    cross-variant cases.
  3. Zone map IsNotNull query that produces a Refined result
    directly, instead of going through Not(IsNull).

🤖 Generated with Claude Code

…r} interval

Replaces the `enum IndexExprResult { Exact(M), AtMost(M), AtLeast(M) }`
representation with a single struct carrying two row-address masks:

  pub struct IndexExprResult {
      pub lower: RowAddrMask,  // rows definitely in the answer
      pub upper: RowAddrMask,  // rows that may be in the answer
  }                            // (rows outside upper are definitely out)

Same change applied to `NullableIndexExprResult`. The three pre-existing
shapes map to degenerate intervals:

  Exact(m)  ≡ {lower: m, upper: m}
  AtMost(m) ≡ {lower: allow_nothing(), upper: m}
  AtLeast(m)≡ {lower: m, upper: all_rows()}

…and are now constructed via `IndexExprResult::{exact, at_most, at_least}`
associated fns / inspected via `is_exact()` / `is_at_most()` /
`is_at_least()` predicates. Intervals that are none of those — a
non-empty lower strictly inside a non-universe upper, the "Refined" case
— are now representable; previously the algebra had to either pick a
side (e.g. `AtMost(m) & AtLeast(_) → AtMost(m)`, dropping the lower
bound) or fail to produce one.

The boolean algebra collapses to elementwise lattice ops:

  !{l, u}              = {!u, !l}
  {l1, u1} & {l2, u2}  = {l1 & l2, u1 & u2}
  {l1, u1} | {l2, u2}  = {l1 | l2, u1 | u2}

This works for both `IndexExprResult` and `NullableIndexExprResult` —
the underlying `RowAddrMask` / `NullableRowAddrMask` types already
implement two-valued and SQL three-valued logic correctly inside each
mask, and lifting elementwise to the interval preserves both. NULL info
lives inside each `NullableRowAddrMask` (via its `nulls` field), and
`NullableRowAddrMask::Not` flips Allow↔Block without touching `nulls`,
so `!{l, u} = {!u, !l}` preserves NULL on both endpoints.

Wire format change: the (mask, discriminant) layout serialized by
`ScalarIndexExec` is now (lower, upper). Two binary columns instead of
one binary + one u32. The schema is internal to the in-process hand-off
from `ScalarIndexExec` to `FilteredReadExec`, so this isn't a
persisted-format break.

Other touchpoints:

  * `apply_index_to_fragment` in filtered_read switches from a 3-arm
    `match` to a predicate chain. The three current shapes preserve
    their existing read-plan semantics; the Refined case is handled
    conservatively (read `upper`, recheck via `full_filter`). When
    per-range filter pushdown lands, the Refined arm will split
    `to_read` into guaranteed-match ranges (skip recheck) and recheck
    ranges (apply full filter).

  * `MaterializeIndexExec` in scalar_index.rs uses `result.upper` for
    the candidate-row set. `AtLeast` still bails (upper is universe);
    everything else works.

  * Per-fn `NullableRowAddrMask::allow_nothing()` and `all_rows()`
    constructors added so the `at_most` / `at_least` builders have
    short, obvious empty/universe sentinels.

  * `NullableRowAddrMask` gains `PartialEq` so `is_exact()` can check
    `lower == upper`.

  * `IndexExprResult::{from_parts, serialize_to_arrow}` (inherent
    methods on the old enum) → `index_expr_result_from_parts(lower,
    upper)` / `serialize_index_expr_result(result, frags)` free
    functions on `lance-index::scalar::expression`. The schema
    `INDEX_EXPR_RESULT_SCHEMA` keeps the same name but its columns are
    now (`lower`, `upper`, `fragments_covered`).

  * Tests in expression.rs rewritten to use the new constructors and
    `is_at_most()` / `is_at_least()` predicates. All assertions still
    hold under the new algebra (verified by the bench: every
    cross-variant combination tested lands in the same degenerate
    corner of the lattice as before).

Bench deltas at N=10M rows. Both runs at criterion
`--measurement-time 3 --warm-up-time 2 --sample-size 30`, same workload
(single contiguous run, AllowList with no nulls), same machine, idle
CPU. Median time per op:

  Op   Variant pair         Baseline   2-mask    Delta
  not  Exact                  8 ns      13 ns    +60%
  not  AtMost / AtLeast       8 ns      14 ns    +75%
  and  Exact_Exact          7.24 µs   13.84 µs   +91%
  and  AtMost_AtMost        7.49 µs    7.32 µs   -2%
  and  AtLeast_AtLeast      7.22 µs    7.35 µs   +2%
  and  Exact_AtMost         7.15 µs   10.16 µs  +42%
  and  Exact_AtLeast        1.63 µs    7.19 µs  +341%  (was a drop)
  and  AtMost_AtLeast       1.66 µs    1.76 µs   +6%
  or   Exact_Exact          3.82 µs    7.54 µs  +97%
  or   AtMost_AtMost        3.90 µs    3.87 µs   -1%
  or   AtLeast_AtLeast      3.82 µs    4.00 µs   +5%
  or   Exact_AtMost         3.67 µs    3.75 µs   +2%
  or   Exact_AtLeast        3.74 µs   12.49 µs  +234%  (was a drop)
  or   AtMost_AtLeast       3.04 µs   11.70 µs  +285%  (was a drop)

Reading the deltas:

  * NOT: a few extra ns (one mask flip → two endpoint flips + swap).
    Irrelevant in absolute terms.

  * Same-variant `AtMost_AtMost` / `AtLeast_AtLeast`: ~unchanged. The
    second mask op composes empty-with-empty or universe-with-universe
    and short-circuits.

  * `Exact_Exact`: ~1.9-2× slower. Both endpoints carry the full mask;
    we genuinely do two mask ops instead of one. Expected.

  * `Exact_AtMost` AND: ~1.4× slower. New result still lands in AtMost
    (lower stays empty after `a & empty`), but the upper-side `a & b`
    is unchanged from before — the +42% is the second op + the empty
    intersection cost.

  * Cases that the old algebra resolved by dropping a bound
    (`Exact_AtLeast` for both, `AtMost_AtLeast` for OR) are 3-4×
    slower in absolute terms. The old code returned without doing a
    mask op; the new code computes both endpoints and preserves both
    bounds. The follow-up optimization here is to add
    `BlockList(empty)`/`AllowList(empty)` fast paths to
    `NullableRowAddrMask::{BitAnd, BitOr}`, which would catch the
    universe/empty operands directly.

Trade-off: a few extra microseconds per binary op on 10M-row masks (no
impact on query latency, which is millisecond-scale and dominated by
I/O) in exchange for richer interval results. The Refined case unlocks
the actual `IS NOT NULL` acceleration we couldn't reach before — a
zone-map IsNotNull search can now produce
`{lower: definitely_non_null_zones, upper: not_all_null_zones}` and
the read planner can both skip all-null zones (the I/O win) and skip
the recheck on the guaranteed-non-null zones.

Verified:

  * lance-select --lib: 97 passed
  * lance-index --lib: 302 passed
  * lance-table --lib: 104 passed
  * lance --lib io::exec::{filtered_read,scalar_index}: 25 passed
  * `cargo clippy --workspace --tests -- -D warnings`: clean
  * `cargo fmt --all --check`: clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@westonpace
Copy link
Copy Markdown
Member Author

@claude review once

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, comment @claude review on this pull request to trigger a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant