You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make Vortex statistics pluggable by modeling pruning stats as aggregate-function state exposed through physical bound expressions. The concrete success case is demonstrating a Bloom-filter zone-map stat for UTF-8 equality pruning added through plugins: a custom aggregate function, scalar function, and rewrite rule, without changing built-in pruning logic.
Direction
Stats are aggregate-function partials/results when the aggregate has pruning semantics. Not every aggregate is a useful pruning stat; the rewrite path should depend only on aggregates that can prove bounds.
Keep expressions physical for this epic. Falsification turns concrete predicates into normal Vortex expressions over physical scalar functions and stat(expr, AggregateFnRef). We are not adding a logical expression layer yet.
Use stat(expr, AggregateFnRef) as the bound-expression primitive. It returns the stat value for the current stats scope, or null when unavailable. Falsification produces expressions containing stat(...); simplification/execution decides whether anything is proven.
Aggregate functions advertise whether a stored aggregate can satisfy a requested aggregate through AggregateFnRef::can_satisfy(...). Exact descriptor matches are preferred; compatible approximate aggregates, such as bounded max satisfying max, may be used when the stored aggregate is a sound bound.
Zone maps store aggregate-function descriptors, using Display for AggregateFnRef, and use those descriptors as stats-table column names. At read time, the zone map lowers bound expressions by matching available descriptors against requested aggregates.
Expression expansion is acceptable for now. Rewrites may produce multiple physical proof expressions, and each zone map can lower unavailable aggregates to null.
All new stats-facing APIs should live under vortex-array/src/stats/. Scalar function implementations may live with scalar functions, but should be re-exported through vortex_array::stats.
Goal
Make Vortex statistics pluggable by modeling pruning stats as aggregate-function state exposed through physical bound expressions. The concrete success case is demonstrating a Bloom-filter zone-map stat for UTF-8 equality pruning added through plugins: a custom aggregate function, scalar function, and rewrite rule, without changing built-in pruning logic.
Direction
Stats are aggregate-function partials/results when the aggregate has pruning semantics. Not every aggregate is a useful pruning stat; the rewrite path should depend only on aggregates that can prove bounds.
Keep expressions physical for this epic. Falsification turns concrete predicates into normal Vortex expressions over physical scalar functions and
stat(expr, AggregateFnRef). We are not adding a logical expression layer yet.Use
stat(expr, AggregateFnRef)as the bound-expression primitive. It returns the stat value for the current stats scope, or null when unavailable. Falsification produces expressions containingstat(...); simplification/execution decides whether anything is proven.Aggregate functions advertise whether a stored aggregate can satisfy a requested aggregate through
AggregateFnRef::can_satisfy(...). Exact descriptor matches are preferred; compatible approximate aggregates, such as bounded max satisfying max, may be used when the stored aggregate is a sound bound.Zone maps store aggregate-function descriptors, using
DisplayforAggregateFnRef, and use those descriptors as stats-table column names. At read time, the zone map lowers bound expressions by matching available descriptors against requested aggregates.Expression expansion is acceptable for now. Rewrites may produce multiple physical proof expressions, and each zone map can lower unavailable aggregates to null.
All new stats-facing APIs should live under
vortex-array/src/stats/. Scalar function implementations may live with scalar functions, but should be re-exported throughvortex_array::stats.Phase 1: Bound Expressions and Pruning Aggregates
StatFn/stat(expr, AggregateFnRef)undervortex-array/src/stats/. Add stats rewrite session API #7930 Thread scope dtype through stats rewrites #8024vortex_array::stats.StatFn::new_expr(...)directly in tests and rewrites.Min,Max,AllNull,AllNonNull,AllNan,AllNonNan, and bounded min/max variants for binary/string stats.NullCountas a legacy bridge for existing stats, not as a pruning proof aggregate.AggregateFnRef::can_satisfy(...)returns whether a stored aggregate is exact, approximate, or unusable for the request.Statslots. Centralize aggregate stat bridge #7931 Add pruning aggregate functions #8025MinandMaxmap separately toStat::MinandStat::Max.StatFnread existing legacy stats through the aggregate-to-Statmapping. Centralize aggregate stat bridge #7931 Add pruning aggregate functions #8025StatFnworks for flat arrays and chunked arrays. Remove chunked special case from stat execution #7928StatFn::new_expr(...)reading legacy pruning stats. Add NullCount aggregate function #7933 Add pruning aggregate functions #8025Phase 2: Rewrite Registry
vortex-array/src/stats/session.rsfor stats session state. Add stats rewrite session API #7930 Thread scope dtype through stats rewrites #8024vortex-array/src/stats/rewrite.rsfor rewrite traits and helpers. Add stats rewrite session API #7930VortexSession. Add stats rewrite session API #7930OR.Phase 3: Built-In Rewrite Rules
StatFn::new_expr(...). Add built-in stats rewrite rules #7935stat_falsificationimplementations.StatsCatalog.and/orrewrites. Add built-in stats rewrite rules #7935between. Add built-in stats rewrite rules #7935is_nullandis_not_null. Add built-in stats rewrite rules #7935list_contains. Add built-in stats rewrite rules #7935LIKE pruning is tracked separately in #8026.
Phase 4: Zoned Layout Migration
ZoneMapLayoutto lowerStatFnbound expressions against existing zone stats. Teach zoned pruning to lower StatFn #7937StatsCatalogzoned pruning path onceStatFnlowering covers existing behavior. Teach zoned pruning to lower StatFn #7937Phase 5: Aggregate-Function Zoned Stats
WARNING: this is the phase that changes the ZonedLayout serialized form
AggregateFnRef, notStatenum values.DisplayforAggregateFnRefas the descriptor string.Statonly as a compatibility bridge for existing array stats and legacy zoned metadata.zone_lenfollowed by a legacyStatbitset.zone_lenandpresent_aggregates: repeated string.Statbitsets into built-in aggregate descriptor strings.stat(expr, aggregate_fn)at read time by matching aggregate-function descriptors in the zone stats table. Use aggregate descriptors for zoned stats #7938AggregateFnRef::can_satisfy(...)when a stored aggregate is a sound exact or approximate substitute for the requested aggregate.Statenum values. Use aggregate descriptors for zoned stats #7938Phase 6: Plugin Bloom Proof
bloom_might_contain(filter, value)scalar function.Phase 7: Satisfaction Follow-Up
OR.Phase 8: Cleanup
StatsCatalogpruning path.Status
In progress.