Skip to content

chore(tesseract): Collapse logical plan into Query + LogicalPlan root#10921

Draft
waralexrom wants to merge 21 commits into
masterfrom
tesseract-logical-plan-collapse
Draft

chore(tesseract): Collapse logical plan into Query + LogicalPlan root#10921
waralexrom wants to merge 21 commits into
masterfrom
tesseract-logical-plan-collapse

Conversation

@waralexrom
Copy link
Copy Markdown
Member

Summary

Collapse-then-decompose refactor of the Tesseract logical plan: the
former zoo of body types (AggregateMultipliedSubquery, KeysSubQuery,
MeasureSubquery, DimensionSubQuery) merges into a single Query
node with QueryKind role marker and an explicit LogicalPlan root
that owns the flat CTE pool. Multi-stage dimension planning moves to
the leaves and goes through a unified DimensionSubqueryPlanner
dispatcher; the physical builder switches off its context-based
multi-stage-dim lookup in favour of explicit MultiStageDimensionRef
joins.

waralexrom added 21 commits May 19, 2026 21:06
Reshapes the cubesqlplanner logical plan around two ideas:
- collapse specialised body types (KeysSubQuery, MeasureSubquery,
  AggregateMultipliedSubquery, DimensionSubQuery, multi-stage
  Calculation/Aggregation) into a single `Query` over a uniform
  `QuerySource` (`LogicalJoin` | `FullKeyAggregate` | `PreAggregation`);
- decompose Query's role via a `QueryKind` enum
  (`TopLevelOverCtes` | `Stage(StageKind)` | `AggregateMultiplied`
  | `LeafOverJoin` | `InternalFact(FactKind)` | `PreAggregationLeaf`);
- introduce a `LogicalPlan { ctes, root: PlanNode }` root container —
  not part of `PlanNode` — that owns the WITH-clause pool. Every CTE
  body (multi-stage stages, multiplied keys/measures, ex-DSQ bodies)
  surfaces as a `LogicalMultiStageMember` whose `.body` is always a
  `Rc<LogicalPlan>`; tree walkers cross the boundary explicitly.

Migration touchpoints:
- DimensionSubQuery is gone; ex-DSQ dimensions flow as
  `MultiStageDimensionRef` (`OnPrimaryKeys` / `OnOuterDimensions`).
- Per-Query CTE refs live on `Query.multi_stage_dimensions`; CTE
  bodies live on the surrounding `LogicalPlan.ctes`.
- New `PlanProcessor` renders the WITH clause; `QueryProcessor`
  no longer emits CTEs.
- `PreAggregationOptimizer.try_optimize` and `OriginalSqlCollector`
  accept `Rc<LogicalPlan>` and walk it via `collect_cube_names_from_plan`.

Re-enables `test_propagated_with_category_filter` (previously ignored
due to DSQ ungrouped-measure wrapping). All 900 active tests pass.
`LogicalPlan.root` is now `Rc<Query>` (strict) — the previous
`PlanNode` root forced runtime guards every time a caller wanted
`.schema()` / `.modifers()` off the root.

`LogicalMultiStageMember.body` is now a `MultiStageMemberBody` enum:
  - `Plan(Rc<LogicalPlan>)` — Query-rooted bodies (leaf, DSQ, MS
    inode, multiplied keys/measures/agg-multiplied).
  - `TimeSeries(Rc<MultiStageTimeSeries>)` — time-axis scaffold.
  - `RollingWindow(Rc<MultiStageRollingWindow>)` — window CTE.

TimeSeries and RollingWindow no longer need an empty-CTE-pool
wrapper. Pre-agg and `PlanProcessor` dispatch on the enum directly;
`expect_query_root`-style runtime guards disappear.
`QueryPlanner` now exposes two entry points:
  - `plan() -> Rc<LogicalPlan>` — top-level, owns a fresh `CteState`,
    packs the result.
  - `plan_into(cte_state) -> Rc<Query>` — sub-plan that flattens
    every CTE it produces into the caller's pool, returns just the
    root Query.

DSQ and multi-stage leaf bodies switch to `plan_into`. Inner CTEs
(multiplied-measure keys/measure/agg-multiplied bodies for cross-cube
leaves) now live alongside outer CTEs in a single top-level pool —
no more nested `LogicalPlan` inside member bodies.

`MultiStageMemberBody::Plan(Rc<LogicalPlan>)` is gone — bodies are
`Query(Rc<Query>)` / `TimeSeries(...)` / `RollingWindow(...)`.

`PreAggregationOptimizer` becomes a graph-walk: starts at `plan.root`
FK refs, follows them by name through `plan.ctes`, dispatching by
each member body's role (Leaf → simple-match, WholeSubtree →
schema+root-filter match, PassThrough → keep + recurse, NoRewrite →
keep). After the walk, unreachable members (orphans of replaced
bodies, e.g. keys/measure CTEs of a replaced agg-multiplied) are
pruned from the result. This makes flattening safe — pre-agg no
longer double-counts inner CTEs as separate matches.
CTE naming is now uniformly `cte_N` via `CteState.next_cte_name()`.
Previously DSQ bodies used `"{cube}_{dim}_dimension_subquery"` and
multiplied KS/MS bodies used `"{cube}_keys_subquery_{seq}"` etc.; the
semantic names were neither read by consumers nor deduped reliably,
and the cross-planner `QueryTools.next_cte_seq_id` counter is gone.

DSQ body now projects its dimension as a synthetic `MeasureSymbol`
built off the dim's own `compiled_path` — the new helper
`MeasureSymbol::new_synthetic_from_dimension(dim)` returns a
Calculated measure whose `full_name`/`alias` match the dim's. Errors
if the dim has no `sql` (e.g. Geo) or its type doesn't map to a
`CalculatedMeasureType`.

That kills the `MemberExpression`-with-`expr:`-prefix round-trip the
DSQ planner used to build, which in turn collapses
`MultiStageDimensionRef.exposed` into `body_column`: now the body's
projection alias and the outer-scope full_name are the same symbol,
so substitution needs only one ref.

Dedup of identical CTE bodies (by `kind` + `eq_as_state`) deferred —
see phase-B backlog.
Cleaner field name now that the body's projection symbol and the
outer-scope reference are the same `MemberSymbol` by construction —
no need to distinguish "the column the body projects" from "the
member the outer scope reads".

Also drops the stale FIXME and commented-out DSQ-planning block in
`aggregate_subquery_plan` — DSQ for outer dims is correctly handled
inside `keys_query` (via `collect_sub_query_dimensions_from_symbols`
over the outer dim set); the outer Query{AggregateMultiplied} itself
never needs DSQ refs of its own.
Replaces the manual `for pk in dimensions { if !already_in { push }}`
with `outer_dims.chain(pk_dims).unique_by(|d| d.full_name())`.
Same semantics, fewer moving parts.
`Query.filter` and `Query.modifers` are `#[builder(default)]` now —
when a caller doesn't set them, the builder constructs an empty
`LogicalFilter` / `LogicalQueryModifiers` instead of failing.

The MeasureSubquery (`InternalFact(Measures)`) and the outer
`AggregateMultiplied` bodies were always passing
`Rc::new(LogicalFilter::default())` + `Rc::new(LogicalQueryModifiers::default())`
through explicitly — matching the old `MeasureSubquery` struct, which
had no such fields. Same for the time-series-get-range, Stage and
DimensionCalc bodies (modifers are set, filter is default), and for
the pre-agg-rewrite `schema_and_filter` shape (filter is set, modifers
is default).

Drop those redundant setters at the call sites; builder semantics
stay identical.
The classic-compiler idiom `measures: { foo: { type: 'number', sql:
'count(*)' } }` previously fell through to `MeasureKind::Calculated`
in the tesseract planner, so under a multiplied join the planner
spun up a full aggregate-multiplied pipeline (keys/measure CTEs,
fk_aggregate). The measure CTE ended up projecting `pk` next to
`count(*)` without a GROUP BY, which Postgres rejects with
`column "..." must appear in the GROUP BY clause`.

Mirror the classic-compiler hack: detect the `count(*)` literal and
promote the measure to `MeasureKind::Count` with the cube's
primary-key sqls behind it. From that point on the regular Count
path applies — additive in multiplied joins, rendered as
`COUNT(DISTINCT pk)` when needed.

- New `SqlCall::is_count_star()` — true only for an empty-dependency,
  empty-filter-param/group, empty-security-context `SqlCall` whose
  template is the literal `count(*)` (case-insensitive, trim-safe).
- `MeasureKind::from_type_str` promotes a `number` + `count(*)` shape
  to `Count(Auto(pk_sqls))`.
- `MeasureSymbolFactory.build` now collects `pk_sqls` when the SQL is
  absent OR is `count(*)`, so the promoted Count has the keys it
  needs for `COUNT(DISTINCT pk)`.

Repro test: `tests::integration::chained_subquery::
test_or_and_filter_with_cross_cube_sub_query_dim` — mirrors the JS
sql-generation-logic.test.ts "where filter with operators OR & AND"
test on a visitors → visitor_checkins → cards chain with a
sub_query dim and a count(*) measure.
…er_references

The OnPrimaryKeys DSQ join was built as
  `cte_ref.pk_alias = Expr::new_member(pk_dim)`.
RHS, a `MemberExpression`, renders via `render_references`. By the
time the join condition was rendered, the outer SELECT had already
resolved the same PK dim through `ReferencesBuilder` and — since
`SingleSource::Cube` returns no column ref — picked it up from the
joined CTE's `TableReference` schema, registering
`pk_dim → cte.pk_alias` in `render_references`. The ON then
collapsed to `cte.pk_alias = cte.pk_alias`, a self-equal that
turned the LEFT JOIN into a CTE × cube cartesian and broke any
subsequent join condition referencing the DSQ dim.

Build each side per-source instead — same pattern the
full-key-aggregate strategies use:
- CTE side: `Expr::Reference(cte_alias, cte_schema.resolve_member_alias(dim))`.
- Cube side: `Expr::Reference(cube_alias, dim.name())` — direct
  column reference, bypassing `render_references` entirely.

Cube alias is threaded through `add_multi_stage_dimension_pk_join`
from `LogicalJoinProcessor`, where it's already known via
`default_alias_with_prefix`.

Repro test: `tests::integration::subquery_in_join::
test_sub_query_dim_in_join_condition` — mirrors the JS
sub-query-dimensions.test.ts "inserted at the right place of a
join" test with B → C join filtered by a sub_query dim from A.
When a measure is queried through a view, the view's `join_path:` is
baked into the underlying member's path during construction (e.g. a
view that exposes `child_val_avg` via `join_path:
many_to_one_root.many_to_one_child` produces an underlying measure
symbol with `path=[root, child]`, even though `is_view=false`). The
view-wrapper sits one level above and carries `is_view=true`;
`resolve_reference_chain` drops to the underlying, picking up its
inflated path on the way.

`collect_join_hints_for_measures` then pushes
`Vector([root, child])`, and `aggregate_subquery_measure` built the
MeasureSubquery's source `FROM root LEFT JOIN child` — multiplying
child rows back through root and breaking the very dedup the
subquery exists to do (e.g. `child.val_avg` over (foo, foo) came
back as 166.66 instead of 200).

Strip the join-chain prefix from each measure before collecting its
hints. Same helper as in `assert_measures_not_multiplied`. After
that the MeasureSubquery source is `FROM child` alone, matching
what the classic compiler emits for the same shape.

Repro test: `test_many_to_one_view_one_sum_multiplied` in
`member_expressions_on_views.rs` — mirrors the JS
member-expressions-on-views.test.ts "many_to_one_view > one_sum"
case with a multiplied seed.
When a multiplied measure list contains a member-expression with an
inner aggregate (e.g. `COUNT(DISTINCT {view.child_test_dim})`), the
MeasureSubquery CTE used to emit the expression verbatim, producing
invalid SQL (aggregate without GROUP BY) and breaking outer
re-aggregation by row-multiplying through the join chain.

The MeasureSubquery now splits member-expressions into their
dim/measure deps via `get_dependencies` + `resolve_reference_chain`,
projecting dim-deps as plain dimension columns and measure-deps
alongside the native measures. The member-expression itself stays on
the outer `Query{AggregateMultiplied}` schema; `QueryProcessor`
resolves render-references for it explicitly so its inner dim refs
bind to the CTE columns (`q_N.cube__col`) instead of the raw source
table.
Replaces the flat `Vec<LogicalMultiStageMember>` in `CteState` with
typed `CteEntry { role, members, state, body }` records. Each CTE
now carries:
- `CteRole` (Keys / MultiStageDimension / MultiStageMeasure /
  FactMeasure / MultipliedMeasureSubquery) — DSQ folds into
  MultiStageDimension as a degenerate one-level case;
- the members it projects (Vec, since fact/multiplied CTEs cover
  several measures at once);
- the `Rc<QueryProperties>` state it was planned under.

`next_cte_name(role)` takes a role and pulls from a per-kind counter,
so generated aliases are `keys_0`, `fact_measure_0`,
`multi_stage_measure_0`, etc. `add_member(...)` returns the final
CTE name — `body.name` for fresh entries, the existing entry's name
on same-body-name dedup — so callers can't pin a name that won't
appear in the WITH clause. `find_matching(role, members, state)`
exposes a `(role, members, state)` lookup over
`QueryProperties::eq_as_state` + member-set comparison; no callers
yet, wired up for future planner-level dedup.

All ten call-sites (multiplied x4, dim_subq, multi_stage tree x5) pass
role + members + state and use the returned name for downstream
refs.
Stores the consumer-facing `Rc<MultiStageSubqueryRef>` alongside each
CTE in `CteState` so future dedup hits (via `find_matching`) can hand
back the same ref a fresh builder would have produced. Required for
the upcoming bottom-up rewrite of the multi-stage planner where the
recursion returns refs directly instead of going through a
description tree + second-pass render.

All six `add_member` call-sites (multiplied x4, dim_subq, multi_stage
descr loop) build the ref up-front and pass it in. Existing behaviour
is unchanged — `cte_ref` is not consumed by anything yet.
Replaces the two-pass multi-stage planner (build a tree of
`MultiStageQueryDescription`s, then render each into a
`LogicalMultiStageMember` in a second loop) with a single bottom-up
recursion in `MultiStageQueryPlanner::build_multi_stage_cte` that
returns `Rc<MultiStageSubqueryRef>` directly. The body is rendered
and registered in `cte_state` on the way back up each stack frame;
dedup is now `cte_state.find_matching(role, [member], state)`.

`MultiStageMemberQueryPlanner` is gone — its render methods are now
free `pub(crate) fn build_*` in `member_query_planner.rs`, taking
explicit parameters and `&[Rc<MultiStageSubqueryRef>]` for children.
`MultiStageQueryDescription` (and `query_description.rs`) is deleted
along with the description tree; the time-series / time-series-range
literal-alias cache (`descriptions.iter().find(d.alias() == ...)`)
migrates to a local `TimeSeriesCache` on the planner.

Three bucketing tests with chained / multi-dim multi-stage dimension
setups are `#[ignore]`'d: the bottom-up
`child.schema().dimensions` union in `build_for_cte_dimension_query`
doesn't yet match the old `collect_all_non_multi_stage_dimension`
walk. Tracked in FIXME comments at the test sites; everything else
(917 tests) stays green.
…nner dispatcher

Collector now surfaces both `sub_query: true` and `multi_stage: true`
dimensions; DimensionSubqueryPlanner dispatches on the flavour and
currently only builds DSQ — multi-stage path will follow.

No behavior change: top-level MultiStageQueryPlanner still owns the
multi-stage dim build, and the dispatcher silently skips them.
DimensionSubqueryPlanner now owns the multi-stage dim build path via
the dispatcher introduced in the previous commit: when a dim is
flagged multi_stage, it delegates to
MultiStageQueryPlanner::build_multi_stage_dim_ref and returns a
MultiStageDimensionRef with the OnOuterDimensions join placeholder.

Top-level MultiStageQueryPlanner.plan_queries now skips dimension-
flavoured multi-stage members; only measures are walked there.
Physical builder is unchanged — multi-stage dim joins still go
through the schema().multi_stage_dimensions() walk.
…ns JOIN

Multi-stage dim CTEs now LEFT-JOIN onto the cube-join chain through
the MultiStageDimensionRef::OnOuterDimensions descriptor — populated
in build_multi_stage_dim_ref via the existing
LogicalSchema::multi_stage_join_dimensions. LogicalJoinProcessor
partitions ms-dim refs by join shape and applies pk vs outer accordingly.

Removes the legacy combined-key context lookup:
- MultiStageDimensionContext + the multi_stage_dimension_schemas /
  multi_stage_dimensions slots on PushDownBuilderContext.
- remove_multi_stage_dimensions / add_multi_stage_dimension /
  get_multi_stage_dimensions / add_multi_stage_dimension_schema.
- The schema-walk hack in QueryProcessor + DimensionCalc registration
  in PlanProcessor.

QueryProcessor still drops ms-dim refs already exposed by the first
FullKeyAggregate data input to avoid double-joining. Three bucketing
tests with chained multi-stage dims remain ignored — separate
recursive-walk regression in build_for_cte_dimension_query.
…tate

ref_for_member rebuilt the LogicalSchema from the upper state, so
extension dims added by deeper `add_group_by` directives never reached
outer consumers — chained multi-stage dim CTEs lost their join keys.
Replace with ref_for_body, which takes the just-rendered
LogicalMultiStageMember and copies its body's own output schema (via
new MultiStageMemberBody::schema accessor).

Knocks out two of the three bucketing regressions
(with_dimension_over_complex_dimension, with_two_dimensions).
with_two_dims_concated remains ignored — separate
pk_aggregate_keys_source UNION ALL column-set mismatch when chained
ms-dim CTEs project disjoint dim sets.
…hape

When a multi-stage dimension's only multi-stage dependencies are other
multi-stage *dimensions* (no measure children), build its body as a
leaf-style query over the cube join with ms-dim CTEs attached as
OnOuterDimensions LEFT JOINs, rather than a FullKeyAggregate UNION over
heterogeneous dim-child schemas (which can't merge disjoint dim sets
across cube boundaries).

`build_for_cte_dim_only_query` in member_query_planner; routing in
build_multi_stage_cte via `dim_only_deps`/`build_dim_only_child_refs`.

Third bucketing test (`with_two_dims_concated`) advances past the UNION
mismatch but trips on a render-references issue (`customer_id` resolved
from LEFT-joined ms-dim CTE pollutes cube JOIN ON). Remains ignored
with FIXME describing the new failure.
…-only DimensionCalc

For StageKind::DimensionCalc with a cube-join source (the new dim-only
shape), only resolve render-references for the stage's own
multi_stage_dimension. Extension/payload dims (e.g. `orders.customer_id`)
projected for consumer JOIN keys are projected without resolving:
otherwise `references_builder` finds them in the LEFT-joined ms-dim CTE
first (cube sources return None), sets a render_reference, and the
inner cube JOIN ON ends up rendered as
`ms_dim_X.col = ms_dim_Y.col` instead of `cube.col = cube.col`.

Unignores `test_bucketing_with_two_dims_concated`.
…into FullKeyAggregate path

Mirrors the JS `Calc-Groups > source product_category cross join` case
as a Rust integration test (currently ignored — see FIXME).

Plumbing to advance the case beyond "no FROM clause at all":

- FullKeyAggregateQueryPlanner now runs DimensionSubqueryPlanner over
  the outer query's dimensions and populates Query.multi_stage_dimensions.
  Without this, queries that have only multi-stage dimensions and no
  measures (top-level MultiStageQueryPlanner filters dims out) never
  triggered DSQ planner anywhere.
- YAML test-fixture view parser now supports inline `dimensions:` and
  `measures:` blocks, matching the JS view shape (cubes: becomes
  optional). Switch dims / multi-stage case-switch dims can now live
  directly on a view without an artificial host cube.

Test advances from "column 'source' does not exist" (no SQL) to
"column 'source_b__product_category' does not exist" (disjoint column
sets in case-switch branch CTEs hitting the FullKeyAggregate UNION).
FIXME describes the remaining gap.
@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 20, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

Claude finished @waralexrom's task in 3m 13s —— View job


Review: Collapse logical plan into Query + LogicalPlan root

  • Gather context and understand the refactoring scope
  • Review new core types (Query, QueryKind, LogicalPlan)
  • Review logical plan changes and deleted types
  • Review physical plan builder changes
  • Review planner changes (dimension subquery, multi-stage, multiplied measures)
  • Review test coverage and new integration tests
  • Post comprehensive review

This is a well-structured collapse-then-decompose refactoring that consolidates the former zoo of logical plan body types (AggregateMultipliedSubquery, KeysSubQuery, MeasureSubquery, DimensionSubQuery) into a single Query node with an explicit QueryKind role marker and a flat LogicalPlan root owning the CTE pool. Overall, the design is sound and the code quality is high. Below are specific observations and suggestions.


Architecture & Design — Positive

  1. Explicit role marking via QueryKind is a major clarity win. The previous approach inferred roles from structural predicates (source / multistage_members shape); the new enum variants (TopLevelOverCtes, Stage(StageKind), AggregateMultiplied, LeafOverJoin, InternalFact(FactKind), PreAggregationLeaf) make every Query's pipeline position self-documenting.

  2. Flat CTE pool (LogicalPlan.ctes + CteState accumulator) eliminates nested pool bookkeeping. The drain_subquery_refs_from(baseline) pattern for scoped sub-planner calls (query_planner.rs:56-68) is clean and prevents inner refs from leaking.

  3. Unified MultiStageDimensionRef merging DSQ and multi-stage-dim patterns under one descriptor (OnPrimaryKeys / OnOuterDimensions) removes a whole category of join-wiring duplication in the physical builder.

  4. LogicalNode trait + PlanNode enum + match_plan_node! macro gives a solid foundation for generic tree traversal. The multi_stage_kind() method on PlanNode is a nice semantic-classification hook for passes that need leaf/stage distinction.


Issues & Suggestions

1. Typo: modifersmodifiers (persistent across codebase)

The field name modifers (missing an i) is used consistently throughout the PR:

  • query.rs:24modifers: Rc<LogicalQueryModifiers>
  • query.rs:46-47 — accessor fn modifers()
  • query.rs:58with_modifers
  • All consumer sites (query.rs, multiplied_measures_query_planner.rs:439,499, etc.)

This appears to be a pre-existing typo carried forward. Since this is a large refactor that introduces these fields freshly, now would be the ideal time to fix it before it spreads further. A codebase-wide rename of modifersmodifiers would prevent confusion.

Fix this →

2. CteRole::MultipliedMeasureSubquery conflation risk (cte_state.rs:21-23)

The doc comment on CteRole::MultipliedMeasureSubquery acknowledges that both the inner MeasureSubquery CTE and the outer Query{AggregateMultiplied} CTE share the same role, relying on member-set divergence to prevent dedup conflation:

/// `MultipliedMeasureSubquery` — MS-CTE (raw projection of dedup-by-pk
/// measures) and the outer `Query{AggregateMultiplied}` CTE that
/// stitches it to KS-CTE. They share the role today; the outer's
/// member-set differs from the inner's so dedup wouldn't conflate
/// them in practice.

This is fragile — if a future change causes the member sets to align, the dedup will silently produce incorrect plans. Consider splitting into two distinct roles (MultipliedMeasureInner / MultipliedMeasureOuter or similar) to make the invariant structural rather than relying on a coincidence.

3. find_matching() has no callers (cte_state.rs:152)

The doc comment says "Returned for future dedup wiring — no callers yet." However, find_matching is called from multi_stage_query_planner.rs:214:

if let Some(existing) = cte_state.find_matching(role, &[member.clone()], &state) {
    return Ok(existing.cte_ref.clone());
}

The doc comment is stale — it should be updated to reflect the actual usage. (Cosmetic, low priority.)

4. FullKeyAggregate::with_inputs reconstructs via builder but ignores semantic identity

In full_key_aggregate.rs:108-119, with_inputs checks for exactly 0 inputs then reconstructs itself through the builder. This is correct for tree-rewrite passes, but the reconstruction clones every field. Since FullKeyAggregate is a leaf in the LogicalNode tree (no child PlanNodes), the with_inputs(vec![]) call effectively returns a clone. Consider just returning self directly (via Ok(self)) when inputs are empty — avoids the allocation.

5. Missing Debug derive on several new types

MultiStageDimensionRef has #[derive(Debug)] but LogicalPlan, Query, QueryKind, FullKeyAggregate, and MultiStageSubqueryRef don't. While PrettyPrint is the primary debugging mechanism, Debug is useful for unwrap() / expect() error messages and assert_eq! diagnostics in tests. Consider adding Debug derives to the core types.

6. let _ = me; dead binding in split_member_expression_deps (multiplied_measures_query_planner.rs:322)

let _ = me;

This line has no effect — me was already moved into the if let binding and the for dep loop consumed it. This appears to be a leftover from development. It can be removed.

Fix this →

7. Pre-aggregation optimizer: NotMatched on any single CTE rolls back all usages

In optimizer.rs:226-229, if any CTE body fails to match a pre-aggregation, the optimizer truncates all usages and returns None for the entire plan:

CteRewriteOutcome::NotMatched => {
    self.usages.truncate(saved_usages_len);
    self.usage_counter = saved_counter;
    return Ok(None);
}

This all-or-nothing semantics is intentional (you can't partially rewrite a CTE graph), but it's worth documenting explicitly in the method doc that partial pre-agg matches are rejected — this isn't obvious from the function signature alone.

8. PhysycalNode typo in trait

The associated type is spelled PhysycalNode (missing i) in LogicalNodeProcessor:

type PhysycalNode = Rc<Select>;

This is pre-existing, but since the PR touches these lines, it could be corrected to PhysicalNode. (Cosmetic.)


Performance

The refactoring doesn't introduce new algorithmic bottlenecks. Key observations:

  • members_equivalent in cte_state.rs:165-179 sorts two Vec<String> on every find_matching call. For the typical multi-stage tree (< 20 CTEs), this is fine. If CTE counts grow significantly, consider normalizing member names at registration time.

  • Rc-heavy cloning is prevalent but matches the existing codebase idiom and avoids the complexity of lifetime-threaded references in a tree-rewriting context. No regression here.


Security

No security concerns. This is internal query-planning logic with no user-facing input handling beyond what the existing framework already validates. SQL generation goes through the existing template/parameterization pipeline.


Testing

Good coverage for a refactoring PR:

  • 3 new integration test files (chained_subquery.rs, subquery_in_join.rs, calc_groups.rs) covering DSQ in complex filters, DSQ in join conditions, and calc-group cross-joins.
  • Extended combinations.rs with test_aggregating_cross_cube_measure_with_fanout_dim (correctly #[ignore]d with a detailed FIXME).
  • New many_to_one_view_one_sum_multiplied snapshot test for member-expression multiplied-measure dedup.
  • SQL seed files + YAML schemas are well-structured for reproducibility.
  • Snapshot tests (insta) validate output stability.

Observation: The test_source_product_category_cross_join test in calc_groups.rs is #[ignore]d with a FIXME about disjoint column sets in FullKeyAggregate UNION ALL. This is a known limitation — consider tracking it as a follow-up issue if not already done.


Summary

This is a high-quality structural refactoring that significantly improves the readability and maintainability of the Tesseract logical plan. The QueryKind enum, flat CTE pool, and unified dimension-ref model are well-designed. The main actionable items are:

Priority Item
Low Fix modifersmodifiers typo (widespread)
Low Split CteRole::MultipliedMeasureSubquery into two roles
Trivial Remove dead let _ = me; binding
Trivial Update stale find_matching doc comment
Trivial Fix PhysycalNodePhysicalNode typo

None of these are blocking. The PR is well-structured and ready for merge from a code quality perspective.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.53%. Comparing base (3477776) to head (e551a35).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10921      +/-   ##
==========================================
+ Coverage   78.87%   83.53%   +4.65%     
==========================================
  Files         470      254     -216     
  Lines       92483    75869   -16614     
  Branches     3449        0    -3449     
==========================================
- Hits        72948    63374    -9574     
+ Misses      19031    12495    -6536     
+ Partials      504        0     -504     
Flag Coverage Δ
cube-backend ?
cubesql 83.53% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant