You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
leftJoin onto a subquery that itself contains a join returns 0 / undefined join key at initial materialization when the subquery source is indexed and its inner join has an unmatched row (incomplete #501) #1590
I've validated the bug against the latest version of DB packages (@tanstack/db@0.6.8, currently latest)
Describe the bug
A live query of the shape
from(parent).leftJoin({ sub: <subquery that itself contains an inner join> }, …)
returns an empty initial snapshot for the subquery — every parent row joins against undefined — when both of these hold:
the subquery's driving (left) collection is indexed on its join key (BTreeIndex, or equivalently autoIndex: 'eager'), and
the subquery's inner join has at least one unmatched right-side row (a right-side row that no left row joins to).
Drop the index, or remove the unmatched row, and the result is correct. The subquery in isolation is always correct (its own toArray returns the right rows) — only the outer leftJoin's view of the subquery's initial snapshot is wrong.
This is the same machinery as #494 ("Subquery doesn't work in leftJoin clause with auto indexing"), which was fixed by #501. #501's fix and tests cover a subquery that is a single filtered collection on the right of a leftJoin (from(c).where(…)). The bug remains for a subquery that itself contains a join, in the specific case where that inner join has an unmatched right-side row. The #494 symptom triad still holds for this case: leftJoin fails, innerJoin works, autoIndex: 'off' (no index) works.
Reproduction
localOnly collections + explicit BTreeIndex, all data present before the query is built. Pasteable as-is (uses only public @tanstack/db API); adjust imports for the in-repo suite. The two BUG tests currently fail; the three CONTROL tests pass.
import{describe,expect,it}from'vitest';import{BTreeIndex,coalesce,createCollection,createLiveQueryCollection,eq,localOnlyCollectionOptions,sum,typeCollection}from'@tanstack/db';// Reproduces against @tanstack/db@0.6.8.//// Shape: from(parent).leftJoin({ sub: <a subquery that itself contains a join> }).// When the subquery's driving collection carries a BTreeIndex on its join key AND// the subquery's inner join has at least one UNMATCHED right-side row, the outer// leftJoin sees an EMPTY initial snapshot of the subquery — every parent row joins// against `undefined`. Drop the index OR remove the unmatched row and it is correct.typeParent={id: string;name: string};typeChild={id: string;parent_id: string};typeGrandchild={id: string;parent_id: string;value: number};// Flush the IVM pipeline. The result is stable after the first tick (not a timing artifact).constsettle=async()=>{for(leti=0;i<5;i++)awaitnewPromise((r)=>setTimeout(r,0));};functionmk<Textends{id: string}>(prefix: string,indexParentId: boolean){constc=createCollection(localOnlyCollectionOptions<T,string>({id: `${prefix}-${crypto.randomUUID()}`,getKey: (i)=>i.id}));if(indexParentId){(casCollection<{id: string;parent_id: string},string>).createIndex((r)=>r.parent_id,{indexType: BTreeIndex});}returnc;}/** Outer leftJoin(parent, <subquery containing an inner join>). `grouped` toggles the aggregated shape. */functionbuildWrapped(parents: Collection<Parent,string>,children: Collection<Child,string>,grandchildren: Collection<Grandchild,string>,grouped: boolean){returncreateLiveQueryCollection({id: `wrapped-${crypto.randomUUID()}`,startSync: true,query: (q)=>{constbase=q.from({gc: grandchildren}).innerJoin({child: children},({ gc, child })=>eq(gc.parent_id,child.id));constsub=grouped
? base.groupBy(({ child })=>child.parent_id).select(({ gc, child })=>({parentId: child.parent_id,total: sum(gc.value)}))
: base.select(({ gc, child })=>({parentId: child.parent_id,total: gc.value}));returnq.from({parent: parents}).leftJoin({ sub },({ parent, sub })=>eq(sub.parentId,parent.id)).select(({ parent, sub })=>({id: parent.id,total: coalesce(sub?.total,0)}));}});}consttotals=(j: {toArray: {id: string;total: number}[]})=>Object.fromEntries(j.toArray.map((r)=>[r.id,r.total]));// Minimal data: 1 parent, 2 children (c1 has a grandchild, c2 is unmatched), 1 grandchild.constPARENTS: Parent[]=[{id: 'p1',name: 'One'}];constCHILDREN: Child[]=[{id: 'c1',parent_id: 'p1'},{id: 'c2',parent_id: 'p1'}// unmatched — never referenced by any grandchild];constGRANDCHILDREN: Grandchild[]=[{id: 'g1',parent_id: 'c1',value: 0.5}];functionsetup(indexGrandchildren: boolean,children=CHILDREN,grandchildren=GRANDCHILDREN){constparents=mk<Parent>('parents',false);constkids=mk<Child>('children',false);constgrandkids=mk<Grandchild>('grandchildren',indexGrandchildren);parents.insert(PARENTS);kids.insert(children);grandkids.insert(grandchildren);return{ parents, kids, grandkids };}describe('leftJoin onto an indexed subquery-with-join drops its initial snapshot',()=>{it('CONTROL — unindexed, unmatched row present → correct',async()=>{const{ parents, kids, grandkids }=setup(false);constj=buildWrapped(parents,kids,grandkids,false);awaitsettle();expect(totals(j)).toEqual({p1: 0.5});});it('CONTROL — indexed, every row matched (no unmatched row) → correct',async()=>{const{ parents, kids, grandkids }=setup(true,[{id: 'c1',parent_id: 'p1'},{id: 'c2',parent_id: 'p1'}],[{id: 'g1',parent_id: 'c1',value: 0.5},{id: 'g2',parent_id: 'c2',value: 0.3}]);constj=buildWrapped(parents,kids,grandkids,true);awaitsettle();expect(totals(j)).toEqual({p1: 0.8});});it('BUG — indexed + unmatched row → returns 0 (expected 0.5)',async()=>{const{ parents, kids, grandkids }=setup(true);constj=buildWrapped(parents,kids,grandkids,false);awaitsettle();// Currently FAILS: totals(j) === { p1: 0 }. The outer leftJoin output row is keyed// "[p1,undefined]" — it never sees the subquery's (correct) p1 row.expect(totals(j)).toEqual({p1: 0.5});});it('BUG — aggregated shape (groupBy + sum), indexed + unmatched row → returns 0',async()=>{const{ parents, kids, grandkids }=setup(true);constj=buildWrapped(parents,kids,grandkids,true);awaitsettle();expect(totals(j)).toEqual({p1: 0.5});// currently FAILS with { p1: 0 }});it('the subquery in isolation is correct even when indexed (only the outer join is wrong)',async()=>{const{ kids, grandkids }=setup(true);constsub=createLiveQueryCollection({id: `sub-${crypto.randomUUID()}`,startSync: true,query: (q)=>q.from({gc: grandkids}).innerJoin({child: kids},({ gc, child })=>eq(gc.parent_id,child.id)).select(({ gc, child })=>({parentId: child.parent_id,total: gc.value}))});awaitsettle();expect(sub.toArray.map((r)=>r.total)).toEqual([0.5]);// passes});});
Expected behavior
Either of these would resolve it:
Indexed and unindexed must return identical results. The BUG tests should produce { p1: 0.5 }, matching the unindexed control. (An index is a performance optimization; it must not change query results.)
If there is a legitimate constraint that prevents a subquery-with-a-join on the right of a leftJoin from being lazily loaded, throw/warn rather than silently returning an empty subquery snapshot (right now it fails silently — coalesce(undefined, 0) masks it as a plausible 0).
Actual behavior
Indexed + unmatched row:{ p1: 0 } (both the plain and the groupBy+sum shape).
Unindexed (same data):{ p1: 0.5 }.
Indexed, every row matched:{ p1: 0.8 } (correct).
Causal trace (observed, not inferred)
Logging the wrapped join's output row vs. the subquery's own toArray, with the minimal data + only the driving collection indexed:
So the subquery materializes the correct p1 row, but the outer leftJoin's output is keyed [p1, undefined] — it joined p1 against undefined, i.e. it never received the subquery's initial snapshot. With no index the same row is keyed [p1, p1] and is correct.
Minimization (each retained element is load-bearing — verified by removal)
Index: only an index on the subquery's driving (left) collection's join key is required. With that single index the bug fires; removing it makes it correct. Indexes on the parent or on the inner-join's right collection are irrelevant.
Unmatched inner-join row (c2, a child with no grandchild) is required. With every child matched, the result is correct regardless of the index. The unmatched row need not even belong to the same parent — it triggers the bug pointing at any/ghost parent.
groupBy / sum are NOT required — a plain (non-aggregated) subquery drops identically. The aggregated shape is included only because it is the real-world shape that surfaced this.
Progressive sync: nested toArray subqueries skip the fast-path snapshot #1533 ("Progressive sync: nested toArray subqueries skip the fast-path snapshot") — same theme (a subquery's initial snapshot not reaching the parent), but distinct: that is progressive-mode + toArray includes and the data is correct, just slow; here the data is wrong (silently 0) in eager/localOnly mode and is index-triggered.
Pre-aggregate the subquery into a dedicated one-row-per-key collection (maintained via subscribeChanges on the source collections) and do a plain keyed leftJoin onto that single-source collection instead of onto a subquery-with-a-join. A plain-collection right side avoids the lazy-indexed-subquery path entirely.
Notes on scope / honesty
The minimal repro above is deterministic: data inserted, then the query built, with the result stable across several macrotask ticks (not a flush-timing artifact). This first surfaced in a real app through a streaming trigger instead (the query was built before eagerly-synced data arrived), which I could not collapse into the same localOnly repro — but the all-present localOnly case above is a clean, self-contained reproduction tied directly to the #494/#501 lazy-indexed-subquery machinery, so the streaming detail is corroborating context rather than load-bearing.
@tanstack/db@0.6.8, currentlylatest)Describe the bug
A live query of the shape
returns an empty initial snapshot for the subquery — every parent row joins against
undefined— when both of these hold:BTreeIndex, or equivalentlyautoIndex: 'eager'), andDrop the index, or remove the unmatched row, and the result is correct. The subquery in isolation is always correct (its own
toArrayreturns the right rows) — only the outerleftJoin's view of the subquery's initial snapshot is wrong.This is the same machinery as #494 ("Subquery doesn't work in leftJoin clause with auto indexing"), which was fixed by #501. #501's fix and tests cover a subquery that is a single filtered collection on the right of a
leftJoin(from(c).where(…)). The bug remains for a subquery that itself contains a join, in the specific case where that inner join has an unmatched right-side row. The #494 symptom triad still holds for this case:leftJoinfails,innerJoinworks,autoIndex: 'off'(no index) works.Reproduction
localOnlycollections + explicitBTreeIndex, all data present before the query is built. Pasteable as-is (uses only public@tanstack/dbAPI); adjust imports for the in-repo suite. The twoBUGtests currently fail; the threeCONTROLtests pass.Expected behavior
Either of these would resolve it:
BUGtests should produce{ p1: 0.5 }, matching the unindexed control. (An index is a performance optimization; it must not change query results.)leftJoinfrom being lazily loaded, throw/warn rather than silently returning an empty subquery snapshot (right now it fails silently —coalesce(undefined, 0)masks it as a plausible0).Actual behavior
{ p1: 0 }(both the plain and thegroupBy+sumshape).{ p1: 0.5 }.{ p1: 0.8 }(correct).Causal trace (observed, not inferred)
Logging the wrapped join's output row vs. the subquery's own
toArray, with the minimal data + only the driving collection indexed:So the subquery materializes the correct
p1row, but the outerleftJoin's output is keyed[p1, undefined]— it joinedp1againstundefined, i.e. it never received the subquery's initial snapshot. With no index the same row is keyed[p1, p1]and is correct.Minimization (each retained element is load-bearing — verified by removal)
c2, a child with no grandchild) is required. With every child matched, the result is correct regardless of the index. The unmatched row need not even belong to the same parent — it triggers the bug pointing at any/ghost parent.groupBy/sumare NOT required — a plain (non-aggregated) subquery drops identically. The aggregated shape is included only because it is the real-world shape that surfaced this.leftJoinfails whileinnerJoinreturns the correct row — matching Subquery doesn't work in leftJoin clause with auto indexing #494's reported workaround.Relationship to existing issues
innerJoinworks,autoIndex: 'off'works) match this report exactly. Fix optimized join missing results #501's added tests (join-subquery.test.ts→ "subquery in JOIN clause - left join") cover a subquery that is a single filtered collection; this report is the case where the subquery itself contains a join and that join has an unmatched row. So this looks like an incomplete fix / regression of the same lazy-indexed-subquery path. (I could not tell from the outside whether Fix optimized join missing results #501 ever intended to cover the nested-join case — flagging it rather than asserting.)toArraysubqueries skip the fast-path snapshot") — same theme (a subquery's initial snapshot not reaching the parent), but distinct: that is progressive-mode +toArrayincludes and the data is correct, just slow; here the data is wrong (silently0) in eager/localOnlymode and is index-triggered.orderBymisses same-key value updates (TopKState.processElement) #1489 (chained live-queryorderBymisses same-key updates) — same general area (join/subquery update propagation), not a duplicate.Workaround
Pre-aggregate the subquery into a dedicated one-row-per-key collection (maintained via
subscribeChangeson the source collections) and do a plain keyedleftJoinonto that single-source collection instead of onto a subquery-with-a-join. A plain-collection right side avoids the lazy-indexed-subquery path entirely.Notes on scope / honesty
The minimal repro above is deterministic: data inserted, then the query built, with the result stable across several macrotask ticks (not a flush-timing artifact). This first surfaced in a real app through a streaming trigger instead (the query was built before eagerly-synced data arrived), which I could not collapse into the same
localOnlyrepro — but the all-presentlocalOnlycase above is a clean, self-contained reproduction tied directly to the #494/#501 lazy-indexed-subquery machinery, so the streaming detail is corroborating context rather than load-bearing.