Skip to content

leftJoin onto a subquery that itself contains a join returns 0 / undefined join key at initial materialization when the subquery source is indexed and its inner join has an unmatched row (incomplete #501) #1590

@vibl

Description

@vibl
  • I've validated the bug against the latest version of DB packages (@tanstack/db@0.6.8, currently latest)

Describe the bug

A live query of the shape

from(parent).leftJoin({ sub: <subquery that itself contains an inner join> }, …)

returns an empty initial snapshot for the subquery — every parent row joins against undefined — when both of these hold:

  1. the subquery's driving (left) collection is indexed on its join key (BTreeIndex, or equivalently autoIndex: 'eager'), and
  2. the subquery's inner join has at least one unmatched right-side row (a right-side row that no left row joins to).

Drop the index, or remove the unmatched row, and the result is correct. The subquery in isolation is always correct (its own toArray returns the right rows) — only the outer leftJoin's view of the subquery's initial snapshot is wrong.

This is the same machinery as #494 ("Subquery doesn't work in leftJoin clause with auto indexing"), which was fixed by #501. #501's fix and tests cover a subquery that is a single filtered collection on the right of a leftJoin (from(c).where(…)). The bug remains for a subquery that itself contains a join, in the specific case where that inner join has an unmatched right-side row. The #494 symptom triad still holds for this case: leftJoin fails, innerJoin works, autoIndex: 'off' (no index) works.

Reproduction

localOnly collections + explicit BTreeIndex, all data present before the query is built. Pasteable as-is (uses only public @tanstack/db API); adjust imports for the in-repo suite. The two BUG tests currently fail; the three CONTROL tests pass.

import { describe, expect, it } from 'vitest';
import {
  BTreeIndex,
  coalesce,
  createCollection,
  createLiveQueryCollection,
  eq,
  localOnlyCollectionOptions,
  sum,
  type Collection
} from '@tanstack/db';

// Reproduces against @tanstack/db@0.6.8.
//
// Shape: from(parent).leftJoin({ sub: <a subquery that itself contains a join> }).
// When the subquery's driving collection carries a BTreeIndex on its join key AND
// the subquery's inner join has at least one UNMATCHED right-side row, the outer
// leftJoin sees an EMPTY initial snapshot of the subquery — every parent row joins
// against `undefined`. Drop the index OR remove the unmatched row and it is correct.

type Parent = { id: string; name: string };
type Child = { id: string; parent_id: string };
type Grandchild = { id: string; parent_id: string; value: number };

// Flush the IVM pipeline. The result is stable after the first tick (not a timing artifact).
const settle = async () => {
  for (let i = 0; i < 5; i++) await new Promise((r) => setTimeout(r, 0));
};

function mk<T extends { id: string }>(prefix: string, indexParentId: boolean) {
  const c = createCollection(
    localOnlyCollectionOptions<T, string>({ id: `${prefix}-${crypto.randomUUID()}`, getKey: (i) => i.id })
  );
  if (indexParentId) {
    (c as Collection<{ id: string; parent_id: string }, string>).createIndex((r) => r.parent_id, {
      indexType: BTreeIndex
    });
  }
  return c;
}

/** Outer leftJoin(parent, <subquery containing an inner join>). `grouped` toggles the aggregated shape. */
function buildWrapped(
  parents: Collection<Parent, string>,
  children: Collection<Child, string>,
  grandchildren: Collection<Grandchild, string>,
  grouped: boolean
) {
  return createLiveQueryCollection({
    id: `wrapped-${crypto.randomUUID()}`,
    startSync: true,
    query: (q) => {
      const base = q
        .from({ gc: grandchildren })
        .innerJoin({ child: children }, ({ gc, child }) => eq(gc.parent_id, child.id));
      const sub = grouped
        ? base.groupBy(({ child }) => child.parent_id).select(({ gc, child }) => ({
            parentId: child.parent_id,
            total: sum(gc.value)
          }))
        : base.select(({ gc, child }) => ({
            parentId: child.parent_id,
            total: gc.value
          }));
      return q
        .from({ parent: parents })
        .leftJoin({ sub }, ({ parent, sub }) => eq(sub.parentId, parent.id))
        .select(({ parent, sub }) => ({ id: parent.id, total: coalesce(sub?.total, 0) }));
    }
  });
}

const totals = (j: { toArray: { id: string; total: number }[] }) =>
  Object.fromEntries(j.toArray.map((r) => [r.id, r.total]));

// Minimal data: 1 parent, 2 children (c1 has a grandchild, c2 is unmatched), 1 grandchild.
const PARENTS: Parent[] = [{ id: 'p1', name: 'One' }];
const CHILDREN: Child[] = [
  { id: 'c1', parent_id: 'p1' },
  { id: 'c2', parent_id: 'p1' } // unmatched — never referenced by any grandchild
];
const GRANDCHILDREN: Grandchild[] = [{ id: 'g1', parent_id: 'c1', value: 0.5 }];

function setup(indexGrandchildren: boolean, children = CHILDREN, grandchildren = GRANDCHILDREN) {
  const parents = mk<Parent>('parents', false);
  const kids = mk<Child>('children', false);
  const grandkids = mk<Grandchild>('grandchildren', indexGrandchildren);
  parents.insert(PARENTS);
  kids.insert(children);
  grandkids.insert(grandchildren);
  return { parents, kids, grandkids };
}

describe('leftJoin onto an indexed subquery-with-join drops its initial snapshot', () => {
  it('CONTROL — unindexed, unmatched row present → correct', async () => {
    const { parents, kids, grandkids } = setup(false);
    const j = buildWrapped(parents, kids, grandkids, false);
    await settle();
    expect(totals(j)).toEqual({ p1: 0.5 });
  });

  it('CONTROL — indexed, every row matched (no unmatched row) → correct', async () => {
    const { parents, kids, grandkids } = setup(
      true,
      [{ id: 'c1', parent_id: 'p1' }, { id: 'c2', parent_id: 'p1' }],
      [{ id: 'g1', parent_id: 'c1', value: 0.5 }, { id: 'g2', parent_id: 'c2', value: 0.3 }]
    );
    const j = buildWrapped(parents, kids, grandkids, true);
    await settle();
    expect(totals(j)).toEqual({ p1: 0.8 });
  });

  it('BUG — indexed + unmatched row → returns 0 (expected 0.5)', async () => {
    const { parents, kids, grandkids } = setup(true);
    const j = buildWrapped(parents, kids, grandkids, false);
    await settle();
    // Currently FAILS: totals(j) === { p1: 0 }. The outer leftJoin output row is keyed
    // "[p1,undefined]" — it never sees the subquery's (correct) p1 row.
    expect(totals(j)).toEqual({ p1: 0.5 });
  });

  it('BUG — aggregated shape (groupBy + sum), indexed + unmatched row → returns 0', async () => {
    const { parents, kids, grandkids } = setup(true);
    const j = buildWrapped(parents, kids, grandkids, true);
    await settle();
    expect(totals(j)).toEqual({ p1: 0.5 }); // currently FAILS with { p1: 0 }
  });

  it('the subquery in isolation is correct even when indexed (only the outer join is wrong)', async () => {
    const { kids, grandkids } = setup(true);
    const sub = createLiveQueryCollection({
      id: `sub-${crypto.randomUUID()}`,
      startSync: true,
      query: (q) =>
        q
          .from({ gc: grandkids })
          .innerJoin({ child: kids }, ({ gc, child }) => eq(gc.parent_id, child.id))
          .select(({ gc, child }) => ({ parentId: child.parent_id, total: gc.value }))
    });
    await settle();
    expect(sub.toArray.map((r) => r.total)).toEqual([0.5]); // passes
  });
});

Expected behavior

Either of these would resolve it:

  1. Indexed and unindexed must return identical results. The BUG tests should produce { p1: 0.5 }, matching the unindexed control. (An index is a performance optimization; it must not change query results.)
  2. If there is a legitimate constraint that prevents a subquery-with-a-join on the right of a leftJoin from being lazily loaded, throw/warn rather than silently returning an empty subquery snapshot (right now it fails silently — coalesce(undefined, 0) masks it as a plausible 0).

Actual behavior

  • Indexed + unmatched row: { p1: 0 } (both the plain and the groupBy+sum shape).
  • Unindexed (same data): { p1: 0.5 }.
  • Indexed, every row matched: { p1: 0.8 } (correct).

Causal trace (observed, not inferred)

Logging the wrapped join's output row vs. the subquery's own toArray, with the minimal data + only the driving collection indexed:

subquery-alone toArray (INDEXED):    [ { parentId: "p1", total: 0.5, $key: "p1" } ]
wrapped leftJoin  toArray (INDEXED): [ { id: "p1", total: 0, $key: "[p1,undefined]" } ]
wrapped leftJoin  toArray (UNINDEXED): [ { id: "p1", total: 0.5, $key: "[p1,p1]" } ]

So the subquery materializes the correct p1 row, but the outer leftJoin's output is keyed [p1, undefined] — it joined p1 against undefined, i.e. it never received the subquery's initial snapshot. With no index the same row is keyed [p1, p1] and is correct.

Minimization (each retained element is load-bearing — verified by removal)

  • Index: only an index on the subquery's driving (left) collection's join key is required. With that single index the bug fires; removing it makes it correct. Indexes on the parent or on the inner-join's right collection are irrelevant.
  • Unmatched inner-join row (c2, a child with no grandchild) is required. With every child matched, the result is correct regardless of the index. The unmatched row need not even belong to the same parent — it triggers the bug pointing at any/ghost parent.
  • groupBy / sum are NOT required — a plain (non-aggregated) subquery drops identically. The aggregated shape is included only because it is the real-world shape that surfaced this.
  • Outer join type: with this single index, leftJoin fails while innerJoin returns the correct row — matching Subquery doesn't work in leftJoin clause with auto indexing #494's reported workaround.
  • Minimal data: 1 parent, 2 children (one unmatched), 1 grandchild.

Relationship to existing issues

Workaround

Pre-aggregate the subquery into a dedicated one-row-per-key collection (maintained via subscribeChanges on the source collections) and do a plain keyed leftJoin onto that single-source collection instead of onto a subquery-with-a-join. A plain-collection right side avoids the lazy-indexed-subquery path entirely.

Notes on scope / honesty

The minimal repro above is deterministic: data inserted, then the query built, with the result stable across several macrotask ticks (not a flush-timing artifact). This first surfaced in a real app through a streaming trigger instead (the query was built before eagerly-synced data arrived), which I could not collapse into the same localOnly repro — but the all-present localOnly case above is a clean, self-contained reproduction tied directly to the #494/#501 lazy-indexed-subquery machinery, so the streaming detail is corroborating context rather than load-bearing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions