Skip to content

spike: action-based conflict resolution prototype#6832

Draft
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:spike/action-based-conflict-resolution
Draft

spike: action-based conflict resolution prototype#6832
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:spike/action-based-conflict-resolution

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 commented May 18, 2026

Spike for #6448. Not for merge — published for reference when doing future work.

Includes a design doc plus a working Rust prototype of action-based transactions and conflict resolution.

What's here

  • rust/lance-table/design/action_based_conflict_resolution.md — design doc covering principles (criterion-based targeting, late binding, symbolic intra-txn references), the action catalog, lifecycle, worked examples, a first-pass conflict matrix (§5/§6), and a classification of every existing check_*_txn rule against the design (§11).
  • rust/lance-table/src/format/transaction/actions{.rs,/}Action trait, ManifestMask, transaction driver, and six action implementations: RemoveFragments, AddFragments, UpdateDeletionVector, InvalidateIndexCoverage, ChangeSchema, AddIndex.
  • Per-action unit tests + a conflict-resolution test suite covering:
    • update || update on different fragments → compatible
    • update || update same fragment, different rows → compatible
    • update || update same row → conflict
    • update || update same fragment, full rewrite → conflict
    • create_index then update (criterion path picks up new concrete index)
    • update then create_index (rebase trims removed fragments from trained bitmap)
    • two create_index over the same field with distinct UUIDs → both land
    • concurrent overwrite → last writer wins

24 tests, all passing. Clippy + fmt clean.

Why this approach makes N×N rules unnecessary

The acceptance criteria on #6448 call for "a conflict matrix covering all 14 action types." The prototype's premise is that the matrix is the wrong artifact — it should be emergent, not hand-authored.

Three mechanisms, all derived from per-action reads() and writes() masks, cover the bulk of conflict cases:

  1. Mask overlap (pure). Two actions' manifest paths intersect → driver fires rebase. Most fragment-id / field-id collisions resolve here without any per-pair logic.
  2. Criterion-based overlap (§3.1). A path carries a predicate (IndicesCoveringFields{fields=[7]}); the mask layer evaluates it against the peer's concrete path. This is what lets a Update(field=7) correctly invalidate a concurrently-added AddIndex(field=7) without either action knowing about the other.
  3. Peer-inspecting rebase (§7). When mask overlap fires, rebase can downcast the peer to make a finer decision — UpdateDeletionVector checking row-level overlap against a concurrent UDV is the canonical case.

Classifying the existing ~52 rules across all 14 check_*_txn methods (see §11):

  • ~24 fit one of the three patterns directly.
  • ~22 are catalog gaps — rules over actions §4 doesn't yet enumerate (UpdateConfig, AddBases, etc.). Add the action, the rule fits.
  • ~6 are true pattern gaps and the most valuable finding of the spike.

For an N-action catalog, that's O(N) action declarations + O(small) special-case rebase impls, not O(N²) hand-curated cells. Adding a new action means adding its mask declarations and any necessary peer-inspection logic in rebase; existing actions are untouched.

Worked example: merge-insert primary-key bloom filter

One existing rule that looks at first like it needs transaction-level metadata: inserted_rows_filter, the Bloom filter over primary keys a merge-insert claims as new. Two concurrent merge-inserts could both decide PK 42 doesn't exist and both insert it — the bloom detects the race.

This appears transaction-level because both txns need to compare blooms. In fact it fits pattern 3 cleanly: the bloom describes the PKs in the new rows of an AddFragments, so it lives on the action. AddFragments::rebase walks concurrent peers, finds other AddFragments actions, and:

  • both have a bloom with matching field_ids and blooms intersect → Retry
  • both have a bloom with non-matching field_ids → Retry (can't verify)
  • self has a bloom, peer has none (plain Append) → Retry (peer's PKs unknowable)

No txn-level surface required. Same shape as UDV row-overlap, just with a bloom instead of a roaring bitmap.

Genuine gaps that did surface

Three structural additions the prototype doesn't yet handle:

  1. Transaction-level conflict metadata for MemWAL merged_generations. This field appears on three different operation shells (Update, UpdateMemWalState, CreateIndex(MemWAL)) — same data, different wrappers. Unlike the PK bloom, it genuinely doesn't belong to any one action. Wants a TxnContext-level metadata surface parallel to action masks.
  2. Retryable vs Incompatible verdict. Today's resolver distinguishes "your transaction can't be made valid" from "your transaction is fine, just races; retry against the new manifest." The action trait collapses both into TransactionError. rebase should return Ok | Retry | Fail (or split the error type) so the driver can drive the retry loop.
  3. Restore as baseline reset. Forward rebase is meaningless for a Restore — the entire base is replaced. Needs a driver-level short-circuit. The PRD already puts Restore at the top-level proto oneof, so this is more a §7 documentation issue than a design gap.

The bulk of the remaining work is mechanical catalog completion (~10 more actions: RemoveIndex, UpdateConfig, AddFields, etc.) plus enriching the trait to return the three-way verdict.

Known TODOs / open questions (prototype-specific)

  • UDV row-disjoint merge currently passes rebase but apply overwrites rather than unions — real impl needs IO in rebase.
  • AddIndex introduces an OnNewFragments::{LeaveUncovered, Fail} policy knob that the design doc doesn't yet pin down.
  • InvalidateIndexCoverage/AddIndex operate on a prototype-only WorkingState { manifest, indices } wrapper since the real Manifest doesn't carry the index list inline.

Design doc + working prototype of action-based transactions and conflict
resolution. Not for merge — published for reference.

- rust/lance-table/design/action_based_conflict_resolution.md
- rust/lance-table/src/format/transaction/actions{.rs,/} — Action trait,
  ManifestMask, six action impls (RemoveFragments, AddFragments,
  UpdateDeletionVector, InvalidateIndexCoverage, ChangeSchema, AddIndex),
  and a conflict-resolution test suite covering update-update variants
  and update-create_index orderings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant