Skip to content

fix(document-api): anchor pure-insert word-diff groups to preceding EQUAL token (SD-3044)#3368

Open
tupizz wants to merge 1 commit into
mainfrom
tadeu/sd-3044-fix-tracked-rewrite-anchor
Open

fix(document-api): anchor pure-insert word-diff groups to preceding EQUAL token (SD-3044)#3368
tupizz wants to merge 1 commit into
mainfrom
tadeu/sd-3044-fix-tracked-rewrite-anchor

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented May 18, 2026

Demo

CleanShot 2026-05-18 at 16 04 19@2x

Summary

Fixes SD-3044: tracked text.rewrite was inserting replacement text at shifted positions, producing strings like JohnJames Smith address of [insert]... instead of John James Smith of [insert address].... The bug survived accept-all, so it was written into exported DOCX.

Root cause

In packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/word-diff.ts, getWordChanges computes a word-level diff and groups consecutive delete/insert ops into single ops. For pure-insert groups (no delete, just inserts surrounded by EQUAL tokens), it tried to anchor the insertion to the preceding EQUAL token:

const prevStep = i > 0 ? steps[i - 1] : null;  // ← bug
if (prevStep && prevStep.type === 'equal') { /* … */ }
else if (result.length > 0) {                  // fallback that actually runs
  insertAt = lastOp.oldTo ?? lastOp.insertAt ?? 0;
}

But by the time this code runs, the inner while loop has already advanced i past the group, so steps[i - 1] is the last delete/insert of the current group — never an EQUAL. The prevStep.type === 'equal' branch was dead code, and every pure-insert group fell back to the previous result op's end.

The bug only triggers when Myers can find EQUAL tokens between insert-only groups. The customer payload happens to do this because the prefix/suffix trim splits [insert] into [insert (without the ], which is part of the shared suffix), making [insert token-equal between old and new. Two insert-only groups in [insert] of [insertJohn James Smith of [insert address both got insertAt = 8, piling John, James Smith and address at the first deletion site.

Fix

Capture groupStart before the inner while loop and use it to look at the step BEFORE the group:

+    const groupStart = i;
     let deleteStart = -1;
     let deleteEnd = -1;
     let insertText = '';
     while (i < steps.length && (steps[i].type === 'delete' || steps[i].type === 'insert')) {
       …
     }
     …
     } else if (insertText.length > 0) {
-      const prevStep = i > 0 ? steps[i - 1] : null;
+      const prevStep = groupStart > 0 ? steps[groupStart - 1] : null;

One-line change. The previously-dead prevStep.type === 'equal' branch now runs and the existing prevToken.offset + prevToken.text.length logic produces the correct anchor.

How to reproduce

The customer fixture is Simple Agreement for future equity (SAFE) (5).docx. Steps:

  1. pnpm dev and open the SuperDoc dev app.

  2. Upload the SAFE fixture.

  3. Run the Lighthouse payload via the dev console:

    window.superdocdev.editor.doc.mutations.apply({
      action: 'apply', atomic: true, changeMode: 'tracked',
      steps: [
        { id: 'parties-investor', op: 'text.rewrite',
          where: { by: 'block', nodeType: 'listItem', nodeId: '6FDB207C' },
          args: { replacement: { text: 'John James Smith of [insert address], [insert] ("Investor")' } } },
        { id: 'parties-company', op: 'text.rewrite',
          where: { by: 'block', nodeType: 'listItem', nodeId: '447A152E' },
          args: { replacement: { text: 'Working Title Group Limited a company incorporated in New Zealand having its registered office at 29 Park Hill Road, Birkenhead, Auckland, 0626, NZ (NZBN 9429050880331)("Company")' } } },
      ],
    });
  4. Click "Accept tracked changes".

Before the fix:

JohnJames Smith  address of [insert], [insert] ("Investor")
Working TitleGroup  Limited a company incorporated in NewZealand  having its registered office at 29 Park Hill Road, Birkenhead, Auckland, 0626, NZ (NZBN 9429050880331)("Company")

After the fix:

John James Smith of [insert address], [insert] ("Investor")
Working Title Group Limited a company incorporated in New Zealand having its registered office at 29 Park Hill Road, Birkenhead, Auckland, 0626, NZ (NZBN 9429050880331)("Company")

Test plan

  • New unit suite: packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/word-diff.test.ts (8 tests) — covers identity, empty-side, single replace, insert-only with surrounding EQUAL, the exact Lighthouse suffix-trim shape, and prefix-only equal anchoring.
  • Integration coverage added to tracked-rewrite.integration.test.ts:
    • SD-3044: tracked rewrite with shared suffix anchors inserts correctly — parties-investor shape; explicit guards against JohnJames and Smith address.
    • SD-3044: tracked rewrite of long block preserves spacing across multiple equal anchors — parties-company shape; guards against PtyTitle, AustraliaNew, (ACNPark.
  • Full super-editor suite passes (1033 files, 13083 tests).
  • Manually verified end-to-end with the SAFE fixture in pnpm dev: applying the Lighthouse payload then accepting all tracked changes now produces the expected text for all four blocks (3BD84854, 7D51E57C, 6FDB207C, 447A152E).

Acceptance criteria (from ticket)

  • Lighthouse payload no longer produces JohnJames, PtyTitleGroup, TitleGroup (concatenated), or AustraliaNewZealand.
  • Accepting all tracked changes produces the expected replacement text with correct spacing.
  • Regression coverage exists for this payload shape (tracked-rewrite.integration.test.ts + word-diff.test.ts).
  • Existing granular tracked-rewrite behaviour preserved — full suite green, no tracked-rewrite.integration.test.ts cases regressed.

Related

  • Linked from: SD-2787 (granular diff tracked changes).
  • Blocks: IT-999 (AI edits add strange text in docs).

…QUAL token (SD-3044)

In tracked `text.rewrite`, when the word-level diff produces multiple
delete/insert groups separated by EQUAL tokens, pure-insert groups were
anchoring to the previous result op's end (`oldTo` / `insertAt`) instead
of the EQUAL token that precedes the group. `prevStep = steps[i - 1]`
read the last step of the just-processed group (always delete/insert),
so the `prevStep.type === 'equal'` branch was dead code.

For inputs like `[insert] of [insert` → `John James Smith of [insert
address`, the diff produced three groups with EQUAL tokens between them
(`of`, ` `, `[insert`). Both pure-insert groups fell back to `insertAt
= 8`, piling all granular insertions on the first deletion site and
producing strings like `JohnJames Smith  address of [insert]...` after
accept.

Capture `groupStart` before the inner while loop and read
`steps[groupStart - 1]` so the EQUAL-anchor branch actually fires.

Adds unit coverage for `getWordChanges` and two integration tests
mirroring the Lighthouse SAFE-document repro shape.
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 18, 2026

SD-3044

@tupizz tupizz marked this pull request as ready for review May 18, 2026 19:09
@tupizz tupizz requested a review from caio-pizzol May 18, 2026 19:09
@tupizz tupizz requested a review from a team as a code owner May 18, 2026 19:09
@tupizz tupizz requested a review from luccas-harbour May 18, 2026 19:10
@tupizz tupizz self-assigned this May 18, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants