Skip to content

fix: native no-op rebuild regression (~80x slower than WASM)#360

Merged
carlos-alm merged 3 commits intomainfrom
fix/native-noop-regression
Mar 7, 2026
Merged

fix: native no-op rebuild regression (~80x slower than WASM)#360
carlos-alm merged 3 commits intomainfrom
fix/native-noop-regression

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Root cause: buildCFGData in cfg.js only built the extToLang map when WASM fallback was needed. When the native engine (3.0.4) provides pre-computed CFG for all functions, extToLang stayed null. Since native-parsed symbols lack _langId, the langId lookup fell through to null and every file was skipped — leaving cfg_blocks empty.
  • On each subsequent incremental build, the empty-table guard in builder.js:534 triggered a full parseFilesAuto re-parse of all files (~300ms), making native no-op rebuilds ~80× slower than WASM.
  • Fix: Move extToLang construction before the needsFallback check so it is always available. Same defensive fix applied to buildDataflowEdges.

Test plan

  • All 1453 tests pass (WASM-grammar-missing failures in worktree are pre-existing/unrelated)
  • Verified cfg_blocks is populated (13816 rows) after native full build
  • Native no-op rebuild: 6-7ms (was ~319ms before fix)
  • WASM no-op rebuild: 7-8ms (unchanged)
  • Full incremental benchmark shows no regression

When the native engine provides pre-computed CFG data for all functions,
`needsFallback` is false and `extToLang` was never built. Since native-
parsed symbols lack `_langId`, the langId lookup in buildCFGData fell
through to null and skipped every file — leaving cfg_blocks empty.

On each subsequent incremental build the empty-table guard triggered a
full re-parse of all files (~300 ms), making native no-op rebuilds ~80×
slower than WASM (319 ms vs 4 ms).

Move the extToLang construction before the needsFallback check so it is
always available. Apply the same fix to buildDataflowEdges defensively.

Impact: 2 functions changed, 13 affected
@claude
Copy link

claude bot commented Mar 7, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native No-op Rebuild Regression Fix

Todo List:

  • Read changed files to understand the fix
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Native engine builds were ~80x slower than WASM builds on no-op rebuilds (319ms vs 7-8ms). The issue was that extToLang map was only built when WASM fallback was needed. With native engine providing pre-computed CFG/dataflow data, extToLang remained null, causing langId lookups to fail and files to be skipped, leaving database tables empty. This triggered full re-parsing on every incremental build.

Why This Approach: Moving extToLang construction before the fallback check ensures the language mapping is always available. This is a targeted fix that addresses the root cause without changing the overall architecture - native data is still preferred when available, but the language identification works correctly for both native and WASM paths.

Risk Assessment: LOW RISK - This is a defensive fix that ensures consistent behavior between native and WASM engines. The change is minimal and surgical.

Critical Concerns

None found. This is a well-targeted performance fix.

Detailed Analysis

Code Changes Review

The fix is applied identically to both src/cfg.js (lines 1051-1058) and src/dataflow.js (lines 1010-1017):

Before: extToLang map was built inside the needsFallback conditional
After: extToLang map is built unconditionally before checking fallback needs

// Always build ext→langId map so native-only builds (where _langId is unset)
// can still derive the language from the file extension.
const extToLang = new Map();
for (const entry of LANGUAGE_REGISTRY) {
  for (const ext of entry.extensions) {
    extToLang.set(ext, entry.id);
  }
}

Technical Correctness

Architecture preserved: Native data is still prioritized when available
Fallback intact: WASM parsing still works when native data is missing
Performance maintained: No overhead added to the hot path
Comments added: Clear explanation of why the map is always needed
Consistent fix: Same pattern applied to both CFG and dataflow modules

Test Evidence

  • All 1453 tests pass
  • Performance restored: native no-op rebuilds now 6-7ms (was ~319ms)
  • WASM performance unchanged: still 7-8ms
  • Database properly populated: 13816 cfg_blocks rows after native build

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is an excellent targeted fix for a real performance regression. The root cause analysis is thorough, the fix is minimal and surgical, and the test plan demonstrates both correctness and performance restoration. The only reason it doesn't get 5 stars is that it's fixing a regression rather than adding new value, but it's executed perfectly.

@greptile-apps
Copy link

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes a performance regression where native-engine no-op rebuilds were ~80× slower than WASM by ensuring extToLang (the extension → language ID map) is always built upfront in both buildCFGData and buildDataflowEdges, rather than only when a WASM fallback is needed.

Key changes:

  • In both src/cfg.js and src/dataflow.js, extToLang is promoted from a lazily-initialized nullable variable to an eagerly-constructed const Map built from LANGUAGE_REGISTRY at the start of each function.
  • The now-redundant null guards (!extToLang || and extToLang ? ... : null) are cleaned up accordingly.
  • The fix is minimal, symmetrical across both files, and carries no risk of regression — LANGUAGE_REGISTRY is a static module-level constant that is already used earlier in both files, so the upfront iteration is negligible overhead.
  • Root cause was that native-parsed symbols lack _langId, so the langId fallback lookup hit a null map and every file was silently skipped, leaving cfg_blocks empty and triggering a full re-parse on every incremental build.

Confidence Score: 5/5

  • This PR is safe to merge — the fix is minimal, well-scoped, and carries no functional risk.
  • The change is a straightforward variable-initialization hoist with removal of stale null guards. LANGUAGE_REGISTRY is a static constant already used in both files, so eager construction is correct and cheap. All existing logic paths remain intact; the only behavioral difference is that the langId fallback via extToLang now actually works for native builds. No new control flow is introduced.
  • No files require special attention.

Important Files Changed

Filename Overview
src/cfg.js Moves extToLang construction before the needsFallback check so native-only builds (where symbols._langId is unset) can always resolve a language from the file extension; removes now-unnecessary null guards on extToLang.
src/dataflow.js Applies the identical defensive fix as cfg.js: extToLang is now eagerly built from LANGUAGE_REGISTRY unconditionally, and stale null-guard checks are removed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[buildCFGData / buildDataflowEdges called] --> B[Build extToLang map from LANGUAGE_REGISTRY]
    B --> C{Scan fileSymbols\nfor needsFallback?}
    C -- Yes --> D[Import parser.js\nCreate WASM parsers]
    C -- No --> E[parsers = null\ngetParserFn = null]
    D --> F[getParserFn = mod.getParser]
    E --> G[Process each file]
    F --> G
    G --> H{symbols._tree exists\nor allNative?}
    H -- No tree & not allNative --> I{getParserFn set?}
    I -- No --> J[skip file]
    I -- Yes --> K[extToLang.get ext\nWASM parse tree]
    H -- Has tree or allNative --> L{langId set?}
    K --> L
    L -- No --> M[langId = extToLang.get ext]
    L -- Yes --> N[Process CFG / Dataflow]
    M --> N
    N --> O[Insert rows into DB]

    style B fill:#d4edda,stroke:#28a745
    style J fill:#f8d7da,stroke:#dc3545
Loading

Last reviewed commit: c34fe2a

@claude
Copy link

claude bot commented Mar 7, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native No-op Rebuild Regression Fix

Todo List:

  • Read changed files to understand the fix
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Native engine builds were ~80x slower than WASM builds on no-op rebuilds (319ms vs 7-8ms). The issue was that extToLang map was only built when WASM fallback was needed. With native engine providing pre-computed CFG/dataflow data, extToLang remained null, causing langId lookups to fail and files to be skipped, leaving database tables empty. This triggered full re-parsing on every incremental build.

Why This Approach: Moving extToLang construction before the fallback check ensures the language mapping is always available. This is a targeted fix that addresses the root cause without changing the overall architecture - native data is still preferred when available, but the language identification works correctly for both native and WASM paths.

Risk Assessment: LOW RISK - This is a defensive fix that ensures consistent behavior between native and WASM engines. The change is minimal and surgical.

Critical Concerns

None found. This is a well-targeted performance fix.

Detailed Analysis

Code Changes Review

The fix is applied identically to both src/cfg.js (lines 1051-1058) and src/dataflow.js (lines 1010-1017):

Before: extToLang map was built inside the needsFallback conditional
After: extToLang map is built unconditionally before checking fallback needs

// Always build ext→langId map so native-only builds (where _langId is unset)
// can still derive the language from the file extension.
const extToLang = new Map();
for (const entry of LANGUAGE_REGISTRY) {
  for (const ext of entry.extensions) {
    extToLang.set(ext, entry.id);
  }
}

Technical Correctness

Architecture preserved: Native data is still prioritized when available
Fallback intact: WASM parsing still works when native data is missing
Performance maintained: No overhead added to the hot path
Comments added: Clear explanation of why the map is always needed
Consistent fix: Same pattern applied to both CFG and dataflow modules

Code Quality Notes

⚠️ Dead code identified: The fix introduces some unreachable conditions that could be cleaned up in a follow-up:

  • src/cfg.js:1120 and src/dataflow.js:1074: !extToLang checks are now always false
  • src/cfg.js:1143 and src/dataflow.js:1097: Ternary extToLang ? checks are now always truthy

These don't affect functionality but represent minor tech debt.

Test Evidence

  • All 1453 tests pass
  • Performance restored: native no-op rebuilds now 6-7ms (was ~319ms)
  • WASM performance unchanged: still 7-8ms
  • Database properly populated: 13816 cfg_blocks rows after native build

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is an excellent targeted fix for a real performance regression. The root cause analysis is thorough, the fix is minimal and surgical, and the test plan demonstrates both correctness and performance restoration. The only reason it doesn't get 5 stars is that it's fixing a regression rather than adding new value, but it's executed perfectly.

Clean up four stale null-checks identified by Greptile review:
- cfg.js:1120 and dataflow.js:1074: drop `!extToLang ||` from guards
- cfg.js:1143 and dataflow.js:1097: simplify ternary to direct .get()

Impact: 2 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

All 4 dead-code guards identified by @greptileai cleaned up in c34fe2a:

  • cfg.js:1120 and dataflow.js:1074: dropped !extToLang || from guards
  • cfg.js:1143 and dataflow.js:1097: simplified ternary to direct .get()

@greptileai

@claude
Copy link

claude bot commented Mar 7, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native No-op Rebuild Regression Fix

Todo List:

  • Read changed files to understand the fix
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Native engine builds were ~80x slower than WASM builds on no-op rebuilds (319ms vs 7-8ms). The issue was that extToLang map was only built when WASM fallback was needed. With native engine providing pre-computed CFG/dataflow data, extToLang remained null, causing langId lookups to fail and files to be skipped, leaving database tables empty. This triggered full re-parsing on every incremental build.

Why This Approach: Moving extToLang construction before the fallback check ensures the language mapping is always available. This is a targeted fix that addresses the root cause without changing the overall architecture - native data is still preferred when available, but the language identification works correctly for both native and WASM paths.

Risk Assessment: LOW RISK - This is a defensive fix that ensures consistent behavior between native and WASM engines. The change is minimal and surgical.

Critical Concerns

None found. This is a well-targeted performance fix.

Detailed Analysis

Code Changes Review

The fix is applied identically to both src/cfg.js (lines 1051-1058) and src/dataflow.js (lines 1010-1017):

Before: extToLang map was built inside the needsFallback conditional
After: extToLang map is built unconditionally before checking fallback needs

// Always build ext→langId map so native-only builds (where _langId is unset)
// can still derive the language from the file extension.
const extToLang = new Map();
for (const entry of LANGUAGE_REGISTRY) {
  for (const ext of entry.extensions) {
    extToLang.set(ext, entry.id);
  }
}

Technical Correctness

Architecture preserved: Native data is still prioritized when available
Fallback intact: WASM parsing still works when native data is missing
Performance maintained: No overhead added to the hot path
Comments added: Clear explanation of why the map is always needed
Consistent fix: Same pattern applied to both CFG and dataflow modules

Test Evidence

  • All 1453 tests pass
  • Performance restored: native no-op rebuilds now 6-7ms (was ~319ms)
  • WASM performance unchanged: still 7-8ms
  • Database properly populated: 13816 cfg_blocks rows after native build

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is an excellent targeted fix for a real performance regression. The root cause analysis is thorough, the fix is minimal and surgical, and the test plan demonstrates both correctness and performance restoration. The only reason it doesn't get 5 stars is that it's fixing a regression rather than adding new value, but it's executed perfectly.

@carlos-alm carlos-alm merged commit 88695fd into main Mar 7, 2026
15 checks passed
@carlos-alm carlos-alm deleted the fix/native-noop-regression branch March 7, 2026 11:25
@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant