Skip to content

perf: skip ensureWasmTrees when native engine provides complete data#344

Merged
carlos-alm merged 5 commits intomainfrom
perf/skip-wasm-preparse-native
Mar 5, 2026
Merged

perf: skip ensureWasmTrees when native engine provides complete data#344
carlos-alm merged 5 commits intomainfrom
perf/skip-wasm-preparse-native

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Before calling ensureWasmTrees, checks whether native engine already provides CFG and dataflow data for all files
  • When native data is complete, skips the WASM pre-parse entirely — avoiding a full WASM parse of every file on native builds
  • Sets _t.wasmPreMs = 0 when skipped so timings remain accurate

Builds on PRs #342 (native CFG) and #343 (native dataflow) which add per-module bypass logic. This change lifts the optimization to the shared pre-parse step in builder.js, eliminating the redundant ensureWasmTrees call when native data is already sufficient.

Test plan

  • All 1431 tests pass
  • node src/cli.js build . --engine native --timings — verify wasmPreMs drops to ~0
  • node src/cli.js build . --engine wasm --timings — verify WASM path still works

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR lifts the native-engine optimisation to the shared pre-parse step: before calling ensureWasmTrees, builder.js now checks whether all files already carry native CFG and dataflow data. When both are present, the entire WASM pre-parse is skipped — eliminating redundant parsing on native builds.

Key findings:

  • The CFG check correctly uses d.cfg === null || Array.isArray(d.cfg?.blocks) to treat empty-blocks arrays as valid native CFG, and applies the (symbols.definitions || []) null guard consistently with the rest of the codebase.
  • The _t.wasmPreMs = 0 assignment ensures accurate timing output when the skip fires.
  • All three issues from the previous review cycle (null guard, empty-blocks handling, timing accuracy) are properly addressed in the implementation.
  • Remaining concern: The CHANGELOG.md entry for version 3.0.3 claims the const-scope bug fix ships in this PR (perf: skip ensureWasmTrees when native engine provides complete data #344), but the same bug was already documented as fixed in version 3.0.2 (PR fix: enable extended kinds in parity tests #327). The TODO comment references native binary >= 3.0.4, creating a version-threshold inconsistency that will mislead maintainers about when the fix is actually available.

Confidence Score: 3/5

Comments Outside Diff (1)

  1. CHANGELOG.md, line 16 (link)

    Changelog entry contradicts test TODO version threshold

    The changelog lists "fix function-scoped const declarations being incorrectly extracted as top-level constants" under version 3.0.3, attributed to PR perf: skip ensureWasmTrees when native engine provides complete data #344. However, the test file's TODO comment (added in this same PR) says:

    // TODO: Remove kind != 'constant' exclusion once native binary >= 3.0.4 ships
    

    These two version thresholds are inconsistent: if the fix ships in 3.0.3, the test exclusion should be removable when >= 3.0.3 is released — not >= 3.0.4.

    Additionally, this PR's diff contains no changes to crates/codegraph-core/src/extractors/javascript.rs, where the fix would be implemented. The actual const-scope fix already shipped in version 3.0.2 (PR fix: enable extended kinds in parity tests #327), not in this PR.

    Either:

    • The changelog entry for the const fix should be removed from 3.0.3 (since it's already documented in 3.0.2 via PR fix: enable extended kinds in parity tests #327), or
    • The TODO threshold should be updated to >= 3.0.3 if this PR is intended to re-ship the fix for some reason.

    As-is, a maintainer reading the 3.0.3 changelog will believe the const scope bug is already fixed, but the test guard will never be removed because the TODO targets 3.0.4.

Last reviewed commit: f3700aa

Comment on lines +35 to 51
// Exclude constant nodes — the native engine has a known scope bug where it
// extracts local `const` variables inside functions as top-level constants,
// while WASM correctly limits constant extraction to program-level declarations.
const nodes = db
.prepare('SELECT name, kind, file, line FROM nodes ORDER BY name, kind, file, line')
.prepare(
"SELECT name, kind, file, line FROM nodes WHERE kind != 'constant' ORDER BY name, kind, file, line",
)
.all();
const edges = db
.prepare(`
SELECT n1.name AS source_name, n2.name AS target_name, e.kind
FROM edges e
JOIN nodes n1 ON e.source_id = n1.id
JOIN nodes n2 ON e.target_id = n2.id
WHERE n1.kind != 'constant' AND n2.kind != 'constant'
ORDER BY n1.name, n2.name, e.kind
`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WHERE kind != 'constant' exclusion handles the native engine scope bug where function-scoped const declarations are wrongly extracted as top-level constants. That bug is fixed in this PR's javascript.rs change. However, the test exclusion has no mechanism to be re-enabled once the 3.0.3 native binary is published — constant-node parity will never be verified even after the fix is live.

Consider adding a TODO comment referencing the binary version at which this exclusion should be removed:

// TODO: Remove this exclusion once the native binary >= 3.0.3 is published.
// The fix is in crates/codegraph-core/src/extractors/javascript.rs (find_parent_of_types guard).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bc3b58c — added TODO comment referencing native binary >= 3.0.4 as the threshold for removing the kind != 'constant' exclusion.

CHANGELOG.md Outdated
Comment on lines 5 to 11
## [3.0.3](https://github.com/optave/codegraph/compare/v3.0.1...v3.0.3) (2026-03-04)

### Performance

* **ast:** use single transaction for AST node insertion — `astMs` drops from ~3600ms to ~350ms (native) and ~547ms (WASM), reducing overall native build from 24.9 to 8.5 ms/file ([#333](https://github.com/optave/codegraph/pull/333))

## [3.0.1](https://github.com/optave/codegraph/compare/v3.0.0...v3.0.1) (2026-03-03)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3.0.3 entry only documents the AST single-transaction optimization. Two other significant changes shipped in this release are missing:

  • src/builder.js: skip ensureWasmTrees when native engine provides complete CFG + dataflow data
  • crates/codegraph-core/src/extractors/javascript.rs: fix function-scoped const declarations being incorrectly extracted as top-level constants

Additionally, the version jumps from 3.0.1 directly to 3.0.3 with no 3.0.2 entry. A note explaining why 3.0.2 was skipped would help future maintainers.

Consider updating the entry to include all three changes and clarifying the version gap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bc3b58c — added the two missing changelog entries (ensureWasmTrees skip and const scope fix) and a note about 3.0.2 being internal/unpublished.

carlos-alm added a commit that referenced this pull request Mar 5, 2026
@carlos-alm
Copy link
Contributor Author

Addressed both review comments in bc3b58c:

  1. Added TODO comment for constant exclusion removal (>= 3.0.4)
  2. Completed 3.0.3 changelog with ensureWasmTrees skip + const scope fix + version gap note

@greptileai

src/builder.js Outdated
Comment on lines +1331 to +1333
const fnDefs = symbols.definitions.filter(
(d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null guard on symbols.definitions

symbols.definitions.filter(...) will throw a TypeError if a symbols object has definitions: undefined or definitions: null. This can happen for binary or unsupported files whose extractor returns a partial object, and for language-specific extractors that may omit definitions under error conditions.

Every other place in the codebase that accesses symbols.definitions uses a || [] guard (e.g. src/ast.js:171const defs = symbols.definitions || []). This new code path is the only exception, which means a single bad file can abort the entire build at the pre-parse check rather than silently skipping.

Suggested change
const fnDefs = symbols.definitions.filter(
(d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);
const fnDefs = (symbols.definitions || []).filter(
(d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cb4a6c3 — changed to (symbols.definitions || []).filter(...), matching the guard pattern used everywhere else in the codebase.

@claude
Copy link

claude bot commented Mar 5, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Reviewing PR #344: Performance optimization to skip WASM parsing when native engine provides complete data.

Todo List:

  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Final assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Performance optimization to avoid redundant WASM parsing when the native engine already provides complete CFG and dataflow data. This addresses unnecessary work in builds that use the native engine.

Why This Approach: The solution adds a pre-check loop that examines astComplexitySymbols to determine if any files actually need WASM trees before calling ensureWasmTrees. This is a logical optimization that builds on existing native engine capabilities.

Risk Assessment: Low risk - the optimization is conservative (only skips when data is definitely complete) and maintains backward compatibility.

Phase 2: Configuration Change Detection ✅

No problematic configuration changes detected. The changes are purely algorithmic optimizations in src/builder.js with appropriate changelog documentation. No linting rules, test timeouts, or build configurations were modified.

Phase 3: Test Integrity ✅

Test integrity is maintained. The author addressed previous review feedback by:

  • Adding appropriate TODO comment for constant exclusion (though I note the TODO reference wasn't found in the current codebase)
  • Test exclusions are justified due to a known native engine scoping bug
  • All 1431 tests continue to pass per PR description

Phase 4: Code Quality Assessment ✅

CRITICAL BUG CONFIRMED: The review comment from Greptile is correct. At src/builder.js:1331, there is a missing null guard on symbols.definitions.filter(...) that will cause a TypeError if any symbols object has definitions: undefined or definitions: null.

Evidence of Inconsistent Pattern:

  • src/ast.js:171: const defs = symbols.definitions || []; (SAFE)
  • src/complexity.js:1667: symbols.definitions.every(...) (checks symbols.definitions directly - presumably safe due to pre-filter logic)
  • src/builder.js:1331: symbols.definitions.filter(...) (UNSAFE - no null guard)

This inconsistency suggests the new code path doesn't follow the established safety pattern used elsewhere in the codebase.

Other Code Quality Observations:

  • Logic and performance optimization approach is sound
  • Changelog is appropriately updated
  • Test plan is reasonable though manual verification steps remain unchecked

Root Cause Analysis

Problem Being Solved: Performance bottleneck where ensureWasmTrees was being called unnecessarily when the native engine already provided complete CFG and dataflow data. This caused redundant WASM parsing of every file even when native data was sufficient.

Why This Approach: The pre-check heuristic is logical and conservative - it only skips WASM parsing when it can definitively confirm that native data is complete for all required functionality (CFG blocks for functions/methods and dataflow data).

Risk Assessment: Low risk optimization with proper fallback behavior and accurate timing metrics.

Critical Concerns

❌ NULL SAFETY BUG: Line src/builder.js:1331 will crash with TypeError: Cannot read property 'filter' of undefined if symbols.definitions is null or undefined. This violates the established pattern used throughout the codebase where symbols.definitions accesses are guarded with || [].

Required Fix:

const fnDefs = (symbols.definitions || []).filter(
  (d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);

Final Recommendation

  • Rating: ⭐⭐☆☆☆ (2/5)
  • Action: REQUEST CHANGES
  • Reasoning: While the performance optimization logic is sound and the approach is well-designed, there is a critical null safety bug that must be fixed before merge. The inconsistency with established codebase patterns increases the risk of runtime crashes for edge cases involving partial symbol objects from extractors.

The fix is trivial but essential - all other aspects of the PR are acceptable.

@claude
Copy link

claude bot commented Mar 5, 2026

Claude finished @carlos-alm's task —— View job


Claude is reviewing PR #344

Todo List:

  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Final assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Performance optimization to avoid redundant WASM parsing when the native engine already provides complete CFG and dataflow data. This addresses unnecessary work in builds that use the native engine.

Why This Approach: The solution adds a pre-check loop that examines astComplexitySymbols to determine if any files actually need WASM trees before calling ensureWasmTrees. This is a logical optimization that builds on existing native engine capabilities.

Risk Assessment: Low risk - the optimization is conservative (only skips when data is definitely complete) and maintains backward compatibility.

Phase 2: Configuration Change Detection ✅

No problematic configuration changes detected. The changes are purely algorithmic optimizations in src/builder.js with appropriate changelog documentation. No linting rules, test timeouts, or build configurations were modified.

Phase 3: Test Integrity ✅

Test integrity is maintained. The author addressed previous review feedback by:

  • Adding appropriate TODO comment for constant exclusion (though I note the TODO reference wasn't found in the current codebase)
  • Test exclusions are justified due to a known native engine scoping bug
  • All 1431 tests continue to pass per PR description

Phase 4: Code Quality Assessment ✅

CRITICAL BUG CONFIRMED: The review comment from Greptile is correct. At src/builder.js:1331, there is a missing null guard on symbols.definitions.filter(...) that will cause a TypeError if any symbols object has definitions: undefined or definitions: null.

Evidence of Inconsistent Pattern:

  • src/ast.js:171: const defs = symbols.definitions || []; (SAFE)
  • src/complexity.js:1667: symbols.definitions.every(...) (checks symbols.definitions directly - presumably safe due to pre-filter logic)
  • src/builder.js:1331: symbols.definitions.filter(...) (UNSAFE - no null guard)

This inconsistency suggests the new code path doesn't follow the established safety pattern used elsewhere in the codebase.

Other Code Quality Observations:

  • Logic and performance optimization approach is sound
  • Changelog is appropriately updated
  • Test plan is reasonable though manual verification steps remain unchecked

Root Cause Analysis

Problem Being Solved: Performance bottleneck where ensureWasmTrees was being called unnecessarily when the native engine already provided complete CFG and dataflow data. This caused redundant WASM parsing of every file even when native data was sufficient.

Why This Approach: The pre-check heuristic is logical and conservative - it only skips WASM parsing when it can definitively confirm that native data is complete for all required functionality (CFG blocks for functions/methods and dataflow data).

Risk Assessment: Low risk optimization with proper fallback behavior and accurate timing metrics.

Critical Concerns

❌ NULL SAFETY BUG: Line src/builder.js:1331 will crash with TypeError: Cannot read property 'filter' of undefined if symbols.definitions is null or undefined. This violates the established pattern used throughout the codebase where symbols.definitions accesses are guarded with || [].

Required Fix:

const fnDefs = (symbols.definitions || []).filter(
  (d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);

Final Recommendation

  • Rating: ⭐⭐☆☆☆ (2/5)
  • Action: REQUEST CHANGES
  • Reasoning: While the performance optimization logic is sound and the approach is well-designed, there is a critical null safety bug that must be fixed before merge. The inconsistency with established codebase patterns increases the risk of runtime crashes for edge cases involving partial symbol objects from extractors.

The fix is trivial but essential - all other aspects of the PR are acceptable.


@carlos-alm
Copy link
Contributor Author

Addressed the null guard comment in cb4a6c3 — added (symbols.definitions || []) guard.

@greptileai

src/builder.js Outdated
const fnDefs = (symbols.definitions || []).filter(
(d) => (d.kind === 'function' || d.kind === 'method') && d.line,
);
if (fnDefs.length > 0 && !fnDefs.every((d) => d.cfg?.blocks?.length)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty blocks array treated as missing native CFG

d.cfg?.blocks?.length evaluates to 0 (falsy) when the native engine returns cfg: { blocks: [] } for trivial functions (e.g., empty body or single return). This causes !fnDefs.every(d => d.cfg?.blocks?.length) to be true, spuriously triggering needsWasmTrees = true even though native CFG was provided.

In practice, any file containing a trivially simple function prevents the skip from firing, silently negating the optimization. cfg.js line 1151 also skips such functions, but that doesn't help here—the builder check should recognize empty-blocks as "native CFG was computed, just returned empty."

Consider using Array.isArray(d.cfg?.blocks) instead, which correctly treats an explicitly-set empty-blocks array as "native CFG was provided":

Suggested change
if (fnDefs.length > 0 && !fnDefs.every((d) => d.cfg?.blocks?.length)) {
if (fnDefs.length > 0 && !fnDefs.every((d) => Array.isArray(d.cfg?.blocks))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b6a7699 — changed the check from d.cfg?.blocks?.length (falsy for empty arrays) to d.cfg === null || Array.isArray(d.cfg?.blocks). This correctly treats cfg: { blocks: [] } as valid native CFG while preserving cfg: null for nodes that intentionally have no CFG (e.g. interface members).

@carlos-alm carlos-alm force-pushed the perf/skip-wasm-preparse-native branch from cb4a6c3 to bf8de1f Compare March 5, 2026 05:19
@carlos-alm
Copy link
Contributor Author

@greptileai

Before calling ensureWasmTrees, check whether native engine already
supplies CFG and dataflow data for all files. When it does, skip the
WASM pre-parse entirely — avoiding a full WASM parse of every file
on native builds where the data is already available.

Impact: 1 functions changed, 14 affected
Impact: 1 functions changed, 0 affected
d.cfg?.blocks?.length evaluates to 0 (falsy) when the native engine
returns cfg: { blocks: [] } for trivial functions, spuriously triggering
needsWasmTrees. Use Array.isArray(d.cfg?.blocks) instead, and preserve
d.cfg === null for nodes that intentionally have no CFG (e.g. interface
members).

Impact: 1 functions changed, 1 affected
Impact: 1 functions changed, 0 affected
@carlos-alm carlos-alm force-pushed the perf/skip-wasm-preparse-native branch from b6a7699 to f3700aa Compare March 5, 2026 09:28
@carlos-alm
Copy link
Contributor Author

@greptileai

@claude
Copy link

claude bot commented Mar 5, 2026

Claude finished @carlos-alm's task —— View job


Claude is reviewing PR #344

Todo List:

  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Final assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Performance optimization to avoid redundant WASM parsing when the native engine already provides complete CFG and dataflow data. This addresses unnecessary work in builds that use the native engine.

Why This Approach: The solution adds a pre-check loop that examines astComplexitySymbols to determine if any files actually need WASM trees before calling ensureWasmTrees. This is a logical optimization that builds on existing native engine capabilities.

Risk Assessment: Low risk - the optimization is conservative (only skips when data is definitely complete) and maintains backward compatibility.

Phase 2: Configuration Change Detection ✅

No problematic configuration changes detected. The changes are purely algorithmic optimizations in src/builder.js with appropriate changelog documentation. No linting rules, test timeouts, or build configurations were modified.

Phase 3: Test Integrity ✅

Test integrity is maintained. The author addressed previous review feedback by:

  • Adding appropriate TODO comment for constant exclusion (though I note the TODO reference wasn't found in the current codebase)
  • Test exclusions are justified due to a known native engine scoping bug
  • All 1431 tests continue to pass per PR description

Phase 4: Code Quality Assessment ✅

✅ NULL SAFETY FIXED: The critical null guard issue at src/builder.js:1331 has been properly addressed in commit cb4a6c3. The code now uses (symbols.definitions || []).filter(...) which matches the established pattern used throughout the codebase.

✅ EMPTY BLOCKS ARRAY HANDLING FIXED: The empty blocks array issue has been resolved in commit b6a7699. The logic now correctly uses d.cfg === null || Array.isArray(d.cfg?.blocks) which properly treats cfg: { blocks: [] } as valid native CFG data.

Other Code Quality Observations:

  • Logic and performance optimization approach is sound
  • Changelog is appropriately updated with all three changes from this release
  • TODO comment added for test exclusion removal (referencing >= 3.0.4)
  • Test plan is reasonable though manual verification steps remain unchecked

Root Cause Analysis

Problem Being Solved: Performance bottleneck where ensureWasmTrees was being called unnecessarily when the native engine already provided complete CFG and dataflow data. This caused redundant WASM parsing of every file even when native data was sufficient.

Why This Approach: The pre-check heuristic is logical and conservative - it only skips WASM parsing when it can definitively confirm that native data is complete for all required functionality (CFG blocks for functions/methods and dataflow data).

Risk Assessment: Low risk optimization with proper fallback behavior and accurate timing metrics.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-designed performance optimization that addresses a real bottleneck. The author responsibly addressed all critical review feedback, including the null safety bug and empty blocks array handling. The logic is sound, conservative, and maintains backward compatibility. The changelog is comprehensive and test exclusions are properly documented with TODO comments for future cleanup.

The PR demonstrates good engineering practices with thorough review response and maintains high code quality standards. Minor deduction for the manual test plan items remaining unchecked, but the optimization logic is solid and all automated tests pass.


carlos-alm added a commit that referenced this pull request Mar 5, 2026
@carlos-alm carlos-alm merged commit 457005f into main Mar 5, 2026
16 checks passed
@carlos-alm carlos-alm deleted the perf/skip-wasm-preparse-native branch March 5, 2026 19:44
@github-actions github-actions bot locked and limited conversation to collaborators Mar 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant