Skip to content

feat(native): port Verilog extractor to Rust#1107

Merged
carlos-alm merged 19 commits into
mainfrom
feat/1071-verilog-rust-extractor
May 14, 2026
Merged

feat(native): port Verilog extractor to Rust#1107
carlos-alm merged 19 commits into
mainfrom
feat/1071-verilog-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds tree-sitter-verilog dependency and a native Verilog/SystemVerilog extractor in crates/codegraph-core/src/extractors/verilog.rs.
  • Registers .v and .sv with LanguageKind::Verilog and the Rust file_collector, adds Verilog to NATIVE_SUPPORTED_EXTENSIONS on the JS side, and wires VERILOG_AST_CONFIG in helpers.rs (all empty lists — mirrors the WASM side, which has no verilog entry in AST_TYPE_MAPS, so both engines emit zero ast_nodes rows for Verilog and stay in parity).
  • Mirrors extractVerilogSymbols: module_declaration / interface_declaration / package_declaration / class_declaration definitions (extends emitted into classes), function_declaration and task_declaration with <parent>.<name> for nested decls, package_import_declaration (pkg::item / pkg::*) and include_compiler_directive imports, and module_instantiation as the call analogue.

Closes #1071

Test plan

  • cargo build --release -p codegraph-core (clean build)
  • cargo test -p codegraph-core --lib — 190/190
  • npx tree-sitter build --wasm node_modules/tree-sitter-verilog/ regenerates tree-sitter-verilog.wasm
  • npx vitest run tests/parsers/verilog.test.ts — 5/5
  • npx vitest run tests/parsers/native-drop-classification.test.ts — 13/13

Adds tree-sitter-verilog dependency and a native Verilog/SystemVerilog
extractor in crates/codegraph-core/src/extractors/verilog.rs, registers
.v / .sv with LanguageKind::Verilog and the Rust file_collector, and
adds Verilog to NATIVE_SUPPORTED_EXTENSIONS on the JS side.

Mirrors extractVerilogSymbols: module/interface/package/class
declarations, function and task declarations (parent-prefixed when
nested), package_import_declaration and include_compiler_directive
imports, and module_instantiation as call extraction.

VERILOG_AST_CONFIG in helpers.rs deliberately has all node-type lists
empty to mirror the WASM side, whose AST_TYPE_MAPS has no verilog
entry — so both engines emit zero ast_nodes rows for Verilog and stay
in parity.

Closes #1071
@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete ✅

Reviewing PR #1107: feat(native): port Verilog extractor to Rust

Root Cause Analysis

Problem Being Solved: This PR implements native Rust support for Verilog/SystemVerilog code extraction to maintain dual-engine parity. The codegraph tool follows a dual-engine architecture where both WASM and native Rust engines must produce identical results for supported languages.

Why This Approach: The implementation mirrors the existing WASM extractor (src/extractors/verilog.ts) precisely to ensure engine parity. This is a foundational requirement per CLAUDE.md - both engines must extract identical symbols, or it's considered a bug.

Risk Assessment: Low risk. This is purely additive functionality with comprehensive test coverage and strict parity requirements enforced by the architecture.

Backlog Compliance

  • Zero-dep: ✓ - Only adds tree-sitter-verilog crate dependency, builds on existing tree-sitter infrastructure
  • Foundation-aligned: ✓ - Fully compliant with all principles, extends parsing capabilities without changing core architecture
  • Problem-fit: 4/5 - Directly enables AI agents to understand Verilog codebases, reducing token waste and preventing hallucinated dependencies
  • Breaking: No - Purely additive, no existing API changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Assessment

Strengths:

  1. Perfect Parity Implementation: The Rust extractor precisely mirrors the WASM version:

    • Identical symbol extraction logic (modules, interfaces, packages, classes, functions, tasks)
    • Matching import handling (package_import_declaration, include_compiler_directive)
    • Same call extraction (module instantiations)
    • Consistent nested function qualification (parent.name)
  2. Comprehensive Integration:

    • Correctly added to all registry points (LanguageKind::Verilog, from_extension, extract_symbols)
    • Proper AST config with empty types (maintains WASM parity where no verilog entry exists in AST_TYPE_MAPS)
    • File collector updated with .v and .sv extensions
    • Native supported extensions updated correctly
  3. Thorough Testing:

    • Rust unit tests covering core extraction scenarios
    • TypeScript integration tests maintaining existing coverage
    • Drift guard test properly updated to include Verilog extensions
    • All tests demonstrate parity between engines
  4. Clean Architecture: Well-structured extractor following established patterns with proper error handling and defensive programming practices

Technical Details Verified:

  • Tree-sitter grammar integration properly configured ✅
  • AST node walking with VERILOG_AST_CONFIG (empty types matching WASM) ✅
  • Symbol extraction handles all major Verilog constructs ✅
  • Import resolution for both package imports and include directives ✅
  • Parent-child relationships correctly established for nested declarations ✅

No Issues Found: No security concerns, performance problems, or architectural violations detected.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality implementation that perfectly achieves its goal of adding native Verilog support with full engine parity. The code is well-structured, thoroughly tested, and follows all architectural patterns. Only minor deduction for being a straightforward feature addition rather than exceptional innovation.

This PR successfully extends codegraph's language coverage while maintaining the strict dual-engine parity requirements that are fundamental to the tool's reliability.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the Verilog/SystemVerilog extractor from WASM-only to a fully native Rust implementation, promoting .v/.sv files out of the JS-only fallback path and into the native pipeline on both engines simultaneously.

  • New crates/codegraph-core/src/extractors/verilog.rs implements all symbol kinds from the TS extractor: module/interface/package/class definitions, function and task declarations with <parent>.<name> qualification, package_import_declaration, `include directives, and module_instantiation as calls.
  • src/extractors/verilog.ts updated for parity: class name lookup via class_identifier structural scan, namedChild(0) for module instantiation, module_ansi_header added to port container recursion, and findVerilogParent extended to cover class-nested tasks/functions.
  • Infrastructure wired end-to-end: LanguageKind::Verilog added to parser_registry, .v/.sv appended to SUPPORTED_EXTENSIONS and NATIVE_SUPPORTED_EXTENSIONS, VERILOG_AST_CONFIG added, and change_detection tests updated to use synthetic extensions.

Confidence Score: 5/5

Safe to merge — the native Verilog extractor is well-tested, both engines are in parity, and all issues from previous review rounds have been resolved.

All definition, import, and call extraction paths are covered by unit tests in both Rust and TypeScript. The class-name structural lookup, ANSI-port recursion fix, namedChild(0) for module instantiation, and findVerilogParent class-qualification fix were all caught in prior review rounds and are now correctly implemented and tested.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/verilog.rs New Rust Verilog/SystemVerilog extractor mirroring the TS engine; all symbol kinds handled with correct structural fallbacks for the grammar's field-name-free nodes.
src/extractors/verilog.ts TS WASM extractor updated for parity: class name lookup, namedChild(0) for instantiation, module_ansi_header port recursion, and class-nested parent qualification.
crates/codegraph-core/src/parser_registry.rs Adds LanguageKind::Verilog with v/sv extension mapping and exhaustiveness counter bump 34 to 35.
crates/codegraph-core/src/file_collector.rs Adds .v and .sv to SUPPORTED_EXTENSIONS with documentation about the .v/Coq extension collision.
tests/parsers/verilog.test.ts Adds 5 new TS-side regression tests covering class definitions, ANSI-style ports, include directives, and class-nested task qualification.

Reviews (16): Last reviewed commit: "fix(extractors): restore Verilog WASM en..." | Re-trigger Greptile

Comment on lines +109 to +136
fn handle_class_decl(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
// The JS extractor calls `node.childForFieldName('name')`; tree-sitter-verilog
// exposes no fields on `class_declaration`, so this returns null in JS too —
// matching that behavior keeps native and WASM in lockstep.
let name = match named_child_text(node, "name", source) {
Some(n) => n.to_string(),
None => return,
};
symbols.definitions.push(Definition {
name: name.clone(),
kind: "class".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
complexity: None,
cfg: None,
children: None,
});

if let Some(superclass) = node.child_by_field_name("superclass") {
symbols.classes.push(ClassRelation {
name,
extends: Some(node_text(&superclass, source).to_string()),
implements: None,
line: start_line(node),
});
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handle_class_decl is unreachable dead code

The comment itself confirms that tree-sitter-verilog exposes no name field on class_declaration, so named_child_text(node, "name", source) always returns None, and the function always exits at the None => return arm. Neither the Definition for the class nor the ClassRelation for extends is ever emitted. The PR description lists class_declaration → kind class (extends emitted into classes) as a captured definition type, but that path is never taken. If intentional parity with the JS extractor is the goal (where childForFieldName('name') also returns null), it's worth at minimum adding a comment that class extraction is currently a no-op, so a future grammar update that adds the name field doesn't go unnoticed. Alternatively, falling back to find_decl_name (like handle_interface_decl and handle_package_decl do) would recover class names from the first simple_identifier child without breaking WASM parity once the grammar grows the field.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — strengthened the comment on handle_class_decl so it loudly flags that the handler is currently a no-op on the live tree-sitter-verilog grammar (no name or superclass field, so name lookup always returns None and the function exits early). The hook stays in place so a future grammar upgrade that adds those fields automatically picks up class definitions and extends relations. Going with parity-with-WASM rather than find_decl_name fallback to avoid silently diverging from the JS extractor before the grammar grows the field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up — went a step further per the @greptileai suggestion. Rather than leaving the handler as documented dead code, traced the actual tree-sitter-verilog grammar shape and now extract the class name via class_identifier > simple_identifier and the superclass via class_type > class_identifier > simple_identifier. Both engines now emit proper class definitions and extends relations. Added regression tests for classes with and without an extends clause in both tests/parsers/verilog.test.ts and crates/codegraph-core/src/extractors/verilog.rs (extracts_class_with_superclass, extracts_class_without_superclass). Pushed in commit 260ee4f.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean, complete fix. The implementation looks solid — a few observations:

What's working well:

  • find_class_name correctly falls through to the class_identifier structural scan when the field-name lookup misses (which it always will with this grammar), matching the JS findClassName exactly.
  • find_class_superclass mirrors findClassSuperclass faithfully: descend through class_type → class_identifier → simple_identifier, fall back to raw class_type text if the inner class_identifier isn't present.
  • Both regression tests (extracts_class_with_superclass, extracts_class_without_superclass) cover the two meaningful paths, and the TS suite has matching coverage.

One minor thing: find_class_superclass uses node.child(i)? with the ? operator in a for loop, which will silently exit the entire loop on the first None rather than just skipping that iteration (same pattern exists elsewhere in the file). It works fine in practice since child_count only returns valid indices, but if let Some(child) = node.child(i) would be more defensively correct and consistent with handle_package_import / handle_include_directive below it. Not a blocker.

The fix correctly resolves the dead-code issue and parity between engines is maintained. LGTM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up observation @greptileai. The ? operator pattern in the for-loops is fine for now because child_count() always returns valid indices and the function uses the same pattern across the file. Switching to if let Some(child) = ... is a defensiveness style choice and a sweep would touch several handlers — opening a follow-up issue to do it as one cohesive cleanup PR rather than mixing it into this one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally reasonable — the pattern is correct as-is, and a cohesive sweep is the right way to handle it. Happy to flag the same pattern if it shows up in the follow-up PR. This one is good to go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked the deferred ?-in-for-loop cleanup in #1118.

Comment on lines 36 to 40
"js", "jsx", "mjs", "cjs", "ts", "tsx", "d.ts", "py", "pyi", "go", "rs", "java", "cs", "rb",
"rake", "gemspec", "php", "phtml", "tf", "hcl", "c", "h", "cpp", "cc", "cxx", "hpp", "kt",
"kts", "swift", "scala", "sh", "bash", "ex", "exs", "lua", "dart", "zig", "hs", "ml", "mli",
"v", "sv",
];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 .v extension shared with Coq theorem prover

The .v extension is also the canonical extension for Coq source files. A repository that mixes Coq proofs and Verilog hardware files (or a pure Coq repo) would now have all Coq files routed to the tree-sitter-verilog parser, producing empty or garbage symbol output. The file collector and parser_registry have no way to disambiguate between the two. This may be an acceptable known limitation, but it is worth documenting (e.g., in a code comment alongside the extension list) so future contributors understand the trade-off rather than discovering it through mis-indexed codebases.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a documentation block above SUPPORTED_EXTENSIONS in file_collector.rs that explicitly calls out the .v collision with Coq source files, so Coq-heavy repos know they need to exclude *.v via the exclude config glob. Keeping the routing as Verilog since Verilog repos far outnumber Coq repos in our user base, but the trade-off is now discoverable from the source.

Comment on lines +182 to +204
fn handle_module_instantiation(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
// Tree-sitter-verilog exposes no field name on `module_instantiation`; the
// first child holds the module type being instantiated. The JS extractor
// uses `childForFieldName('type') || child(0)` — the field lookup never
// hits, so first-child fallback is the live path.
let name_node = node
.child_by_field_name("type")
.or_else(|| node.child(0));
let name_node = match name_node {
Some(n) => n,
None => return,
};
let name = node_text(&name_node, source).to_string();
if name.is_empty() {
return;
}
symbols.calls.push(Call {
name,
line: start_line(node),
dynamic: None,
receiver: None,
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 child(0) may return an anonymous grammar token on some node shapes

node.child(0) in tree-sitter returns any child at index 0, including anonymous tokens (punctuation, keywords). For the majority of module_instantiation shapes this is harmless because the module type identifier is the first child. However, if the grammar ever emits a leading anonymous node (e.g., a parameter-override token like #) before the module identifier on a non-ANSI instantiation form, the call name will be that punctuation character instead of the module type. The named-node variant node.named_child(0) (which skips anonymous tokens) would be safer and more defensive here, and would still mirror the JS extractor's first-child fallback intent.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — switched the fallback from node.child(0) to node.named_child(0) so any anonymous grammar tokens (parameter-override #, keywords) leading the module_instantiation cannot leak into the call name. Added a comment documenting the rationale. All 6 verilog unit tests still pass.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

47 functions changed30 callers affected across 3 files

  • detect_removed_skips_unsupported_extensions in crates/codegraph-core/src/change_detection.rs:776 (0 transitive callers)
  • extract_symbols_with_opts in crates/codegraph-core/src/extractors/mod.rs:69 (1 transitive callers)
  • VerilogExtractor.extract in crates/codegraph-core/src/extractors/verilog.rs:32 (0 transitive callers)
  • match_verilog_node in crates/codegraph-core/src/extractors/verilog.rs:40 (0 transitive callers)
  • handle_module_decl in crates/codegraph-core/src/extractors/verilog.rs:57 (1 transitive callers)
  • handle_interface_decl in crates/codegraph-core/src/extractors/verilog.rs:75 (1 transitive callers)
  • handle_package_decl in crates/codegraph-core/src/extractors/verilog.rs:92 (1 transitive callers)
  • handle_class_decl in crates/codegraph-core/src/extractors/verilog.rs:109 (1 transitive callers)
  • find_class_name in crates/codegraph-core/src/extractors/verilog.rs:146 (5 transitive callers)
  • find_class_superclass in crates/codegraph-core/src/extractors/verilog.rs:162 (2 transitive callers)
  • handle_function_decl in crates/codegraph-core/src/extractors/verilog.rs:175 (1 transitive callers)
  • handle_task_decl in crates/codegraph-core/src/extractors/verilog.rs:197 (1 transitive callers)
  • handle_module_instantiation in crates/codegraph-core/src/extractors/verilog.rs:219 (1 transitive callers)
  • handle_package_import in crates/codegraph-core/src/extractors/verilog.rs:245 (1 transitive callers)
  • handle_include_directive in crates/codegraph-core/src/extractors/verilog.rs:267 (1 transitive callers)
  • find_module_name in crates/codegraph-core/src/extractors/verilog.rs:298 (5 transitive callers)
  • find_decl_name in crates/codegraph-core/src/extractors/verilog.rs:320 (7 transitive callers)
  • find_function_or_task_name in crates/codegraph-core/src/extractors/verilog.rs:337 (3 transitive callers)
  • extract_identifier_text in crates/codegraph-core/src/extractors/verilog.rs:364 (8 transitive callers)
  • find_verilog_parent in crates/codegraph-core/src/extractors/verilog.rs:383 (3 transitive callers)

- handle_class_decl: strengthen comment so the no-op behavior on the
  current tree-sitter-verilog grammar is loud and discoverable for
  future grammar upgrades.
- handle_module_instantiation: switch child(0) to named_child(0) so
  any anonymous grammar tokens (e.g. parameter-override '#') leading
  the module type cannot leak into call names.
- file_collector::SUPPORTED_EXTENSIONS: document .v conflict with Coq
  theorem-prover source files so Coq-heavy repos know to exclude *.v
  via config.
- native-drop-classification: drop expected count to 9 to reflect the
  merge with main (.clj already removed, .v removed by this PR).
#1107)

Adding native Verilog (#1107) brings 4 .v resolution-benchmark fixtures
into the incremental benchmark sweep (which runs against the repo root).
tree-sitter-verilog is a large grammar so each .v file costs noticeably
more to parse than other fixture languages — pushing the native
fullBuildMs from the 3.10.0 baseline of 1959ms to ~2809ms (+43%).

This is a structural one-time cost of supporting the language, not a
regression in shared code paths. Following the existing pattern in
KNOWN_REGRESSIONS (3.9.6:* / 3.10.0:* entries) with a documented
rationale so a future PR isn't blocked by the bump.
The tree-sitter-verilog grammar exposes no field names on
class_declaration, so childForFieldName('name') and
childForFieldName('superclass') always returned null in both engines.
The previous workaround left class extraction as documented dead code
in both extractors.

Per the CLAUDE.md principle 'Never document bugs as expected behavior',
fix the root cause by descending through the grammar's actual structure:

- Class name lives under class_identifier > simple_identifier
- Superclass appears as a class_type child with the same wrapping

Both engines now emit identical class Definitions and ClassRelation
extends edges. Added matching Rust and TypeScript regression tests
covering classes with and without an extends clause.
Comment on lines +376 to +392
fn find_verilog_parent(node: &Node, source: &[u8]) -> Option<String> {
const PARENT_KINDS: &[&str] = &[
"module_declaration",
"interface_declaration",
"package_declaration",
"class_declaration",
];
let mut current = node.parent();
while let Some(parent) = current {
if PARENT_KINDS.contains(&parent.kind()) {
return find_decl_name(&parent, source)
.or_else(|| find_module_name(&parent, source));
}
current = parent.parent();
}
None
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 find_verilog_parent silently fails for class_declaration parents

class_declaration is listed in PARENT_KINDS, but the name resolution chain — find_decl_name(&parent, source).or_else(|| find_module_name(&parent, source)) — only searches for bare simple_identifier / identifier children. For class_declaration, the name is wrapped under a class_identifier > simple_identifier chain (exactly the grammar quirk that find_class_name was introduced in this PR to handle). Neither find_decl_name nor find_module_name descends into class_identifier, so both return None, and any function or task nested inside a SystemVerilog class gets a bare name (e.g., check_range) instead of the qualified ClassName.check_range. Adding .or_else(|| find_class_name(&parent, source)) to the chain would close the gap; a symmetric fix in findVerilogParent (JS) would keep parity.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — extended find_verilog_parent's name-resolution chain to also try find_class_name, so tasks and functions nested in a SystemVerilog class are now qualified as ClassName.task rather than the bare name. Mirrored the same fix in findVerilogParent (JS) for engine parity. Added regression tests in both engines (qualifies_task_nested_in_class_with_class_name / qualifies tasks nested inside a class with the class name). Pushed in commit 497da19.

find_verilog_parent only consulted find_decl_name and find_module_name,
neither of which descends into the class_identifier wrapper that
tree-sitter-verilog uses for class names. As a result, any task or
function declared inside a SystemVerilog class lost its qualifier and
surfaced as a bare name instead of ClassName.task.

Extend the parent-name resolution chain to also try find_class_name,
mirroring the same logic in the WASM extractor for engine parity.
Added regression tests in both engines covering the class > task case.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Merges Gleam (#1105, landed in main) with Verilog (this PR):
- parser_registry.rs: both Gleam and Verilog in LanguageKind::all(),
  EXPECTED_LEN=33.
- native-drop-classification.test.ts: fix toBe(4) → toBe(3) after both
  .gleam and .v were dropped from the WASM-only-by-default input list
  (now both are native).

docs check acknowledged — README/CLAUDE/ROADMAP already document both
languages.
…1107)

The merge-conflict resolution commit fixed parser_registry.rs but the
test edit was lost from the same merge commit. Both Gleam (#1105) and
Verilog (#1107) are now natively supported, so the WASM-only test
fixture should only count .fs / .fsx / .m as unsupported (3, not 4).

docs check acknowledged — README/CLAUDE/ROADMAP already cover both
languages.
Second merge picks up Objective-C (#1106) which landed after the
previous merge. Combines:
- parser_registry.rs: ObjC + Gleam + Verilog all in LanguageKind::all(),
  EXPECTED_LEN bumped to 34.
- file_collector.rs: combined collision notes for .v (Coq), .m
  (MATLAB), and .h.
- native-drop-classification.test.ts: rotated WASM-only fixture to
  .fs / .fsx / .fsi now that .m and .v are both natively supported.

docs check acknowledged — README/CLAUDE/ROADMAP already document all
three languages.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the remaining Greptile suggestion from the summary body: replaced the unreachable unwrap_or(text) / ?? text fallback in handle_package_import (Rust) and handlePackageImport (JS) with unwrap_or("") / ?? '' to make the intent self-documenting. splitn(2, "::").next() and String.split('::') always yield at least one element, so the first-element fallback was dead code. Both engines verified with the existing parser tests (9 Rust + 8 TS). Pushed in commit 86f1d86.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm added a commit that referenced this pull request May 14, 2026
…ludes (#1107)

TS extractPorts was missing module_ansi_header from its container recursion and was not descending into port_identifier, so ANSI-style modules (module top(input clk, …)) returned no port children in the WASM engine while the native engine extracted them correctly. handleIncludeDirective was also missing double_quoted_string, which would silently drop \`include imports when the grammar emits that node kind. Added regression tests for ANSI port extraction and include directive imports.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Fixed both parity issues from the latest Greptile review:

  1. TS extractPorts missing module_ansi_header and port_identifier lookups. Without these, ANSI-style modules (module top(input clk, output reg q);) returned no port children in the WASM engine. Added module_ansi_header to the container recursion list and extended the name-node fallback chain to port_identifier > simple_identifier, matching the Rust collect_ports exactly.

  2. TS handleIncludeDirective missing double_quoted_string node kind. Added it alongside string_literal / quoted_string for full parity with the Rust handler.

Added two regression tests to tests/parsers/verilog.test.tsextracts ports from ANSI-style modules and extracts include compiler directives as imports — so these can't silently regress again. All 10 verilog tests + 297 Rust unit tests + 474 vitest parser tests pass. Pushed in commit 742f790.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

…ludes (#1107)

TS `extractPorts` was missing `module_ansi_header` from its container
recursion and was not descending into `port_identifier`, so ANSI-style
modules (`module top(input clk, …)`) returned no port children in the
WASM engine while the native engine extracted them correctly.
`handleIncludeDirective` was also missing `double_quoted_string`, which
would silently drop `\`include` imports when the grammar emits that node
kind. Added regression tests for ANSI port extraction and include
directive imports.
@carlos-alm carlos-alm force-pushed the feat/1071-verilog-rust-extractor branch from 742f790 to f091d3f Compare May 14, 2026 11:00
@carlos-alm carlos-alm merged commit 2ba882a into main May 14, 2026
37 of 39 checks passed
@carlos-alm carlos-alm deleted the feat/1071-verilog-rust-extractor branch May 14, 2026 21:00
@github-actions github-actions Bot locked and limited conversation to collaborators May 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rust engine parity: port the 11 remaining JS-only language extractors

1 participant