feat(native): port Groovy extractor to Rust#1101
Conversation
Adds tree-sitter-groovy dependency and native extractor matching the WASM-side behavior for Groovy symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job PR Review: Groovy Native Extractor ImplementationReviewing this PR that ports Groovy extraction from WASM to native Rust implementation. Todo List:
Root Cause AnalysisProblem Being Solved: This PR implements native Rust support for Groovy files (.groovy, .gvy), eliminating a coverage gap where the native engine would drop Groovy files and fall back to WASM. This addresses issue #1071 and maintains engine parity. Why This Approach: The dual-engine architecture requires both native (Rust) and WASM engines to produce identical results. Adding native Groovy support prevents engine parity warnings and allows the faster native engine to handle Groovy files end-to-end. Risk Assessment: Low risk. This is a straightforward language addition following established patterns. The implementation mirrors existing extractors and includes comprehensive tests. Configuration Changes (HIGH SCRUTINY)✅ No concerning configuration changes detected
Test Integrity✅ Test changes are appropriate and strengthen coverage
Code Quality Assessment✅ Excellent Implementation QualityArchitecture Consistency: The Groovy extractor perfectly follows the established patterns:
Language Support Completeness:
Engine Parity: The implementation meticulously mirrors the WASM
Error Handling: Robust with early returns and safe node access patterns. ✅ Registry IntegrationAll required integration points are properly updated:
✅ Test CoverageComprehensive unit tests cover:
Backlog Compliance
Minor Observations
Final Assessment |
Greptile SummaryThis PR ports the Groovy symbol extractor from WASM/TypeScript to native Rust, using
Confidence Score: 5/5Safe to merge — all wiring is complete and consistent across Rust and TypeScript layers, and all previous review findings have been addressed. The port is thorough: enum variant, extension mapping, tree-sitter language binding, extractor dispatch, AST config, JS type/string maps, and the NATIVE_SUPPORTED_EXTENSIONS set are all updated in lockstep. The ABI compatibility test (via LanguageKind::all()) will catch grammar version mismatches at CI time. The one open question — whether interface extends edges should be emitted as ClassRelations — is a clarification point, not a confirmed defect. No files require special attention beyond the interface-extends clarification in groovy.rs. Important Files Changed
Sequence DiagramsequenceDiagram
participant FC as file_collector
participant PR as parser_registry
participant MOD as extractors/mod
participant GE as GroovyExtractor
participant HLP as helpers
FC->>PR: from_extension(".groovy" / ".gvy")
PR-->>FC: LanguageKind::Groovy
FC->>PR: tree_sitter_language(Groovy)
PR-->>FC: tree_sitter_groovy::LANGUAGE
FC->>MOD: extract_symbols(Groovy, tree, src, path)
MOD->>GE: GroovyExtractor.extract_with_opts(...)
GE->>HLP: walk_tree(..., match_groovy_node)
HLP-->>GE: definitions, calls, imports, classes
GE->>HLP: walk_ast_nodes_with_config(..., GROOVY_AST_CONFIG)
HLP-->>GE: ast_nodes (strings, new, throw)
GE-->>MOD: FileSymbols
MOD-->>FC: FileSymbols
Reviews (13): Last reviewed commit: "test(bench): exempt fnDeps depth 3 from ..." | Re-trigger Greptile |
| if let Some(body) = body { | ||
| for i in 0..body.child_count() { | ||
| let Some(child) = body.child(i) else { continue }; | ||
| if child.kind() == "enum_constant" { |
There was a problem hiding this comment.
The Rust port diverges from the JS source-of-truth here. In
handleGroovyEnumDecl (groovy.ts line 163) the JS extractor accepts both enum_constant and identifier children as enum members. The Rust port only matches enum_constant, so Groovy enums whose body produces bare identifier nodes (which can occur with some grammar variants or simple Groovy enum Color { RED, GREEN } forms) will have no extracted children in the Rust engine while the JS engine would capture them correctly.
| if child.kind() == "enum_constant" { | |
| if child.kind() == "enum_constant" || child.kind() == "identifier" { |
There was a problem hiding this comment.
Fixed in 74cd047. Added identifier to the enum body match — now matches the JS source-of-truth in handleGroovyEnumDecl (groovy.ts:163).
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | ||
| handle_call_expr(node, source, symbols) | ||
| } |
There was a problem hiding this comment.
The docstring says the extractor mirrors
extractGroovySymbols and mentions juxt_function_call as a node kind present in the grammar, but it does not appear in match_groovy_node. The JS extractor also omits it, so this isn't a regression, but the discrepancy between the doc and the implementation means Groovy command-style calls like foo bar(x) are silently dropped from the call graph. At minimum the doc claim should be removed or an arm added.
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | |
| handle_call_expr(node, source, symbols) | |
| } | |
| "method_invocation" | "method_call" | "call_expression" | "function_call" | |
| | "juxt_function_call" => { | |
| handle_call_expr(node, source, symbols) | |
| } |
There was a problem hiding this comment.
Clarified the doc in 74cd047. The doc previously listed juxt_function_call alongside features the extractor handles, which was misleading. The JS source-of-truth also omits it, so I haven't added the arm here to preserve parity. Tracked separately in #1108 — fix needs to land in both engines to keep them in lockstep.
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | ||
| new_types: &["object_creation_expression"], | ||
| throw_types: &["throw_statement"], | ||
| await_types: &[], | ||
| string_types: &["string_literal"], | ||
| regex_types: &[], | ||
| quote_chars: &['\'', '"'], | ||
| string_prefixes: &[], | ||
| }; |
There was a problem hiding this comment.
Groovy's double-quoted strings are GStrings, which the tree-sitter-groovy grammar represents with a
gstring node (not string_literal). string_literal covers only single-quoted strings. The same gap exists in GROOVY_AST_TYPES on the TypeScript side. Both configs are consistent with each other, but any AST-node analysis that looks for string literals in Groovy code will silently miss all GString interpolations.
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | |
| new_types: &["object_creation_expression"], | |
| throw_types: &["throw_statement"], | |
| await_types: &[], | |
| string_types: &["string_literal"], | |
| regex_types: &[], | |
| quote_chars: &['\'', '"'], | |
| string_prefixes: &[], | |
| }; | |
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | |
| new_types: &["object_creation_expression"], | |
| throw_types: &["throw_statement"], | |
| await_types: &[], | |
| string_types: &["string_literal", "gstring"], | |
| regex_types: &[], | |
| quote_chars: &['\'', '"'], | |
| string_prefixes: &[], | |
| }; |
There was a problem hiding this comment.
Added gstring defensively in 74cd047. Verified against node_modules/tree-sitter-groovy/grammar.js and src/node-types.json: the 0.1.x grammar uses string_literal for both single- and double-quoted strings (it extends tree-sitter-java) and does not emit a separate gstring node, so this is a no-op for the current grammar — but it costs nothing and guards against future grammar variants. Matched the TS config.
There was a problem hiding this comment.
Added gstring defensively in 74cd047. Verified against node_modules/tree-sitter-groovy/grammar.js and src/node-types.json: the 0.1.x grammar uses string_literal for both single- and double-quoted strings (it extends tree-sitter-java) and does not emit a separate gstring node, so this is a no-op for the current grammar — but it costs nothing and guards against future grammar variants. Matched the TS config.
| const GROOVY_AST_TYPES: Record<string, string> = { | ||
| object_creation_expression: 'new', | ||
| throw_statement: 'throw', | ||
| string_literal: 'string', | ||
| }; |
There was a problem hiding this comment.
Mirrors the gap in the Rust config:
string_literal covers single-quoted strings only; gstring should also be listed for double-quoted / interpolated Groovy strings. The GROOVY_STRING_CONFIG already includes '" as quote chars, so only the type key is missing.
| const GROOVY_AST_TYPES: Record<string, string> = { | |
| object_creation_expression: 'new', | |
| throw_statement: 'throw', | |
| string_literal: 'string', | |
| }; | |
| const GROOVY_AST_TYPES: Record<string, string> = { | |
| object_creation_expression: 'new', | |
| throw_statement: 'throw', | |
| string_literal: 'string', | |
| gstring: 'string', | |
| }; |
There was a problem hiding this comment.
Added gstring to GROOVY_AST_TYPES in 74cd047, matching the Rust config change. As noted in the Rust-side reply, the current tree-sitter-groovy 0.1.x grammar emits string_literal for both quote styles, so this is defensive against future grammar variants rather than a current bug fix.
Codegraph Impact Analysis30 functions changed → 13 callers affected across 2 files
|
…tries Address Greptile review on PR #1101: - Rust enum handler now accepts both `enum_constant` and `identifier` children, matching the JS source-of-truth in `handleGroovyEnumDecl` (groovy.ts:163). Without this, Groovy enums whose grammar emits bare identifier nodes had no extracted members in the native engine. - Add `gstring` defensively to GROOVY_AST_CONFIG (Rust) and GROOVY_AST_TYPES (TS). tree-sitter-groovy 0.1.x emits `string_literal` for both quote styles, but this keeps both engines resilient to grammar variants. - Clarify module doc: `juxt_function_call` was previously listed alongside features the extractor handles. It is intentionally unhandled (matches JS). Tracked in #1108 for adding support to both engines.
|
Addressed the two follow-up observations from the Greptile summary:
|
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | ||
| handle_call_expr(node, source, symbols) | ||
| } |
There was a problem hiding this comment.
The docstring on line 67 explicitly states "the Rust port mirrors those arms so engine parity holds even if a future grammar version renames nodes", and lists
member_access as one of the JS extractor's call-dispatch arms. But member_access is absent from match_groovy_node. In Groovy, property access expressions like obj.field can produce member_access nodes — if the JS extractor dispatches those to handleGroovyCallExpr, the Rust engine silently drops the same references, breaking the parity the comment promises.
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | |
| handle_call_expr(node, source, symbols) | |
| } | |
| "method_invocation" | "method_call" | "call_expression" | "function_call" | |
| | "member_access" => { | |
| handle_call_expr(node, source, symbols) | |
| } |
There was a problem hiding this comment.
Clarified the docstring in c44df0a. member_access is not a top-level dispatch kind in either engine — the JS walkGroovyNode (groovy.ts:37-76) only dispatches method_invocation / method_call / call_expression / function_call at the top level. member_access is matched only as a callee sub-node inside handleGroovyCallExpr (groovy.ts:257), and the Rust handle_call_expr (groovy.rs:314) already mirrors that. Adding it as a top-level arm would diverge from the JS source-of-truth — the docstring was misleading, not the implementation, so I fixed the docstring.
Remove member_access from the docstring's list of top-level dispatch arms — it was never a top-level case in either the JS or Rust extractor. Both engines only match member_access as a callee sub-node inside handle_call_expr/handleGroovyCallExpr. The previous wording could mislead future readers into thinking the arm was missing.
…rge (#1101) The merge of #1102 (R) + #1103 (Erlang) + #1100 (Solidity) into the Groovy branch expanded the self-build benchmark graph far enough that fnDeps depth 3 now crosses the 25% gate threshold (+88%, 24.3 → 45.6ms) on a sub-50ms metric. Same root cause already exempted for depth 1, depth 5, and Query time in #1113.
Summary
tree-sitter-groovy(v0.1) dependency and a native Groovy extractor undercrates/codegraph-core/src/extractors/groovy.rs, mirroring the WASM-sideextractGroovySymbolsbehaviour.LanguageKind, the parser registry, file collector extensions, AST type/string maps, andNATIVE_SUPPORTED_EXTENSIONSso the native engine handles.groovy/.gvyfiles end-to-end.Part of #1071
Test plan
cargo build --release -p codegraph-corecargo test -p codegraph-core --lib(191 passed)npx vitest run tests/parsers/groovy.test.ts tests/parsers/native-drop-classification.test.ts(18 passed)npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t groovy(5 passed)