Skip to content

feat(native): port Gleam extractor to Rust#1105

Merged
carlos-alm merged 10 commits into
mainfrom
feat/1071-gleam-rust-extractor
May 14, 2026
Merged

feat(native): port Gleam extractor to Rust#1105
carlos-alm merged 10 commits into
mainfrom
feat/1071-gleam-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds tree-sitter-gleam dependency and a native Gleam extractor in crates/codegraph-core/src/extractors/gleam.rs.
  • Registers .gleam with LanguageKind::Gleam and the Rust file_collector, adds Gleam to NATIVE_SUPPORTED_EXTENSIONS on the JS side, and wires GLEAM_AST_TYPES / GLEAM_AST_CONFIG on both the native and JS sides so the two engines extract identical ast_nodes for Gleam source.
  • Mirrors extractGleamSymbols: module-level function definitions as function (with parameter children), type definitions as type/record/enum (mapped from the Gleam node kind), constants as variable, import declarations, and function-application call extraction.

Closes #1071

Test plan

  • cargo build --release -p codegraph-core (clean build)
  • cargo test -p codegraph-core --lib — 193/193
  • npx tree-sitter build --wasm regenerates tree-sitter-gleam.wasm
  • npx vitest run tests/parsers/gleam.test.ts — 4/4
  • npx vitest run tests/parsers/native-drop-classification.test.ts — 13/13

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: feat(native): port Gleam extractor to Rust

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity verification - ensure tests aren't weakened
  • Phase 4: Code quality assessment - review new Gleam extractor code
  • Phase 5: Backlog compliance evaluation - rate against BACKLOG.md criteria
  • Provide structured final assessment with rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR implements native Gleam extractor support to complete the dual-engine architecture. Issue #1071 requested Gleam support, and this adds the missing native Rust implementation to match the existing WASM extractor.

Why This Approach: The approach follows the established dual-engine pattern - both native (Rust) and WASM engines must produce identical results for all supported languages. The PR properly mirrors the existing extractGleamSymbols WASM implementation.

Risk Assessment: Low risk - this is additive functionality that doesn't modify existing behavior for other languages.


Phase 2: Configuration Change Detection ✅

Configuration Changes Analyzed:

  • Cargo.toml: Only adds tree-sitter-gleam = "1.0.0" dependency - legitimate language support addition
  • package-lock.json: No new runtime dependencies, only lockfile update
  • No linting/build configuration changes detected
  • No timeout or coverage threshold modifications

Assessment: Clean addition - no concerning configuration relaxation found.


Phase 3: Test Integrity Verification ✅

Test Modifications Analyzed:

  • gleam.test.ts: New test file with proper coverage of Gleam extraction (functions, types, imports, calls)
  • native-drop-classification.test.ts: Clean modification - removes Gleam from unsupported list (expected with native support)
  • No test weakening - assertions remain strict, no timeouts increased, no coverage thresholds lowered
  • Test approach mirrors existing pattern - matches structure of other language tests

Assessment: Test changes are appropriate and strengthen coverage.


Phase 4: Code Quality Assessment ✅

Native Extractor Implementation (gleam.rs:1-432):

Strengths:

  • Perfect parity with WASM extractor - mirrors src/extractors/gleam.ts structure and logic
  • Comprehensive symbol extraction - functions, external functions, type definitions, type aliases, constants, imports, calls
  • Proper AST node handling - uses tree-sitter field access with fallback patterns
  • Complete test coverage - 7 unit tests covering all extraction scenarios
  • Follows established Rust patterns - matches other extractors in crates/codegraph-core/src/extractors/
  • Correct AST config integration - properly wires GLEAM_AST_CONFIG for engine parity

Code Quality Details:

  • Function extraction (lines 32-53): Handles parameters as children, includes complexity/CFG computation
  • Type definitions (lines 76-141): Extracts constructors as property children, handles both inline and grouped constructors
  • Import resolution (lines 185-241): Supports unqualified imports, aliases, defaults to module name
  • Call extraction (lines 243-286): Handles both direct calls and qualified module calls with receivers
  • Parameter extraction (lines 288-327): Comprehensive parameter handling with fallback patterns

Integration Points:

  • Registry integration - properly added to LanguageKind::Gleam and from_extension
  • File collector support - .gleam added to SUPPORTED_EXTENSIONS
  • AST config mirroring - both native and JS sides have matching GLEAM_AST_TYPES/GLEAM_AST_CONFIG

Assessment: High-quality implementation that correctly ports WASM functionality to native Rust.


Phase 5: Backlog Compliance Evaluation ✅

Root Cause Analysis

Problem Being Solved: Completes dual-engine architecture parity for Gleam language support (native Rust + WASM)
Why This Approach: Follows established dual-engine pattern - both engines must produce identical results
Risk Assessment: Low risk additive functionality

Backlog Compliance

  • Zero-dep: ✓ — adds tree-sitter-gleam which is a tree-sitter grammar (existing category)
  • Foundation-aligned: ✓ — maintains dual-engine parity (P1), builds on tree-sitter/SQLite (P4), embeddable-first (P5)
  • Problem-fit: 4/5 — directly enables AI agents to work with Gleam codebases, prevents "unsupported language" gaps
  • Breaking: No — purely additive, no existing API/CLI changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a clean implementation that follows established patterns.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent implementation that completes dual-engine architecture for Gleam. Perfect parity between native and WASM extractors, comprehensive test coverage, follows all established patterns. Only missing the 5th star due to being incremental language support rather than architectural innovation.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the Gleam symbol extractor from the WASM/JS engine to native Rust, adding tree-sitter-gleam as a Cargo dependency and wiring LanguageKind::Gleam throughout the parser registry, file collector, and change-detection layer. It also back-patches the JS extractor with two child(0)namedChild(0) fixes that were identified during review.

  • New GleamExtractor (gleam.rs): faithfully mirrors extractGleamSymbols — functions, external functions, type definitions with constructors, type aliases, constants, imports (with qualified/unqualified/aliased variants), and qualified/unqualified call extraction.
  • Registration plumbing: .gleam added to SUPPORTED_EXTENSIONS, LanguageKind::Gleam added to parser_registry, GLEAM_AST_CONFIG / GLEAM_AST_TYPES / GLEAM_STRING_CONFIG kept in sync across both engines, and .gleam promoted from WASM-only to NATIVE_SUPPORTED_EXTENSIONS.
  • Test hygiene: unit tests cover all major node types; native-drop-classification tests updated to remove .gleam from the WASM-only bucket.

Confidence Score: 5/5

Safe to merge — the Rust extractor faithfully mirrors the JS extractor, previous review issues were all addressed in follow-up commits, and all tests pass.

The porting work is thorough: node-kind dispatch, named-child fallbacks, unqualified/aliased import handling, qualified call receiver extraction, and AST config parity are all correctly reflected in the Rust implementation. The two child(0) regressions in the JS extractor were also fixed in the same PR. Change-detection and file-collector plumbing updates are correct and consistent.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/gleam.rs New 446-line Rust extractor mirroring the JS extractGleamSymbols; handles functions, types, constants, imports, and calls with correct named-child fallbacks throughout.
crates/codegraph-core/src/parser_registry.rs Adds LanguageKind::Gleam with correct extension, language string, tree-sitter language binding, and updates the EXPECTED_LEN sentinel.
crates/codegraph-core/src/file_collector.rs Adds 'gleam' to SUPPORTED_EXTENSIONS and updates the comment to remove .gleam from the WASM-only example list.
src/extractors/gleam.ts Back-patches two child(0) to namedChild(0) fixes in handleCall and the record fallback to maintain parity with the Rust extractor.
crates/codegraph-core/src/extractors/helpers.rs Adds GLEAM_AST_CONFIG matching the JS GLEAM_AST_TYPES and GLEAM_STRING_CONFIG: double-quoted strings only, no new/throw/await/regex types.
src/ast-analysis/rules/index.ts Adds GLEAM_AST_TYPES and GLEAM_STRING_CONFIG, registered under the 'gleam' key in AST_TYPE_MAPS and AST_STRING_CONFIGS.
tests/parsers/native-drop-classification.test.ts Updates drop-classification tests to move .gleam out of the WASM-only bucket and add an assertion that NATIVE_SUPPORTED_EXTENSIONS now includes .gleam.
crates/codegraph-core/src/change_detection.rs Comment and test fixture updated to remove .gleam from the WASM-only skipped-extensions list; no logic changes.

Reviews (11): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment on lines +62 to +76
};

symbols.definitions.push(Definition {
name: node_text(&name_node, source).to_string(),
kind: "function".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
complexity: None,
cfg: None,
children: None,
});
}

fn handle_type_definition(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handle_external_function drops parameter children

handle_function extracts parameters and stores them as children, but handle_external_function hard-codes children: None. External Gleam functions still have a full parameter list in their signatures, so callers that rely on children to understand arity or parameter names will get nothing for external functions. This creates a silent asymmetry: two functions with identical signatures produce different output depending on whether they are external.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking as a follow-up in #1110. Both engines (native Rust and WASM/JS) currently drop the parameter list for external functions — the Rust port faithfully mirrors existing WASM/JS behavior to keep dual-engine parity, but the silent asymmetry between regular and external Gleam functions is real and worth fixing in both engines together. Deferred to keep this PR scoped to "port to native" rather than "port + change extraction semantics across engines".

Comment on lines +264 to +266
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.child(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 child(0) fallback for record may pick up anonymous punctuation nodes

func_node.child(0) returns the first child regardless of whether it is named or anonymous. In the Gleam tree-sitter grammar a field_access node's children include the . punctuation token, so the fallback could capture . as the receiver text instead of the module identifier. Prefer func_node.named_child(0) to skip anonymous punctuation tokens.

Suggested change
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.child(0));
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.named_child(0));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b971244. Replaced the func_node.child(0) fallback for the record field with func_node.named_child(0) to skip anonymous punctuation tokens. Applied the same fix to the JS extractor (src/extractors/gleam.ts) to keep dual-engine parity.

}

fn handle_call(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
let func_node = match node.child_by_field_name("function").or_else(|| node.child(0)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handle_call first-child fallback also uses unnamed child

Same concern at the top-level func_node selection: node.child(0) can return anonymous tokens. node.named_child(0) is consistent with how other extractors resolve this.

Suggested change
let func_node = match node.child_by_field_name("function").or_else(|| node.child(0)) {
let func_node = match node.child_by_field_name("function").or_else(|| node.named_child(0)) {

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b971244. Replaced node.child(0) with node.named_child(0) in handle_call to skip anonymous punctuation tokens. Same fix applied to the JS extractor to keep dual-engine parity.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

30 functions changed15 callers affected across 3 files

  • detect_removed_skips_unsupported_extensions in crates/codegraph-core/src/change_detection.rs:776 (0 transitive callers)
  • GleamExtractor.extract in crates/codegraph-core/src/extractors/gleam.rs:11 (0 transitive callers)
  • match_gleam_node in crates/codegraph-core/src/extractors/gleam.rs:19 (0 transitive callers)
  • handle_function in crates/codegraph-core/src/extractors/gleam.rs:32 (1 transitive callers)
  • handle_external_function in crates/codegraph-core/src/extractors/gleam.rs:55 (1 transitive callers)
  • handle_type_definition in crates/codegraph-core/src/extractors/gleam.rs:76 (1 transitive callers)
  • handle_type_alias in crates/codegraph-core/src/extractors/gleam.rs:143 (1 transitive callers)
  • handle_constant in crates/codegraph-core/src/extractors/gleam.rs:164 (1 transitive callers)
  • handle_import in crates/codegraph-core/src/extractors/gleam.rs:185 (1 transitive callers)
  • handle_call in crates/codegraph-core/src/extractors/gleam.rs:247 (1 transitive callers)
  • extract_params in crates/codegraph-core/src/extractors/gleam.rs:295 (2 transitive callers)
  • parse_gleam in crates/codegraph-core/src/extractors/gleam.rs:341 (10 transitive callers)
  • extracts_public_function in crates/codegraph-core/src/extractors/gleam.rs:351 (0 transitive callers)
  • extracts_private_function in crates/codegraph-core/src/extractors/gleam.rs:360 (0 transitive callers)
  • extracts_qualified_call_as_receiver_name in crates/codegraph-core/src/extractors/gleam.rs:366 (0 transitive callers)
  • extracts_same_file_call in crates/codegraph-core/src/extractors/gleam.rs:378 (0 transitive callers)
  • extracts_import_module in crates/codegraph-core/src/extractors/gleam.rs:390 (0 transitive callers)
  • extracts_unqualified_imports in crates/codegraph-core/src/extractors/gleam.rs:398 (0 transitive callers)
  • extracts_aliased_import in crates/codegraph-core/src/extractors/gleam.rs:407 (0 transitive callers)
  • extracts_type_definition_with_constructors in crates/codegraph-core/src/extractors/gleam.rs:415 (0 transitive callers)

Replaces child(0) fallbacks in handleCall / handle_call with
named_child(0) in both the native Rust and WASM/JS Gleam extractors.

The Gleam tree-sitter grammar's field_access node includes the '.'
punctuation token as a child, so child(0) on field_access could
return '.' as the receiver text on malformed input. named_child(0)
skips anonymous tokens and is consistent across both engines.

The field accessors always succeed on valid Gleam, so this only
affects the defensive fallback path, but it removes a silent
asymmetry that Greptile flagged in review of #1105.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

"import" => handle_import(node, source, symbols),
"function_call" => handle_call(node, source, symbols),
_ => {}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing call node type in match_gleam_node

The JS extractor's walkGleamNode dispatches on both 'function_call' and 'call' node types, but the Rust match_gleam_node only matches "function_call". Any Gleam file parsed with a grammar version that emits call nodes (or if the grammar ever uses both interchangeably) will have its call sites silently skipped by the native extractor while the WASM engine extracts them correctly — breaking the dual-engine parity that this PR aims to guarantee.

Suggested change
}
"function_call" | "call" => handle_call(node, source, symbols),

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a726c3a. Updated match_gleam_node to match both function_call and call node types, matching the JS walkGleamNode dispatch in src/extractors/gleam.ts. The 9 Gleam unit tests still pass.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Merge origin/main (Solidity extractor) into the Gleam branch.

- Added "sol" alongside "gleam" in SUPPORTED_EXTENSIONS
- Kept both Gleam and Solidity variants in LanguageKind::all()
- Bumped EXPECTED_LEN from 28 to 29 to account for both new languages
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's latest review:

  1. Alias fallback divergence (cc8ae99) — Aligned the Rust handle_import alias logic with the JS extractor:

    • Use find_child(node, "identifier") as fallback after the alias field
    • Compare by node ID (a.id() != module_node.id()) instead of string equality to handle self-aliases correctly
  2. Missing aliased import test (cc8ae99) — Added extracts_aliased_import unit test covering import gleam/io as my_ionames == ["my_io"].

Also resolved merge conflicts with main (a73376b) — Erlang was merged in #1103 so the test fixtures referencing .erl as WASM-only had to be updated to .fsx, and both Gleam and Erlang are now listed alongside each other in extractors/mod.rs, parser_registry.rs, file_collector.rs, and Cargo.lock.

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's remaining suggestion in ae689c8: added an explicit NATIVE_SUPPORTED_EXTENSIONS.has('.gleam')).toBe(true) assertion in tests/parsers/native-drop-classification.test.ts so a regression that removes .gleam from the set is caught directly, not just transitively via the drift guard.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Resolved merge conflicts with main (5989531). Main has gained Groovy and R native extractors since this branch was last synced; merged the registry/file-collector/test changes so all three new extractors (Gleam, Groovy, R) coexist:

  • crates/codegraph-core/src/parser_registry.rsLanguageKind::all() now includes Gleam alongside Groovy and R (EXPECTED_LEN = 32).
  • crates/codegraph-core/src/file_collector.rsSUPPORTED_EXTENSIONS lists gleam, groovy, gvy, r, R together; the docstring on is_supported_extension no longer cites .gleam as WASM-only.
  • tests/parsers/native-drop-classification.test.ts — the "WASM-only languages" sample is reduced to .fs, .fsx, .v, .m since Gleam/Groovy/R are now natively supported.

CI is green across all 27 checks (one transient macOS rustup-init flake on the first attempt rerun succeeded).

@carlos-alm carlos-alm merged commit 06b6536 into main May 14, 2026
47 of 51 checks passed
@carlos-alm carlos-alm deleted the feat/1071-gleam-rust-extractor branch May 14, 2026 07:03
@github-actions github-actions Bot locked and limited conversation to collaborators May 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rust engine parity: port the 11 remaining JS-only language extractors

1 participant