Skip to content

feat(core): Go callee extraction + Rust language support#32

Merged
prosdev merged 11 commits intomainfrom
feat/core-phase5-go-rust-support
Apr 2, 2026
Merged

feat(core): Go callee extraction + Rust language support#32
prosdev merged 11 commits intomainfrom
feat/core-phase5-go-rust-support

Conversation

@prosdev
Copy link
Copy Markdown
Contributor

@prosdev prosdev commented Apr 1, 2026

Summary

  • Rust scanner: Full extraction — functions, structs, enums, traits, impl methods, imports, callees, doc comments (~530 lines)
  • Rust patterns: try operator, match expression, unsafe block, impl/trait definitions
  • Go callees: walkCallNodes for functions and methods — dev_refs now traces Go call chains
  • Go patterns: error handling (if err != nil), goroutines, defer, channels
  • Pipeline wiring: .go/.rs in EXTENSION_TO_LANGUAGE, rust in supportedLanguages, copy-wasm, test-utils
  • Fix: .go was missing from EXTENSION_TO_LANGUAGE — Go patterns never fired

Rust scanner highlights

  • impl_item.type field for method naming (works for both impl Type and impl Trait for Type)
  • Functions inside mod blocks captured (query matches at any depth, parent-chain filter excludes only impl methods)
  • Generic type param stripping via split('<')[0]: handles Container<T>, HashMap<String, Vec<u8>>, Wrapper<Option<T>>
  • Macros intentionally excluded from callees (explicit macro_invocation early return in walkCallNodes)
  • Doc comments via /// prefix, attributes (#[derive]) skipped during backward walk
  • Malformed file resilience: returns empty, no crash

Code review history

  • Review 1 (general): Misleading macro comment → fixed (actual skip added). Orphaned JSDoc → fixed. Attribute comment → added.
  • Review 2 (Rust expert): mod block support missing (CRITICAL) → fixed (removed source_file anchoring, added parent-chain filter). Greedy generic stripping → fixed (split('<')[0]). Both fixes have dedicated tests.
  • Review 3 (final pass): APPROVED. All findings verified fixed, no regressions.

Test plan

Automated (1758 tests, all passing)

  • 14 Step 0 grammar validation tests (permanent reference for node names)
  • 25 Rust scanner tests: functions, structs, enums, traits, methods, callees, generics, closures, macros, doc comments, async, mod blocks, nested generics, malformed, generated
  • 6 Go callee tests: functions, methods, full selector names, line numbers, dedup, no callees on structs
  • 2 new pattern matcher tests: resolveLanguage('.go') and resolveLanguage('.rs')
  • Full suite: 1758 passed, 39 skipped
  • Lint + typecheck clean

Manual verification — Rust (BurntSushi/ripgrep)

Cloned --depth 1, ran local build against it.

Command Result
dev index 119 files, 3,226 components indexed in 8.5s. No crash.
dev map --depth 2 Crate structure captured correctly: crates/searcher/, crates/printer/, crates/core/, etc.
dev refs "Searcher" Found Searcher.new at crates/searcher/src/searcher/mod.rs:632. Callees: SearcherBuilder::new().build, SearcherBuilder::new.
dev search "grep pattern matching" 10 results including GlobStrategic.is_match, grep-regex README, Glob.compile_matcher. Semantic search works on Rust code.

Note: Hot paths show 0 refs because tree-sitter callees don't resolve target files (no file field). This is a known limitation — the dependency graph only has edges when callees include file paths. Cross-file resolution for tree-sitter languages is tracked as future work.

Manual verification — Go (cli/cli)

Cloned --depth 1, ran local build against it.

Command Result
Scan 830 files, 5,933 components scanned successfully in 3.2s.
dev index Failed — Antfly Linear Merge choked on 5,933 docs: decoding request: json: string unexpected end of JSON input. Scanner works, Antfly batch size is the bottleneck.
NewCmdRoot callees 126 callees extracted including f.Config, fmt.Errorf, heredoc.Doc, versionCmd.Format, cmdutil.IsAuthCheckEnabled. Full selector text preserved.
Scale test (359 Go files) 2,488 docs, 1,541 with callees, 16,520 total callee references. Go callee extraction works at scale.

Note: Antfly Linear Merge fails at ~6k docs. Tracked in scratchpad. Fix options: batch into chunks, raise Antfly limit, or stream. This blocks full indexing of medium-large repos but does not affect the scanner itself.

Known limitations (documented in scratchpad)

  1. Antfly batch size limit (~6k docs) blocks indexing large repos
  2. Rust/Go callees don't resolve target files (no cross-file edges in dependency graph)
  3. //! inner doc comments and /** */ block doc comments not extracted (v1)
  4. Trait default method bodies not extracted
  5. #[cfg(test)] inline test modules not detected as test files

🤖 Generated with Claude Code

prosdev and others added 11 commits April 1, 2026 16:33
Step 0 of Phase 5: validate tree-sitter-rust grammar node names before
building the scanner. All 14 tests confirm expected node types:
function_item, struct_item, enum_item, trait_item, impl_item (with type
field for both inherent and trait impls), use_declaration, call_expression
(for both bare and method calls), macro_invocation, visibility_modifier,
line_comment for doc comments, and generic impl blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RustScanner extracts functions, structs, enums, traits, impl methods,
imports, callees, and doc comments from Rust source files.

- impl_item.type field for method naming (works for both impl Type and
  impl Trait for Type)
- Generic type param stripping: Container<T>.show → Container.show
- Callee extraction via recursive AST walk (call_expression only,
  macro_invocation intentionally excluded)
- Doc comments via /// prefix (line_comment nodes)
- Visibility: pub, pub(crate), pub(super) → exported: true
- Async detection via text inspection before fn keyword
- Generated file skipping: target/ directory
- Malformed file resilience: returns empty, no crash

37 tests: 14 grammar validation + 23 scanner tests covering both
fixtures (simple + complex), impl patterns, generics, closures,
macros, malformed files, and generated file detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Go pattern rules: if err != nil, defer, goroutine, channel send
- Rust pattern rules: try operator, match expression, unsafe block,
  impl block, trait definition
- Add .go and .rs to EXTENSION_TO_LANGUAGE (fixes: Go patterns never fired)
- Add 'rust' to supportedLanguages in wasm-matcher
- Add go/rust to QUERIES_BY_LANGUAGE map in pattern-analysis-service
- Add 'rust' to SUPPORTED_LANGUAGES in copy-wasm.js
- Add Rust test file detection (tests/ dir, _test.rs) to test-utils
- Fix tests that used .rs as unsupported extension example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add walkCallNodes to GoScanner for function and method callee extraction.
Uses full selector text ("fmt.Println" not "Println") matching TS scanner.
6 new tests: callee extraction, full selector names, methods, line numbers,
deduplication, no callees on structs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Rust to CLAUDE.md scanner description, website Multi-Language
feature list, release notes (v0.12.0), and changeset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Actually skip macro_invocation in Rust walkCallNodes (prevents
  capturing calls inside macros like vec![foo()])
- Fix orphaned JSDoc comment in go.ts (was between walkCallNodes and
  isExported after insertion)
- Add comment explaining attribute skip in doc comment extraction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
W1: Remove source_file anchoring from functions query so functions
    inside mod blocks are captured. Filter impl methods by checking
    parent chain (declaration_list > impl_item), not just declaration_list.

W3: Fix greedy generic stripping — use split('<')[0] instead of
    regex replace. Handles nested generics like Wrapper<Option<T>>.

2 new tests: functions inside mod blocks (pub + private), nested
generic type param stripping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track two limitations found during manual verification:
- Antfly Linear Merge fails at ~6k docs (blocks large repo indexing)
- Rust/Go callees don't resolve target files (no cross-file graph edges)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Antfly's merge endpoint fails on large JSON payloads (~6k+ docs).
Split documents into chunks of 3,000 before sending.

- Extract chunk() utility as a pure function in utils/chunking.ts
- AntflyVectorStore.linearMerge splits sorted docs into chunks,
  runs linearMergeChunk per batch, accumulates results
- Progress callbacks report across all chunks
- 10 new tests for chunk() (even/uneven splits, edge cases, large arrays)
- Update scratchpad with Antfly batch limit and callee file resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chunking Linear Merge causes each chunk to delete the previous chunk's
records (server thinks each subset is the full dataset). Reverted to
single-call approach.

The Antfly payload size limit (~6k docs) is an Antfly-side issue that
needs a fix in the server (raise JSON body limit or support streaming).
Tracked in scratchpad. chunk() utility kept — useful elsewhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prosdev prosdev merged commit 1fdac2f into main Apr 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant