The java_analyzer module is a language-specific analyzer component within the dependency analysis system. It leverages Tree-Sitter Java parser to analyze Java source code files and extract Java components (classes, interfaces, enums, methods, etc.) along with their relationships (inheritance, implementation, field dependencies, method calls, and object creation patterns).
This module is part of the language_analyzers group, which provides specialized parsing for different programming languages. The java_analyzer specifically handles Java syntax parsing and semantic relationship detection.
The java_analyzer module serves three primary functions:
- Component Extraction: Identifies and extracts all Java top-level components (classes, interfaces, enums, records, annotations) and methods
- Relationship Detection: Identifies dependencies and interactions between components through inheritance, implementation, field types, method invocations, and object instantiation
- Semantic Analysis: Analyzes Java code structure to understand class hierarchies, method dependencies, and type relationships
The java_analyzer operates within a multi-layered architecture:
System Layers:
- Dependency Analysis System: Core services including AnalysisService, CallGraphAnalyzer, and RepoAnalyzer
- Language Analyzers: Nine language-specific analyzers (Python, JavaScript, TypeScript, Java, Kotlin, C#, C++, C, PHP)
- Parser Backend: Tree-Sitter Java language library
- Data Models: Node and CallRelationship representations
Key Relationships:
- RepoAnalyzer dispatches file analysis to appropriate language analyzers
- TreeSitterJavaAnalyzer uses Tree-Sitter Java for parsing
- Generated Nodes and CallRelationships are consumed by dependency graph builders
The TreeSitterJavaAnalyzer class has the following structure:
Main Components:
__init__: Initializer accepting file path, content, and repository path_analyze(): Main analysis entry point_extract_nodes(): Identifies Java components (classes, interfaces, methods, etc.)_extract_relationships(): Detects dependencies between components- Helper Methods: Type resolution, path utilities, and context tracking
Extraction Categories:
- Node Extraction:
class_declaration,interface_declaration,enum_declaration,record_declaration,annotation_type_declaration,method_declaration - Relationship Detection: Inheritance (extends), Implementation (implements), Field Type Use, Method Calls, Object Creation
The TreeSitterJavaAnalyzer class is the main analyzer responsible for parsing Java files and extracting structural information.
class TreeSitterJavaAnalyzer:
def __init__(self, file_path: str, content: str, repo_path: str = None)
# Main Analysis Methods
def _analyze() -> None
# Node Extraction
def _extract_nodes(node, top_level_nodes, lines) -> None
# Relationship Extraction
def _extract_relationships(node, top_level_nodes) -> None
# Utility Methods
def _get_module_path() -> str
def _get_relative_path() -> str
def _get_component_id(name: str, parent_class: str = None) -> str
def _find_containing_class(node, top_level_nodes) -> Optional[str]
def _find_containing_method(node) -> Optional[str]
def _find_variable_type(node, variable_name, top_level_nodes) -> Optional[str]
def _search_variable_declaration(block_node, variable_name) -> Optional[str]
def _is_primitive_type(type_name: str) -> bool
def _get_identifier_name(node) -> Optional[str]
def _get_type_name(node) -> Optional[str]| Property | Type | Purpose |
|---|---|---|
file_path |
Path |
Absolute path to the Java file being analyzed |
content |
str |
Source code content of the Java file |
repo_path |
str |
Repository root path for relative path calculation |
nodes |
List[Node] |
Extracted Java components (classes, methods, etc.) |
call_relationships |
List[CallRelationship] |
Detected relationships between components |
The analysis process follows these sequential phases:
- Initialization: Receive file path and content
- Parsing: Parse Java code using Tree-Sitter library
- AST Construction: Build Abstract Syntax Tree
- Node Extraction: Identify Java components (classes, interfaces, enums, records, annotations, methods)
- Relationship Detection: Identify five types of dependencies (inheritance, implementation, field types, method calls, object creation)
- Output Generation: Return accumulated nodes and relationships
The analyzer extracts the following Java language constructs:
┌─ class_declaration
│ └─ abstract class support
├─ interface_declaration
├─ enum_declaration
├─ record_declaration
├─ annotation_type_declaration
└─ method_declaration
└─ Nested within classes/interfaces
| Component Type | Tree-Sitter Type | Description |
|---|---|---|
class |
class_declaration |
Standard Java classes |
abstract class |
class_declaration (with abstract modifier) |
Abstract class definitions |
interface |
interface_declaration |
Interface definitions |
enum |
enum_declaration |
Enum type definitions |
record |
record_declaration |
Record type definitions (Java 14+) |
annotation |
annotation_type_declaration |
Custom annotation definitions |
method |
method_declaration |
Method definitions |
The analyzer identifies five categories of dependencies:
Detected Pattern: class ChildClass extends ParentClass
Detection Logic:
- Searches for
superclassnode inclass_declaration - Extracts parent class name from
type_identifier - Excludes primitive types
- Creates
CallRelationshipwith caller as child, callee as parent
Detected Pattern: class MyClass implements Interface1, Interface2
Detection Logic:
- Looks for
super_interfacesin class/enum/record declarations - Iterates through
type_listto find all implemented interfaces - Filters out primitive types
- Creates relationship for each interface
Detected Pattern: private MyType fieldName;
Detection Logic:
- Scans
field_declarationnodes - Identifies containing class for each field
- Extracts field type using
type_identifierorgeneric_type - Creates dependency from class to field type
Detected Pattern: objectName.methodName()
Resolution Steps:
- Extract Method Invocation node from AST
- Extract Object ID (object being called)
- Extract Method ID (method being invoked)
- Lookup Variable Type (resolve object type from context)
- Create Relationship (if type resolved successfully)
Detection Logic:
- Identifies
method_invocationnodes - Extracts object name and method name from invocation structure
- Attempts to resolve object type:
- First checks if object is a known top-level class
- Falls back to local variable type lookup
- Creates relationship if target type is identified
Detected Pattern: new MyType()
Detection Logic:
- Identifies
object_creation_expressionnodes - Extracts created type from type node
- Links creating class to created type
- Supports generic types
Extracts type name from various type node structures:
type_identifier: Direct type referencegeneric_type: Generic type with type parameterssuperclass: Parent class type
Resolves variable type through:
- Local variable declaration search in containing method
- Field declaration search in containing class
- Tree traversal for nested scopes
Recursively searches code blocks for variable declarations:
local_variable_declaration: Local variable scope- Supports nested block structures
- Returns first matching type
Converts file path to module path:
/src/main/java/com/example/MyClass.java → com/example/MyClass
Computes relative path from repository root for consistent component IDs
Generates unique component identifiers:
Format: {relative_path}::{component_name}
Example: src/main/java/Example.java::MyClass
src/main/java/Example.java::MyClass.myMethod
Traverses AST upward to find enclosing class, supports:
class_declarationinterface_declarationenum_declarationrecord_declarationannotation_type_declaration
Locates enclosing method and builds full method identifier
Filters out Java primitives and common built-in types:
Primitives: boolean, byte, char, double, float, int, long, short
Boxed Types: Boolean, Byte, Character, Double, Float, Integer, Long, Short
Built-ins: String, Object, List, Set, Map, Collection, Optional, void, Void
def analyze_java_file(
file_path: str,
content: str,
repo_path: str = None
) -> Tuple[List[Node], List[CallRelationship]]:
"""
Analyze a Java source file and extract components and relationships.
Args:
file_path: Absolute path to the Java file
content: Source code content
repo_path: Repository root path for relative paths
Returns:
Tuple of:
- List[Node]: Extracted Java components
- List[CallRelationship]: Detected relationships
"""See dependency_analyzer_models.md for complete Node model definition.
Key Node Fields:
id: Unique component identifiername: Component namecomponent_type: Type of component (class, method, interface, etc.)file_path: Absolute file pathrelative_path: Path relative to repository rootstart_line,end_line: Location in source filesource_code: Source code snippetdisplay_name: Human-readable component name
See dependency_analyzer_models.md for complete CallRelationship model definition.
Key Relationship Fields:
caller: Source component IDcallee: Target component IDcall_line: Line number where dependency occursis_resolved: Boolean indicating if relationship is fully resolved
The java_analyzer integrates with the broader dependency analysis system:
Integration Flow:
- RepoAnalyzer dispatches each Java file to TreeSitterJavaAnalyzer
- TreeSitterJavaAnalyzer produces two outputs:
- Extracted Nodes (components found in the file)
- Relationships (dependencies between components)
- DependencyGraphBuilder consumes both outputs to construct dependency graphs
- Analysis Pipeline orchestrates the complete workflow
The java_analyzer is one of nine language-specific analyzers:
- python_analyzer.md - Python code analysis
- javascript_analyzer.md - JavaScript/Node.js analysis
- typescript_analyzer.md - TypeScript analysis
- kotlin_analyzer.md - Kotlin analysis
- csharp_analyzer.md - C# analysis
- cpp_analyzer.md - C++ analysis
- c_analyzer.md - C analysis
- php_analyzer.md - PHP analysis
# Language setup
language_capsule = tree_sitter_java.language()
java_language = Language(language_capsule)
parser = Parser(java_language)
# Parsing
tree = parser.parse(bytes(self.content, "utf8"))
root = tree.root_nodeThe analyzer leverages the Tree-Sitter Java grammar for:
- Complete Java syntax support (Java 8 through latest versions)
- Accurate AST construction
- Robust error recovery
- Performance optimized parsing
Supported Java Features:
- Classes, interfaces, enums, records, annotations
- Method and field declarations
- Generics and type parameters
- Inner/nested classes
- Method invocations with various patterns
- Object creation expressions
- Inheritance and interface implementation
- Package declarations and imports (parsed but not extracted)
Process Steps:
- Parse Java file and get root AST node
- Split content into lines for source mapping
- Walk through entire AST tree recursively
- For each node, check its type:
- If class/interface/enum/record/annotation: extract component info
- If method: extract method with containing class
- Otherwise: skip to next sibling
- For extracted components:
- Generate unique component ID (file path + component name)
- Create Node object with metadata
- Add to accumulating nodes list
- Recursively process children and continue traversal
Process Steps:
- Walk through AST tree for relationship nodes
- Check node type and dispatch to appropriate handler:
- Class nodes: Check for inheritance (extends superclass)
- Class/Enum/Record nodes: Check for interface implementation
- Field declarations: Extract field type dependencies
- Method invocations: Analyze method calls on objects
- Object creation: Analyze new object instantiations
- Other nodes: Skip relationship analysis
- For identified relationships:
- Resolve the target type (class, interface, etc.)
- Filter out primitive and built-in types
- Create CallRelationship with source and target
- Add to accumulating relationships list
- Continue traversal to process all nodes
- Node Extraction: O(n) where n = number of AST nodes
- Relationship Detection: O(n) with additional lookups for variable type resolution
- Variable Type Resolution: O(m) where m = nodes in containing scope (optimized through block-level search)
- Nodes Storage: O(k) where k = number of components extracted
- Relationships Storage: O(r) where r = number of relationships found
- Tree-Sitter AST: O(n) automatically managed by parser
- Single-pass AST traversal for both node and relationship extraction
- Early filtering of primitive types to reduce relationship count
- Localized variable type search within method/class scope
- Recursive descent prevents unbounded traversal
- Missing or malformed Java files
- Invalid Tree-Sitter parse results (gracefully degraded)
- Missing type information (defaults to unresolved relationships)
- Primitive type filtering prevents spurious dependencies
The analyzer uses Python's logging module with configurable log levels:
logger = logging.getLogger(__name__)Debug logs include:
- Missing superclass information
- Node extraction details
- Type resolution steps
-
Type Resolution: Cannot fully resolve types without import information
- Only handles local variables and fields
- Cross-module types marked as unresolved
- Type parameters and generic resolution is basic
-
Visibility Scope: Cannot determine method visibility or access modifiers impact
- All detected calls treated equally regardless of public/private scope
-
Dynamic Types: Cannot resolve dynamic dispatch or reflection-based calls
- Static analysis limitations inherent to the approach
-
Complex Generics: Generic type parameter resolution is simplified
- Nested generics may not be fully resolved
- Abstract classes: Detected and labeled with "abstract class" type
- Inner/nested classes: Component IDs include parent class references
- Method overloading: Methods distinguished only by name (signature not fully captured)
- Generic types: Basic extraction of base type from generic_type nodes
- Multi-interface implementation: Correctly handles comma-separated interface list
Input Java Code:
public abstract class Animal {
public abstract void makeSound();
}
public class Dog extends Animal {
@Override
public void makeSound() {
System.out.println("Woof");
}
}Extracted Components:
Node(id="file.java::Animal", name="Animal", type="abstract class")
Node(id="file.java::Dog", name="Dog", type="class")
Node(id="file.java::Animal.makeSound", name="Animal.makeSound", type="method")
Node(id="file.java::Dog.makeSound", name="Dog.makeSound", type="method")
Extracted Relationships:
CallRelationship(caller="file.java::Dog", callee="file.java::Animal")
Input Java Code:
public interface Logger {
void log(String message);
}
public class ConsoleLogger implements Logger {
private PrintWriter writer;
@Override
public void log(String message) {
writer.println(message);
}
}Extracted Relationships:
CallRelationship(caller="file.java::ConsoleLogger", callee="file.java::Logger")
CallRelationship(caller="file.java::ConsoleLogger", callee="PrintWriter")
Input Java Code:
public class Service {
private Repository repository;
public void process() {
List<Data> data = repository.findAll();
}
}
public class Repository {
public List<Data> findAll() { ... }
}Extracted Relationships:
CallRelationship(caller="file.java::Service", callee="Repository")
CallRelationship(caller="file.java::Service.process", callee="Repository")
-
Node Extraction
- Class, interface, enum, record, annotation detection
- Method extraction with containing class identification
- Abstract class modifier detection
-
Relationship Detection
- Single and multiple inheritance chains
- Interface implementation (single and multiple)
- Field type dependencies
- Method invocation patterns
- Object creation expressions
-
Type Resolution
- Local variable type inference
- Field type resolution
- Generic type handling
- Primitive type filtering
-
Path Generation
- Relative path calculation
- Component ID generation
- Module path conversion
- Multi-file Java projects with cross-file dependencies
- Complex inheritance hierarchies
- Generics and parameterized types
- Nested and inner classes
- dependency_analyzer_models.md: Data models used by analyzer
- dependency_analysis_services.md: Higher-level analysis orchestration
- language_analyzers.md: Overview of all language analyzers
- c_analyzer.md: C analyzer implementation reference
- cpp_analyzer.md: C++ analyzer implementation reference
- csharp_analyzer.md: C# analyzer implementation reference
- kotlin_analyzer.md: Kotlin analyzer implementation reference
- python_analyzer.md: Python analyzer implementation reference
- javascript_analyzer.md: JavaScript analyzer implementation reference
- typescript_analyzer.md: TypeScript analyzer implementation reference
- php_analyzer.md: PHP analyzer implementation reference
The java_analyzer module provides robust Java code analysis through Tree-Sitter integration, extracting both Java language components and their interdependencies. It serves as a key component in the dependency analysis pipeline, enabling developers and tools to understand Java codebase structure and relationships.
By combining AST-based analysis with smart type resolution and relationship detection, the analyzer bridges the gap between low-level syntax and high-level semantic understanding of Java code.