The dependency_graph_construction module is responsible for parsing multi-language source code repositories and building comprehensive dependency graphs. It serves as the core analysis engine that transforms raw source code into structured component models with dependency relationships.
- Repository Parsing: Extract code components from multi-language repositories
- Component Modeling: Create
Nodeobjects representing code elements (classes, functions, methods, interfaces, etc.) - Dependency Extraction: Identify and map relationships between code components
- Graph Construction: Build directed acyclic graphs representing component dependencies
- Leaf Node Identification: Identify entry points (leaf nodes) for documentation generation
codewiki/src/be/dependency_analyzer/
├── ast_parser.py (DependencyParser)
└── dependency_graphs_builder.py (DependencyGraphBuilder)
┌─────────────────────────────────────────────────────────────┐
│ Dependency Graph Construction Module │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ DependencyGraphBuilder │ │
│ │ Orchestrates dependency analysis and graph building │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ DependencyParser │ │
│ │ Extracts components and builds dependency models │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ AnalysisService (External) │ │
│ │ Multi-language AST analysis and call graph generation │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
File: codewiki/src/be/dependency_analyzer/ast_parser.py
Purpose: Parses multi-language repositories to extract code components and their dependencies.
- Initialize with repository path and file patterns (include/exclude)
- Orchestrate multi-language AST analysis via
AnalysisService - Build component models from analysis results
- Map component IDs and legacy identifiers
- Persist dependency graphs to JSON
class DependencyParser:
"""Parser for extracting code components from multi-language repositories."""
def __init__(
self,
repo_path: str,
include_patterns: List[str] = None,
exclude_patterns: List[str] = None
)
def parse_repository(
self,
filtered_folders: List[str] = None
) -> Dict[str, Node]
def _build_components_from_analysis(
self,
call_graph_result: Dict
)
def save_dependency_graph(
self,
output_path: str
)| Attribute | Type | Purpose |
|---|---|---|
repo_path |
str |
Absolute path to the repository |
components |
Dict[str, Node] |
Extracted code components keyed by component ID |
modules |
Set[str] |
Set of identified modules |
include_patterns |
List[str] |
File patterns to include (e.g., *.py, *.js) |
exclude_patterns |
List[str] |
File patterns to exclude |
analysis_service |
AnalysisService |
Multi-language analysis orchestrator |
- Initialization: Setup parser with repository configuration and filter patterns
- Repository Analysis: Invoke
AnalysisServiceto perform multi-language AST analysis - Component Extraction: Create
Nodeobjects from analysis results - Dependency Mapping: Build dependency relationships between components
- Persistence: Save component graph to JSON format
parser = DependencyParser(
repo_path="/path/to/repo",
include_patterns=["*.py", "*.ts"],
exclude_patterns=["*test*", "*node_modules*"]
)
components = parser.parse_repository()
parser.save_dependency_graph("dependency_graph.json")File: codewiki/src/be/dependency_analyzer/dependency_graphs_builder.py
Purpose: High-level orchestrator that manages dependency analysis workflow and identifies leaf nodes for documentation.
- Coordinate dependency graph construction from configuration
- Create and manage output directories
- Filter leaf nodes based on component types
- Validate and sanitize leaf node identifiers
- Return structured results for downstream processing
class DependencyGraphBuilder:
"""Handles dependency analysis and graph building."""
def __init__(self, config: Config)
def build_dependency_graph(
self
) -> tuple[Dict[str, Any], List[str]]| Attribute | Type | Purpose |
|---|---|---|
config |
Config |
System configuration with paths and patterns |
- Setup: Ensure output directory structure
- Configuration: Extract include/exclude patterns from config
- Parsing: Use
DependencyParserto analyze repository - Graph Construction: Build traversable dependency graph
- Leaf Node Identification: Extract entry points for documentation
- Validation: Filter invalid/error nodes and verify component existence
- Return: Components and validated leaf nodes
# Valid types for leaf nodes
valid_types = {"class", "interface", "struct"}
# For C-based codebases with no OOP constructs
if not available_types.intersection(valid_types):
valid_types.add("function")
# Filter criteria
keep_leaf_nodes = [
leaf for leaf in leaf_nodes
if isinstance(leaf, str)
and leaf.strip() != ""
and leaf in components
and components[leaf].component_type in valid_types
and not contains_error_keywords(leaf)
]Source: dependency_analyzer_models
The Node class represents a code component with the following attributes:
class Node(BaseModel):
id: str # Unique component identifier
name: str # Component name
component_type: str # Type: class, function, interface, etc.
file_path: str # Absolute file path
relative_path: str # Relative path from repository root
depends_on: Set[str] # Set of component IDs this depends on
source_code: Optional[str] # Source code snippet
start_line: int # Starting line number
end_line: int # Ending line number
has_docstring: bool # Whether component has documentation
docstring: str # Documentation string
parameters: Optional[List[str]] # Function/method parameters
node_type: Optional[str] # Detailed node type
base_classes: Optional[List[str]] # Parent classes
class_name: Optional[str] # Containing class name
display_name: Optional[str] # Human-readable name
component_id: Optional[str] # Component identifiergraph TD
A["Repository"] -->|Path & Patterns| B["DependencyGraphBuilder"]
B -->|Initialize| C["DependencyParser"]
C -->|Repository Analysis| D["AnalysisService"]
D -->|Multi-Language AST Analysis| E["Language Analyzers"]
E -->|Functions & Relationships| D
D -->|Structured Results| F["Call Graph Results"]
F -->|Functions List| C
C -->|Build Components| G["Node Creation"]
G -->|Component Objects| H["Component Dictionary"]
H -->|Build Graph| I["Dependency Graph"]
I -->|Traverse & Filter| J["Leaf Node Extraction"]
J -->|Validated Nodes| K["Documentation Entry Points"]
H -->|Persist| L["Dependency Graph JSON"]
style A fill:#e1f5ff
style K fill:#c8e6c9
style L fill:#f3e5f5
graph LR
A["Analysis Results"] -->|Functions| B["Component ID Mapping"]
A -->|Relationships| C["Dependency Linking"]
B -->|Create Nodes| D["Component Dictionary"]
C -->|Link Dependencies| D
D -->|Extract Modules| E["Module Set"]
D -->|Process Relationships| F["Dependency Resolution"]
F -->|Filter Valid| G["Final Dependencies"]
G -->|Store| D
style D fill:#fff3e0
style G fill:#e0f2f1
graph TD
A["Dependency Graph"] -->|Traverse| B["Get Leaf Nodes"]
B -->|Initial List| C["Validation Filter"]
C -->|Check Type| D{"Valid Component Type?"}
D -->|Yes| E["Keep in Results"]
D -->|No| F["Discard"]
C -->|Check Validity| G{"Valid Identifier?"}
G -->|No Error Keywords| E
G -->|Contains Errors| F
C -->|Check Existence| H{"In Components?"}
H -->|Yes| E
H -->|No| F
E -->|Validated| I["Leaf Nodes List"]
F -->|Filtered| J["Discard"]
I -->|Return| K["Documentation Entries"]
style I fill:#c8e6c9
style K fill:#a5d6a7
The module depends on the following services:
Source: dependency_analysis_services
Multi-language AST analysis and call graph generation:
- Analyzes repository structure
- Performs language-specific AST parsing
- Extracts function/class definitions
- Identifies inter-component relationships
- Supports 9+ programming languages
Source: language_analyzers
Language-specific AST parsers:
- Python AST Analyzer
- TypeScript/JavaScript Tree-sitter Analyzer
- Java Tree-sitter Analyzer
- C#, C++, C, PHP, Kotlin Analyzers
Components that consume this module's output:
Source: documentation_generation
Uses the component graph and leaf nodes to:
- Select entry points for documentation
- Generate module documentation
- Create dependency visualizations
Source: frontend_web_app
Consumes the dependency graph for:
- Repository submission processing
- Background job orchestration
- Analysis result caching
The module respects the following configuration:
class DependencyGraphBuilderConfig:
repo_path: str # Repository to analyze
dependency_graph_dir: str # Output directory for graphs
include_patterns: Optional[List[str]] # File patterns to include
exclude_patterns: Optional[List[str]] # File patterns to exclude# Include specific file types
include_patterns = ["*.py", "*.ts", "*.java"]
# Exclude test files and dependencies
exclude_patterns = ["*test*", "*spec*", "node_modules/*", "venv/*"]# Invalid identifier detection
SKIP_KEYWORDS = ['error', 'exception', 'failed', 'invalid']
invalid_identifiers = [
identifier for identifier in leaf_nodes
if any(keyword in identifier.lower() for keyword in SKIP_KEYWORDS)
]VALID_COMPONENT_TYPES = {"class", "interface", "struct", "function"}
# For C-based projects without OOP constructs
if not has_oop_components:
add("function")
# Validate each leaf node
filtered_nodes = [
node for node in leaf_nodes
if node in components
and components[node].component_type in VALID_TYPES
]Format: {repo_name}_dependency_graph.json
Structure:
{
"path/to/file.py::ClassName.method_name": {
"id": "path/to/file.py::ClassName.method_name",
"name": "method_name",
"component_type": "method",
"file_path": "/absolute/path/to/file.py",
"relative_path": "path/to/file.py",
"depends_on": [
"path/to/other.py::OtherClass",
"path/to/util.py::helper_function"
],
"source_code": "def method_name(...):\n ...",
"start_line": 42,
"end_line": 55,
"has_docstring": true,
"docstring": "Method documentation...",
"node_type": "function",
"class_name": "ClassName"
}
}Type: List[str]
Validated entry points for documentation generation:
[
"path/to/file.py::MainClass",
"path/to/module.py::AnotherClass",
"src/handler.ts::RequestHandler"
]Maps multiple identifier formats to canonical component IDs:
component_id_mapping = {
# Canonical format: file::name
"path/to/file.py::ComponentName": "path/to/file.py::ComponentName",
# Legacy format: file:name
"path/to/file.py:ComponentName": "path/to/file.py::ComponentName",
# Alternative formats for different languages
"path.to.module.ComponentName": "path/to/module.py::ComponentName"
}Identifies modules from component IDs:
# From Python-style IDs
"path/to/file.py::ClassName" → module = "path/to/file.py"
# From dotted paths
"path.to.module.ClassName" → module = "path.to.module"Resolves relationships between components:
for relationship in relationships:
caller_id = resolve_id(relationship.caller)
callee_id = resolve_id(relationship.callee)
if caller_id in components and caller_id is not None:
components[caller_id].depends_on.add(callee_id)| Operation | Complexity | Notes |
|---|---|---|
| Parse Repository | O(n) | Linear in number of files |
| Extract Components | O(f) | Linear in number of functions |
| Build Dependencies | O(r) | Linear in number of relationships |
| Topological Sort | O(n + e) | Standard graph algorithm |
| Leaf Node Filtering | O(l) | Linear in leaf nodes |
- File Limit: Configurable maximum files to prevent analysis timeout
- Language Filtering: Pre-filter by supported languages for performance
- Pattern Matching: Use efficient glob patterns for file filtering
- Memory: Component dictionary grows linearly with code size
- Multi-language Repository: Verify parsing of mixed language projects
- Circular Dependencies: Handle cyclic relationships gracefully
- Pattern Filtering: Validate include/exclude pattern matching
- Invalid Identifiers: Ensure error nodes are filtered
- Large Repositories: Test with 1000+ components
- Edge Cases:
- Empty repositories
- Missing dependencies
- Malformed source code
- Unusual file structures
def test_parse_simple_python_repo():
parser = DependencyParser(repo_path)
components = parser.parse_repository()
assert len(components) > 0
def test_pattern_filtering():
parser = DependencyParser(
repo_path,
include_patterns=["*.py"],
exclude_patterns=["*test*"]
)
components = parser.parse_repository()
# Verify only .py files are analyzed
# Verify no test files are included
def test_leaf_node_validation():
builder = DependencyGraphBuilder(config)
components, leaf_nodes = builder.build_dependency_graph()
# Verify all leaf nodes are in components
# Verify no error keywords in names- dependency_analysis_services - Multi-language analysis orchestration
- language_analyzers - Language-specific AST parsing
- dependency_analyzer_models - Core data models
- documentation_generation - Generation from dependency graphs
- frontend_web_app - Web interface integration
| Term | Definition |
|---|---|
| Component | A code element (class, function, interface, etc.) extracted from source |
| Node | A Node object representing a component with metadata |
| Dependency | A relationship where component A uses/calls component B |
| Leaf Node | A component with no dependents; suitable as documentation entry point |
| Call Graph | A directed graph showing function/method call relationships |
| AST | Abstract Syntax Tree - parsed representation of source code |
| Tree-sitter | Parser generator for efficient syntax tree creation |
| Topological Sort | Algorithm to order nodes respecting dependencies |
- Incremental Analysis: Cache results and only re-analyze changed files
- Cycle Detection: Identify and report circular dependency chains
- Quality Metrics: Calculate component cohesion and coupling metrics
- Type Resolution: Full type information for better dependency detection
- Performance Optimization: Parallel multi-language analysis
- Extended Language Support: Add more programming languages
- Visualization: Generate interactive dependency visualizations
Last Updated: 2024 Module Version: 1.0