The PHP Analyzer module is a language-specific analyzer that extracts code structure and dependency information from PHP files using the tree-sitter-php parser. It is part of the language_analyzers subsystem within the larger dependency_analysis_services framework.
The PHP Analyzer identifies and catalogues:
- Structural elements: Classes, interfaces, traits, enums, functions, and methods
- Dependency relationships: Use statements, class inheritance, interface implementation, object creation, and static method calls
- Metadata: Docstrings, parameters, return types, and base classes
- Namespace resolution: Fully qualified names through PHP's namespace and use mechanisms
- Tree-sitter based: Uses tree-sitter-php for robust AST parsing (handles both pure PHP and PHP mixed with HTML)
- Namespace-aware: Resolves class names to fully qualified names accounting for use statements and namespaces
- Recursion protected: Includes safeguards against stack overflow with MAX_RECURSION_DEPTH
- Template-aware: Skips PHP template files (Blade, Twig, PHTML) that are not relevant for dependency analysis
- Comprehensive relationships: Extracts multiple dependency types including inheritance, implementation, instantiation, and static calls
Responsibility: Resolves PHP class names to their fully qualified form considering namespace context and use statements.
Key Methods:
register_namespace(ns: str): Set the current namespace contextregister_use(fqn: str, alias: str = None): Register a use statement with optional aliasresolve(name: str) -> str: Resolve a name to its fully qualified form
Behavior:
- Maintains a mapping of aliases to fully qualified names (
use_map) - Handles both simple use statements (
use App\User;) and group use statements (use App\{User, Post};) - Resolves partial qualified names by checking the use map
- Prepends namespace context when resolving unqualified names
- Normalizes backslash escaping throughout resolution
Example Flow:
Input: namespace "App\Models", use "Illuminate\Database\Model as BaseModel"
resolve("BaseModel") → "Illuminate\Database\Model"
resolve("BaseModel\Factory") → "Illuminate\Database\Model\Factory"
resolve("\stdClass") → "stdClass"
Responsibility: Parses PHP files using tree-sitter and extracts nodes (classes, functions, methods) and their dependency relationships.
Key Methods:
__init__(file_path, content, repo_path): Initialize with file content_analyze(): Main entry point for parsing and analysis (three-pass approach)_is_template_file(): Determine if file should be skipped (Blade, Twig, PHTML patterns)
_extract_namespace_info(node): Extract namespace and use declarations (First Pass)_extract_use_statement(node): Parse complex use statements including group syntax_extract_nodes(node, lines, parent_class): Extract node definitions (Second Pass)
_extract_relationships(node): Extract all dependency relationships (Third Pass)_add_use_relationships(node): Add relationships for use statements- Handles: extends, implements, new, static calls, property promotion
_get_component_id(name, parent_class): Generate unique component identifier_get_relative_path(): Calculate relative path from repository root_find_containing_class_name(node): Traverse parent nodes to find enclosing class_extract_parameters(node): Extract function/method parameter list_extract_base_classes(node): Get parent classes and interfaces_get_preceding_docstring(node, lines): Extract PHPDoc comments_is_primitive(type_name): Check if type is built-in PHP type
Data Structures:
nodes: List[Node]- Extracted structural elementscall_relationships: List[CallRelationship]- Extracted dependency relationshipsnamespace_resolver: NamespaceResolver- Instance for name resolution_top_level_nodes: Dict[str, Node]- Cache of extracted top-level definitions
The analyzer uses a three-pass approach to ensure namespace and use statements are resolved before processing dependencies:
Pass 1: Extract Namespace Info
├─ Walk AST and find namespace_definition nodes
├─ Register current namespace with NamespaceResolver
├─ Find and register all use statements
└─ Build use_map for later name resolution
Pass 2: Extract Nodes
├─ Walk AST and identify structural elements
├─ For each: class, interface, trait, enum, function, method
├─ Create Node objects with metadata (docstring, parameters, base_classes)
└─ Cache top-level nodes in _top_level_nodes
Pass 3: Extract Relationships
├─ Walk AST and identify dependency patterns:
│ ├─ Use statements (imports)
│ ├─ Class inheritance (extends)
│ ├─ Interface implementation (implements)
│ ├─ Object creation (new)
│ ├─ Static method calls (::)
│ └─ Property promotion (PHP 8+)
├─ For each relationship, resolve names using namespace_resolver
└─ Create CallRelationship objects
Components and their roles:
- tree-sitter-php Parser: Parses PHP source code into AST
- NamespaceResolver: Maintains namespace context and use statement mappings
- TreeSitterPHPAnalyzer: Main orchestrator that:
- Uses tree-sitter-php to parse files
- Uses NamespaceResolver to resolve qualified names
- Extracts Node objects from AST
- Extracts CallRelationship objects from AST
Output: Tuple of (List[Node], List[CallRelationship])
The analyzer follows a three-pass approach:
- Parsing Phase: Content → tree-sitter-php Parser → AST Tree
- Namespace Resolution Phase: AST → Extract namespace and use statements → NamespaceResolver setup
- Node Extraction Phase: AST → Extract classes, methods, functions → List[Node]
- Relationship Extraction Phase: AST + NamespaceResolver → Extract dependencies → List[CallRelationship]
- Return: Tuple[List[Node], List[CallRelationship]] to caller
Core Components:
-
Node: Represents extracted structural elements (classes, methods, functions)
- Fields: id, name, component_type, file_path, docstring, parameters, base_classes
-
CallRelationship: Represents dependency relationships between components
- Fields: caller, callee, call_line, is_resolved
-
NamespaceResolver: Resolves class names to fully qualified names
- Fields: current_namespace, use_map
Relationships:
- TreeSitterPHPAnalyzer uses NamespaceResolver for name resolution
- TreeSitterPHPAnalyzer extracts and creates Node objects
- TreeSitterPHPAnalyzer extracts and creates CallRelationship objects
- Nodes and CallRelationships work together to represent the code dependency graph
The PHP Analyzer identifies and extracts the following dependency relationships:
- Pattern:
use Namespace\ClassName; - Detected: At file/namespace level
- Relationship: File imports external class
- Example:
use Illuminate\Database\Model;→ File depends on Model class
- Pattern:
class Child extends Parent { } - Detected: In class declarations
- Relationship: Child class depends on Parent class
- Example:
class User extends Model→ User depends on Model
- Pattern:
class MyClass implements MyInterface { } - Detected: In class/enum declarations
- Relationship: Implementing class depends on interface
- Example:
class Repository implements CacheableInterface→ Repository depends on CacheableInterface
- Pattern:
$obj = new ClassName(); - Detected: In object_creation_expression nodes
- Relationship: Caller depends on instantiated class
- Example: Within User class:
new Repository()→ User depends on Repository
- Pattern:
ClassName::staticMethod(); - Detected: In scoped_call_expression nodes
- Relationship: Caller depends on target class
- Example: Within Service:
Logger::log()→ Service depends on Logger
- Pattern:
public function __construct(private UserRepository $repo) {} - Detected: In property_promotion_parameter nodes
- Relationship: Class depends on injected class type
- Example: Service with promoted property of type UserRepository → Service depends on UserRepository
The analyzer maintains a comprehensive set of PHP primitives and built-in types to exclude from dependency relationships:
Scalar Types: string, int, float, bool, array, object, callable, iterable, mixed, void, null, false, true, never
Special Keywords: self, static, parent
Common Built-in Classes: Exception, Error, Throwable, Closure, Generator, Iterator, stdClass, DateTime, ArrayObject, etc.
These are excluded because they don't represent meaningful external dependencies.
The analyzer automatically skips template files that contain mostly markup:
Extension Patterns:
.blade.php(Laravel Blade).phtml(PHP with HTML).twig.php(Twig template)
Directory Patterns:
views/templates/resources/views/
This prevents spurious analysis of view files that contain primarily HTML with embedded PHP.
The PHP Analyzer is used by higher-level systems:
RepoAnalyzer (orchestrates all language analyzers)
├─ Delegates to: PHP Analyzer, Python Analyzer, JS Analyzer, ...
└─ Collects results from all analyzers
PHP Analyzer produces:
├─ Node objects (classes, methods, functions)
└─ CallRelationship objects (dependencies)
DependencyGraphBuilder (consumes analyzer output)
├─ Reads: Node and CallRelationship objects
└─ Creates: Repository with complete dependency graph
AnalysisService returns:
└─ AnalysisResult with full codebase dependency map
User Request (via CLI/Web)
↓
RepoAnalyzer (dependency_analysis_services)
├─ For each PHP file:
│ ├─ Read file content
│ ├─ Call: analyze_php_file(path, content, repo_path)
│ ├─ PHP Analyzer processes it
│ │ ├─ Extracts nodes (classes, functions, methods)
│ │ ├─ Extracts relationships (dependencies)
│ │ └─ Returns: (List[Node], List[CallRelationship])
│ └─ Store results
│
├─ Aggregate all language results
└─ Return to:
DependencyGraphBuilder
↓
Creates dependency graph structure
↓
AnalysisService (dependency_analysis_services)
↓
Returns AnalysisResult with full dependency map
# Main entry point for the PHP Analyzer
def analyze_php_file(
file_path: str, # Path to PHP file
content: str, # File contents
repo_path: str = None # Repository root for relative paths
) -> Tuple[List[Node], List[CallRelationship]]:
"""
Analyze a PHP file and extract nodes and call relationships.
Returns:
Tuple of (extracted_nodes, dependency_relationships)
"""
analyzer = TreeSitterPHPAnalyzer(file_path, content, repo_path)
return analyzer.nodes, analyzer.call_relationshipsThe analyzer handles PHP's namespace system:
// File: app/Models/User.php
namespace App\Models;
use Illuminate\Database\Model;
use App\Services\UserService as UserSvc;
class User extends Model {
// ...
}Resolution Process:
- Registers namespace:
App\Models - Registers use:
Model→Illuminate\Database\Model - Registers use:
UserSvc→App\Services\UserService - When encountering
extends Model:- Resolves
ModeltoIlluminate\Database\Model - Creates relationship to
Illuminate.Database.Model
- Resolves
// Fully qualified
new \Some\Namespace\Class(); // Resolved as-is
// Alias
use Some\Namespace\Class as Alias;
new Alias(); // Resolved via use_map
// Partial qualified (first component aliased)
use Some\Namespace;
new Namespace\SubClass(); // Resolved by expanding alias
// Unqualified (namespace prepended)
namespace My\App;
new LocalClass(); // Resolved as My\App\LocalClassThe analyzer supports PHP 8+ features:
Constructor Property Promotion:
class Service {
public function __construct(
private UserRepository $repo, // Type hint creates dependency
private Logger $logger
) {}
}Named Types (PHP 7.4+):
public function getUserById(int $id): User {
// Extracts User type as dependency
}Union and Intersection Types (PHP 8.0+):
public function process(User|Admin $entity): void {}MAX_RECURSION_DEPTH = 100
# Used in all recursive methods:
def _extract_nodes(self, node, lines, depth=0, parent_class=None):
if depth > MAX_RECURSION_DEPTH:
logger.warning(f"Max recursion depth reached in {self.file_path}")
return
# ... process node ...
for child in node.children:
self._extract_nodes(child, lines, depth + 1, parent_class)def _analyze(self):
try:
# Parse and analyze
...
except RecursionError:
logger.warning(f"Max recursion depth exceeded in {self.file_path}")
except Exception as e:
logger.error(f"Error parsing PHP file {self.file_path}: {e}")- Primitive type checking: Excludes analysis of built-in types
- Template file detection: Skips non-code files
- Node validation: Only creates nodes when both type and name are present
- Relationship validation: Filters out self-references and primitives
- Early template file detection: Skip analysis before parsing for non-code files
- Single tree-sitter parse: One parse per file (not multiple)
- Lazy docstring extraction: Only when specifically needed
- Caching of top-level nodes:
_top_level_nodesdict for quick lookup - String normalization: Backslash normalization once at resolution time
- Time Complexity: O(n) where n = AST node count (linear traversal with bounded depth)
- Space Complexity: O(m) where m = number of extracted nodes and relationships
- Handles PHP files efficiently via tree-sitter (C/C++ based)
- Bounded recursion depth prevents stack issues
- No external API calls or I/O during analysis
- Suitable for analyzing large codebases with thousands of PHP files
Modify PHP_PRIMITIVES set to extend built-in types:
PHP_PRIMITIVES: Set[str] = {
"string", "int", "float", "bool", "array", "object",
# ... add custom framework-specific types to exclude
}Modify TEMPLATE_DIRECTORIES to skip additional directories:
TEMPLATE_DIRECTORIES: Set[str] = {
"views", "templates", "resources/views",
# Add custom pattern: "admin/templates"
}Adjust MAX_RECURSION_DEPTH for deeply nested structures:
MAX_RECURSION_DEPTH = 100 # Increase if needed (default: 100)- dependency_analysis_services: Orchestrates all language-specific analyzers
- language_analyzers: Overview of all language-specific analyzers (Python, JavaScript, TypeScript, Java, C, C++, C#, Kotlin)
- dependency_analyzer_models: Core data models (Node, CallRelationship, Repository)
- dependency_graph_construction: Constructs dependency graphs from extracted nodes
- dependency_analyzer_utils: Shared logging and utilities
from codewiki.src.be.dependency_analyzer.analyzers.php import analyze_php_file
php_code = """
<?php
namespace App\Models;
use Illuminate\Database\Model;
use App\Services\UserService;
class User extends Model {
public function __construct(
private UserService $service
) {}
}
"""
file_path = "app/Models/User.php"
repo_path = "/home/project"
nodes, relationships = analyze_php_file(php_code, file_path, repo_path)
# nodes contains:
# - Node(name="User", component_type="class", ...)
# relationships contains:
# - CallRelationship(caller="app/Models/User::User",
# callee="Illuminate.Database.Model", ...)
# - CallRelationship(caller="app/Models/User",
# callee="App.Services.UserService", ...)from codewiki.src.be.dependency_analyzer.analysis.analysis_service import AnalysisService
service = AnalysisService()
result = service.analyze("/path/to/project", language="php")
# result.nodes: all PHP nodes found
# result.relationships: all PHP dependencies
# result.call_graph: complete dependency graphThe PHP Analyzer is a robust, language-specific component that extracts structural and dependency information from PHP codebases. Through a three-pass analysis strategy and comprehensive namespace resolution, it accurately identifies classes, functions, methods, and their dependency relationships while handling PHP-specific features and filtering out non-relevant elements. It integrates seamlessly with the broader dependency analysis system to enable complete codebase understanding.