A powerful code analysis tool that automatically identifies similar code regions across your codebase using advanced Locality Sensitive Hashing (LSH) algorithms. Find potential refactoring opportunities by detecting duplicate or near-duplicate code patterns that could be extracted into shared functions or modules.
The Refactor Analyzer employs sophisticated similarity detection algorithms to scan your entire codebase and identify regions of code that share similar patterns or functionality. By using k-shingles (character n-grams) and MinHash signatures, it can efficiently detect both exact and near-exact duplicates across different files.
- Codebase Preprocessing: Reads and normalizes all files in your project directory
- Shingling: Converts each line of code into k-shingles (5-character substrings) for comparison
- LSH Algorithm: Uses MinHash signatures and banding techniques to efficiently find similar lines
- Similarity Graph: Creates connections between similar lines based on configurable Jaccard similarity thresholds
- Region Expansion: Uses sliding window approach to identify contiguous blocks of similar code
- Results Generation: Outputs ranked similar regions that are candidates for refactoring
- ✅ Efficient Similarity Detection: LSH algorithm enables fast comparison across large codebases
- ✅ Configurable Thresholds: Adjust sensitivity for both line-level and region-level similarity
- ✅ Cross-File Analysis: Only identifies similarities between different files (avoids false positives from same-file duplicates)
- ✅ Region-Based Results: Groups similar lines into meaningful code blocks rather than individual line matches
- ✅ Detailed Output: Provides exact line numbers and code content for each similar region
- ✅ JSON Export: Machine-readable results for integration with other tools
- ✅ Progress Tracking: Real-time feedback during analysis with step-by-step progress indicators
- ✅ CLI Interface: Command-line tool with multiple commands and options
- Python 3.6 or higher
- Required dependencies (see requirements.txt)
- Clone the repository:
git clone https://github.com/cerredz/Codebase-Refactor-Detection.git
cd refactor-analyzer- Install dependencies:
pip install -r requirements.txtThe refactor analyzer includes a comprehensive CLI interface with the following commands:
refactor --runrefactor --configrefactor --reportThe report command provides a formatted, easy-to-read output of all similar regions found, including:
- 📁 File pairs with similar regions
- 📂 Full file paths
- 🔸 Actual code content for each similar region
- Visual separators for easy reading
Running the CLI without any arguments defaults to the --run command:
refactorYou can still run the analyzer using the original method:
python -m mainThis will:
- Read the configuration from
config.json - Analyze the codebase specified in your configuration
- Generate similarity analysis results
- Save findings to
results.json
Customize the analysis behavior by editing config.json:
{
"region_length": 10, // Minimum lines required for a similar region
"candidate_threshold": 0.7, // LSH candidate selection threshold (0.0-1.0)
"line_threshold": 0.8 // Final similarity threshold for line matching (0.0-1.0)
}Configuration Parameters:
region_length: Minimum number of consecutive similar lines to constitute a refactorable regioncandidate_threshold: Controls how selective the initial similarity detection is (lower = more candidates)line_threshold: Final threshold for determining if two lines are truly similar (higher = more strict)
The tool generates results.json containing:
- Regions: Similar code blocks with exact line content
- File References: Source file paths for each similar region
- Line Numbers: Precise locations of similar code blocks
- Similarity Scores: Quantified similarity between regions
Use the --report command for a human-readable formatted output of the results.
🚧 Coming Soon:
- More CLI Commands: More in-depth cli commands that give you more control over testing.
- REST API: HTTP API for integration with IDEs and CI/CD pipelines
- Web Dashboard: Interactive web interface for visualizing refactoring opportunities
- IDE Plugins: Direct integration with popular code editors
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.