A machine learning-based system for detecting forged signatures using image processing and deep learning techniques. This project helps in verifying the authenticity of handwritten signatures by analyzing key geometric and statistical features.
- Overview
- Features
- Dataset
- Installation
- Project Structure
- How It Works
- Usage
- Technical Details
- Results
- Future Improvements
This project implements an automated signature verification system that can distinguish between genuine and forged signatures. The system uses:
- Image Processing: Preprocessing signatures to extract meaningful features
- Feature Extraction: Calculating 9 geometric and statistical features from signatures
- Deep Learning: Using a Multi-Layer Perceptron (MLP) neural network for classification
- TensorFlow: Implementing the neural network model with TensorFlow 1.x
- β Preprocesses signature images (RGB β Grayscale β Binary)
- β Extracts 9 distinctive features from each signature
- β Trains neural network models for each person's signature
- β Classifies signatures as genuine or forged
- β Handles individual signature verification
The system extracts the following features from signature images:
- Ratio - Ratio of signature pixels to total image area
- Centroid Y - Vertical position of signature center
- Centroid X - Horizontal position of signature center
- Eccentricity - Measure of how elongated the signature is
- Solidity - Ratio of signature pixels to convex hull pixels
- Skew X - Horizontal skewness of pixel distribution
- Skew Y - Vertical skewness of pixel distribution
- Kurtosis X - Peak flatness of horizontal pixel distribution
- Kurtosis Y - Peak flatness of vertical pixel distribution
Note: The original signature images are not included in this repository but are available via Google Drive.
Drive Link: https://drive.google.com/drive/folders/1KcAvPwbwMEPS6yembqIJgoZG8Tm7m8ya?usp=sharing
- 39 individuals (Person IDs: 001-039)
- For each person:
- 10 genuine signatures (training: 7, testing: 3)
- 10 forged signatures (training: 7, testing: 3)
- Total samples: 780 signatures (390 genuine + 390 forged)
The project includes pre-generated feature files:
Training/- Training CSV files for each person (training_001.csv to training_039.csv)Testing/- Testing CSV files for each person (testing_001.csv to testing_039.csv)- Each CSV contains 14 samples (7 genuine + 7 forged) with 9 features plus classification label
- Python 3.7+
- TensorFlow 1.x (or TensorFlow 2.x with compatibility mode)
- NumPy
- Pandas
- Matplotlib
- SciPy
- scikit-image
Install the required packages:
pip install numpy pandas matplotlib scipy scikit-image tensorflow kerasOr for TensorFlow 2.x compatibility:
pip install tensorflow==2.xNote: The code uses TensorFlow 1.x syntax with tf.disable_v2_behavior() for compatibility.
signature_forgery_detection/
β
βββ Code_sign.py # Main Python implementation
βββ Main_Code.ipynb # Jupyter notebook version (Google Colab)
βββ README.md # This file
β
βββ Training/ # Training feature files
β βββ training_001.csv
β βββ training_002.csv
β βββ ... (training_039.csv)
β
βββ Testing/ # Testing feature files
β βββ testing_001.csv
β βββ testing_002.csv
β βββ ... (testing_039.csv)
β
βββ TestFeatures/ # Test feature extraction
β βββ testcsv.csv
β
βββ TestFeatures/ # Temporary test files
def preproc(path):
# Convert RGB to grayscale
grey = rgbgrey(img)
# Convert to binary using Otsu's threshold
binimg = greybin(grey)
# Crop to signature boundaries
signimg = binimg[r.min():r.max(), c.min():c.max()]Steps:
- Reads RGB image
- Converts to grayscale
- Applies Gaussian filter (blur_radius=0.8) for noise reduction
- Uses Otsu's threshold for binarization
- Crops to signature bounds
The system extracts 9 features:
- Ratio: Pixel density in cropped signature
- Centroid: Normalized center coordinates (x, y)
- Eccentricity & Solidity: Using scikit-image regionprops
- Skewness & Kurtosis: Statistical moments of pixel projections
Neural Network Structure:
- Input: 9 features
- Hidden Layer 1: 7 neurons (tanh activation)
- Hidden Layer 2: 10 neurons
- Hidden Layer 3: 30 neurons
- Output: 2 classes (genuine/forged)
Training Parameters:
- Learning Rate: 0.001
- Epochs: 1000 (or until loss < 0.0001)
- Optimizer: Adam
- Loss Function: Mean Squared Difference
- Activation: Softmax for output
The model outputs a probability distribution over 2 classes:
- Class 0: Forged signature
- Class 1: Genuine signature
Important: Update the file paths in Code_sign.py before running:
genuine_image_paths = "path/to/genuine/signatures"
forged_image_paths = "path/to/forged/signatures"- Generate features from images:
python Code_sign.py- The script will:
- Extract features from all training/testing images
- Create CSV files in Training/ and Testing/ folders
- Prompt for person ID and test image path
- Classify the signature
- Open in Google Colab
- Mount Google Drive
- Update paths for your signature images
- Run cells sequentially
When running the script, you'll be prompted:
Enter person's id : 001
Enter path of signature image : path/to/signature.png
Output:
- "Genuine Image" - Signature is authentic
- "Forged Image" - Signature is forged
RGB Image β Grayscale β Gaussian Filter β Binary (Otsu) β Crop β Featuresrgbgrey(): Manual RGB to grayscale conversiongreybin(): Binarization with noise removalRatio(): Signature pixel densityCentroid(): Center of mass (normalized)EccentricitySolidity(): Shape metricsSkewKurtosis(): Statistical distributions
Architecture: 4-layer MLP
- Layer 1: Linear β tanh (feature transformation)
- Layer 2: Linear
- Layer 3: Linear (deep representation)
- Output: Linear β tanh β softmax
Key Functions:
multilayer_perceptron(): Network definitionreadCSV(): Data loading and preprocessingevaluate(): Training and testingtrainAndTest(): Cross-validation
The system achieves different accuracy levels based on:
- Person-specific signatures
- Quality of input images
- Feature extraction quality
- Training Accuracy: ~95-98%
- Testing Accuracy: ~85-92%
- Varies by signature complexity
- Image Quality: Higher resolution = better features
- Signature Complexity: More distinctive signatures = better detection
- Forgery Skill: Skilled forgeries are harder to detect
- Model Parameters: Learning rate, epochs, network architecture
-
Deep Learning Models
- Implement CNNs for raw image analysis
- Use Siamese networks for signature comparison
- Transfer learning from pre-trained models
-
Feature Engineering
- Add texture features (LBP, Gabor filters)
- Incorporate stroke-level analysis
- Dynamic time warping for temporal features
-
Data Augmentation
- Rotation, scaling, noise addition
- Synthetic forgery generation
- Balanced dataset creation
-
User Interface
- Web-based upload and verification
- Real-time visualization of features
- Batch processing capabilities
-
Model Improvements
- Hyperparameter tuning
- Ensemble methods
- Attention mechanisms
- Regularization techniques
- TensorFlow Version: Code uses TensorFlow 1.x syntax
- Hard-coded Paths: File paths need to be updated
- Dataset Dependency: Original images not in repository
- Limited to 39 Persons: Expand dataset for production
| Function | Purpose |
|---|---|
rgbgrey() |
RGB to grayscale conversion |
greybin() |
Grayscale to binary with noise removal |
preproc() |
Complete preprocessing pipeline |
Ratio() |
Extract signature pixel ratio |
Centroid() |
Calculate centroid coordinates |
EccentricitySolidity() |
Extract shape features |
SkewKurtosis() |
Calculate statistical features |
getFeatures() |
Extract all features |
makeCSV() |
Generate feature CSV files |
testing() |
Extract features for test image |
readCSV() |
Load training/testing data |
multilayer_perceptron() |
Define neural network |
evaluate() |
Train and evaluate model |
trainAndTest() |
Cross-validation testing |
- Dataset Required: Download signature images from the provided Google Drive link before running feature extraction
- Update Paths: Modify file paths in the code to match your system
- TensorFlow Version: Ensure compatibility with TensorFlow 1.x or use compatibility mode
- Test Features: The system works best with clean, high-contrast signature images
- Individual Models: Each person requires a separate trained model for best accuracy