cnn based image recognition project \
CV-Image-Recognition-Using-CNN/
├── cnn_image_recognition/ <- Source code for use in this project.
├── model_cnn_base/ <- Source code of 1 model development.
├── __init__.py
├── model_config.yaml <- Store configuration of model (i.e. model compilation settings, training settings etc.)
├── predict.py <- Code to run model inference with trained model.
└── train.py <- Code to train model.
├── model_cnn_enhance/
├── __init__.py
├── model_config.yaml
├── predict.py
└── train.py
├── __init__.py
├── config.py <- Store project default variables and configuration (i.e. deafult directories, mlflow uri etc.)
├── dataset.py <- Scripts to download and extract data.
├── evaluation.py <- Scripts to run evaluation on the trained models.
├── features.py <- Scripts to prepare train, validation, test datasets.
├── preprocessing.py <- Scripts to preprocess and get image data for model training, evaluation.
├── project_config.yaml <- Store default project settings.
├── utils.py <- Scripts with reusable functions.
├── data/
├── external/ <- Data from third party sources.
├── interim/ <- Intermediate data that has been transformed.
├── processed/ <- The final data sets for modeling i.e. train,test,validation.
├── raw/ <- Stores the original data.
├── models/ <- Stores the serialized trained models.
├── model_cnn_base/
├── model_cnn_enhance/
├── mlruns/ <- Experiment tracking, artifacts and model registry.
├── notebooks/ <- Stores the Jupyter notebook used for EDA, experiments etc.
└── omniglot_cnn.ipynb
├── reports/
├── figures/ <- Stores the generated graphics and figures used in reporting.
├── tests/ <- To store the test scripts
└── test_data.py
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── pyproject.toml
├── requirements.txt
├── setup.cfg
About:
In this project, selected problem is the image recognition and classification on the handwritten characters from multiple writing systems and alphabets by using the Convolutional Neural Network (CNN) from Deep Learning. The objective of the model is to identify and correctly classify handwritten characters taken from the Omniglot dataset. This dataset consists of 1623 different characters across 50 different writing systems. This problem is more complex compared to the digit recognition due to the variety of character shapes, styles and the limited amount of data per character class.
CNN model is selected from the Tensorflow as a start from basic, to observe the effectiveness of the model before going into complex model. The implementation consists of several key function to preprocess the Omniglot dataset.
Step 1: Data Preparation
In this step, Omniglot data is downloaded and perform extraction from zip. These actions are performed through data ingestion pipeline, and output to local data directory.
Step 2: Exploratory Data Analysis and Dataset Split
In this step, EDA is performed on the Omniglot data to understand the class distribution, ensuring the dataset to be balanced in the available class for model training. Then, the data labelling is performed and followed by data splitting into three set, train, validation and test set with stratification enabled
Observation from EDA: No issue of unbalanced dataset. All classes have the same number of samples. Dataset information:
- Number of class available: 1623
- Total number of images: 32460
- No. of samples per class: 20
Data Splitting Configuration:
- Train dataset size: 20774 images (64%)
- Validation dataset size: 5194 images (16%)
- Test dataset size: 6492 images (20%)
Step 3: Data Processing
In this step, each of the image data with its labelled is preprocessed and formatted into Tensor Dataset for Tensorflow Model training.
Preprocessing step as below:
- Image to be decoded as single channel.
- Resize to 28x28 pixels.
- Array value normalized [0,1].
- Reshape to expected tensor format.
Step 4: Model Architecture Design
In this step, the basic CNN classification model from Tensorflow is implemented as a start to test on the effectiveness of basic model (Taken from: https://www.tensorflow.org/tutorials/images/cnn). The model architecture implemented as a sequential layer in the function as below:
- Input Layer: Grayscale images with size 28x28 pixels are accepted.
- 3 Convolutional blocks:
- Block 1: 32 filters of size 3x3, followed by the MaxPooling and BatchNormalization.
- Block 2: 64 filters of size 3x3, followed by MaxPooling and BatchNormalization.
- Block 3: 64 filters of size 3x3, followed by Dropout of 20%.
- Flattening Layer: To convert the 2D features to 1D.
- Two Dense Layers:
- Layer 1: 64 neurons with ReLU activation.
- Layer 2: Output layer with the number of classes neurons.
On top of the basic model, BatchNormalization layers are implemented to accelerate the convergence speed.
Dropout layers are implemented to reduce overfitting of the basic model. As a result, this model will be return from when the function “cnn_model” is called.
Step 5: Model Training
In this step, all the functions block are integrated to perform function of fetching Tensor Dataset, model setup and training configuration.
- First, it will load the training data and validation data through crafted functions.
- Next, CNN model to be created, and stopping criteria to be implemented to reduce computational resources.
- Model Compilation configuration:
- Adam optimizer: Optimizer used in this model to save time and effort on experimenting the learning rate, thereby using ‘Adam’ can dynamically adjust the learning rate based on the historical gradient magnitudes.
- Sparse categorical cross entropy: As this is a multiclass classification problem, selected as the loss function to minimise the difference between true and predicted distributions by penalizing incorrect predictions effectively. Sparse type is used to in the target variable, it is provided as integers instead of one-hot encoded vectors.
- Besides, to enhance the learning of model, batch size of 64 is used during the model fitting.
- Mlflow experiment tracking is used to track information as below:
- Model Parameters
- Metrics (Accuracy as general metrics for comparison)
- Model configuration artifacts
- Trained model logging.
Step 6: Model Evaluation
In this step, trained model are evaluated through the plots of historical data of the mlflow experiment log of model training.
- Graph of training vs validation accuracy/loss over epochs are used to check on the learning capabilities of the model and the generalisation of the model.
- Validation accuracy across the training also being used to compare the learning capability of different models for selection of final model to test.
Step 7: Model Testing
In this step, there are two tests to be conducted.
- Holdout sample testing: Test dataset is used to test the model prediction performance on unseen data.
- Cross validation: Full training dataset (Train+Validation) is used to perform 5-fold cross validation on the model performance.
Based on this basic model architecture with parameters as in Figure 1, the best test accuracy of the character classification achieved as below:
- Test accuracy: 51.76%
- Batch Size: 32
- Epochs: 50
Figure 1: Basic Model Architecture Summary

Figure 2 shows the graph of accuracy vs epoch of training accuracy and validation accuracy also showed the sign of overfitting as the training model continues to improve but validation accuracy does not show improvement.
Figure 2: Learning Curve (loss vs epochs) of Basic Model
For an image classification problem, 51.76% is relatively low. It could be due to reason as below:
- The model suffered from overfitting as shown in the learning curve causing its poor prediction on test set.
- The basic model is too simple to recognise and predict up to 1623 number of classes. There is only 3 convolutional layers and final 64 dense layer. This might result in limited feature extraction capability for complex variations. Furthermore, the shallow architecture might not be enough to capture hierarchical features, therefore struggle with fine-grained distinctions between similar characters across alphabets.
- Small dataset of image for training as it is only 20774 (64%) images for model training, therefore the model does not learn sufficiently on the character pattern and variation.
To confirm the hypothesis above, 5 experiments to be conducted as below:
- Add dropout layer (rate=0.2) after all conv2d layer to increase regularisation for reduce overfitting.
- Add weight decay in Conv2D and Dense layer with strength of 0.001 to reduce overfitting.
- Increase dataset size (from 64% to 72%, +2597 images) for training by reducing validation size.
- An enhanced model architecture with increased complexity. (Refer Figure 3 for the architecture, Figure 4 for architecture comparison)
- Early stopping criteria during model fitting to prevent overfitting.
Table below summarises the experimental results.
| Experiment | Test Accuracy (%) |
|---|---|
| Basic (Benchmark) | 51.76 |
| Implement dropout after Conv2D (0.2) | 66.65 (+14.89%) |
| Implement L2 Regularizer in Conv2D and Dense (0.001) | 52.66 (+0.9%) |
| Increase dataset size | 64.86 (+13.1%) |
| Increase model complexity | 72.74 (+20.98%) |
| Implement early stopping (patience=20) | 48.41 (-3.35%) |
As a result, model evaluation using test dataset yielded as below:
- Increase model complexity
Yielded the most improvement in accuracy, an increment of 20.98% to a test accuracy of 72.74%. Learning curve of the enhanced model also shows overfitting. (Refer Figure 2-1)
| Figure 3: Enhanced Model Architecture Summary | Figure 2-1: Learning Curve of Enhanced Model |
|---|---|
![]() |
![]() |
Figure 4: Model Architecture Comparison

- Implement dropout after Conv2D (0.2)
Improved the test accuracy with an increment of 14.89%, reducing the overfitting. It can be observed the overfitting improved with the validation curve reduces over epochs. (Refer Figure 2-2 below)
Figure 2-2: Learning Curve of Basic Model (Before Dropout and After Dropout) \
-
Increase dataset size
Yielded improvement of 13.1% increment in test accuracy. Increasing dataset size helps as model learns more variation and more weight updates. -
Implement L2 Regularizer in Conv2D and Dense (0.001)
Insignificant improvement observed, not as effective as regularization at layers. (Refer Figure 2-3)
Figure 2-3: Learning Curve of Basic Model (Dropout vs L2 Weight Decay) \
- Implement early stopping (patience=20)
Implement early stopping criteria yielded reduced in test accuracy by 3.35%. Pre-matured stop in training happened.
Based on these experiments, it explained that;
- A more complex model might be required to recognise and classify complex problem like handwriting image effectively.
- More data improved the model learning.
- Regularization with dropout layer reduced overfitting thus improved the model generalization.
Next, all the improvements idea is incorporated one by one into the enhanced model to check on its improvement, ablation experiment with enhanced model. Learning rate scheduler and early stopping is implemented to optimise the learning and computational effort in training.
| Experiment | Test Accuracy (%) |
|---|---|
| Enhanced | 72.74 |
| Implement dropout after Conv2D (0.2) | 73.97 (+1.23%) |
| Increase dataset | 77.26 (+4.52%) |
| lr scheduling, Early stopping | 76.90 (+4.16%) |
Final model configuration: Enhanced CNN model (As in Figure 3)
Test Accuracy: 76.90%
CV Accuracy: 74.66%
Brenden M. Lake et al. Human-level concept learning through probabilistic program induction. Science350,1332-1338(2015).DOI:10.1126/science.aab3050
- F-score/AUC metrics
- Pre-trained model like YOLO, ResNet50.
- MLOps flow to improve:
- Robust data configs, model configs logging
- Exp tracking not flexible enough (need change name in config file, script)
- Missing dataset used logging
- Learning curve to shown automatically
- Run name to be more representative and automatic
- Evaluation run to link with model run
- Local directory model registry-more specific model name
- CV evaluation model to be more flexible tackle different fitting configuration from model (lr schedule, early stop)





