Skip to content

Hardware-aware optimization of CNN inference using INT8 quantization in PyTorch. Includes benchmarking, profiling, and visualization of accuracy, latency, and model size for edge deployment.

Notifications You must be signed in to change notification settings

Armitawh/Edge-ML-Quantization-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Edge ML Quantization Benchmark (PyTorch)

πŸ“Œ Overview

This project demonstrates hardware-aware optimization of deep learning models for edge and embedded deployment using post-training INT8 quantization in PyTorch.

A convolutional neural network (CNN) is trained on the MNIST dataset and then optimized using dynamic quantization to reduce model size and analyze inference performance trade-offs. The project includes benchmarking and visualization of accuracy, latency, and memory footprint.

This work is designed to reflect real-world constraints faced in edge AI, embedded ML, and ML compiler workflows.


🎯 Objectives

  • Train a baseline CNN model for image classification
  • Apply post-training INT8 quantization
  • Benchmark:
    • Model size
    • Inference latency
    • Classification accuracy
  • Visualize optimization trade-offs
  • Demonstrate hardware-aware ML deployment skills

🧠 Key Concepts

  • Convolutional Neural Networks (CNNs)
  • Post-training Dynamic Quantization
  • INT8 vs FP32 inference
  • Model size vs latency trade-offs
  • Edge ML performance benchmarking

πŸ— Model Architecture

  • Input: 28Γ—28 grayscale image
  • Conv Layers: Feature extraction
  • Max Pooling: Spatial downsampling
  • Fully Connected Layers: Classification
  • Output: 10 digit classes (0–9)

πŸ›  Technologies Used

  • Python
  • PyTorch
  • Torchvision
  • NumPy
  • Matplotlib
  • Jupyter Notebook

πŸ“Š Results Overview

The CNN model was trained for 3 epochs on the MNIST dataset and optimized using post-training INT8 dynamic quantization. Training converged smoothly with decreasing loss values, confirming stable learning behavior.

πŸ”Ή Training Loss

  • Epoch 1: 0.1716
  • Epoch 2: 0.0566
  • Epoch 3: 0.0397

This demonstrates effective feature learning and fast convergence.


πŸ”Ή Performance Comparison

Model Size (KB) Accuracy (%) Inference Time (s)
FP32 (Original) 4690.61 98.97 5.04
INT8 (Quantized) 1232.20 98.98 5.00

πŸ”Ή Key Observations

  • Model size reduced by ~74% using INT8 quantization.
  • Accuracy was preserved, with no measurable degradation after quantization.
  • Inference latency slightly increased on a CPU-only environment, highlighting the importance of hardware-specific INT8 acceleration (e.g., ARM NEON, DSP, NPU).

This result reflects a real-world edge AI trade-off: quantization significantly reduces memory footprint while latency gains depend on the underlying hardware and runtime backend.

About

Hardware-aware optimization of CNN inference using INT8 quantization in PyTorch. Includes benchmarking, profiling, and visualization of accuracy, latency, and model size for edge deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages