Skip to content

cazy8/AI-Based-Network-Intrusion-Detection-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ AI-Powered Network Intrusion Detection System

Python Streamlit scikit-learn XGBoost Pandas NumPy Matplotlib Seaborn License


๐Ÿ“– Introduction

AI-Powered Network Intrusion Detection System (NIDS) is an interactive machine learning web application that detects malicious activity in computer network traffic. Built using Python and Streamlit, it provides a complete end-to-end pipeline โ€” from raw data ingestion to real-time threat classification.

What Problem Does It Solve?

Every day, millions of data packets flow through computer networks. Hidden within normal traffic (browsing, email, streaming) can be cyberattacks โ€” DDoS floods, port scans, brute force logins, bot activity, and more. Manually monitoring network logs is impossible at scale.

This project uses Machine Learning to automatically learn the difference between normal (BENIGN) and malicious (ATTACK) traffic patterns, and then classify new, unseen traffic in real-time with confidence scores.

How It Works

CSV Upload โ†’ Data Cleaning โ†’ Feature Selection โ†’ Model Training โ†’ Evaluation โ†’ Live Detection
  1. Upload a network traffic CSV file (CIC-IDS2017/2018 format or similar)
  2. Explore the dataset โ€” view statistics, class distribution, data quality, and feature correlations
  3. Train an ML model โ€” choose between Random Forest, XGBoost, or Decision Tree with configurable hyperparameters
  4. Evaluate โ€” view accuracy, F1-score, precision, recall, confusion matrix, ROC curves, precision-recall curves, and feature importance rankings
  5. Simulate โ€” feed individual network packets into the trained model and get instant classification (BENIGN / Attack Type) with confidence percentages
  6. Export โ€” download the trained model (.joblib) for production deployment, and export prediction logs as CSV

What the AI Detects

The system classifies network traffic into:

  • โœ… BENIGN โ€” Normal, safe traffic (web browsing, video streaming, emails)
  • ๐Ÿšจ DDoS โ€” Distributed Denial of Service attacks
  • ๐Ÿšจ DoS Hulk / GoldenEye / Slowloris โ€” Various Denial of Service attack variants
  • ๐Ÿšจ PortScan โ€” Network reconnaissance/scanning
  • ๐Ÿšจ Bot โ€” Botnet command & control traffic
  • ๐Ÿšจ FTP-Patator / SSH-Patator โ€” Brute force login attacks
  • ๐Ÿšจ Web Attack โ€” SQL Injection, XSS, Brute Force on web apps
  • ๐Ÿšจ Infiltration โ€” Lateral movement inside a network
  • ๐Ÿšจ Heartbleed โ€” OpenSSL vulnerability exploitation

๐Ÿ“‹ Table of Contents


โœจ Features

Feature Description
Multi-Model Support Random Forest, XGBoost, Decision Tree โ€” train and compare algorithms
Interactive Dashboard 5-tab Streamlit UI with sidebar controls and real-time updates
Data Quality Audit Auto-detects missing values, duplicates, class imbalance, data types
Smart Feature Selection Auto-selects 33 CIC-IDS features, or manual custom selection
5 Performance Metrics Accuracy, F1 Score, Precision, Recall, Threat Count
4 Visualizations Confusion Matrix, ROC Curve, Precision-Recall Curve, Feature Importance
Class Distribution Donut chart showing BENIGN vs attack type breakdown
Correlation Matrix Heatmap of top feature correlations
Live Traffic Simulator Manual input or random sampling with real-time prediction
Confidence Scoring predict_proba() โ€” shows probability for each class, not just the label
Attack Type Identification Maps prediction back to human-readable label (e.g., "DDoS", "Bot")
Risk Level Assessment Categorizes predictions as HIGH / LOW risk
Model Export Download trained model as .joblib for deployment
Scaler Export Download fitted StandardScaler for consistent preprocessing
Prediction Logging Timestamped audit trail of all predictions, exportable as CSV
Feature Normalization StandardScaler preprocessing for consistent model performance
Stratified Splitting Preserves class ratios in train/test split

๐Ÿ› ๏ธ Complete Tech Stack

Core Language

Technology Version Role
Python 3.13 Primary programming language for the entire application

Web Framework

Technology Version Role
Streamlit 1.54 Full-stack web framework โ€” handles frontend UI, sidebar controls, tabs, file uploads, buttons, sliders, metrics display, and server-side rendering. No HTML/CSS/JS needed.

Machine Learning & Data Science

Technology Version Role
scikit-learn 1.8 Core ML library โ€” provides RandomForestClassifier, DecisionTreeClassifier, train_test_split, StandardScaler, LabelEncoder, accuracy_score, f1_score, precision_score, recall_score, confusion_matrix, classification_report, roc_curve, precision_recall_curve, auc
XGBoost 3.2 Gradient boosting library โ€” provides XGBClassifier for high-performance ensemble learning with regularization
Pandas 2.3 Data manipulation โ€” CSV loading, DataFrame operations, column cleaning, type conversion, statistical summaries
NumPy 2.4 Numerical computing โ€” array operations, random generation, infinity/NaN handling, data type casting

Data Visualization

Technology Version Role
Matplotlib 3.10 Base plotting library โ€” creates ROC curves, precision-recall curves, feature importance bar charts, confidence breakdown charts, custom colormaps
Seaborn 0.13 Statistical visualization โ€” generates styled confusion matrix heatmaps and feature correlation heatmaps on top of matplotlib

Utilities

Technology Version Role
Joblib 1.5 Model serialization โ€” saves trained sklearn/XGBoost models and fitted scalers to .joblib binary format for download and reuse

Python Standard Library Modules Used

Module Role
time Training duration measurement
gc Garbage collection for memory management after training
io In-memory byte streams for model export (BytesIO)
datetime Timestamps for prediction logging

Development & Deployment

Tool Role
Git Version control
GitHub Repository hosting & collaboration
pip Package management
venv Virtual environment isolation

๐Ÿ“ธ Screenshots

Landing Page

Landing Page

CSV Upload & Data Explorer

CSV Upload Dataset Preview

Model Training

Model Training

Performance Metrics

Performance Metrics

Intrusion Detection Alert

Intrusion Detected


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    STREAMLIT FRONTEND                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Data   โ”‚  Model   โ”‚  Perf.   โ”‚  Live    โ”‚ Export  โ”‚ โ”‚
โ”‚  โ”‚ Explorer โ”‚ Training โ”‚ Metrics  โ”‚ Simulate โ”‚ & Logs  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚       โ”‚          โ”‚          โ”‚          โ”‚         โ”‚        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚              ML PIPELINE (scikit-learn)               โ”‚ โ”‚
โ”‚  โ”‚                                                       โ”‚ โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚ โ”‚
โ”‚  โ”‚  โ”‚ Preprocess  โ”‚โ”€โ”€โ–ถโ”‚   Training   โ”‚โ”€โ”€โ–ถโ”‚ Inference  โ”‚  โ”‚ โ”‚
โ”‚  โ”‚  โ”‚             โ”‚   โ”‚              โ”‚   โ”‚            โ”‚  โ”‚ โ”‚
โ”‚  โ”‚  โ”‚ LabelEncoderโ”‚   โ”‚ RandomForest โ”‚   โ”‚ predict()  โ”‚  โ”‚ โ”‚
โ”‚  โ”‚  โ”‚ StdScaler   โ”‚   โ”‚ XGBoost      โ”‚   โ”‚ predict    โ”‚  โ”‚ โ”‚
โ”‚  โ”‚  โ”‚ NaN/Inf fix โ”‚   โ”‚ DecisionTree โ”‚   โ”‚  _proba()  โ”‚  โ”‚ โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚           VISUALIZATION (matplotlib + seaborn)        โ”‚ โ”‚
โ”‚  โ”‚                                                       โ”‚ โ”‚
โ”‚  โ”‚  Confusion Matrix ยท ROC Curve ยท PR Curve ยท Feature    โ”‚ โ”‚
โ”‚  โ”‚  Importance ยท Class Distribution ยท Correlation Matrix โ”‚ โ”‚
โ”‚  โ”‚  Confidence Breakdown ยท Statistical Summary           โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚              EXPORT (joblib + pandas)                  โ”‚ โ”‚
โ”‚  โ”‚                                                       โ”‚ โ”‚
โ”‚  โ”‚  Model .joblib ยท Scaler .joblib ยท Prediction Log CSV  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Installation

Prerequisites

  • Python 3.9 or higher
  • pip package manager

Setup

# Clone the repository
git clone https://github.com/cazy8/AI-Based-Network-Intrusion-Detection-System.git
cd AI-Based-Network-Intrusion-Detection-System

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate        # Linux/Mac
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

๐Ÿ’ก Usage

# Run the Streamlit dashboard
streamlit run nids_main_csv.py

Then open your browser to http://localhost:8501 and:

  1. Upload a CIC-IDS CSV file via the sidebar (or use the included sample_dataset.csv)
  2. Explore your data in the Data Explorer tab โ€” view quality report, statistics, distributions
  3. Configure model parameters in the sidebar โ€” algorithm, split ratio, estimators, depth
  4. Train the model by clicking "Train Model" and watch the progress bar
  5. Analyze performance metrics โ€” accuracy, F1, confusion matrix, ROC/PR curves
  6. Simulate live traffic detection with confidence scores and risk levels
  7. Export the trained model and prediction logs for production use

๐Ÿ“‚ Dataset

This project works with CIC-IDS2017 and CIC-IDS2018 datasets from the Canadian Institute for Cybersecurity. A synthetic sample_dataset.csv (5,000 rows) is included in the repo for quick testing.

Dataset Link Size
CIC-IDS2017 Download ~6 GB
CIC-IDS2018 Download ~16 GB
Sample (included) sample_dataset.csv 5,000 rows

Key Features Used from Dataset

Feature Description
Destination Port Target service port (80=HTTP, 443=HTTPS, 22=SSH)
Flow Duration Total duration of the network flow (ฮผs)
Total Fwd/Bwd Packets Packet count in forward and backward direction
Packet Length Mean/Max/Std Statistical properties of packet sizes
Flow Bytes/s Data transfer rate
Flow Packets/s Packet transmission rate
Flow IAT Mean/Std Inter-arrival time between packets
SYN/ACK/PSH/RST/URG Flag Count TCP flag statistics (key attack indicators)
Active/Idle Mean Connection activity patterns
Init Win Bytes Initial TCP window size (forward/backward)

๐Ÿง  ML Techniques Used

Algorithms

Algorithm Type Strengths
Random Forest Ensemble (Bagging) Robust, handles noise well, provides feature importances, parallelizable
XGBoost Ensemble (Boosting) High accuracy, built-in regularization (L1/L2), handles imbalanced data
Decision Tree Single Tree Fast, interpretable, good baseline model

Preprocessing

Technique Purpose
LabelEncoder Converts text labels ("BENIGN", "DDoS") โ†’ numeric (0, 1, 2...)
StandardScaler Normalizes features to zero mean, unit variance โ€” prevents features with large ranges from dominating
Inf/NaN Replacement Replaces infinity values and missing data with 0 for model stability
Stratified Train/Test Split Ensures each class (BENIGN, DDoS, etc.) has proportional representation in both train and test sets

Evaluation Metrics

Metric What It Measures
Accuracy % of all predictions that are correct
F1 Score Harmonic mean of precision & recall (balanced metric)
Precision Of all predicted attacks, how many are real attacks? (false positive rate)
Recall Of all real attacks, how many did the model catch? (false negative rate)
ROC-AUC Model's ability to distinguish between classes across all thresholds
Confusion Matrix Detailed breakdown of correct vs incorrect predictions per class

๐Ÿ“ Project Structure

AI-Based-Network-Intrusion-Detection-System/
โ”œโ”€โ”€ nids_main_csv.py          # Main Streamlit application (500+ lines)
โ”œโ”€โ”€ sample_dataset.csv        # Synthetic CIC-IDS test data (5,000 rows)
โ”œโ”€โ”€ requirements.txt          # Python dependencies with version pins
โ”œโ”€โ”€ .gitignore               # Git ignore rules
โ”œโ”€โ”€ LICENSE                  # MIT License
โ”œโ”€โ”€ README.md                # Project documentation (this file)
โ””โ”€โ”€ screenshots/             # App screenshots
    โ”œโ”€โ”€ start.png            # Landing page
    โ”œโ”€โ”€ csv.png              # CSV upload view
    โ”œโ”€โ”€ dataset_preview.png  # Data explorer
    โ”œโ”€โ”€ model_training.png   # Training interface
    โ”œโ”€โ”€ performance_matrix.png # Metrics dashboard
    โ””โ”€โ”€ intrusion_detected.png # Alert screen

๐Ÿ”ฎ Future Improvements

  • Deep Learning model (LSTM / Autoencoder) for anomaly detection
  • Real-time packet capture integration with Scapy / PyShark
  • REST API endpoint with FastAPI for production deployment
  • Docker containerization for easy deployment
  • Database-backed prediction logging (SQLite / PostgreSQL)
  • Model comparison dashboard โ€” train and compare multiple models simultaneously
  • SHAP explainability for individual predictions
  • Email/Slack alerting on threat detection
  • Batch prediction mode for large CSV files
  • User authentication for multi-user environments

๐Ÿ“„ License

This project is licensed under the MIT License โ€” see the LICENSE file for details.


Built with โค๏ธ using Python & Streamlit
โญ Star this repo if you found it helpful!

About

๐Ÿ›ก๏ธ AI-powered Network Intrusion Detection System โ€” Detects cyberattacks (DDoS, PortScan, Bot, Brute Force) in network traffic using Random Forest, XGBoost & Decision Tree. Built with Python, Streamlit, scikit-learn.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages