๐ก๏ธ AI-Powered Network Intrusion Detection System
AI-Powered Network Intrusion Detection System (NIDS) is an interactive machine learning web application that detects malicious activity in computer network traffic. Built using Python and Streamlit, it provides a complete end-to-end pipeline โ from raw data ingestion to real-time threat classification.
What Problem Does It Solve?
Every day, millions of data packets flow through computer networks. Hidden within normal traffic (browsing, email, streaming) can be cyberattacks โ DDoS floods, port scans, brute force logins, bot activity, and more. Manually monitoring network logs is impossible at scale.
This project uses Machine Learning to automatically learn the difference between normal (BENIGN) and malicious (ATTACK) traffic patterns, and then classify new, unseen traffic in real-time with confidence scores.
CSV Upload โ Data Cleaning โ Feature Selection โ Model Training โ Evaluation โ Live Detection
Upload a network traffic CSV file (CIC-IDS2017/2018 format or similar)
Explore the dataset โ view statistics, class distribution, data quality, and feature correlations
Train an ML model โ choose between Random Forest, XGBoost, or Decision Tree with configurable hyperparameters
Evaluate โ view accuracy, F1-score, precision, recall, confusion matrix, ROC curves, precision-recall curves, and feature importance rankings
Simulate โ feed individual network packets into the trained model and get instant classification (BENIGN / Attack Type) with confidence percentages
Export โ download the trained model (.joblib) for production deployment, and export prediction logs as CSV
The system classifies network traffic into:
โ
BENIGN โ Normal, safe traffic (web browsing, video streaming, emails)
๐จ DDoS โ Distributed Denial of Service attacks
๐จ DoS Hulk / GoldenEye / Slowloris โ Various Denial of Service attack variants
๐จ PortScan โ Network reconnaissance/scanning
๐จ Bot โ Botnet command & control traffic
๐จ FTP-Patator / SSH-Patator โ Brute force login attacks
๐จ Web Attack โ SQL Injection, XSS, Brute Force on web apps
๐จ Infiltration โ Lateral movement inside a network
๐จ Heartbleed โ OpenSSL vulnerability exploitation
Feature
Description
Multi-Model Support
Random Forest, XGBoost, Decision Tree โ train and compare algorithms
Interactive Dashboard
5-tab Streamlit UI with sidebar controls and real-time updates
Data Quality Audit
Auto-detects missing values, duplicates, class imbalance, data types
Smart Feature Selection
Auto-selects 33 CIC-IDS features, or manual custom selection
5 Performance Metrics
Accuracy, F1 Score, Precision, Recall, Threat Count
4 Visualizations
Confusion Matrix, ROC Curve, Precision-Recall Curve, Feature Importance
Class Distribution
Donut chart showing BENIGN vs attack type breakdown
Correlation Matrix
Heatmap of top feature correlations
Live Traffic Simulator
Manual input or random sampling with real-time prediction
Confidence Scoring
predict_proba() โ shows probability for each class, not just the label
Attack Type Identification
Maps prediction back to human-readable label (e.g., "DDoS", "Bot")
Risk Level Assessment
Categorizes predictions as HIGH / LOW risk
Model Export
Download trained model as .joblib for deployment
Scaler Export
Download fitted StandardScaler for consistent preprocessing
Prediction Logging
Timestamped audit trail of all predictions, exportable as CSV
Feature Normalization
StandardScaler preprocessing for consistent model performance
Stratified Splitting
Preserves class ratios in train/test split
๐ ๏ธ Complete Tech Stack
Technology
Version
Role
Python
3.13
Primary programming language for the entire application
Technology
Version
Role
Streamlit
1.54
Full-stack web framework โ handles frontend UI, sidebar controls, tabs, file uploads, buttons, sliders, metrics display, and server-side rendering. No HTML/CSS/JS needed.
Machine Learning & Data Science
Technology
Version
Role
scikit-learn
1.8
Core ML library โ provides RandomForestClassifier, DecisionTreeClassifier, train_test_split, StandardScaler, LabelEncoder, accuracy_score, f1_score, precision_score, recall_score, confusion_matrix, classification_report, roc_curve, precision_recall_curve, auc
XGBoost
3.2
Gradient boosting library โ provides XGBClassifier for high-performance ensemble learning with regularization
Pandas
2.3
Data manipulation โ CSV loading, DataFrame operations, column cleaning, type conversion, statistical summaries
NumPy
2.4
Numerical computing โ array operations, random generation, infinity/NaN handling, data type casting
Technology
Version
Role
Matplotlib
3.10
Base plotting library โ creates ROC curves, precision-recall curves, feature importance bar charts, confidence breakdown charts, custom colormaps
Seaborn
0.13
Statistical visualization โ generates styled confusion matrix heatmaps and feature correlation heatmaps on top of matplotlib
Technology
Version
Role
Joblib
1.5
Model serialization โ saves trained sklearn/XGBoost models and fitted scalers to .joblib binary format for download and reuse
Python Standard Library Modules Used
Module
Role
time
Training duration measurement
gc
Garbage collection for memory management after training
io
In-memory byte streams for model export (BytesIO)
datetime
Timestamps for prediction logging
Tool
Role
Git
Version control
GitHub
Repository hosting & collaboration
pip
Package management
venv
Virtual environment isolation
CSV Upload & Data Explorer
Intrusion Detection Alert
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STREAMLIT FRONTEND โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโ โ
โ โ Data โ Model โ Perf. โ Live โ Export โ โ
โ โ Explorer โ Training โ Metrics โ Simulate โ & Logs โ โ
โ โโโโโโฌโโโโโโดโโโโโฌโโโโโโดโโโโโฌโโโโโโดโโโโโฌโโโโโโดโโโโฌโโโโโโ โ
โ โ โ โ โ โ โ
โ โโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโ โ
โ โ ML PIPELINE (scikit-learn) โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ โ
โ โ โ Preprocess โโโโถโ Training โโโโถโ Inference โ โ โ
โ โ โ โ โ โ โ โ โ โ
โ โ โ LabelEncoderโ โ RandomForest โ โ predict() โ โ โ
โ โ โ StdScaler โ โ XGBoost โ โ predict โ โ โ
โ โ โ NaN/Inf fix โ โ DecisionTree โ โ _proba() โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ VISUALIZATION (matplotlib + seaborn) โ โ
โ โ โ โ
โ โ Confusion Matrix ยท ROC Curve ยท PR Curve ยท Feature โ โ
โ โ Importance ยท Class Distribution ยท Correlation Matrix โ โ
โ โ Confidence Breakdown ยท Statistical Summary โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ EXPORT (joblib + pandas) โ โ
โ โ โ โ
โ โ Model .joblib ยท Scaler .joblib ยท Prediction Log CSV โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Python 3.9 or higher
pip package manager
# Clone the repository
git clone https://github.com/cazy8/AI-Based-Network-Intrusion-Detection-System.git
cd AI-Based-Network-Intrusion-Detection-System
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Run the Streamlit dashboard
streamlit run nids_main_csv.py
Then open your browser to http://localhost:8501 and:
Upload a CIC-IDS CSV file via the sidebar (or use the included sample_dataset.csv)
Explore your data in the Data Explorer tab โ view quality report, statistics, distributions
Configure model parameters in the sidebar โ algorithm, split ratio, estimators, depth
Train the model by clicking "Train Model" and watch the progress bar
Analyze performance metrics โ accuracy, F1, confusion matrix, ROC/PR curves
Simulate live traffic detection with confidence scores and risk levels
Export the trained model and prediction logs for production use
This project works with CIC-IDS2017 and CIC-IDS2018 datasets from the Canadian Institute for Cybersecurity. A synthetic sample_dataset.csv (5,000 rows) is included in the repo for quick testing.
Dataset
Link
Size
CIC-IDS2017
Download
~6 GB
CIC-IDS2018
Download
~16 GB
Sample (included)
sample_dataset.csv
5,000 rows
Key Features Used from Dataset
Feature
Description
Destination Port
Target service port (80=HTTP, 443=HTTPS, 22=SSH)
Flow Duration
Total duration of the network flow (ฮผs)
Total Fwd/Bwd Packets
Packet count in forward and backward direction
Packet Length Mean/Max/Std
Statistical properties of packet sizes
Flow Bytes/s
Data transfer rate
Flow Packets/s
Packet transmission rate
Flow IAT Mean/Std
Inter-arrival time between packets
SYN/ACK/PSH/RST/URG Flag Count
TCP flag statistics (key attack indicators)
Active/Idle Mean
Connection activity patterns
Init Win Bytes
Initial TCP window size (forward/backward)
Algorithm
Type
Strengths
Random Forest
Ensemble (Bagging)
Robust, handles noise well, provides feature importances, parallelizable
XGBoost
Ensemble (Boosting)
High accuracy, built-in regularization (L1/L2), handles imbalanced data
Decision Tree
Single Tree
Fast, interpretable, good baseline model
Technique
Purpose
LabelEncoder
Converts text labels ("BENIGN", "DDoS") โ numeric (0, 1, 2...)
StandardScaler
Normalizes features to zero mean, unit variance โ prevents features with large ranges from dominating
Inf/NaN Replacement
Replaces infinity values and missing data with 0 for model stability
Stratified Train/Test Split
Ensures each class (BENIGN, DDoS, etc.) has proportional representation in both train and test sets
Metric
What It Measures
Accuracy
% of all predictions that are correct
F1 Score
Harmonic mean of precision & recall (balanced metric)
Precision
Of all predicted attacks, how many are real attacks? (false positive rate)
Recall
Of all real attacks, how many did the model catch? (false negative rate)
ROC-AUC
Model's ability to distinguish between classes across all thresholds
Confusion Matrix
Detailed breakdown of correct vs incorrect predictions per class
AI-Based-Network-Intrusion-Detection-System/
โโโ nids_main_csv.py # Main Streamlit application (500+ lines)
โโโ sample_dataset.csv # Synthetic CIC-IDS test data (5,000 rows)
โโโ requirements.txt # Python dependencies with version pins
โโโ .gitignore # Git ignore rules
โโโ LICENSE # MIT License
โโโ README.md # Project documentation (this file)
โโโ screenshots/ # App screenshots
โโโ start.png # Landing page
โโโ csv.png # CSV upload view
โโโ dataset_preview.png # Data explorer
โโโ model_training.png # Training interface
โโโ performance_matrix.png # Metrics dashboard
โโโ intrusion_detected.png # Alert screen
This project is licensed under the MIT License โ see the LICENSE file for details.
Built with โค๏ธ using Python & Streamlit
โญ Star this repo if you found it helpful!