Skip to content

Advanced network traffic forecasting framework using SARIMA time series models on CESNET-TimeSeries-2023-2024 dataset. Includes automated retraining, comprehensive evaluation metrics (RMSE, SMAPE, RΒ²), and production-ready HPC batch processing scripts.

Notifications You must be signed in to change notification settings

KUNALSHAWW/TimeSeries-NetTraffic-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 TimeSeries-NetTraffic-Engine

Advanced Network Traffic Time Series Forecasting & Analysis Framework

Python License Jupyter SARIMA scikit-learn

Intelligent network traffic forecasting using state-of-the-art time series analysis on the CESNET-TimeSeries-2023-2024 dataset

Features β€’ Quick Start β€’ Documentation β€’ Dataset β€’ Examples β€’ Contributing


πŸ“‹ Table of Contents


🎯 Overview

TimeSeries-NetTraffic-Engine is a production-ready, enterprise-grade framework for network traffic forecasting and analysis using advanced time series modeling techniques. Built on top of the comprehensive CESNET-TimeSeries-2023-2024 dataset, this framework leverages SARIMA (Seasonal AutoRegressive Integrated Moving Average) models to predict network behavior patterns with high accuracy.

πŸŽ“ What is This Project About?

This framework provides researchers, network engineers, and data scientists with powerful tools to:

  • Forecast Network Traffic: Predict future network behavior using historical patterns
  • Analyze Time Series: Understand temporal patterns in network metrics across different aggregation levels
  • Evaluate Performance: Comprehensive evaluation using RMSE, SMAPE, and RΒ² metrics
  • Scale Analysis: Process multiple IP addresses, institutions, and subnets simultaneously
  • Automated Retraining: Implement sliding window approaches for continuous model improvement

🏒 Real-World Applications

  • Network Capacity Planning: Predict bandwidth requirements and optimize infrastructure
  • Anomaly Detection: Identify unusual traffic patterns by comparing predictions with actual values
  • Resource Optimization: Allocate network resources efficiently based on forecasted demand
  • Security Analytics: Detect potential DDoS attacks or unusual traffic patterns
  • SLA Management: Ensure service level agreements through predictive maintenance

✨ Features

πŸš€ Core Capabilities

  • πŸ“Š Multi-Scale Analysis: Support for 10-minute, 1-hour, and 1-day aggregation intervals
  • πŸ”„ Automated Retraining: Sliding window approach with configurable training and testing periods
  • πŸ“ˆ 18 Network Metrics: Comprehensive coverage including flows, packets, bytes, ASN diversity, port diversity, and TCP/UDP ratios
  • 🎯 High-Performance Forecasting: SARIMA model with optimized hyperparameters
  • πŸ“‰ Missing Value Handling: Intelligent gap-filling strategies for time series continuity
  • πŸ” Multi-Dataset Support: Works with IP addresses, institutions, and institution subnets
  • πŸ“Š Visualization Suite: Rich plotting capabilities for exploratory data analysis
  • ⚑ Parallel Processing: Metacentrum scripts for large-scale batch processing

πŸ› οΈ Technical Features

  • Reproducible Research: Clear documentation of all preprocessing and modeling steps
  • Scalable Architecture: Designed for processing thousands of time series
  • Flexible Configuration: Easy customization of model parameters and evaluation settings
  • Production-Ready Code: Clean, well-documented, and maintainable codebase
  • Comprehensive Evaluation: Multiple statistical metrics for robust performance assessment

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CESNET Time Series Dataset                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚IP Addressesβ”‚  β”‚ Institutionsβ”‚  β”‚ Institution Subnets  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Data Preprocessing & Gap Filling                β”‚
β”‚  β€’ Missing value imputation                                  β”‚
β”‚  β€’ Ratio metrics normalization (0.5)                         β”‚
β”‚  β€’ Temporal alignment                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   SARIMA Modeling Engine                     β”‚
β”‚  Order: (p=1, d=1, q=1)                                      β”‚
β”‚  Seasonal Order: (P=1, D=1, Q=1, M=168)                      β”‚
β”‚  Training: 744 hours (31 days)                               β”‚
β”‚  Testing: 168 hours (7 days)                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Sliding Window Retraining Loop                  β”‚
β”‚  β€’ Train on historical window                                β”‚
β”‚  β€’ Forecast next period                                      β”‚
β”‚  β€’ Slide window forward                                      β”‚
β”‚  β€’ Repeat until dataset exhausted                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Evaluation & Analysis                           β”‚
β”‚  β€’ RMSE (Root Mean Squared Error)                            β”‚
β”‚  β€’ SMAPE (Symmetric Mean Absolute Percentage Error)          β”‚
β”‚  β€’ RΒ² Score (Coefficient of Determination)                   β”‚
β”‚  β€’ Statistical distributions                                 β”‚
β”‚  β€’ Aggregate statistics                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Dataset

CESNET-TimeSeries-2023-2024 Dataset Structure

The framework works with the comprehensive CESNET network traffic dataset containing:

πŸ“ Dataset Parts

Dataset Part Description Granularity
IP Addresses (Sample) Representative sample of individual IP addresses Individual hosts
IP Addresses (Full) Complete set of monitored IP addresses Individual hosts
Institutions Aggregated traffic per institution Organizational level
Institution Subnets Traffic per institution subnet Network segment level

⏰ Temporal Aggregations

  • 10 Minutes: High-resolution, short-term pattern analysis
  • 1 Hour: Medium-resolution, ideal for daily pattern detection
  • 1 Day: Low-resolution, long-term trend analysis

πŸ“ˆ Network Metrics (18 Total)

Category Metrics
Volume n_flows, n_packets, n_bytes
ASN Diversity sum_n_dest_asn, average_n_dest_asn, std_n_dest_asn
Port Diversity sum_n_dest_ports, average_n_dest_ports, std_n_dest_ports
IP Diversity sum_n_dest_ip, average_n_dest_ip, std_n_dest_ip
Protocol Ratios tcp_udp_ratio_packets, tcp_udp_ratio_bytes
Direction Ratios dir_ratio_packets, dir_ratio_bytes
Flow Characteristics avg_duration, avg_ttl

πŸ’» Installation

Prerequisites

  • Python: 3.10.12 or higher
  • pip: Latest version
  • Operating System: Linux, macOS, or Windows
  • RAM: Minimum 8GB (16GB recommended for large-scale analysis)

Quick Installation

# Clone the repository
git clone https://github.com/KUNALSHAWW/TimeSeries-NetTraffic-Engine.git
cd TimeSeries-NetTraffic-Engine

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install required dependencies
pip install pandas==2.2.2 \
            numpy==1.24.4 \
            matplotlib==3.8.0 \
            scikit-learn==1.5.0 \
            statsmodels==0.14.1 \
            seaborn==0.13.0

Alternative: Requirements File

# Create requirements.txt
cat > requirements.txt << EOF
pandas==2.2.2
numpy==1.24.4
matplotlib==3.8.0
scikit-learn==1.5.0
statsmodels==0.14.1
seaborn==0.13.0
EOF

# Install from requirements
pip install -r requirements.txt

πŸš€ Quick Start

1️⃣ Basic Example - Explore Time Series

import pandas as pd
import matplotlib.pyplot as plt

# Load time series data
df_times = pd.read_csv('cesnet-time-series-2023-2024/times/times_1_hour.csv')
df_times['time'] = pd.to_datetime(df_times['time'])

# Load network traffic data
df = pd.read_csv('cesnet-time-series-2023-2024/ip_addresses_sample/agg_1_hour/1/103.csv')

# Visualize n_flows metric
plt.figure(figsize=(15, 5))
plt.plot(df_times['time'], df['n_flows'])
plt.title('Network Flows Over Time')
plt.xlabel('Time')
plt.ylabel('Number of Flows')
plt.show()

2️⃣ Jupyter Notebook Exploration

Launch the interactive example notebook:

jupyter notebook example.ipynb

This notebook provides:

  • βœ… Dataset loading and preprocessing
  • βœ… Time series visualization
  • βœ… SARIMA model training
  • βœ… Forecasting and evaluation

3️⃣ Command-Line SARIMA Retraining

python sarima_retraining.py \
    -p 1 -d 1 -q 1 \
    -P 1 -D 1 -Q 1 -M 168 \
    -t 744 -T 168 \
    --dataset ip_addresses_sample \
    --aggregation agg_1_hour \
    --metric n_flows \
    --id_ip 1/103.csv

Parameters Explained:

  • -p, -d, -q: ARIMA order (p=AR order, d=differencing, q=MA order)
  • -P, -D, -Q, -M: Seasonal ARIMA order (M=seasonal period)
  • -t: Training period (744 hours = 31 days)
  • -T: Testing period (168 hours = 7 days)
  • --dataset: Dataset part to use
  • --aggregation: Temporal aggregation level
  • --metric: Network metric to forecast
  • --id_ip: Specific IP/entity identifier

πŸ“š Usage Examples

Example 1: Batch Processing with Shell Scripts

For processing multiple time series in parallel on HPC clusters:

# Process all IP addresses in sample dataset
./metacentrum_scripts/sarima_retraining_ip_addresses_sample.sh

# Process all institutions
./metacentrum_scripts/sarima_retraining_institutions.sh

# Process all institution subnets
./metacentrum_scripts/sarima_retraining_institution_subnets.sh

Example 2: Custom Time Series Analysis

from statsmodels.tsa.statespace.sarimax import SARIMAX
import pandas as pd
import numpy as np

# Configuration
ORDER = (1, 1, 1)
SEASONAL_ORDER = (1, 1, 1, 168)
TRAINING_PERIOD = 744
TESTING_PERIOD = 168

# Load and prepare data
df = pd.read_csv('your_time_series.csv')
train_data = df['n_flows'][:TRAINING_PERIOD]

# Train SARIMA model
model = SARIMAX(train_data, order=ORDER, seasonal_order=SEASONAL_ORDER)
results = model.fit(disp=False)

# Forecast
forecast = results.forecast(steps=TESTING_PERIOD)

# Evaluate
from sklearn.metrics import root_mean_squared_error
rmse = root_mean_squared_error(df['n_flows'][TRAINING_PERIOD:TRAINING_PERIOD+TESTING_PERIOD], forecast)
print(f'RMSE: {rmse:.2f}')

Example 3: Visualizing Multiple Metrics

import matplotlib.pyplot as plt

metrics = ['n_flows', 'n_packets', 'n_bytes']
fig, axes = plt.subplots(len(metrics), 1, figsize=(15, 12))

for idx, metric in enumerate(metrics):
    axes[idx].plot(df['time'], df[metric])
    axes[idx].set_title(f'{metric} Over Time')
    axes[idx].set_ylabel(metric)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

βš™οΈ Model Configuration

SARIMA Hyperparameters

The default configuration is optimized for hourly network traffic data:

{
    "order": (1, 1, 1),           # (p, d, q) - ARIMA order
    "seasonal_order": (1, 1, 1, 168),  # (P, D, Q, M) - Seasonal ARIMA
    "training_period": 744,        # 31 days in hours
    "testing_period": 168,         # 7 days in hours
    "retraining_stride": 168       # Retrain every 7 days
}

Hyperparameter Tuning Guide

Parameter Description Typical Range Notes
p AR order 0-5 Number of lag observations
d Differencing order 0-2 Number of differences for stationarity
q MA order 0-5 Size of moving average window
P Seasonal AR order 0-2 Seasonal autoregressive order
D Seasonal differencing 0-1 Seasonal differencing degree
Q Seasonal MA order 0-2 Seasonal moving average order
M Seasonal period 24, 168, 8760 Hours in day/week/year

πŸ“Š Evaluation Metrics

1. Root Mean Squared Error (RMSE)

Measures the standard deviation of prediction errors.

$$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $$

Lower is better β€’ Sensitive to outliers β€’ Same units as target variable

2. Symmetric Mean Absolute Percentage Error (SMAPE)

Percentage-based metric treating over/under-estimation equally.

$$ \text{SMAPE} = \frac{100%}{n} \sum_{i=1}^{n} \frac{|y_i - \hat{y}_i|}{(|y_i| + |\hat{y}_i|)/2} $$

Range: 0-100% β€’ 0% = perfect β€’ Symmetric β€’ Scale-independent

3. RΒ² Score (Coefficient of Determination)

Proportion of variance in the target variable explained by the model.

$$ R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} $$

Range: -∞ to 1 β€’ 1 = perfect prediction β€’ 0 = baseline model β€’ Negative = worse than baseline


πŸ“ Project Structure

TimeSeries-NetTraffic-Engine/
β”‚
β”œβ”€β”€ πŸ““ example.ipynb                    # Interactive tutorial notebook
β”œβ”€β”€ πŸ““ analyze-results.ipynb            # Results analysis and visualization
β”œβ”€β”€ 🐍 sarima_retraining.py            # CLI tool for SARIMA retraining
β”‚
β”œβ”€β”€ πŸ“ metacentrum_scripts/            # HPC batch processing scripts
β”‚   β”œβ”€β”€ run_ip_addresses_sample.sh
β”‚   β”œβ”€β”€ run_institutions.sh
β”‚   β”œβ”€β”€ run_institution_subnets.sh
β”‚   β”œβ”€β”€ sarima_retraining_ip_addresses_sample.sh
β”‚   β”œβ”€β”€ sarima_retraining_institutions.sh
β”‚   └── sarima_retraining_institution_subnets.sh
β”‚
β”œβ”€β”€ πŸ“ cesnet-time-series-2023-2024/   # Dataset directory (not included)
β”‚   β”œβ”€β”€ times/                         # Timestamp files
β”‚   β”œβ”€β”€ ip_addresses_sample/           # Sample IP dataset
β”‚   β”œβ”€β”€ ip_addresses_full/             # Full IP dataset
β”‚   β”œβ”€β”€ institutions/                  # Institution-level data
β”‚   └── institution_subnets/           # Subnet-level data
β”‚
β”œβ”€β”€ πŸ“ results/                        # Output directory for predictions
β”‚   └── sarima-retraining/
β”‚       └── results/
β”‚
β”œβ”€β”€ πŸ“„ LICENSE                         # BSD 3-Clause License
└── πŸ“„ README.md                       # This file

πŸ”¬ Advanced Usage

Custom Preprocessing Pipeline

def custom_fill_missing(train_df, train_time_ids, strategy='mean'):
    """
    Custom missing value imputation strategy
    
    Args:
        train_df: Training dataframe
        train_time_ids: Expected time IDs
        strategy: 'mean', 'median', 'zero', or 'forward_fill'
    """
    df_missing = pd.DataFrame(columns=train_df.columns)
    df_missing.id_time = train_time_ids[~train_time_ids.isin(train_df.id_time)].values
    
    for column in train_df.columns:
        if column == "id_time":
            continue
        
        if strategy == 'mean':
            df_missing[column] = train_df[column].mean()
        elif strategy == 'median':
            df_missing[column] = train_df[column].median()
        elif strategy == 'zero':
            df_missing[column] = 0
        # Add more strategies as needed
    
    return pd.concat([train_df, df_missing]).sort_values(by="id_time").reset_index()[train_df.columns]

Multi-Metric Forecasting

# Forecast all metrics for a single time series
metrics = ['n_flows', 'n_packets', 'n_bytes']
predictions = {}

for metric in metrics:
    model = SARIMAX(df[metric], order=(1,1,1), seasonal_order=(1,1,1,168))
    results = model.fit(disp=False)
    predictions[metric] = results.forecast(steps=168)

# Create prediction dataframe
predictions_df = pd.DataFrame(predictions)

Parallel Processing with Joblib

from joblib import Parallel, delayed

def process_time_series(file_path, metric):
    """Process a single time series"""
    df = pd.read_csv(file_path)
    # ... training and prediction logic
    return predictions

# Process multiple files in parallel
results = Parallel(n_jobs=-1)(
    delayed(process_time_series)(file, 'n_flows') 
    for file in file_list
)

πŸ“ˆ Performance & Benchmarks

Computational Requirements

Operation Time (Avg) Memory Notes
Load 1-hour dataset ~2 seconds 50 MB Per IP address
SARIMA training (744 points) ~5-10 seconds 200 MB Single metric
Forecast (168 points) ~1 second 50 MB Using fitted model
Complete retraining cycle ~2-5 minutes 500 MB Full year, single metric

Scalability

  • Single IP Address: ~5 minutes for full analysis (all metrics)
  • 100 IP Addresses: ~8 hours (with parallel processing)
  • 1000 IP Addresses: ~3 days (recommended: HPC cluster)

Recommended Hardware

Scale CPU RAM Storage
Small (< 100 time series) 4 cores 8 GB 10 GB
Medium (100-1000 time series) 16 cores 32 GB 50 GB
Large (1000+ time series) 32+ cores 64+ GB 200+ GB

🀝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

  1. πŸ› Report Bugs: Open an issue with detailed reproduction steps
  2. πŸ’‘ Suggest Features: Share your ideas for improvements
  3. πŸ“ Improve Documentation: Help make our docs clearer
  4. πŸ”§ Submit Pull Requests: Contribute code improvements

Development Setup

# Fork and clone the repository
git clone https://github.com/KUNALSHAWW/TimeSeries-NetTraffic-Engine.git
cd TimeSeries-NetTraffic-Engine

# Create a development branch
git checkout -b feature/your-feature-name

# Make your changes and test thoroughly
# ...

# Commit with clear messages
git commit -m "Add: Description of your changes"

# Push to your fork
git push origin feature/your-feature-name

# Open a Pull Request on GitHub

Code Style Guidelines

  • Follow PEP 8 for Python code
  • Add docstrings to all functions
  • Include type hints where applicable
  • Write unit tests for new features
  • Update documentation for API changes

πŸ“„ License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Copyright (c) 2024, CESNET
All rights reserved.

Third-Party Licenses

  • pandas: BSD 3-Clause License
  • NumPy: BSD License
  • scikit-learn: BSD 3-Clause License
  • statsmodels: BSD License
  • matplotlib: PSF License

πŸ™ Acknowledgments

CESNET

Special thanks to CESNET for providing the comprehensive CESNET-TimeSeries-2023-2024 dataset, which makes this research and development possible.

Research Foundation

This work builds upon established research in:

  • Time series forecasting
  • Network traffic analysis
  • SARIMA modeling
  • Statistical learning

Open Source Community

Built with ❀️ using:


πŸ“– Citation

If you use this framework in your research, please cite:

@software{timeseries_nettraffic_engine,
  title = {TimeSeries-NetTraffic-Engine: Advanced Network Traffic Time Series Forecasting Framework},
  author = {Kunal Shaw},
  year = {2024},
  url = {https://github.com/KUNALSHAWW/TimeSeries-NetTraffic-Engine},
  note = {Based on CESNET-TimeSeries-2023-2024 dataset}
}

πŸ“ž Contact & Support

Get Help

Connect


πŸ—ΊοΈ Roadmap

Current Version (v1.0)

  • βœ… SARIMA forecasting implementation
  • βœ… Multi-dataset support
  • βœ… Comprehensive evaluation metrics
  • βœ… Jupyter notebooks for exploration
  • βœ… HPC batch processing scripts

Upcoming Features (v2.0)

  • πŸ”„ LSTM/GRU deep learning models
  • πŸ”„ Prophet integration
  • πŸ”„ Real-time forecasting API
  • πŸ”„ Web-based visualization dashboard
  • πŸ”„ Automated hyperparameter tuning
  • πŸ”„ Anomaly detection module

Future Vision (v3.0)

  • 🌟 Multi-variate forecasting
  • 🌟 Ensemble methods
  • 🌟 Transfer learning across datasets
  • 🌟 Edge deployment capabilities
  • 🌟 Integration with network monitoring tools

⭐ Star History

If you find this project useful, please consider giving it a star ⭐ on GitHub!

Star History Chart


πŸ“Š Project Stats

GitHub stars GitHub forks GitHub watchers GitHub contributors GitHub last commit


Made with ❀️ by Data Scientists, for Data Scientists

⬆ Back to Top

About

Advanced network traffic forecasting framework using SARIMA time series models on CESNET-TimeSeries-2023-2024 dataset. Includes automated retraining, comprehensive evaluation metrics (RMSE, SMAPE, RΒ²), and production-ready HPC batch processing scripts.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages