本仓库实现了“预处理 + 训练 + 评估”一体化流程,面向加密恶意流量分类任务。
- 融合模型:
MobileViT + CharBERT + Attention - 扩展模型:
Attention + Stacking - 数据集:
USTC-TFC2016、CICAndMal2017(5 大类)、MFCP - 约束:坚持 RGB 三通道,不使用灰度图;USTC 与 CIC 分开训练/验证
FusionModel/
├─ SourceData/
│ ├─ USTC-TFC2016/
│ ├─ CICAndMal2017/
│ └─ MFCP/
├─ configs/
│ ├─ dataset_profiles.yaml
│ └─ train_profiles.yaml
├─ src/
│ ├─ pipeline/
│ │ ├─ dataset_builder.py
│ │ ├─ pcap_session.py
│ │ ├─ feature_rgb.py
│ │ └─ split_audit.py
│ └─ fusion/
│ ├─ fusion_common.py
│ ├─ train_fusion_attention.py
│ ├─ train_fusion_attention_stacking.py
│ └─ run_attention_suite.py
├─ dataset/
└─ outputs/
cd /home/shuora/Repositories/Traffic/FusionModel
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt将原始数据放到 SourceData/ 下:
- USTC:
SourceData/USTC-TFC2016/*.pcap - CIC:
SourceData/CICAndMal2017/<Major>/<Subclass>/*.pcap - MFCP:
SourceData/MFCP/<Family>/**/*.pcap
说明:CIC 的主类按 Adware/Benign/Ransomware/SMSMalware/Scareware 组织。
通用命令:
python src/pipeline/dataset_builder.py --profile <profile_name>可用 profile:
ustcustc_strict_nofallbackustc_strict_time80cic5_payloadcic5_fullpacketcic4_fullpacket_l1024_hrawmfcp_payload
常用示例:
# USTC
python src/pipeline/dataset_builder.py --profile ustc
# CIC5 payload
python src/pipeline/dataset_builder.py --profile cic5_payload
# CIC5 full_packet
python src/pipeline/dataset_builder.py --profile cic5_fullpacket
# MFCP
python src/pipeline/dataset_builder.py --profile mfcp_payload重建相关参数:
# 覆盖重建 image/bin 文件
python src/pipeline/dataset_builder.py --profile cic5_payload --overwrite
# 强制重建 pcap 索引缓存
python src/pipeline/dataset_builder.py --profile mfcp_payload --rebuild_index_cache预处理输出目录:
dataset/<name>/pcap_data/{Train,Test}/<class>/*.bindataset/<name>/image_data/{Train,Test}/<class>/*.pngdataset/<name>/reports/preprocess_summary_*.json
按 configs/train_profiles.yaml 运行:
# CIC5: attention + stacking
python src/fusion/run_attention_suite.py --profile cic5_balanced --mode all
# USTC: attention + stacking
python src/fusion/run_attention_suite.py --profile ustc_baseline --mode all
# MFCP: 仅 attention
python src/fusion/run_attention_suite.py --profile mfcp_baseline --mode attention
# MFCP: 仅 attention + stacking
python src/fusion/run_attention_suite.py --profile mfcp_baseline --mode attention_stacking可选归档参数:
# 指定归档目录名
python src/fusion/run_attention_suite.py --profile ustc_baseline --mode all --archive_tag ustc_try_01
# 归档时移动文件(默认复制)
python src/fusion/run_attention_suite.py --profile cic5_balanced --mode all --archive_move
# 关闭自动归档
python src/fusion/run_attention_suite.py --profile cic5_balanced --mode all --no_archive# Attention
python src/fusion/train_fusion_attention.py --dataset_name CIC5_payload --preset cic_balanced --batch_size 64 --num_workers 4 --prefetch_factor 2
# Attention + Stacking
python src/fusion/train_fusion_attention_stacking.py --dataset_name CIC5_payload --preset cic_balanced --batch_size 64 --num_workers 4 --prefetch_factor 2常见 dataset_name:
USTC-TFC2016CIC5_payloadCIC5_fullpacketCIC4_fullpacket_l1024_hrawmfcp
默认输出到 outputs/:
outputs/logs/*.log:完整训练日志outputs/metrics_curve_*.png:epoch 指标曲线outputs/confusion_matrix_*.png:混淆矩阵outputs/report_*.md:acc / macro-f1 / 分类报告outputs/fusion_model_*.pth:融合模型权重outputs/meta_model_*.pkl:stacking 元模型outputs/archive/<timestamp>_<dataset>_<method>/:自动归档目录
CharBERT 加载失败,已禁止静默降级:检查src/CharBERT/src是否完整可导入。- CIC 缺少
Benign时,cic5_*profile 只处理当前存在类别;后续补齐后重跑即可。 num_workers说明:默认是4。在 Windows + CUDA 环境中可能自动降级为0(用于规避 WinError 1455);Ubuntu 下不会触发这条降级逻辑。
cd /home/shuora/Repositories/Traffic/FusionModel
# 构建 strict 数据集
python src/pipeline/dataset_builder.py --profile ustc_strict_nofallback --dataset_root dataset
python src/pipeline/dataset_builder.py --profile ustc_strict_time80 --dataset_root dataset
# 检查切分泄漏
python src/pipeline/split_audit.py --dataset_dir dataset/USTC-TFC2016 --output outputs/split_audit_USTC-TFC2016.json
python src/pipeline/split_audit.py --dataset_dir dataset/USTC-TFC2016-strict-nofallback --output outputs/split_audit_USTC-TFC2016-strict-nofallback.json
python src/pipeline/split_audit.py --dataset_dir dataset/USTC-TFC2016-strict-time80 --output outputs/split_audit_USTC-TFC2016-strict-time80.json
# 公平对比(同模型同超参,不同切分)
python src/fusion/run_attention_suite.py --profile ustc_baseline --mode attention --archive_tag ustc_baseline_splitcheck
python src/fusion/run_attention_suite.py --profile ustc_strict_time80_eval --mode attention --archive_tag ustc_strict_time80_splitcheck备注:ustc_strict_nofallback 默认可能产生 Train-only 类,主要用于反证切分策略依赖,不建议作为最终 ACC/F1 主结果。
cd /home/shuora/Repositories/Traffic/FusionModel
# 预处理
python src/pipeline/dataset_builder.py --profile cic4_fullpacket_l1024_hraw --dataset_root dataset
# 训练 attention
python src/fusion/run_attention_suite.py --profile cic4_fullpacket_l1024_balanced --mode attention --archive_tag cic4_fp_l1024_attn
# 可选:attention + stacking
python src/fusion/run_attention_suite.py --profile cic4_fullpacket_l1024_balanced --mode attention_stacking --archive_tag cic4_fp_l1024_stackcd /home/shuora/Repositories/Traffic/FusionModel
# 1) 预处理
python src/pipeline/dataset_builder.py --profile mfcp_payload
# 2) Attention
python src/fusion/train_fusion_attention.py --dataset_name mfcp --preset none --epochs 24 --batch_size 64 --output_tag_prefix mfcp
# 3) Attention + Stacking
python src/fusion/train_fusion_attention_stacking.py --dataset_name mfcp --preset none --epochs 24 --batch_size 64 --output_tag_prefix mfcp