A comparative study of classical TF-IDF models and transformer-based approaches for SMS spam detection, with a focus on performance, cost, and deployment trade-offs.
Spam detection is a highly imbalanced text classification problem where evaluation metrics, threshold selection, and operational costs matter more than raw accuracy. This project explores whether modern transformer models meaningfully outperform classical NLP approaches on a real-world SMS dataset.
- UCI SMS Spam Collection
- ~5,500 SMS messages
- Binary labels: ham (legitimate) vs spam
- Strong class imbalance (~13% spam)
- TF-IDF + Logistic Regression
- TF-IDF + Multinomial Naive Bayes
- DistilBERT (fine-tuned for binary classification)
- ROC and Precision-Recall curves
- Threshold tuning to reflect cost-sensitive decisions
- Error analysis of false positives and false negatives
- Comparison of performance gains vs computational cost
- Classical TF-IDF models already achieve very strong performance due to clear lexical signals in spam messages.
- DistilBERT improves recall and F1-score for spam detection, but the gains are incremental.
- Precision-Recall analysis highlights the importance of threshold selection over default accuracy metrics.
This project demonstrates that:
- Model complexity should be justified by problem complexity.
- Classical models often provide superior performance-to-cost ratios for simple text classification tasks.
- Transformers are most valuable when contextual understanding is essential, not by default.
Model artifacts are not committed to the repository. All results can be reproduced by running the provided notebooks.
text-classification/
├── data/
├── notebooks/
│ └── 01_eda.ipynb
└── 02_baselines.ipynb
└── 03_transformers.ipynb
├── README.md
├── requirements.txt