A comprehensive hybrid classification system designed to evaluate the credibility of news articles using high-performance machine learning (SVM) and agentic RAG (Retrieval-Augmented Generation) reasoning.
This project evolved from a standalone machine learning classifier into a full-scale Agentic Intelligence Platform. By combining traditional linguistic analysis with real-time web verification and LLM-based reasoning, it provides a deep, multi-dimensional assessment of news credibility.
Our system processes news through three rigorous layers of validation:
The core engine uses a Linear Support Vector Machine (SVM) trained on the WELFake dataset (72,000+ articles). It analyzes:
- Linguistic Fingerprints: Passive vs. active voice, sensationalism, and punctuation patterns.
- Statistical Patterns: TF-IDF vectorization with unigram and bigram analysis (10,000 max features).
The system performs real-time searches across global fact-checking repositories to find corroborating or conflicting evidence.
- Dynamic Scraping: Fetches the latest updates from Snopes, AP, PolitiFact, and Reuters.
- Consensus Analysis: Evaluates whether retrieved sources support or debunk the input claim.
Powered by LangGraph and Groq (Llama 3.1), an autonomous agent synthesizes the ML signal and live evidence.
- Conflict Resolution: Resolves discrepancies between linguistic patterns (ML) and actual facts (RAG).
- Consolidated Verdict: Generates a professional rationale with confidence scoring.
graph TD
A[User Input Article] --> B{Preprocessing}
B --> C[Stage 1: SVM Classifier]
B --> D[Stage 2: RAG Retriever]
C --> |ML Signal| E[Agentic Brain - LangGraph]
D --> |Live Evidence| E
E --> F{Reasoning Engine - Llama 3.1}
F --> G[Consolidated Verdict]
G --> H[Premium Dashboard]
G --> I[Automated PDF Report]
- Agentic Workflow: Built with LangGraph — four sequential nodes (predict → retrieve → reason → report) with explicit state management.
- Live RAG Integration: Real-time scraper and vector-based retrieval for fresh fact-checks.
- Premium Dark-Blue Dashboard: A custom-styled Streamlit UI with interactive charts, metrics, and progress bars.
- Automated PDF Reporting: Generates a professional deep-dive report (via FPDF2) for offline sharing.
- Session History & Analytics: Track trends in news credibility assessments over time.
| Metric | Milestone 1 (20k Sample) | Milestone 2 (Full Optimization) |
|---|---|---|
| Accuracy | 94.30% | 96.55% |
| Precision | 93.96% | 95.96% |
| Recall | 94.88% | 96.88% |
| F1-Score | 94.42% | 96.42% |
| Category | Technology |
|---|---|
| Frontend | Streamlit, Vanilla CSS (Inter/Outfit Fonts) |
| ML Engine | Scikit-learn, Joblib, NLTK |
| Agentic Core | LangGraph, LangChain, Groq Cloud |
| Models | Linear SVM (Base), Llama 3.1 70B (Reasoning) |
| RAG / Search | FAISS |
- Python 3.9 or higher
- A Groq API Key (Get one here)
# Clone the repository
git clone https://github.com/ashvin2005/AI_ML_project.git
cd AI_ML_project
pip install -r requirements.txt- Launch the Streamlit server:
streamlit run app_final.py
- Enter your Groq API Key in the sidebar.
- Paste an article and click "Run Analysis".
├── milestone1/
│ ├── app.py # Legacy M1 UI (Pure ML)
│ ├── model.ipynb # Model training and optimization
│ └── *.joblib # Serialized SVM & Vectorizer
├── milestone2/
│ └── agent/ # Agentic reasoning logic
│ ├── graph.py # LangGraph workflow definition
│ ├── retriever.py # Live RAG & Scraping logic
│ └── reasoner.py # Llama 3.1 reasoning templates
├── app_final.py # Integrated UI (Final)
└── requirements.txt # Full project dependencies