Data Scientist with 5+ years of experience driving product and business decisions through experimentation, causal inference, and statistical modeling at scale.
- 🏢 Currently Data Scientist @ Walmart Connect — building Brand WAMM models, NLP pipelines & causal inference frameworks
- 🎓 M.S. Computer Science — California State University, Long Beach (2021–2023)
- 🤖 Deep expertise in NLP, Transformer models, MLOps and multi-cloud AI deployments
- 📊 Track record: 60% ↓ tagging effort · 40% ↓ runtime · 25% ↑ marketing effectiveness
- 🏆 Active Kaggle competitor — sharing notebooks, datasets & competition solutions
- 📍 Farragut, Tennessee · Open to Data Scientist & ML Engineer roles
I'm actively competing in ML challenges and sharing reproducible work with the Kaggle community:
- 🏁 Competitions — End-to-end ML/DL solutions with leaderboard results
- 📓 Notebooks — EDA walkthroughs, feature engineering guides & model experiments
- 📦 Datasets — Curated open datasets published for the community
- 💬 Discussions — Tips, insights & collaboration with fellow Kagglers
⭐ Visit kaggle.com/pathik1511 for my latest notebooks and competition results.
| Project | Description | Stack |
|---|---|---|
| ☁️ ATS Resume Screener | AI-powered Applicant Tracking System using Google Gemini for resume scoring | Python · Gemini · NLP |
| 🧠 Kidney Disease Classification | End-to-end deep learning pipeline with MLflow experiment tracking | PyTorch · MLflow · DVC |
| 🐔 Chicken Disease Detection | End-to-end ML project with CI/CD pipeline and cloud deployment | Python · DVC · Docker |
| ☕ Coffee Sales Forecasting | Sales prediction & inventory optimisation with ML models | Python · Scikit-learn · Pandas |
| 📝 Text-to-SQL | Natural language to SQL query generation using LLMs | Python · LLM · SQL |
| 🌿 Cassava Leaf Disease | Computer vision model for plant disease classification | TensorFlow · CNN · Kaggle |
60% reduction in manual tagging effort → spaCy & Hugging Face pipelines (Walmart Connect)
40% runtime reduction → Production WAMM framework in Python/SQL
25% boost in marketing effectiveness → Sentiment analysis, Naïve Bayes (Syntrons)
15% creative effectiveness improvement → NLP on customer reviews (Walmart Connect)
35% fraud detection improvement → Deep learning fraud detection (Syntrons)
50% data processing speed increase → PySpark big data optimisation


