- Cost: Scientific data is expensive; deep networks easily overfit small datasets.
- Generalization: Requires early stopping, dropout, and robust splits to avoid overfitting.
- Pretrained Models: Start with models trained on massive datasets (e.g., ImageNet).
- Feature Reuse: Early layers learn universal visual primitives useful for micrographs.
- Fine-Tuning: Freeze backbone and train the head, or unfreeze late layers with a low learning rate.
- Artificial Relief: Expand data via rotation, flipping, cropping, etc.
- Physics-Preserving: Materials rotations/flips preserve physical meaning.
- Small, Expensive, Messy data.
- Overfitting risks.
- Traditional: Flips, Rotations.
- Advanced: Mixup, Cutmix.
- Domain-specific: Simulated noise.
- Feature Hierarchies.
- Available backbones.
- Fine-tuning strategies.
- Natural vs. scientific domains.
- Cross-Material Transfer.
- Synthetic-to-Real Transfer.
- Backbone selection.
- Small data validation.
- "Learning to Learn".
Summary for ML-PC Week 6:
- Addresses Data Scarcity in materials informatics.
- Explores Transfer Learning to leverage pretrained models.
- Discusses scientific Data Augmentation strategies.
- Evaluates cross-domain knowledge transfer limits.