@@ -295,13 +295,23 @@ \subsubsection{Unit I --- Experimental Data as a Learning Problem (Weeks
295295 Why ML failure modes are common in experimental science.
296296\end {itemize }
297297
298- \textbf {Summary: } This unit introduces the transition from classical
299- physics-based modeling to data-driven discovery in materials science. We
300- explore the unique challenges of experimental materials data, including
301- its multi-modal nature, high acquisition cost, and the fundamental
302- Processing-Structure-Property-Performance (PSPP) relationships. Key
303- concepts include data scales, measurement uncertainty, and the CRISP-DM
304- process adapted for scientific workflows.
298+ \textbf {Summary: }
299+
300+ \begin {itemize }
301+ \tightlist
302+ \item
303+ Transition from physics-based to data-driven modeling
304+ \item
305+ Experimental data challenges: multi-modal, high acquisition cost,
306+ sparse
307+ \item
308+ \textbf {PSPP } (Processing → Structure → Property → Performance) as a
309+ data dependency graph
310+ \item
311+ Data scales and measurement uncertainty
312+ \item
313+ \textbf {CRISP-DM } workflow adapted for scientific labs
314+ \end {itemize }
305315
306316\textbf {Exercise: }\\
307317Inspect real microscopy and process datasets; identify sources of bias
@@ -330,14 +340,18 @@ \subsubsection{Unit I --- Experimental Data as a Learning Problem (Weeks
330340 Relation to MFML refresher on PCA and covariance.
331341\end {itemize }
332342
333- \textbf {Summary: } This unit bridges the gap between the physical process
334- of data acquisition and the mathematical tools used to describe it. We
335- analyze how signals are formed in characterization tools and how
336- physical constraints (resolution, noise, sampling) act as priors for
337- learning. We then introduce Principal Component Analysis (PCA) and
338- Singular Value Decomposition (SVD) as fundamental techniques for
339- discovering low-dimensional structure in high-dimensional experimental
340- datasets.
343+ \textbf {Summary: }
344+
345+ \begin {itemize }
346+ \tightlist
347+ \item
348+ Physical signal formation as a learning prior
349+ \item
350+ Resolution, noise, sampling as physical (not algorithmic) constraints
351+ \item
352+ \textbf {PCA } and \textbf {SVD } for low-dimensional structure in
353+ high-dimensional data
354+ \end {itemize }
341355
342356\textbf {Exercise: }\\
343357Fourier inspection of micrographs; effects of sampling and filtering.
@@ -363,25 +377,38 @@ \subsubsection{Unit I --- Experimental Data as a Learning Problem (Weeks
363377 Why `` good accuracy'' often means a broken pipeline.
364378\end {itemize }
365379
366- \textbf {Summary: } This unit covers the often-overlooked half of an ML
367- pipeline: data integrity, validation, and how performance is measured.
368- We start with the measurement chain and systematic \textbf {data
369- cleaning } --- handling missing values, outliers, and duplicates with a
370- `` fix at source'' mindset. We then build the \textbf {transformation
371- toolbox }: centering, min--max and z-score scaling, physics-aware
372- non-dimensionalisation, log transforms, differentiation, and
373- frequency-domain views (FFT, triggering for time series). On the
374- supervision side we examine \textbf {labels and uncertainty } ---
375- inter-annotator variance, probabilistic labels, and a Bayesian view of
376- priors, likelihoods, and posteriors --- and then formalize the
377- \textbf {bias--variance } tradeoff with parsimony and regularization. A
378- major focus is \textbf {Data Leakage } in materials workflows
379- (pre-processing, temporal, and group/spatial), tackled with proper
380- holdout, K-fold, LOOCV, and stratified validation. We close with the
381- \textbf {error measures } that decide what `` good'' actually means:
382- MAE/MSE/RMSE and \( R^2\) for regression, and confusion matrices,
383- precision/recall, F1/Dice, IoU, and categorical cross-entropy for
384- classification and segmentation.
380+ \textbf {Summary: }
381+
382+ \begin {itemize }
383+ \tightlist
384+ \item
385+ Measurement chain → \textbf {data cleaning }: missing values, outliers,
386+ duplicates (`` fix at source'' )
387+ \item
388+ \textbf {Transformation toolbox }: centering, min--max / z-score
389+ scaling, non-dimensionalization, log, differentiation, FFT, triggering
390+ \item
391+ \textbf {Labels and uncertainty }: inter-annotator variance,
392+ probabilistic labels, Bayesian view (priors, likelihoods, posteriors)
393+ \item
394+ \textbf {Bias--variance } tradeoff with parsimony and regularization
395+ \item
396+ \textbf {Data leakage } in materials workflows: pre-processing,
397+ temporal, group/spatial
398+ \item
399+ \textbf {Validation }: holdout, K-fold, LOOCV, stratified
400+ \item
401+ \textbf {Error measures }:
402+
403+ \begin {itemize }
404+ \tightlist
405+ \item
406+ Regression: MAE, MSE, RMSE, \( R^2\)
407+ \item
408+ Classification / segmentation: confusion matrix, precision/recall,
409+ F1/Dice, IoU, categorical cross-entropy
410+ \end {itemize }
411+ \end {itemize }
385412
386413\textbf {Exercise: }\\
387414Construct a deliberately flawed ML pipeline and diagnose its failure.
@@ -410,15 +437,21 @@ \subsubsection{Unit II --- Representation Learning for Microstructures
410437 Transition to learned representations.
411438\end {itemize }
412439
413- \textbf {Summary: } This unit marks the transition from classical,
414- hand-crafted microstructure quantification (like grain size and phase
415- fractions) to the modern paradigm of \textbf {learned representations }.
416- We first review traditional stereological metrics and their limitations
417- in capturing complex structural nuances. We then introduce the
418- foundational unit of modern ML: the \textbf {artificial neuron }. By
419- understanding weights, biases, and non-linear activation functions, we
420- build the framework for Multi-Layer Perceptrons (MLPs) that can
421- automatically learn optimal features from materials data.
440+ \textbf {Summary: }
441+
442+ \begin {itemize }
443+ \tightlist
444+ \item
445+ Classical stereological metrics (grain size, phase fractions) and
446+ their limits
447+ \item
448+ Transition to \textbf {learned representations }
449+ \item
450+ The \textbf {artificial neuron }: weights, biases, non-linear
451+ activations
452+ \item
453+ \textbf {Multi-Layer Perceptrons (MLPs) } as automatic feature learners
454+ \end {itemize }
422455
423456\textbf {Exercise: }\\
424457Compare classical features vs simple NN-based features for
@@ -443,15 +476,23 @@ \subsubsection{Unit II --- Representation Learning for Microstructures
443476 Overfitting risks with small datasets.
444477\end {itemize }
445478
446- \textbf {Summary: } This unit introduces \textbf {Convolutional Neural
447- Networks (CNNs) }, the workhorse of modern computer vision, and applies
448- them to materials characterization. We explore how convolutions allow
449- networks to automatically learn hierarchical structure detectors---from
450- simple edges to complex phase morphologies---while drastically reducing
451- the number of parameters compared to standard MLPs. Through case studies
452- in phase segmentation and defect detection, students learn the intuition
453- behind filters, pooling, and the unique challenges of applying deep
454- learning to high-resolution, noisy experimental micrographs.
479+ \textbf {Summary: }
480+
481+ \begin {itemize }
482+ \tightlist
483+ \item
484+ \textbf {Convolutional Neural Networks (CNNs) } for materials
485+ characterization
486+ \item
487+ Hierarchical structure detectors: edges → textures → phase
488+ morphologies
489+ \item
490+ Filters and pooling; parameter efficiency vs.~MLPs
491+ \item
492+ Case studies: phase segmentation, defect detection
493+ \item
494+ Practical challenges: high-resolution, noisy micrographs
495+ \end {itemize }
455496
456497\textbf {Exercise: }\\
457498Train a small CNN on microstructure images; analyze failure cases.
@@ -474,15 +515,21 @@ \subsubsection{Unit II --- Representation Learning for Microstructures
474515 When transfer learning helps---and when it does not.
475516\end {itemize }
476517
477- \textbf {Summary: } This unit addresses the fundamental bottleneck of
478- materials informatics: \textbf {Data Scarcity }. We explore how to build
479- powerful deep learning models when only a few hundred labeled images or
480- signals are available. The core focus is on \textbf {Transfer Learning },
481- where we leverage knowledge from models pretrained on millions of
482- natural images to accelerate learning and improve generalization on
483- materials tasks. We also cover \textbf {Data Augmentation } strategies
484- tailored for scientific data and discuss when and why transferring
485- knowledge across different physical domains succeeds or fails.
518+ \textbf {Summary: }
519+
520+ \begin {itemize }
521+ \tightlist
522+ \item
523+ \textbf {Data scarcity } as the materials informatics bottleneck
524+ \item
525+ \textbf {Transfer learning } from natural-image pretrained models
526+ \item
527+ Self-supervised pretraining as an alternative
528+ \item
529+ \textbf {Data augmentation } tailored to scientific data
530+ \item
531+ When cross-domain transfer succeeds vs.~fails
532+ \end {itemize }
486533
487534\textbf {Exercise: }\\
488535Fine-tune a pretrained model; compare against training from scratch.
@@ -509,16 +556,21 @@ \subsubsection{Unit III --- Learning from Processing Data (Weeks
509556 Relation to MFML concepts of generalization.
510557\end {itemize }
511558
512- \textbf {Summary: } This unit explores the application of machine learning
513- to \textbf {Time-Series Data }, specifically for monitoring and predicting
514- materials processing outcomes. We introduce \textbf {Recurrent Neural
515- Networks (RNNs) } and their advanced variants like \textbf {LSTMs }, which
516- are designed to handle sequential dependencies. We discuss the critical
517- preprocessing steps of signal smoothing and triggering required to
518- handle noisy experimental logs. Through case studies in additive
519- manufacturing and process stability, students learn how to build models
520- that `` remember'' the processing history to predict future states and
521- detect anomalies in real-time.
559+ \textbf {Summary: }
560+
561+ \begin {itemize }
562+ \tightlist
563+ \item
564+ \textbf {Time-series ML } for process monitoring and prediction
565+ \item
566+ \textbf {RNNs } and \textbf {LSTMs } for sequential dependencies
567+ \item
568+ Preprocessing: signal smoothing, triggering on noisy logs
569+ \item
570+ Case studies: additive manufacturing, process stability
571+ \item
572+ Real-time anomaly detection from processing history
573+ \end {itemize }
522574
523575\textbf {Exercise: }\\
524576Predict a process outcome from time-series data using regression or
@@ -542,15 +594,23 @@ \subsubsection{Unit III --- Learning from Processing Data (Weeks
542594 Robustness as a design criterion.
543595\end {itemize }
544596
545- \textbf {Summary: } This unit shifts the focus from model performance to
546- \textbf {Model Reliability }. We explore the Bias-Variance tradeoff and
547- the fundamental challenge of generalization---ensuring that an ML model
548- works on new, unseen data from the factory floor. We introduce robust
549- validation techniques like K-Fold and Stratified Cross-Validation to
550- stabilize performance estimates on small materials datasets. A key focus
551- is on \textbf {Process Robustness }, where we use sensitivity analysis to
552- identify `` Process Windows'' ---regions in parameter space where material
553- quality is maximized and insensitive to industrial noise.
597+ \textbf {Summary: }
598+
599+ \begin {itemize }
600+ \tightlist
601+ \item
602+ Shift from raw performance to \textbf {model reliability }
603+ \item
604+ Bias--variance tradeoff and generalization to factory-floor data
605+ \item
606+ Robust validation: K-fold and stratified cross-validation on small
607+ datasets
608+ \item
609+ \textbf {Process robustness } via sensitivity analysis
610+ \item
611+ \textbf {Process windows }: parameter regions insensitive to industrial
612+ noise
613+ \end {itemize }
554614
555615\textbf {Exercise: }\\
556616Analyze model robustness under perturbed process conditions.
@@ -573,16 +633,22 @@ \subsubsection{Unit III --- Learning from Processing Data (Weeks
573633 Physics-informed vs unconstrained regression.
574634\end {itemize }
575635
576- \textbf {Summary: } This unit explores \textbf {Inverse Problems }---the
577- cornerstone of materials design where we seek the processing parameters
578- required to achieve a target microstructure or performance. We contrast
579- these with causal forward problems and discuss why they are often
580- ill-posed and multi-valued. We introduce \textbf {Physics-Informed
581- Learning } as a way to solve these challenges by enriching models with
582- physical transformations and constraints. Students learn how to build
583- and interpret \textbf {Process Maps } and `` Process Corridors,'' using
584- machine learning to visualize safe operating regions in complex
585- experimental spaces.
636+ \textbf {Summary: }
637+
638+ \begin {itemize }
639+ \tightlist
640+ \item
641+ \textbf {Inverse problems }: target microstructure / performance →
642+ processing parameters
643+ \item
644+ Forward (causal) vs.~inverse (often ill-posed, multi-valued)
645+ \item
646+ \textbf {Physics-informed learning }: physical transformations and
647+ constraints
648+ \item
649+ \textbf {Process maps } and \textbf {process corridors } for safe
650+ operating regions
651+ \end {itemize }
586652
587653\textbf {Exercise: }\\
588654Construct a simple ML-based process map; compare constrained vs
@@ -610,16 +676,22 @@ \subsubsection{Unit IV --- Uncertainty, Surrogates, and Automation
610676 Using ML without destroying physical meaning.
611677\end {itemize }
612678
613- \textbf {Summary: } This unit focuses on the processing of
614- high-dimensional \textbf {Characterization Signals } (like XRD, EDS, and
615- EELS) using unsupervised learning. We introduce \textbf {K-Means
616- Clustering } and \textbf {t-SNE } for the automatic identification and
617- visualization of phases in large experimental libraries. We then explore
618- \textbf {Autoencoders }---neural networks that learn to compress complex
619- spectra into a low-dimensional `` latent space.'' This allows for
620- advanced denoising and feature extraction, enabling scientists to handle
621- the massive data volumes produced by modern high-throughput
622- characterization tools without losing physical insight.
679+ \textbf {Summary: }
680+
681+ \begin {itemize }
682+ \tightlist
683+ \item
684+ Unsupervised ML on high-dimensional spectra (XRD, EDS, EELS)
685+ \item
686+ \textbf {K-Means } and \textbf {t-SNE } for phase identification and
687+ visualization
688+ \item
689+ \textbf {Autoencoders }: compressing spectra into a low-dimensional
690+ latent space
691+ \item
692+ Denoising and feature extraction at high throughput without losing
693+ physics
694+ \end {itemize }
623695
624696\textbf {Exercise: }\\
625697Apply PCA/NMF to spectral datasets; interpret components physically.
@@ -640,8 +712,26 @@ \subsubsection{Unit IV --- Uncertainty, Surrogates, and Automation
640712 ML as a control component, not just a predictor.
641713\end {itemize }
642714
643- \textbf {Exercise: }\\
644- Implement a simple ML-assisted autofocus or defect detector.
715+ \textbf {Summary: }
716+
717+ \begin {itemize }
718+ \tightlist
719+ \item
720+ \textbf {Autonomous characterization }: ML moves from passive analysis
721+ to active instrument control
722+ \item
723+ \textbf {Multi-modal data fusion } (SEM + EDS + process logs) via
724+ Bayesian frameworks
725+ \item
726+ \textbf {Reinforcement learning } for instrument tuning and process
727+ optimization
728+ \item
729+ Pipelines that autonomously find → characterize → decide the next
730+ experiment
731+ \end {itemize }
732+
733+ \textbf {Exercise: } Implement a simple ML-assisted autofocus or defect
734+ detector.
645735
646736\begin {center }\rule {0.5\linewidth }{0.5pt}\end {center }
647737
@@ -767,18 +857,6 @@ \subsection{Lab Possibilities}\label{lab-possibilities}
767857 Multi-modal fusion of images, spectra, and process parameters.
768858\end {itemize }
769859
770- \textbf {Summary: } This unit explores the cutting edge of
771- \textbf {Autonomous Characterization }, where machine learning moves from
772- passive data analysis to active instrument control. We introduce
773- \textbf {Multi-Modal Data Fusion } techniques to combine information from
774- diverse sensors like SEM images, EDS spectra, and process logs using
775- Bayesian frameworks. We then discuss \textbf {Reinforcement Learning
776- (RL) } as a tool for automating complex laboratory tasks, such as
777- instrument tuning and process optimization. Through case studies in
778- microscopy and industrial processing, students learn how to build
779- integrated pipelines that can autonomously find, characterize, and
780- decide the next steps of an experiment.
781-
782860\protect\phantomsection \label {refs }
783861\begin {CSLReferences }{1}{0}
784862\bibitem [\citeproctext ]{ref-sandfeld2024materials}
0 commit comments