Skip to content

Commit dfe1e8c

Browse files
committed
Refactor week summaries to be concise bullet points
1 parent 7a4e5a6 commit dfe1e8c

14 files changed

Lines changed: 520 additions & 555 deletions

week10_summary.md

Lines changed: 34 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,51 +2,49 @@
22

33
## Cross-Book Summary
44

5-
### 1. Clustering Spectral Data (Neuer Ch 5, McClarren Ch 4)
6-
- **K-Means Clustering:** A fundamental tool for grouping similar signals (e.g., XRD or EDS spectra). By minimizing the variance within clusters, we can automatically identify distinct phases or chemical environments in a dataset (Neuer Ch 5.3).
7-
- **Mini-Batch K-Means:** Essential for high-throughput characterization where millions of spectra are collected in a single mapping session.
8-
- **Visualization with t-SNE:** High-dimensional spectra (e.g., 2048 channels) are impossible to visualize directly. t-SNE projects these into 2D while preserving "neighborhood" relationships, making it easy to spot outliers or transitional states (Neuer Ch 5.4).
5+
### 1. Clustering Spectral Data
6+
- **K-Means:** Groups similar spectra (XRD/EDS) to identify distinct phases.
7+
- **Mini-Batch K-Means:** Speeds up high-throughput characterization.
8+
- **t-SNE:** Projects high-dimensional spectra to 2D to reveal outliers/relationships.
99

10-
### 2. Autoencoders for Signal Processing (McClarren Ch 8)
11-
- **Latent Representations:** An autoencoder learns to compress a spectrum into a few "latent variables" that capture the essential physical information (peak positions, intensities).
12-
- **Denoising:** By training an autoencoder to reconstruct a clean signal from a noisy input, we can effectively remove experimental fluctuations without the blurring associated with traditional filters (McClarren Ch 8.3.2).
13-
- **Non-linear Compression:** Unlike PCA, autoencoders can capture non-linear relationships in spectral data, enabling much higher compression ratios for massive characterization libraries (McClarren Ch 8.2).
10+
### 2. Autoencoders for Signal Processing
11+
- **Latent Representations:** Compresses spectra to essential physical information.
12+
- **Denoising:** Reconstructs clean signals from noisy inputs without blurring.
13+
- **Non-linear Compression:** Outperforms PCA for complex spectral libraries.
1414

1515
### 3. Scientific Integrity in ML
16-
- **Peak Preservation:** The goal of ML in characterization is to assist the scientist, not replace the physics. Models must be validated to ensure they do not "invent" peaks or smooth away critical structural information.
16+
- **Peak Preservation:** ML must assist, not invent or smooth away real physics.
1717

18-
---
18+
## 90-Minute Lecture Strategy
1919

20-
## 90-Minute Lecture Strategy (50 Slides)
20+
### Part 1: High-Dimensional Signals
21+
- Digital footprint: XRD, EDS, EELS, Raman.
22+
- Manual vs. automated peak-picking.
23+
- Vector spectrum representation.
2124

22-
### Part 1: High-Dimensional Signals (Slides 1-10)
23-
- The digital footprint of materials: XRD, EDS, EELS, and Raman.
24-
- Why manual peak-picking fails in high-throughput experiments.
25-
- The "Vector" representation of a spectrum.
25+
### Part 2: Clustering Structure
26+
- K-Means algorithm.
27+
- Elbow Method for phase counting.
28+
- Ternary alloy mapping.
2629

27-
### Part 2: Discovering Structure with Clustering (Slides 11-20)
28-
- K-Means: Geometry and Algorithm.
29-
- The "Elbow Method": Deciding how many phases are in your sample.
30-
- Case Study: Mapping a ternary alloy system with K-Means.
30+
### Part 3: Visualizing the Unseen
31+
- t-SNE Stochastic Proximity.
32+
- Hidden relationships.
33+
- t-SNE distance pitfalls.
3134

32-
### Part 3: Visualizing the Unseen (Slides 21-30)
33-
- t-SNE: The intuition of "Stochastic Proximity."
34-
- Finding "Hidden" relationships in spectral libraries.
35-
- Pitfalls: Why t-SNE distances can be misleading.
35+
### Part 4: Autoencoders & Denoising
36+
- Encoder-Bottleneck-Decoder.
37+
- Denoising characterization signals.
38+
- Bottlenecks as physical descriptors.
3639

37-
### Part 4: Autoencoders & Denoising (Slides 31-45)
38-
- The Hourglass Architecture: Encoder, Bottleneck, Decoder.
39-
- Applications: Compressing leaf spectra (McClarren Ch 8.2).
40-
- Denoising characterization signals: Improving SNR with Deep Learning.
41-
- Feature extraction: Using the bottleneck as a physical descriptor.
42-
43-
### Part 5: From Data to Discovery (Slides 46-50)
44-
- Real-time spectral analysis during experiments.
45-
- Ensuring physical consistency in ML outputs.
46-
- Summary: The automated characterization pipeline.
47-
48-
---
40+
### Part 5: Data to Discovery
41+
- Real-time spectral analysis.
42+
- Physical consistency in ML.
43+
- Automated pipelines.
4944

5045
## Quarto Website Update (Summary)
5146
**Summary for ML-PC Week 10:**
52-
This unit focuses on the processing of high-dimensional **Characterization Signals** (like XRD, EDS, and EELS) using unsupervised learning. We introduce **K-Means Clustering** and **t-SNE** for the automatic identification and visualization of phases in large experimental libraries. We then explore **Autoencoders**—neural networks that learn to compress complex spectra into a low-dimensional "latent space." This allows for advanced denoising and feature extraction, enabling scientists to handle the massive data volumes produced by modern high-throughput characterization tools without losing physical insight.
47+
- Processes high-dimensional Characterization Signals (XRD, EDS).
48+
- Employs K-Means and t-SNE for automated phase identification.
49+
- Uses Autoencoders for latent space compression and denoising.
50+
- Enhances high-throughput data analysis while preserving physics.

week11_summary.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -2,52 +2,46 @@
22

33
## Cross-Book Summary
44

5-
### 1. Multi-Modal Data Fusion (Murphy Ch 11, Neuer Ch 2)
6-
- **Beyond a Single Sensor:** In modern characterization, we often collect images (SEM), chemistry (EDS), and orientations (EBSD) simultaneously. Fusing these data streams provides a more complete physical picture than any single modality.
7-
- **Bayesian Sensor Fusion:** A mathematical framework for combining uncertain measurements. If two sensors (e.g., two thermocouples) provide conflicting information, the Bayesian posterior weights them by their respective precisions (inverse variances), allowing for robust state estimation (Murphy Ch 4.6.4).
8-
- **Latent Fusion:** Using autoencoders or PCA to find a shared low-dimensional embedding where different data types (images and spectra) can be compared and combined (Murphy Ch 19).
5+
### 1. Multi-Modal Data Fusion
6+
- **Beyond Single Sensors:** Fuse images (SEM), chemistry (EDS), and orientations (EBSD) for a complete physical picture.
7+
- **Bayesian Sensor Fusion:** Combines uncertain measurements using precision-weighted posteriors.
8+
- **Latent Fusion:** Autoencoders/PCA find shared embeddings to combine diverse data types.
99

10-
### 2. Reinforcement Learning for Control (McClarren Ch 9)
11-
- **The Autonomous Agent:** In RL, an agent learns to interact with an environment (e.g., a microscope or a furnace) to maximize a reward.
12-
- **The RL Loop:** State (current image), Action (adjusting focus/stigmation), and Reward (image sharpness/SNR).
13-
- **Policy Gradients:** A method for training deep neural networks to make a sequence of decisions that lead to an optimal scientific outcome (McClarren Ch 9.1).
14-
- **Case Study (McClarren):** Using RL to control the complex cooling cycles of glass, demonstrating the transition from monitoring to active control.
10+
### 2. Reinforcement Learning for Control
11+
- **Autonomous Agent:** Learns to interact with environments (e.g., microscopes) to maximize rewards.
12+
- **RL Loop:** State (image), Action (adjust focus), Reward (sharpness/SNR).
13+
- **Policy Gradients:** Train NNs for optimal scientific decision-making.
1514

16-
### 3. Computer Vision in the Lab (ML-PC Index)
17-
- **Automated Workflows:** Using CNNs for real-time region-of-interest (ROI) detection, automated autofocus, and high-speed classification of diffraction patterns (e.g., EBSD Kikuchi bands).
15+
### 3. Computer Vision in the Lab
16+
- **Automated Workflows:** CNNs for real-time ROI detection, autofocus, and pattern classification.
1817

19-
---
18+
## 90-Minute Lecture Strategy
2019

21-
## 90-Minute Lecture Strategy (50 Slides)
20+
### Part 1: Toward the Self-Driving Lab
21+
- The automation stack.
22+
- Autonomous Characterization: Scan, Analyze, Decide, Repeat.
2223

23-
### Part 1: Toward the Self-Driving Lab (Slides 1-10)
24-
- The bottleneck of human-operated characterization.
25-
- The concept of "Autonomous Characterization": Scan, Analyze, Decide, Repeat.
26-
- Overview of the automation stack.
24+
### Part 2: ML-Assisted Instrument Tuning
25+
- Autofocus and Beam Alignment.
26+
- Real-time feedback loops.
2727

28-
### Part 2: ML-Assisted Instrument Tuning (Slides 11-20)
29-
- Computer Vision for Autofocus and Beam Alignment.
30-
- Real-time feedback loops: Turning pixels into control signals.
31-
- Case Study: Automated EBSD mapping.
28+
### Part 3: Fusing Multi-Modal Data
29+
- Bayesian Fusion for sensor noise.
30+
- Multi-head NNs.
31+
- Combining XRD and EDS.
3232

33-
### Part 3: Fusing Multi-Modal Data (Slides 21-35)
34-
- Why fuse? Structure vs. Chemistry vs. Properties.
35-
- Bayesian Fusion: Handling sensor noise and conflicts (Murphy Ch 11.4).
36-
- Multi-head NNs for multi-modal classification.
37-
- Case Study: Combining XRD and EDS for phase identification.
33+
### Part 4: RL for Lab Control
34+
- RL Framework overview.
35+
- Reward Functions for science.
36+
- Industrial glass processing control.
3837

39-
### Part 4: Reinforcement Learning for Lab Control (Slides 36-45)
40-
- Introduction to the RL Framework (McClarren Ch 9).
41-
- Defining Reward Functions for scientific experiments.
42-
- Case Study: Closing the loop in industrial glass processing.
43-
44-
### Part 5: Summary: The Integrated Pipeline (Slides 46-50)
45-
- The shift from "Post-mortem" analysis to "On-the-fly" discovery.
46-
- Challenges: Latency, safety, and physical limits of automation.
47-
- Summary: The vision of autonomous materials characterization.
48-
49-
---
38+
### Part 5: The Integrated Pipeline
39+
- "On-the-fly" discovery.
40+
- Automation challenges: Latency and safety.
5041

5142
## Quarto Website Update (Summary)
5243
**Summary for ML-PC Week 11:**
53-
This unit explores the cutting edge of **Autonomous Characterization**, where machine learning moves from passive data analysis to active instrument control. We introduce **Multi-Modal Data Fusion** techniques to combine information from diverse sensors like SEM images, EDS spectra, and process logs using Bayesian frameworks. We then discuss **Reinforcement Learning (RL)** as a tool for automating complex laboratory tasks, such as instrument tuning and process optimization. Through case studies in microscopy and industrial processing, students learn how to build integrated pipelines that can autonomously find, characterize, and decide the next steps of an experiment.
44+
- Explores Autonomous Characterization and active instrument control.
45+
- Introduces Multi-Modal Data Fusion (Bayesian and Latent).
46+
- Uses Reinforcement Learning (RL) for laboratory task automation.
47+
- Details building integrated pipelines for "on-the-fly" scientific discovery.

week12_summary.md

Lines changed: 32 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,51 +2,46 @@
22

33
## Cross-Book Summary
44

5-
### 1. The Value of "Knowing what you don't know" (Neuer Ch 6, Murphy Ch 15)
6-
- **Epistemic vs. Aleatoric Uncertainty:**
7-
- **Aleatoric:** The inherent randomness in the physical process (e.g., sensor noise).
8-
- **Epistemic:** The model's ignorance due to lack of training data in a specific region of the parameter space.
9-
- **Danger of Overconfidence:** Standard neural networks often provide "point estimates" that can be wildly overconfident when extrapolating into unknown physical regimes.
5+
### 1. Knowing what you don't know
6+
- **Aleatoric vs. Epistemic:** Inherent physical noise vs. model ignorance.
7+
- **Overconfidence Danger:** Point estimates fail safely in unknown regimes; uncertainty metrics are crucial.
108

11-
### 2. Gaussian Processes (GPs) (Murphy Ch 15, Bishop Ch 6)
12-
- **Distribution over Functions:** A GP defines a prior over an infinite space of functions. After seeing data, it provides a posterior distribution, yielding both a mean prediction and a variance (uncertainty).
13-
- **Kernels as Physical Priors:** The kernel function (e.g., Radial Basis Function or Matérn) encodes our assumptions about the smoothness and length scales of the physical phenomenon (Bishop Ch 6.4).
14-
- **Non-Parametric Nature:** Unlike NNs, GPs don't have a fixed number of parameters; they scale with the number of training points, making them ideal for "small but high-quality" materials datasets.
9+
### 2. Gaussian Processes (GPs)
10+
- **Distribution over Functions:** GP yields posterior mean and variance (uncertainty).
11+
- **Kernels as Physical Priors:** Encodes assumptions about data smoothness/scale.
12+
- **Non-Parametric Nature:** Scales with data size, ideal for small, high-quality materials datasets.
1513

16-
### 3. GP-Based Process Maps (ML-PC Index)
17-
- **Confidence Ribbons:** Visualizing the uncertainty allows engineers to see where a process map is reliable and where more experiments are needed.
18-
- **Kriging:** GP regression is closely related to Kriging, a method long used in geostatistics and now widely applied to interpolate materials property surfaces.
14+
### 3. GP-Based Process Maps
15+
- **Confidence Ribbons:** Visualize reliability to guide further experiments.
16+
- **Kriging:** Interpolates materials property surfaces using GP regression.
1917

20-
---
18+
## 90-Minute Lecture Strategy
2119

22-
## 90-Minute Lecture Strategy (50 Slides)
20+
### Part 1: Uncertainty in Science
21+
- Risk management in materials processing.
22+
- Visualizing distributions and error bars.
2323

24-
### Part 1: Uncertainty in Science (Slides 1-10)
25-
- Why a single number is never enough.
26-
- Risk management in materials processing: The cost of being wrong.
27-
- Visualizing distributions: Histograms, error bars, and density plots.
24+
### Part 2: GP Fundamentals
25+
- Function vs. Parameter space.
26+
- Kernels and "Similarity".
27+
- Conditional Gaussians and Variance.
2828

29-
### Part 2: Gaussian Process Fundamentals (Slides 11-25)
30-
- The Bayesian viewpoint: Function space vs. Parameter space.
31-
- Kernels: How do we define "Similarity" between two material states?
32-
- The GP Math: Conditional Gaussians and Matrix Inversion.
33-
- Interpreting the Variance: Where does the "Shaded region" come from?
29+
### Part 3: GP Case Studies
30+
- Predicting tensile strength across parameters.
31+
- GP for Experimental Design.
32+
- Multi-Task GPs.
3433

35-
### Part 3: GP Case Studies (Slides 26-40)
36-
- Case Study: Predicting tensile strength across a temperature-strain rate space.
37-
- GP for Experimental Design: Identifying the "Gaps" in a database.
38-
- Multi-Task GPs: Sharing information between related properties (e.g., Hardness and Yield Strength).
34+
### Part 4: Advanced Probabilistic ML
35+
- Mixture Density Networks (MDNs).
36+
- Dropout as Bayesian approximation.
3937

40-
### Part 4: Advanced Probabilistic ML (Slides 41-45)
41-
- Mixture Density Networks (MDNs): Handling multi-modal uncertainties (Neuer Ch 6.4).
42-
- Dropout as a Bayesian approximation in deep NNs.
43-
44-
### Part 5: Summary: Decision Making Under Uncertainty (Slides 46-50)
45-
- Using confidence intervals to define "Safe" process windows.
46-
- Summary: Building models that scientists can trust.
47-
48-
---
38+
### Part 5: Decision Making
39+
- Safe process windows via confidence intervals.
40+
- Building trustworthy models.
4941

5042
## Quarto Website Update (Summary)
5143
**Summary for ML-PC Week 12:**
52-
This unit introduces **Probabilistic Machine Learning**, focusing on the quantification of uncertainty in materials models. We explore why point estimates can be dangerous in engineering and introduce **Gaussian Processes (GPs)** as a powerful tool for uncertainty-aware regression. Students learn how kernels encode physical assumptions about data smoothness and how the resulting predictive distributions can be used to build robust process maps. We also discuss the difference between aleatoric (noise) and epistemic (ignorance) uncertainty and how to use confidence intervals to drive scientific decision-making.
44+
- Introduces Probabilistic Machine Learning for uncertainty quantification.
45+
- Differentiates aleatoric (noise) from epistemic (ignorance) uncertainty.
46+
- Uses Gaussian Processes (GPs) for uncertainty-aware regression.
47+
- Applies confidence intervals to map robust process windows.

0 commit comments

Comments
 (0)