ECLIPSE-Lab
diff --git a/‎week10_summary.md‎
Lines changed: 34 additions & 36 deletions b/‎week10_summary.md‎
Lines changed: 34 additions & 36 deletions
diff --git a/‎week11_summary.md‎
Lines changed: 32 additions & 38 deletions b/‎week11_summary.md‎
Lines changed: 32 additions & 38 deletions
diff --git a/‎week12_summary.md‎
Lines changed: 32 additions & 37 deletions b/‎week12_summary.md‎
Lines changed: 32 additions & 37 deletions
@@ -2,51 +2,49 @@
 
 ## Cross-Book Summary
 
-### 1. Clustering Spectral Data (Neuer Ch 5, McClarren Ch 4)
-- **K-Means Clustering:** A fundamental tool for grouping similar signals (e.g., XRD or EDS spectra). By minimizing the variance within clusters, we can automatically identify distinct phases or chemical environments in a dataset (Neuer Ch 5.3).
-- **Mini-Batch K-Means:** Essential for high-throughput characterization where millions of spectra are collected in a single mapping session.
-- **Visualization with t-SNE:** High-dimensional spectra (e.g., 2048 channels) are impossible to visualize directly. t-SNE projects these into 2D while preserving "neighborhood" relationships, making it easy to spot outliers or transitional states (Neuer Ch 5.4).
+### 1. Clustering Spectral Data
+- **K-Means:** Groups similar spectra (XRD/EDS) to identify distinct phases.
+- **Mini-Batch K-Means:** Speeds up high-throughput characterization.
+- **t-SNE:** Projects high-dimensional spectra to 2D to reveal outliers/relationships.
 
-### 2. Autoencoders for Signal Processing (McClarren Ch 8)
-- **Latent Representations:** An autoencoder learns to compress a spectrum into a few "latent variables" that capture the essential physical information (peak positions, intensities).
-- **Denoising:** By training an autoencoder to reconstruct a clean signal from a noisy input, we can effectively remove experimental fluctuations without the blurring associated with traditional filters (McClarren Ch 8.3.2).
-- **Non-linear Compression:** Unlike PCA, autoencoders can capture non-linear relationships in spectral data, enabling much higher compression ratios for massive characterization libraries (McClarren Ch 8.2).
+### 2. Autoencoders for Signal Processing
+- **Latent Representations:** Compresses spectra to essential physical information.
+- **Denoising:** Reconstructs clean signals from noisy inputs without blurring.
+- **Non-linear Compression:** Outperforms PCA for complex spectral libraries.
 
 ### 3. Scientific Integrity in ML
-- **Peak Preservation:** The goal of ML in characterization is to assist the scientist, not replace the physics. Models must be validated to ensure they do not "invent" peaks or smooth away critical structural information.
+- **Peak Preservation:** ML must assist, not invent or smooth away real physics.
 
----
+## 90-Minute Lecture Strategy
 
-## 90-Minute Lecture Strategy (50 Slides)
+### Part 1: High-Dimensional Signals
+- Digital footprint: XRD, EDS, EELS, Raman.
+- Manual vs. automated peak-picking.
+- Vector spectrum representation.
 
-### Part 1: High-Dimensional Signals (Slides 1-10)
-- The digital footprint of materials: XRD, EDS, EELS, and Raman.
-- Why manual peak-picking fails in high-throughput experiments.
-- The "Vector" representation of a spectrum.
+### Part 2: Clustering Structure
+- K-Means algorithm.
+- Elbow Method for phase counting.
+- Ternary alloy mapping.
 
-### Part 2: Discovering Structure with Clustering (Slides 11-20)
-- K-Means: Geometry and Algorithm.
-- The "Elbow Method": Deciding how many phases are in your sample.
-- Case Study: Mapping a ternary alloy system with K-Means.
+### Part 3: Visualizing the Unseen
+- t-SNE Stochastic Proximity.
+- Hidden relationships.
+- t-SNE distance pitfalls.
 
-### Part 3: Visualizing the Unseen (Slides 21-30)
-- t-SNE: The intuition of "Stochastic Proximity."
-- Finding "Hidden" relationships in spectral libraries.
-- Pitfalls: Why t-SNE distances can be misleading.
+### Part 4: Autoencoders & Denoising
+- Encoder-Bottleneck-Decoder.
+- Denoising characterization signals.
+- Bottlenecks as physical descriptors.
 
-### Part 4: Autoencoders & Denoising (Slides 31-45)
-- The Hourglass Architecture: Encoder, Bottleneck, Decoder.
-- Applications: Compressing leaf spectra (McClarren Ch 8.2).
-- Denoising characterization signals: Improving SNR with Deep Learning.
-- Feature extraction: Using the bottleneck as a physical descriptor.
-
-### Part 5: From Data to Discovery (Slides 46-50)
-- Real-time spectral analysis during experiments.
-- Ensuring physical consistency in ML outputs.
-- Summary: The automated characterization pipeline.
-
----
+### Part 5: Data to Discovery
+- Real-time spectral analysis.
+- Physical consistency in ML.
+- Automated pipelines.
 
 ## Quarto Website Update (Summary)
 **Summary for ML-PC Week 10:**  
-This unit focuses on the processing of high-dimensional **Characterization Signals** (like XRD, EDS, and EELS) using unsupervised learning. We introduce **K-Means Clustering** and **t-SNE** for the automatic identification and visualization of phases in large experimental libraries. We then explore **Autoencoders**—neural networks that learn to compress complex spectra into a low-dimensional "latent space." This allows for advanced denoising and feature extraction, enabling scientists to handle the massive data volumes produced by modern high-throughput characterization tools without losing physical insight.
+- Processes high-dimensional Characterization Signals (XRD, EDS).
+- Employs K-Means and t-SNE for automated phase identification.
+- Uses Autoencoders for latent space compression and denoising.
+- Enhances high-throughput data analysis while preserving physics.
@@ -2,52 +2,46 @@
 
 ## Cross-Book Summary
 
-### 1. Multi-Modal Data Fusion (Murphy Ch 11, Neuer Ch 2)
-- **Beyond a Single Sensor:** In modern characterization, we often collect images (SEM), chemistry (EDS), and orientations (EBSD) simultaneously. Fusing these data streams provides a more complete physical picture than any single modality.
-- **Bayesian Sensor Fusion:** A mathematical framework for combining uncertain measurements. If two sensors (e.g., two thermocouples) provide conflicting information, the Bayesian posterior weights them by their respective precisions (inverse variances), allowing for robust state estimation (Murphy Ch 4.6.4).
-- **Latent Fusion:** Using autoencoders or PCA to find a shared low-dimensional embedding where different data types (images and spectra) can be compared and combined (Murphy Ch 19).
+### 1. Multi-Modal Data Fusion
+- **Beyond Single Sensors:** Fuse images (SEM), chemistry (EDS), and orientations (EBSD) for a complete physical picture.
+- **Bayesian Sensor Fusion:** Combines uncertain measurements using precision-weighted posteriors.
+- **Latent Fusion:** Autoencoders/PCA find shared embeddings to combine diverse data types.
 
-### 2. Reinforcement Learning for Control (McClarren Ch 9)
-- **The Autonomous Agent:** In RL, an agent learns to interact with an environment (e.g., a microscope or a furnace) to maximize a reward.
-- **The RL Loop:** State (current image), Action (adjusting focus/stigmation), and Reward (image sharpness/SNR).
-- **Policy Gradients:** A method for training deep neural networks to make a sequence of decisions that lead to an optimal scientific outcome (McClarren Ch 9.1).
-- **Case Study (McClarren):** Using RL to control the complex cooling cycles of glass, demonstrating the transition from monitoring to active control.
+### 2. Reinforcement Learning for Control
+- **Autonomous Agent:** Learns to interact with environments (e.g., microscopes) to maximize rewards.
+- **RL Loop:** State (image), Action (adjust focus), Reward (sharpness/SNR).
+- **Policy Gradients:** Train NNs for optimal scientific decision-making.
 
-### 3. Computer Vision in the Lab (ML-PC Index)
-- **Automated Workflows:** Using CNNs for real-time region-of-interest (ROI) detection, automated autofocus, and high-speed classification of diffraction patterns (e.g., EBSD Kikuchi bands).
+### 3. Computer Vision in the Lab
+- **Automated Workflows:** CNNs for real-time ROI detection, autofocus, and pattern classification.
 
----
+## 90-Minute Lecture Strategy
 
-## 90-Minute Lecture Strategy (50 Slides)
+### Part 1: Toward the Self-Driving Lab
+- The automation stack.
+- Autonomous Characterization: Scan, Analyze, Decide, Repeat.
 
-### Part 1: Toward the Self-Driving Lab (Slides 1-10)
-- The bottleneck of human-operated characterization.
-- The concept of "Autonomous Characterization": Scan, Analyze, Decide, Repeat.
-- Overview of the automation stack.
+### Part 2: ML-Assisted Instrument Tuning
+- Autofocus and Beam Alignment.
+- Real-time feedback loops.
 
-### Part 2: ML-Assisted Instrument Tuning (Slides 11-20)
-- Computer Vision for Autofocus and Beam Alignment.
-- Real-time feedback loops: Turning pixels into control signals.
-- Case Study: Automated EBSD mapping.
+### Part 3: Fusing Multi-Modal Data
+- Bayesian Fusion for sensor noise.
+- Multi-head NNs.
+- Combining XRD and EDS.
 
-### Part 3: Fusing Multi-Modal Data (Slides 21-35)
-- Why fuse? Structure vs. Chemistry vs. Properties.
-- Bayesian Fusion: Handling sensor noise and conflicts (Murphy Ch 11.4).
-- Multi-head NNs for multi-modal classification.
-- Case Study: Combining XRD and EDS for phase identification.
+### Part 4: RL for Lab Control
+- RL Framework overview.
+- Reward Functions for science.
+- Industrial glass processing control.
 
-### Part 4: Reinforcement Learning for Lab Control (Slides 36-45)
-- Introduction to the RL Framework (McClarren Ch 9).
-- Defining Reward Functions for scientific experiments.
-- Case Study: Closing the loop in industrial glass processing.
-
-### Part 5: Summary: The Integrated Pipeline (Slides 46-50)
-- The shift from "Post-mortem" analysis to "On-the-fly" discovery.
-- Challenges: Latency, safety, and physical limits of automation.
-- Summary: The vision of autonomous materials characterization.
-
----
+### Part 5: The Integrated Pipeline
+- "On-the-fly" discovery.
+- Automation challenges: Latency and safety.
 
 ## Quarto Website Update (Summary)
 **Summary for ML-PC Week 11:**  
-This unit explores the cutting edge of **Autonomous Characterization**, where machine learning moves from passive data analysis to active instrument control. We introduce **Multi-Modal Data Fusion** techniques to combine information from diverse sensors like SEM images, EDS spectra, and process logs using Bayesian frameworks. We then discuss **Reinforcement Learning (RL)** as a tool for automating complex laboratory tasks, such as instrument tuning and process optimization. Through case studies in microscopy and industrial processing, students learn how to build integrated pipelines that can autonomously find, characterize, and decide the next steps of an experiment.
+- Explores Autonomous Characterization and active instrument control.
+- Introduces Multi-Modal Data Fusion (Bayesian and Latent).
+- Uses Reinforcement Learning (RL) for laboratory task automation.
+- Details building integrated pipelines for "on-the-fly" scientific discovery.
@@ -2,51 +2,46 @@
 
 ## Cross-Book Summary
 
-### 1. The Value of "Knowing what you don't know" (Neuer Ch 6, Murphy Ch 15)
-- **Epistemic vs. Aleatoric Uncertainty:** 
-  - **Aleatoric:** The inherent randomness in the physical process (e.g., sensor noise).
-  - **Epistemic:** The model's ignorance due to lack of training data in a specific region of the parameter space.
-- **Danger of Overconfidence:** Standard neural networks often provide "point estimates" that can be wildly overconfident when extrapolating into unknown physical regimes.
+### 1. Knowing what you don't know
+- **Aleatoric vs. Epistemic:** Inherent physical noise vs. model ignorance.
+- **Overconfidence Danger:** Point estimates fail safely in unknown regimes; uncertainty metrics are crucial.
 
-### 2. Gaussian Processes (GPs) (Murphy Ch 15, Bishop Ch 6)
-- **Distribution over Functions:** A GP defines a prior over an infinite space of functions. After seeing data, it provides a posterior distribution, yielding both a mean prediction and a variance (uncertainty).
-- **Kernels as Physical Priors:** The kernel function (e.g., Radial Basis Function or Matérn) encodes our assumptions about the smoothness and length scales of the physical phenomenon (Bishop Ch 6.4).
-- **Non-Parametric Nature:** Unlike NNs, GPs don't have a fixed number of parameters; they scale with the number of training points, making them ideal for "small but high-quality" materials datasets.
+### 2. Gaussian Processes (GPs)
+- **Distribution over Functions:** GP yields posterior mean and variance (uncertainty).
+- **Kernels as Physical Priors:** Encodes assumptions about data smoothness/scale.
+- **Non-Parametric Nature:** Scales with data size, ideal for small, high-quality materials datasets.
 
-### 3. GP-Based Process Maps (ML-PC Index)
-- **Confidence Ribbons:** Visualizing the uncertainty allows engineers to see where a process map is reliable and where more experiments are needed.
-- **Kriging:** GP regression is closely related to Kriging, a method long used in geostatistics and now widely applied to interpolate materials property surfaces.
+### 3. GP-Based Process Maps
+- **Confidence Ribbons:** Visualize reliability to guide further experiments.
+- **Kriging:** Interpolates materials property surfaces using GP regression.
 
----
+## 90-Minute Lecture Strategy
 
-## 90-Minute Lecture Strategy (50 Slides)
+### Part 1: Uncertainty in Science
+- Risk management in materials processing.
+- Visualizing distributions and error bars.
 
-### Part 1: Uncertainty in Science (Slides 1-10)
-- Why a single number is never enough.
-- Risk management in materials processing: The cost of being wrong.
-- Visualizing distributions: Histograms, error bars, and density plots.
+### Part 2: GP Fundamentals
+- Function vs. Parameter space.
+- Kernels and "Similarity".
+- Conditional Gaussians and Variance.
 
-### Part 2: Gaussian Process Fundamentals (Slides 11-25)
-- The Bayesian viewpoint: Function space vs. Parameter space.
-- Kernels: How do we define "Similarity" between two material states?
-- The GP Math: Conditional Gaussians and Matrix Inversion.
-- Interpreting the Variance: Where does the "Shaded region" come from?
+### Part 3: GP Case Studies
+- Predicting tensile strength across parameters.
+- GP for Experimental Design.
+- Multi-Task GPs.
 
-### Part 3: GP Case Studies (Slides 26-40)
-- Case Study: Predicting tensile strength across a temperature-strain rate space.
-- GP for Experimental Design: Identifying the "Gaps" in a database.
-- Multi-Task GPs: Sharing information between related properties (e.g., Hardness and Yield Strength).
+### Part 4: Advanced Probabilistic ML
+- Mixture Density Networks (MDNs).
+- Dropout as Bayesian approximation.
 
-### Part 4: Advanced Probabilistic ML (Slides 41-45)
-- Mixture Density Networks (MDNs): Handling multi-modal uncertainties (Neuer Ch 6.4).
-- Dropout as a Bayesian approximation in deep NNs.
-
-### Part 5: Summary: Decision Making Under Uncertainty (Slides 46-50)
-- Using confidence intervals to define "Safe" process windows.
-- Summary: Building models that scientists can trust.
-
----
+### Part 5: Decision Making
+- Safe process windows via confidence intervals.
+- Building trustworthy models.
 
 ## Quarto Website Update (Summary)
 **Summary for ML-PC Week 12:**  
-This unit introduces **Probabilistic Machine Learning**, focusing on the quantification of uncertainty in materials models. We explore why point estimates can be dangerous in engineering and introduce **Gaussian Processes (GPs)** as a powerful tool for uncertainty-aware regression. Students learn how kernels encode physical assumptions about data smoothness and how the resulting predictive distributions can be used to build robust process maps. We also discuss the difference between aleatoric (noise) and epistemic (ignorance) uncertainty and how to use confidence intervals to drive scientific decision-making.
+- Introduces Probabilistic Machine Learning for uncertainty quantification.
+- Differentiates aleatoric (noise) from epistemic (ignorance) uncertainty.
+- Uses Gaussian Processes (GPs) for uncertainty-aware regression.
+- Applies confidence intervals to map robust process windows.