ECLIPSE-Lab
diff --git a/‎.nojekyll‎
Lines changed: 1 addition & 1 deletion b/‎.nojekyll‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_tex/index.tex‎
Lines changed: 22 additions & 8 deletions b/‎_tex/index.tex‎
Lines changed: 22 additions & 8 deletions
diff --git a/‎index-meca.zip‎
-233 KB b/‎index-meca.zip‎
-233 KB
diff --git a/‎index-preview.html‎
Lines changed: 3 additions & 2 deletions b/‎index-preview.html‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎index.docx‎
489 Bytes b/‎index.docx‎
489 Bytes
diff --git a/‎index.embed.ipynb‎
Lines changed: 7 additions & 5 deletions b/‎index.embed.ipynb‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎index.html‎
Lines changed: 31 additions & 1 deletion b/‎index.html‎
Lines changed: 31 additions & 1 deletion
diff --git a/‎index.out.ipynb‎
Lines changed: 7 additions & 5 deletions b/‎index.out.ipynb‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎index.pdf‎
2.53 KB b/‎index.pdf‎
2.53 KB
@@ -1 +1 @@
-01bdab7e
+b00719d5
@@ -350,6 +350,9 @@ \subsubsection{Unit I --- Experimental Data as a Learning Problem (Weeks
 \emph{Lecture: Tuesday, 28.04.2026, 14:15-15:45 \textbar{} Exercise:
 Thursday, 30.04.2026, 16:15-17:45}
 
+\textbf{Slides:}
+\href{https://pelzlab.science/public_presentations/ml_for_characterization_and_processing/unit03_data_quality/01_intro.html}{Open}
+
 \begin{itemize}
 \tightlist
 \item
@@ -360,14 +363,25 @@ \subsubsection{Unit I --- Experimental Data as a Learning Problem (Weeks
   Why ``good accuracy'' often means a broken pipeline.
 \end{itemize}
 
-\textbf{Summary:} This unit focuses on the most critical and often
-overlooked part of the ML pipeline: data integrity. We discuss
-systematic data cleaning and normalization techniques while highlighting
-the unique challenges of labeling experimental materials data, such as
-inter-annotator variance. A major focus is on \textbf{Data Leakage},
-specifically how spatial and physical correlations in materials samples
-can lead to deceptively high model performance. We introduce robust
-validation strategies to ensure models generalize to truly unseen data.
+\textbf{Summary:} This unit covers the often-overlooked half of an ML
+pipeline: data integrity, validation, and how performance is measured.
+We start with the measurement chain and systematic \textbf{data
+cleaning} --- handling missing values, outliers, and duplicates with a
+``fix at source'' mindset. We then build the \textbf{transformation
+toolbox}: centering, min--max and z-score scaling, physics-aware
+non-dimensionalisation, log transforms, differentiation, and
+frequency-domain views (FFT, triggering for time series). On the
+supervision side we examine \textbf{labels and uncertainty} ---
+inter-annotator variance, probabilistic labels, and a Bayesian view of
+priors, likelihoods, and posteriors --- and then formalize the
+\textbf{bias--variance} tradeoff with parsimony and regularization. A
+major focus is \textbf{Data Leakage} in materials workflows
+(pre-processing, temporal, and group/spatial), tackled with proper
+holdout, K-fold, LOOCV, and stratified validation. We close with the
+\textbf{error measures} that decide what ``good'' actually means:
+MAE/MSE/RMSE and \(R^2\) for regression, and confusion matrices,
+precision/recall, F1/Dice, IoU, and categorical cross-entropy for
+classification and segmentation.
 
 \textbf{Exercise:}\\
 Construct a deliberately flawed ML pipeline and diagnose its failure.
 
@@ -153,7 +153,7 @@
   window.document.addEventListener("DOMContentLoaded", function (_event) {
     document.body.classList.add('hypothesis-enabled');
   });
-</script>  
+</script>   <script defer="" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script> 
         <link rel="stylesheet" href="styles.css">
       <meta name="citation_title" content="Machine Learning in Materials Processing &amp;amp; Characterization">
 <meta name="citation_abstract" content="This course teaches how machine learning can be applied to experimental data
@@ -401,12 +401,13 @@ <h4 data-number="1.3.1.2" class="anchored" data-anchor-id="week-2-physics-of-dat
 <section id="week-3-data-quality-labels-and-leakage" class="level4" data-number="1.3.1.3">
 <h4 data-number="1.3.1.3" class="anchored" data-anchor-id="week-3-data-quality-labels-and-leakage"><span class="header-section-number">1.3.1.3</span> Week 3 – Data quality, labels, and leakage</h4>
 <p><em>Lecture: Tuesday, 28.04.2026, 14:15-15:45 | Exercise: Thursday, 30.04.2026, 16:15-17:45</em></p>
+<p><strong>Slides:</strong> <a href="https://pelzlab.science/public_presentations/ml_for_characterization_and_processing/unit03_data_quality/01_intro.html">Open</a></p>
 <ul>
 <li>Annotation uncertainty and inter-annotator variance.</li>
 <li>Train/test leakage in materials workflows.</li>
 <li>Why “good accuracy” often means a broken pipeline.</li>
 </ul>
-<p><strong>Summary:</strong> This unit focuses on the most critical and often overlooked part of the ML pipeline: data integrity. We discuss systematic data cleaning and normalization techniques while highlighting the unique challenges of labeling experimental materials data, such as inter-annotator variance. A major focus is on <strong>Data Leakage</strong>, specifically how spatial and physical correlations in materials samples can lead to deceptively high model performance. We introduce robust validation strategies to ensure models generalize to truly unseen data.</p>
+<p><strong>Summary:</strong> This unit covers the often-overlooked half of an ML pipeline: data integrity, validation, and how performance is measured. We start with the measurement chain and systematic <strong>data cleaning</strong> — handling missing values, outliers, and duplicates with a “fix at source” mindset. We then build the <strong>transformation toolbox</strong>: centering, min–max and z-score scaling, physics-aware non-dimensionalisation, log transforms, differentiation, and frequency-domain views (FFT, triggering for time series). On the supervision side we examine <strong>labels and uncertainty</strong> — inter-annotator variance, probabilistic labels, and a Bayesian view of priors, likelihoods, and posteriors — and then formalize the <strong>bias–variance</strong> tradeoff with parsimony and regularization. A major focus is <strong>Data Leakage</strong> in materials workflows (pre-processing, temporal, and group/spatial), tackled with proper holdout, K-fold, LOOCV, and stratified validation. We close with the <strong>error measures</strong> that decide what “good” actually means: MAE/MSE/RMSE and <span class="math inline">\(R^2\)</span> for regression, and confusion matrices, precision/recall, F1/Dice, IoU, and categorical cross-entropy for classification and segmentation.</p>
 <p><strong>Exercise:</strong><br>
 Construct a deliberately flawed ML pipeline and diagnose its failure.</p>
 <hr>
 
@@ -13,7 +13,7 @@
     "\n",
     "This course teaches how machine learning can be applied to experimental data from materials processing and characterization. The focus lies on images, spectra, time-series, and processing parameters, and on understanding how physical data formation interacts with learning algorithms. Students learn to build robust, uncertainty-aware ML pipelines for real experimental workflows, avoiding common pitfalls such as data leakage, overfitting, and spurious correlations."
    ],
-   "id": "f51de559-b328-412e-a061-d83943d738fd"
+   "id": "ec8e522f-188b-40d4-8405-b13ac06c92bf"
   },
   {
    "cell_type": "raw",
@@ -76,7 +76,7 @@
     "}\n",
     "</style>"
    ],
-   "id": "4d0b7fd1-a823-4047-905a-375f327a501c"
+   "id": "91fb63c7-5649-4983-b27c-f804838b71f7"
   },
   {
    "cell_type": "raw",
@@ -106,7 +106,7 @@
     "  <strong>How to use this course site.</strong> Use this page as the central hub for syllabus, lecture structure, reading, notebooks, and course materials. Formal announcements and enrollment remain on StudOn; code and openly shared resources live in the linked GitHub repository.\n",
     "</div>"
    ],
-   "id": "df4fb92a-b630-4c8a-9316-7f88737ed552"
+   "id": "927882ed-8b4d-4699-aede-5b041368f773"
   },
   {
    "cell_type": "markdown",
@@ -176,11 +176,13 @@
     "\n",
     "*Lecture: Tuesday, 28.04.2026, 14:15-15:45 \\| Exercise: Thursday, 30.04.2026, 16:15-17:45*\n",
     "\n",
+    "**Slides:** [Open](https://pelzlab.science/public_presentations/ml_for_characterization_and_processing/unit03_data_quality/01_intro.html)\n",
+    "\n",
     "- Annotation uncertainty and inter-annotator variance.\n",
     "- Train/test leakage in materials workflows.\n",
     "- Why “good accuracy” often means a broken pipeline.\n",
     "\n",
-    "**Summary:** This unit focuses on the most critical and often overlooked part of the ML pipeline: data integrity. We discuss systematic data cleaning and normalization techniques while highlighting the unique challenges of labeling experimental materials data, such as inter-annotator variance. A major focus is on **Data Leakage**, specifically how spatial and physical correlations in materials samples can lead to deceptively high model performance. We introduce robust validation strategies to ensure models generalize to truly unseen data.\n",
+    "**Summary:** This unit covers the often-overlooked half of an ML pipeline: data integrity, validation, and how performance is measured. We start with the measurement chain and systematic **data cleaning** — handling missing values, outliers, and duplicates with a “fix at source” mindset. We then build the **transformation toolbox**: centering, min–max and z-score scaling, physics-aware non-dimensionalisation, log transforms, differentiation, and frequency-domain views (FFT, triggering for time series). On the supervision side we examine **labels and uncertainty** — inter-annotator variance, probabilistic labels, and a Bayesian view of priors, likelihoods, and posteriors — and then formalize the **bias–variance** tradeoff with parsimony and regularization. A major focus is **Data Leakage** in materials workflows (pre-processing, temporal, and group/spatial), tackled with proper holdout, K-fold, LOOCV, and stratified validation. We close with the **error measures** that decide what “good” actually means: MAE/MSE/RMSE and $R^2$ for regression, and confusion matrices, precision/recall, F1/Dice, IoU, and categorical cross-entropy for classification and segmentation.\n",
     "\n",
     "**Exercise:**  \n",
     "Construct a deliberately flawed ML pipeline and diagnose its failure.\n",
@@ -383,7 +385,7 @@
     "\n",
     "**Summary:** This unit explores the cutting edge of **Autonomous Characterization**, where machine learning moves from passive data analysis to active instrument control. We introduce **Multi-Modal Data Fusion** techniques to combine information from diverse sensors like SEM images, EDS spectra, and process logs using Bayesian frameworks. We then discuss **Reinforcement Learning (RL)** as a tool for automating complex laboratory tasks, such as instrument tuning and process optimization. Through case studies in microscopy and industrial processing, students learn how to build integrated pipelines that can autonomously find, characterize, and decide the next steps of an experiment."
    ],
-   "id": "56ea3dd0-e7f8-4777-b8b1-7b73ce006df2"
+   "id": "80a417a5-c2df-4d28-9de8-2be3b782a69e"
   }
  ],
  "nbformat": 4,
 
@@ -66,6 +66,35 @@
   });
 </script>
 
+  <script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?features=es6"></script>
+  <script defer="" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script>
+
+<script type="text/javascript">
+const typesetMath = (el) => {
+  if (window.MathJax) {
+    // MathJax Typeset
+    window.MathJax.typeset([el]);
+  } else if (window.katex) {
+    // KaTeX Render
+    var mathElements = el.getElementsByClassName("math");
+    var macros = [];
+    for (var i = 0; i < mathElements.length; i++) {
+      var texText = mathElements[i].firstChild;
+      if (mathElements[i].tagName == "SPAN" && texText && texText.data) {
+        window.katex.render(texText.data, mathElements[i], {
+          displayMode: mathElements[i].classList.contains('display'),
+          throwOnError: false,
+          macros: macros,
+          fleqn: false
+        });
+      }
+    }
+  }
+}
+window.Quarto = {
+  typesetMath
+};
+</script>
 
 <link rel="stylesheet" href="styles.css">
 <meta name="citation_title" content="Machine Learning in Materials Processing &amp;amp; Characterization">
@@ -312,12 +341,13 @@ <h4 data-number="1.3.1.2" class="anchored" data-anchor-id="week-2-physics-of-dat
 <section id="week-3-data-quality-labels-and-leakage" class="level4" data-number="1.3.1.3">
 <h4 data-number="1.3.1.3" class="anchored" data-anchor-id="week-3-data-quality-labels-and-leakage"><span class="header-section-number">1.3.1.3</span> Week 3 – Data quality, labels, and leakage</h4>
 <p><em>Lecture: Tuesday, 28.04.2026, 14:15-15:45 | Exercise: Thursday, 30.04.2026, 16:15-17:45</em></p>
+<p><strong>Slides:</strong> <a href="https://pelzlab.science/public_presentations/ml_for_characterization_and_processing/unit03_data_quality/01_intro.html">Open</a></p>
 <ul>
 <li>Annotation uncertainty and inter-annotator variance.</li>
 <li>Train/test leakage in materials workflows.</li>
 <li>Why “good accuracy” often means a broken pipeline.</li>
 </ul>
-<p><strong>Summary:</strong> This unit focuses on the most critical and often overlooked part of the ML pipeline: data integrity. We discuss systematic data cleaning and normalization techniques while highlighting the unique challenges of labeling experimental materials data, such as inter-annotator variance. A major focus is on <strong>Data Leakage</strong>, specifically how spatial and physical correlations in materials samples can lead to deceptively high model performance. We introduce robust validation strategies to ensure models generalize to truly unseen data.</p>
+<p><strong>Summary:</strong> This unit covers the often-overlooked half of an ML pipeline: data integrity, validation, and how performance is measured. We start with the measurement chain and systematic <strong>data cleaning</strong> — handling missing values, outliers, and duplicates with a “fix at source” mindset. We then build the <strong>transformation toolbox</strong>: centering, min–max and z-score scaling, physics-aware non-dimensionalisation, log transforms, differentiation, and frequency-domain views (FFT, triggering for time series). On the supervision side we examine <strong>labels and uncertainty</strong> — inter-annotator variance, probabilistic labels, and a Bayesian view of priors, likelihoods, and posteriors — and then formalize the <strong>bias–variance</strong> tradeoff with parsimony and regularization. A major focus is <strong>Data Leakage</strong> in materials workflows (pre-processing, temporal, and group/spatial), tackled with proper holdout, K-fold, LOOCV, and stratified validation. We close with the <strong>error measures</strong> that decide what “good” actually means: MAE/MSE/RMSE and <span class="math inline">\(R^2\)</span> for regression, and confusion matrices, precision/recall, F1/Dice, IoU, and categorical cross-entropy for classification and segmentation.</p>
 <p><strong>Exercise:</strong><br>
 Construct a deliberately flawed ML pipeline and diagnose its failure.</p>
 <hr>
 
@@ -13,7 +13,7 @@
     "\n",
     "This course teaches how machine learning can be applied to experimental data from materials processing and characterization. The focus lies on images, spectra, time-series, and processing parameters, and on understanding how physical data formation interacts with learning algorithms. Students learn to build robust, uncertainty-aware ML pipelines for real experimental workflows, avoiding common pitfalls such as data leakage, overfitting, and spurious correlations."
    ],
-   "id": "b73b41b1-31ba-4784-92f9-8084868a8922"
+   "id": "f23b30ad-8a34-41cd-bf2e-e8074d2ca14b"
   },
   {
    "cell_type": "raw",
@@ -76,7 +76,7 @@
     "}\n",
     "</style>"
    ],
-   "id": "fe078047-86ca-4ab4-8bdf-248b53f0e63e"
+   "id": "99dc06b5-04b5-4a15-b07b-c8f865219c08"
   },
   {
    "cell_type": "raw",
@@ -106,7 +106,7 @@
     "  <strong>How to use this course site.</strong> Use this page as the central hub for syllabus, lecture structure, reading, notebooks, and course materials. Formal announcements and enrollment remain on StudOn; code and openly shared resources live in the linked GitHub repository.\n",
     "</div>"
    ],
-   "id": "6ffb6083-0168-465b-b006-157d7832997e"
+   "id": "8cd1a24b-7f9c-4901-82b2-a1f956ed0dda"
   },
   {
    "cell_type": "markdown",
@@ -176,11 +176,13 @@
     "\n",
     "*Lecture: Tuesday, 28.04.2026, 14:15-15:45 \\| Exercise: Thursday, 30.04.2026, 16:15-17:45*\n",
     "\n",
+    "**Slides:** [Open](https://pelzlab.science/public_presentations/ml_for_characterization_and_processing/unit03_data_quality/01_intro.html)\n",
+    "\n",
     "- Annotation uncertainty and inter-annotator variance.\n",
     "- Train/test leakage in materials workflows.\n",
     "- Why “good accuracy” often means a broken pipeline.\n",
     "\n",
-    "**Summary:** This unit focuses on the most critical and often overlooked part of the ML pipeline: data integrity. We discuss systematic data cleaning and normalization techniques while highlighting the unique challenges of labeling experimental materials data, such as inter-annotator variance. A major focus is on **Data Leakage**, specifically how spatial and physical correlations in materials samples can lead to deceptively high model performance. We introduce robust validation strategies to ensure models generalize to truly unseen data.\n",
+    "**Summary:** This unit covers the often-overlooked half of an ML pipeline: data integrity, validation, and how performance is measured. We start with the measurement chain and systematic **data cleaning** — handling missing values, outliers, and duplicates with a “fix at source” mindset. We then build the **transformation toolbox**: centering, min–max and z-score scaling, physics-aware non-dimensionalisation, log transforms, differentiation, and frequency-domain views (FFT, triggering for time series). On the supervision side we examine **labels and uncertainty** — inter-annotator variance, probabilistic labels, and a Bayesian view of priors, likelihoods, and posteriors — and then formalize the **bias–variance** tradeoff with parsimony and regularization. A major focus is **Data Leakage** in materials workflows (pre-processing, temporal, and group/spatial), tackled with proper holdout, K-fold, LOOCV, and stratified validation. We close with the **error measures** that decide what “good” actually means: MAE/MSE/RMSE and $R^2$ for regression, and confusion matrices, precision/recall, F1/Dice, IoU, and categorical cross-entropy for classification and segmentation.\n",
     "\n",
     "**Exercise:**  \n",
     "Construct a deliberately flawed ML pipeline and diagnose its failure.\n",
@@ -385,7 +387,7 @@
     "\n",
     "Sandfeld, Stefan. 2024. *Materials Data Science: Introduction to Data Mining, Machine Learning, and Data-Driven Predictions for Materials Science and Engineering*. Springer Nature."
    ],
-   "id": "921109e0-2219-40d8-b8a9-9398719dd610"
+   "id": "ab3eae36-98a4-4c35-8a83-b928deea2b5d"
   }
  ],
  "nbformat": 4,