From a9e5c15c86b465c2a0c80415c7d4e4a5ebb25cbc Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 11:18:21 +0300
Subject: [PATCH 01/12] Trim CLAUDE.md and split out architecture + tutorial
 conventions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CLAUDE.md was 393 lines and contained content Claude could infer
from code (import paths, full directory tree, weight compatibility
example, Switch Configuration JSON). Applying the Anthropic best-
practice test — "would removing this cause Claude to make mistakes?"
— cut it to 204 lines.

Key changes:
- Remove Import Paths, full Project Structure tree, Architecture
  section, Weight Compatibility, and Switch Configuration JSON
- Fix the SingleSwitch description: the old text claimed "N
  transformer layers + linear projection head + ~1-2% of parameters",
  all of which are wrong. Actual implementation is a single attention
  head with one-hot dim-0 pattern and attention-based cumsum, with
  negligible parameter cost
- Architecture theory moved to docs/ARCHITECTURE.md so it stays
  accessible but doesn't bloat the per-session context
- Create tutorials/CLAUDE.md: Claude loads it automatically when
  reading any tutorials/ file, keeping notebook conventions
  (cell ordering, HF login cell, duration comments, utility modules)
  scoped to the directory where they apply
---
 CLAUDE.md            | 217 +++----------------------------------------
 docs/ARCHITECTURE.md |  76 +++++++++++++++
 tutorials/CLAUDE.md  |  40 ++++++++
 3 files changed, 130 insertions(+), 203 deletions(-)
 create mode 100644 docs/ARCHITECTURE.md
 create mode 100644 tutorials/CLAUDE.md

diff --git a/CLAUDE.md b/CLAUDE.md
index 404065e..5c9ab89 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,116 +4,22 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Repository Overview
 
-**granite-switch** implements **Granite Switch**, a system for building and deploying Granite models with embedded LoRA adapters. The system is a single unified Python package (`granite_switch`) with optional extras for different backends.
-
-1. **Building models with embedded adapters** - Combine a base Granite model with multiple LoRA adapters into a single checkpoint
-2. **Automatic adapter control** - Activate adapters via special control tokens or chat templates
-3. **Fast inference** - Deploy with vLLM for speedup over standard HuggingFace inference
-4. **Optional trainable switching** - Train a router to automatically select adapters per-token
+**granite-switch** is a single Python package (`granite_switch`) for building and deploying Granite models with embedded LoRA adapters. Two backends share the same weight format: `granite_switch.hf` (HuggingFace, training) and `granite_switch.vllm` (production inference, 10-20x speedup via Punica kernels + PagedAttention).
 
 ## Project Structure
 
-```
-granite-switch/
-├── pyproject.toml                       # Single package definition with optional extras
-├── src/
-│   └── granite_switch/                  # Unified package
-│       ├── __init__.py                  # Core exports (GraniteSwitchConfig, __version__)
-│       ├── config.py                    # Unified GraniteSwitchConfig
-│       │
-│       ├── composer/                    # Compose system (requires [compose] extra)
-│       │   ├── __init__.py
-│       │   ├── adapter_discovery.py     # Adapter discovery and resolution
-│       │   ├── adapter_loader.py        # Adapter weight loading
-│       │   ├── arch.py                  # Architecture definitions
-│       │   ├── compose_granite_switch.py  # Main compose script (CLI entry point)
-│       │   ├── compose_utils.py           # GraniteSwitchComposer class
-│       │   ├── tokenizer_setup.py       # Tokenizer configuration for control tokens
-│       │   ├── validator.py             # Compose validation checks
-│       │   ├── weight_remapper.py       # Adapter name remapping (AdapterRemapper)
-│       │   ├── weight_transfer.py       # Base model weight transfer
-│       │   └── reporting/               # Compose reporting utilities
-│       │       ├── __init__.py
-│       │       ├── adapter_analysis.py
-│       │       ├── compose_report.py
-│       │       ├── hiding_constant_report.py
-│       │       ├── model_card.py
-│       │       └── population_table.py
-│       │
-│       ├── hf/                          # HuggingFace backend (requires [hf] extra)
-│       │   ├── __init__.py              # Registers with transformers AutoConfig/AutoModel
-│       │   ├── modeling_granite_switch.py
-│       │   ├── core/
-│       │   │   ├── __init__.py
-│       │   │   └── lora.py              # SwitchedLoRALinear, MergedSwitchedLoRALinear
-│       │   └── switch/
-│       │       ├── __init__.py
-│       │       └── single.py            # SingleSwitch (HF attention backends)
-│       │
-│       └── vllm/                        # vLLM backend (requires [vllm] extra)
-│           ├── __init__.py              # register() for vLLM plugin system
-│           ├── granite_switch_model.py
-│           ├── core/
-│           │   ├── __init__.py
-│           │   ├── lora.py              # SwitchedLoRALinear (Punica kernels)
-│           │   ├── lora_kernel_meta.py
-│           │   └── decoder.py           # Decoder layers
-│           └── switch/
-│               ├── __init__.py
-│               └── single.py            # SingleSwitch (vLLM Attention)
-│
-├── tests/                               # All tests
-│   ├── unit/                            # Unit tests (fastest, CPU)
-│   ├── hf/                              # HuggingFace-specific tests
-│   ├── vllm/                            # vLLM-specific tests
-│   ├── composer/                        # Compose system tests
-│   ├── integration/                     # Cross-backend integration tests
-│   ├── regression/                      # Regression tests (hf/, vllm/, integration/, shared/, tools/)
-│   └── shared/                          # Shared test utilities and parametrized cases
-│
-├── scratch/                             # Throwaway debug/diagnostic scripts (gitignored)
-├── docs/                                # Documentation
-├── tutorials/                           # Tutorials and how-to guides
-├── CLAUDE.md                            # This file
-└── README.md
-```
+Key layout rules — full tree via `find src/` or `find tests/`:
+
+- `src/granite_switch/` — unified package; `composer/`, `hf/`, `vllm/` match the optional extras
+- `tests/` — official test suite only; subdirs: `unit/`, `hf/`, `vllm/`, `composer/`, `integration/`, `regression/`, `shared/`
+- `scratch/` — gitignored; use this for throwaway diagnostic scripts (not `tests/`)
+- `tutorials/` — notebooks and guides; see `tutorials/CLAUDE.md` for conventions
 
 ## Installation (local/dev)
 
 ```bash
-# Core package only (config)
-pip install -e .
-
-# With HuggingFace backend
-pip install -e ".[hf]"
-
-# With vLLM backend
-pip install -e ".[vllm]"
-
-# With compose tools
-pip install -e ".[compose]"
-
-# Everything (development)
-pip install -e ".[dev]"
-```
-
-## Import Paths
-
-```python
-# Config (shared by all backends)
-from granite_switch import GraniteSwitchConfig
-from granite_switch.config import GraniteSwitchConfig  # equivalent
-
-# HuggingFace backend
-from granite_switch.hf import GraniteSwitchForCausalLM
-from granite_switch.hf.core.lora import SwitchedLoRALinear
-from granite_switch.hf.switch.single import SingleSwitch
-
-# vLLM backend (auto-registered via plugin entry point)
-from granite_switch.vllm import register
-
-# Compose system
-from granite_switch.composer import GraniteSwitchComposer
+pip install -e ".[dev]"         # everything (recommended for development)
+pip install -e ".[hf,compose]"  # HF + composer only (no vLLM)
 ```
 
 ## File Organization Convention
@@ -154,25 +60,13 @@ debugging, or exploratory scripts in `tests/`. Use `scratch/` instead (it is git
 ### Composing Models
 
 ```bash
-# Compose with HuggingFace adapters
 python -m granite_switch.composer.compose_granite_switch \
   --adapters ibm-granite/granitelib-rag-r1.0
-
-# Multiple adapters
-python -m granite_switch.composer.compose_granite_switch \
-  --adapters ibm-granite/granitelib-rag-r1.0 your-org/extra-adapter
-
-# Custom output directory
-python -m granite_switch.composer.compose_granite_switch \
-  --adapters ibm-granite/granitelib-rag-r1.0 --output ./my-custom-model
 ```
 
 ### Testing
 
-**Always use `-v -s --tb=short`** when running tests. `-v` (verbose) prints each test name as
-it starts, giving real-time progress visibility. `-s` disables output capture so `print()`
-statements inside tests appear immediately instead of being swallowed. Without these, long-running
-test files produce no output until they finish. `-x` (fail fast) stops on the first failure —
+**Always use `-v -s --tb=short`** when running tests. `-x` (fail fast) stops on the first failure —
 no point running 200 more tests after something breaks.
 
 **Check GPU availability first** — the underlying hardware can change between sessions:
@@ -181,9 +75,6 @@ no point running 200 more tests after something breaks.
 python -c "import torch; print('GPU' if torch.cuda.is_available() else 'CPU only')"
 ```
 
-This determines which tests can run. vLLM and integration tests require a GPU; unit and HF tests
-run on CPU.
-
 **Run tests incrementally by directory**, in order of speed — don't run the full suite as a
 single command:
 
@@ -201,9 +92,6 @@ pytest tests/vllm/test_model_forward.py -v -s --tb=short -x
 
 # 4. Integration tests last (slowest, GPU required)
 pytest tests/integration/ -v -s --tb=short -x
-
-# Run a specific test pattern when debugging
-pytest tests/ -k "pattern" -v -s --tb=short -x
 ```
 
 ### vLLM Deployment
@@ -221,90 +109,14 @@ python -m vllm.entrypoints.openai.api_server \
   --port 8000
 ```
 
-## Architecture
-
-### Granite Switch Model
-
-The Granite Switch extends the base Granite model with:
-
-1. **Embedded LoRA Adapters** (frozen during inference)
-   - Multiple task/domain-specific adapters embedded in the same checkpoint
-   - Each adapter has LoRA weights (lora_A, lora_B) stacked in tensors
-   - Controlled via special tokens or router-selected indices
-
-2. **Control Tokens**
-   - Each adapter has a control token `<|adapter|>` that fires the switch
-   - KV hiding uses group-based control dimensions (K=finfo.min, Q=per-adapter policy)
-   - Control tokens are KV-hidden to prevent cross-request interference
-
-3. **Chat Template Integration**
-   - Maps adapter names to control tokens
-   - Automatic token placement based on adapter type (ALORA vs LORA)
-
-4. **Optional Trainable Router** (SingleSwitch)
-   - N transformer layers that compute adapter indices per-token
-   - Linear projection head to num_adapters dimensions
-   - ~1-2% of total model parameters
-
-### Two Backends
-
-#### HuggingFace Backend (`granite_switch.hf`)
-
-**Purpose**: Model building and optional router training
-
-- Full `transformers` integration (`PreTrainedModel`, `GenerationMixin`)
-- Training with `Trainer` API
-- Standard PyTorch operations
-
-#### vLLM Backend (`granite_switch.vllm`)
-
-**Purpose**: Fast production inference (10-20x speedup)
-
-- Punica kernels for optimized LoRA computation
-- PagedAttention for efficient KV cache
-- Continuous batching, tensor/pipeline parallelism
-- OpenAI-compatible API server
-
-### Weight Compatibility
-
-Both backends share the same weight format:
-
-```python
-# Built/trained with HuggingFace
-model_hf.save_pretrained("./checkpoint")
-
-# Loaded directly with vLLM
-llm = LLM(model="./checkpoint")
-```
-
 ## Key Configuration Parameters
 
-### Granite-Specific Parameters
-
 - **`attention_multiplier`**: Attention score scaling (instead of `1/sqrt(head_dim)`)
 - **`logits_scaling`**: Applied to final logits (main architectural difference with Llama)
 - **`residual_multiplier`**: Applied to residual connections
 - **`embedding_multiplier`**: Applied to input embeddings
 
-Always use config values - never hardcode these parameters.
-
-### Switch Configuration
-
-```json
-{
-  "model_type": "granite_switch",
-  "architectures": ["GraniteSwitchForCausalLM"],
-  "num_adapters": 4,
-  "adapter_token_ids": [100, 101, 102, 103],
-  "adapter_names": ["adapter_0", "adapter_1", "adapter_2", "adapter_3"],
-  "hiding_groups": {"all_controls": ["adapter_0", "adapter_1", "adapter_2", "adapter_3"]},
-  "hiding_policy": {"base": ["all_controls"], "adapter_0": ["all_controls"], "...": "..."},
-  "lora_rank": 8,
-  "lora_alpha": 8.0,
-  "switch_head_dim": 32,
-  "control_dims": 32
-}
-```
+Always use config values — never hardcode these parameters.
 
 ## Common Gotchas
 
@@ -335,8 +147,8 @@ Always load from config, never hardcode.
 ### 5. End-to-End Tests Must Use Compose Infrastructure
 
 No test should manually assemble `GraniteSwitchConfig` or call `transfer_base_weights`
-directly.  All model construction must go through `GraniteSwitchComposer` so that the
-compose pipeline itself is what's being tested.  If the composer can't handle a use case
+directly. All model construction must go through `GraniteSwitchComposer` so that the
+compose pipeline itself is what's being tested. If the composer can't handle a use case
 (e.g., zero-adapter skinning), extend the composer — don't work around it in tests.
 
 ### 6. HF Attention Backends and Causal Masking
@@ -372,6 +184,7 @@ skipped for this reason.
 
 ## Documentation
 
+- `docs/ARCHITECTURE.md` - Architecture overview (control tokens, backends, SingleSwitch)
 - `docs/GIT_WORKFLOW.md` - Git branching strategy and commit guidelines
 - `docs/SUPPORTED_MODELS.md` - Model compatibility
 
@@ -379,8 +192,6 @@ skipped for this reason.
 
 **See [docs/GIT_WORKFLOW.md](docs/GIT_WORKFLOW.md) for complete git workflow guidelines.**
 
-**Quick reference:**
-
 - **Branch naming**: `feature/ticket-ID-description` or `bugfix/ticket-ID-description`
 - **Workflow**: Branch from `main` → develop → rebase → PR → merge → delete branch
 - **Critical**: Always verify comments match code before committing (see GIT_WORKFLOW.md)
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
new file mode 100644
index 0000000..66401bb
--- /dev/null
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,76 @@
+# Architecture
+
+## Granite Switch Model
+
+The Granite Switch extends the base Granite model with:
+
+### 1. Embedded LoRA Adapters (frozen during inference)
+
+Multiple task/domain-specific adapters are embedded in the same checkpoint. Each adapter has
+LoRA weights (`lora_A`, `lora_B`) stacked in tensors and is activated via special control tokens
+or router-selected indices.
+
+### 2. Control Tokens
+
+Each adapter has a control token `<|adapter|>` that fires the switch. KV hiding uses
+group-based control dimensions (`K=finfo.min`, `Q=per-adapter policy`). Control tokens are
+KV-hidden to prevent cross-request interference.
+
+### 3. Chat Template Integration
+
+The tokenizer chat template maps adapter names to control tokens and places them automatically
+based on adapter type:
+
+- **ALORA adapters**: token placed either in the user message (by matching the invocation
+  sequence) or right before the generation prompt
+- **LORA adapters**: token placed at sequence beginning
+
+### 4. Optional Trainable Router (SingleSwitch)
+
+SingleSwitch is a single attention head that uses a one-hot dim-0 pattern to compute per-token
+adapter indices via attention-based cumsum. It has no decoder layers and no projection head —
+only a vocab-size lookup table, so parameter cost is negligible relative to the full model.
+
+---
+
+## Two Backends
+
+Both backends share the same checkpoint format (`save_pretrained` / `from_pretrained`).
+
+### HuggingFace Backend (`granite_switch.hf`)
+
+Full `transformers` integration (`PreTrainedModel`, `GenerationMixin`). Used for training and
+debugging. Uses fused QKV and gate-up projections, which changes floating-point reduction order
+relative to the upstream `GraniteMoeHybridForCausalLM` (see Common Gotchas #9 in `CLAUDE.md`).
+
+### vLLM Backend (`granite_switch.vllm`)
+
+Production inference backend (10-20x speedup). Uses Punica kernels for optimized LoRA
+computation, PagedAttention for efficient KV cache, and supports continuous batching and
+tensor/pipeline parallelism. Registered as a vLLM plugin via the `granite_switch.vllm` entry point.
+
+---
+
+## Key Configuration Fields
+
+These fields are specific to Granite Switch and not present in base Granite:
+
+| Field | Description |
+|---|---|
+| `num_adapters` | Number of embedded LoRA adapters |
+| `adapter_token_ids` | Token IDs for each adapter's control token |
+| `adapter_names` | Human-readable names for each adapter |
+| `hiding_groups` | Named groups of adapters for KV hiding |
+| `hiding_policy` | Per-adapter KV hiding rules |
+| `lora_rank` | LoRA rank (same for all adapters) |
+| `lora_alpha` | LoRA alpha scaling factor |
+| `control_dims` | Number of KV dimensions reserved for control |
+
+### Granite-Specific Parameters (inherited from base model)
+
+- **`attention_multiplier`**: Attention score scaling (replaces `1/sqrt(head_dim)`)
+- **`logits_scaling`**: Applied to final logits (main architectural difference with Llama)
+- **`residual_multiplier`**: Applied to residual connections
+- **`embedding_multiplier`**: Applied to input embeddings
+
+Always load these from config — never hardcode.
diff --git a/tutorials/CLAUDE.md b/tutorials/CLAUDE.md
new file mode 100644
index 0000000..7306b5c
--- /dev/null
+++ b/tutorials/CLAUDE.md
@@ -0,0 +1,40 @@
+# CLAUDE.md — tutorials/
+
+This file provides guidance when working on notebooks and guides in this directory.
+Claude loads it automatically when reading any file under `tutorials/`.
+
+## Notebook Cell Ordering
+
+Every notebook follows this cell order:
+
+1. `%pip install ...` — dependencies
+2. HF login cell (see below)
+3. Imports
+4. Configuration (model path, ports, constants)
+5. Long-running steps (corpus build, model load, vLLM launch)
+
+## HF Login Cell
+
+Every notebook that downloads gated HF models (`ibm-granite/`) must have a dedicated cell
+immediately after pip install:
+
+```python
+from huggingface_hub import notebook_login
+notebook_login()  # needed to pull ibm-granite models from the Hub
+```
+
+Use cell id `hf-login-call` for consistency.
+
+## Duration Comments
+
+Add `# Estimated duration: ~2 min on A100, ~7 min on T4` to cells that download models or
+launch vLLM. Put these in **notebook cells only** — not in code files under `src/`.
+
+## Utility Modules
+
+These live in `src/granite_switch/tutorials/` and are imported by notebooks:
+
+- `vllm_server.py` — `launch_vllm()`, `wait_for_server()` (reads the vLLM log and prints
+  stage-based progress), `kill_stale_vllm_processes()`
+- `chroma_loader.py` — `load_or_build_chroma()`: builds corpus on GPU, frees GPU memory with
+  `torch.cuda.empty_cache()`, then switches to CPU for queries so vLLM can use the full GPU

From db0d2b504ccf9e0dae0bc811240924950a0f0fb9 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 11:41:51 +0300
Subject: [PATCH 02/12] Add validate-links and tutorial-notebook skills

Installs two Claude Code skills in .claude/skills/:

- validate-links: scans all .ipynb/.md/.py files for broken local
  links, stale labels (link text names the wrong file), and broken
  first-party imports after renames or restructuring. Proposes fixes;
  never edits without user confirmation.

- tutorial-notebook: 15-item checklist + template for polishing or
  creating tutorial notebooks. Covers structure, correctness bugs,
  imports, comments, diagrams, demo coverage, and next-steps wiring.

References added:
- tutorials/CLAUDE.md: when to invoke each skill
- docs/GIT_WORKFLOW.md: run /validate-links before any PR that
  touches notebooks or docs

.gitignore updated to track .claude/skills/ while keeping local
Claude settings (settings.json, etc.) ignored.
---
 .claude/skills/tutorial-notebook/SKILL.md | 319 ++++++++++++++++++
 .claude/skills/validate-links/SKILL.md    | 388 ++++++++++++++++++++++
 .gitignore                                |   4 +-
 docs/GIT_WORKFLOW.md                      |   6 +
 tutorials/CLAUDE.md                       |   8 +
 5 files changed, 724 insertions(+), 1 deletion(-)
 create mode 100644 .claude/skills/tutorial-notebook/SKILL.md
 create mode 100644 .claude/skills/validate-links/SKILL.md

diff --git a/.claude/skills/tutorial-notebook/SKILL.md b/.claude/skills/tutorial-notebook/SKILL.md
new file mode 100644
index 0000000..5734df6
--- /dev/null
+++ b/.claude/skills/tutorial-notebook/SKILL.md
@@ -0,0 +1,319 @@
+---
+name: tutorial-notebook
+description: Polish an existing Jupyter tutorial notebook — or scaffold a new one from scratch — so it teaches a first-time reader clearly. Enforces a standard template (title → metadata → intro → prerequisites → numbered sections → next steps), catches common bugs (broken imports, silent data loads, stale thresholds, dead code), and adds the load-bearing polish items that actually move the needle (diagrams, explanatory comments, adapter descriptions, link validation). Use when the user asks to "improve a notebook", "make a notebook perfect", "apply the template", or is creating a new tutorial and wants it to match their existing ones.
+---
+
+# Tutorial Notebook Skill
+
+This skill produces tutorial notebooks that a first-time reader can open, execute, and understand without needing to consult other docs. It was distilled from an end-to-end polish of `tutorials/notebooks/rag_flow.ipynb` and the lessons from what made each change earn its place.
+
+## Core principle
+
+**Comments, cells, and sections earn their place by answering WHY, not what.** If removing a comment or splitting a cell wouldn't help a future reader make or avoid a decision, don't add it. Apply every checklist item below through that lens — don't mechanically tick boxes.
+
+## Two modes
+
+- **Polish mode** — user hands you an existing notebook. Read it end-to-end first; don't edit before you understand the whole narrative. Make changes one at a time so each can be verified.
+- **Scaffold mode** — user wants a new notebook from scratch. Start from the [template skeleton](#template-skeleton) below and fill in content. Still apply the full checklist before declaring done.
+
+In both modes, always check against the same rubric — that's what makes notebooks feel like siblings instead of cousins.
+
+## Interaction rhythm
+
+- **Never batch large rewrites.** Propose changes one at a time, each with a brief "why." Let the user say yes/no/adjust before moving on.
+- **Push back honestly when asked to do something that adds noise rather than clarity.** If a requested change would hurt the first-time reader (e.g., collapsing load-bearing reference material behind closed `<details>`, or wrapping unrelated functions in a namespace class), say so with reasoning — don't silently comply.
+- **Show diffs, not summaries.** After an edit, show the actual changed lines. "Summary of changes: …" is less trustworthy than the diff itself.
+- **Verify after each code edit.** `python3 -c "import ast; ast.parse(open(file).read())"` for Python files, `python3 -c "import json; json.load(open(nb))"` for notebooks. Also run `git diff --stat` to sanity-check scope.
+
+---
+
+## The checklist
+
+Work through this in order. Each item has: **what**, **why**, and **how to check**.
+
+### 1. Correctness and bugs (do these FIRST — everything else is polish)
+
+These are the ones that make a notebook fail to run for a first-time user.
+
+- **Broken import paths.** Imports that assume the repo is on `sys.path` (e.g., `from tutorials.scripts.xyz import ...` when there are no `__init__.py` files) will `ImportError` for any user who opens the notebook cold. Fix with `import sys; sys.path.insert(0, "../scripts")` + plain import. *Check:* open the notebook's directory, look at the import path, and ask "does this actually work without my PYTHONPATH being set right?"
+
+- **Threshold / constant mismatches between logic and display.** A function that *decides* using threshold `0.5` while the *display helper* badges using `0.4` will produce contradictory output ("🟢 passed AND 🔴 blocked"). Grep for the same threshold across all cells; make sure they agree. This was a real bug in `rag_flow.ipynb`.
+
+- **Unused constants and dead config.** Constants defined in the config cell that aren't referenced anywhere mislead readers into thinking they matter. Delete them. If the function they were meant for returns a string verdict instead of a score, the "threshold constant" is nonsense — that was the `ANSWERABILITY_THRESHOLD` case.
+
+- **Stale comments and docstrings.** `# QC returns CLEAR when...` — what's QC? If a function name appears in a docstring, make sure it's the *current* name, not an internal abbreviation.
+
+### 2. Template structure
+
+Every tutorial notebook should follow this shape. Deviations need a reason.
+
+```
+# H1 Title                          ← cell 0 starts here
+Metadata line (Duration only — full Prerequisites section follows below)
+Intro paragraph (what it demonstrates, one or two sentences)
+Why this approach (if the choice isn't obvious — e.g., "Why vLLM:")
+What you'll learn (bullets — first bullet names the concrete deliverable as a learning outcome, remaining bullets are transferable conceptual takeaways)
+
+## Prerequisites                   <- still cell 0, or next cell
+1. Install
+2. Get artifacts (models, data)
+3. Start servers
+4. Verify
+Pointer to the softer-intro notebook and PREREQUISITES.md for depth
+
+---                                 ← visual break; diagram lives in its own cell
+Intrinsics / components used        ← if the tutorial exercises multiple adapters/tools
+Pipeline / architecture diagram     ← image attachment, on its own cell
+
+## 1 · <section name>               ← numbered H2s, numbered 1..N
+[one-line intro explaining the section's purpose]
+<code cell(s)>
+
+## 2 · <section name>
+...
+
+## N · Next steps                   ← terminal section; numbering is a style call
+- Adapt to your own app (point at the reusable function)
+- Related tutorials / how-tos
+- External references (library docs, model cards)
+```
+
+**Subsection rules (H3):** use sparingly, only when one H2 has multiple distinct helpers. In `rag_flow.ipynb`, §5 splits into `5a · Display helpers (printing only - not part of the pipeline)` because the display utilities are conceptually separate from the pipeline function above them. Don't force subsections for sections that have one concept.
+
+### 3. Intro cell (cell 0) — the highest-leverage surface
+
+A cold reader decides in 10 seconds whether the notebook is for them. That decision happens in cell 0.
+
+- **H1 title** matches the subject of the tutorial, not the repo.
+- **Metadata line** directly under the title: `**Duration:** ~X min (first run)`. The full `## Prerequisites` section is right below, so no need to link to it from the metadata line.
+- **Motivation paragraphs** should be two short paragraphs, not one 90-word wall. First paragraph: what this demonstrates. Second paragraph (optional, italicized): the constraint that explains *why this approach*. Example: `*Why vLLM:* the mellea intrinsics API currently supports vLLM only.`
+- **What you'll learn:** 3-5 bullets - one consolidated list, no separate "What you'll build" section. Lead with a bullet that names the concrete deliverable phrased as a learning outcome (e.g., `"How to build a 7-turn conversation that exercises every step of the pipeline"`), then follow with bullets about transferable conceptual takeaways. Bullets should not be a list of cells - `"how to call foo()"` is too mechanical; `"how to chain multiple intrinsics into one RAG pipeline"` is right.
+- **Adapters used callout:** directly after the "What you'll learn" bullets, add a one-line `**Adapters used:**` paragraph that names which adapter libraries (and specific intrinsics within them) the notebook exercises, each linked to its HuggingFace repo. Example: `**Adapters used:** intrinsics from the [Core](https://huggingface.co/ibm-granite/granitelib-core-r1.0) library (\`context-attribution\`, \`uncertainty\`) and the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (\`guardian-core\`).` This lets a reader skimming the intro instantly see whether the notebook touches the capability they care about, without reading the full body. Keep it to one sentence; list only adapters the notebook actually *invokes* (not ones mentioned in reference tables). If the notebook has no "What you'll learn" list (freeform intro), place the callout immediately before the Prerequisites section. Keep this list in sync with any top-level README's "where used in tutorials" column — mismatches surface fast.
+- **Prerequisites section:** numbered checklist with copy-pasteable commands. Every installation step, every "start this server" step, and a verification command (`curl ...`) the reader can run before moving on. Don't just link to `PREREQUISITES.md` - inline what they need, *then* link to the full doc for depth.
+
+### 4. Component/adapter introduction table
+
+If the tutorial uses more than 2–3 distinct libraries, adapters, or intrinsics, add a compact reference table near the top (right before the pipeline diagram). Two columns:
+
+| Component | Role |
+|-----------|------|
+| `foo.bar` | One-line description of what it does in *this* tutorial. |
+
+Not an API reference — just enough that a reader skimming the notebook knows what each name means before they hit it in code.
+
+### 5. Pipeline / architecture diagram
+
+A diagram is usually worth including. Default to adding one when the notebook executes a multi-step flow with branching, or when a conceptual illustration would help a reader form the right mental model before reading code. Skip a diagram only when the flow is trivially linear (e.g., a two-cell "load model, run inference" demo) and a picture would add nothing a section header doesn't already convey.
+
+**Two acceptable formats, both in use in this repo:**
+
+1. **Image cell attachment** - markdown cell containing `![image.png](attachment:image.png)` with the PNG embedded under that cell's `attachments` metadata. Used by `granite_switch_with_hf.ipynb`. Renders everywhere (GitHub, nbviewer, JupyterLab, VS Code).
+2. **Mermaid rendered from a code cell** - a Python cell that defines a Mermaid source string and renders it (e.g., via `IPython.display`). Used by `rag_flow.ipynb`. Easier to keep in sync with code labels and edit in-place.
+
+Pick whichever fits the diagram and the notebook. Don't mix both styles within one notebook.
+
+**The skill cannot generate or attach a PNG itself.** When the chosen format is an image attachment, describe to the user exactly what the diagram should show (the steps, branches, terminal states, labels you'd want), leave a placeholder markdown cell with a TODO comment to reserve the slot, and ask the user to produce and attach the image. For Mermaid, the skill *can* author the source directly.
+
+**Diagram content rules** (apply when describing what the image should show):
+- **Include every early-exit branch**, not just the happy path. A reader scanning the diagram should see all possible terminal states (e.g., `BLOCKED`, `UNANSWERABLE`, `DONE`).
+- **Match node labels to code.** If the display helper calls steps `[1a]`, `[1b]`, `[2]`... the diagram nodes should use the same tags. Shared vocabulary is the point.
+- **Match terminal emoji to the code's print output.** If `show_answer` prints ⛔ for blocks and 🔍 for unanswerable, the diagram's terminal nodes use the same glyphs. This makes the diagram a legend for the runtime output.
+- **Keep the diagram on its own cell.** Cell 0 has enough to carry without a diagram inside it.
+
+### 6. Code cells — structure
+
+- **Each `## N · ...` section has at most one concept per code cell.** If a cell is >80 lines doing two distinct things (e.g., `run_pipeline` + display helpers), split it with a markdown divider (`### Na · ...`). Subsections need a short intro markdown cell describing what the code does.
+
+- **Extract helpers when extracting *gains* clarity.** The 7-line `ChatContext` build logic at the top of `run_pipeline` was pure bootstrapping; lifting it into `_build_context(history, query)` let the main function read as a clean 7-step sequence. Extract when it tightens; don't extract for the sake of extraction.
+
+- **Don't extract "for namespacing."** A `Show` class containing three unrelated static methods adds ceremony without structure. Python's namespacing tool is the module/cell, not the class.
+
+- **Shell-out lines — `%pip` for installs, `!` for everything else.** Installer lines (`pip install`, `pip uninstall`, `conda install`) use the Jupyter line magic: `%pip install -q -e "/content/granite-switch[vllm]"`. Every other shell-out — `!git clone ...`, `!python -m ...`, `!python script.py`, `!huggingface-cli ...`, `!curl ...`, `!ls`, `!head`, etc. — keeps the `!` prefix. **Why:** `%pip` is a Jupyter line magic that always installs into the kernel running the notebook; `!pip` shells out to whatever `pip` is on PATH, which in Colab and managed-Jupyter environments often differs from the kernel's interpreter and produces the classic "installed fine, still ImportError" failure for the reader. Only `%pip`/`%conda` get this treatment — the others are not line magics and `%`-prefixing them would either no-op or error.
+
+  ```
+  %pip install -q -e "/content/granite-switch[vllm]"   # good — targets the running kernel
+  !pip install -q -e "/content/granite-switch[vllm]"   # bad — may install into the wrong interpreter
+  !git clone https://github.com/...                    # good — git is not a line magic
+  !python -m granite_switch.composer.compose_granite_switch ...   # good — same reason
+  ```
+
+### 7. Imports
+
+**Consolidate imports into a single cell near the top of the notebook, not scattered across cells.** One dedicated imports cell (typically right after the config cell, or merged with it if config is small) makes dependencies visible at a glance and matches standard Python/Jupyter convention. Readers scanning the notebook can see the full set of external dependencies in one place instead of discovering them cell-by-cell.
+
+**Placement:**
+- Put the imports cell early — after the intro/prerequisites markdown, before the first substantive code section.
+- If the config cell is small (a few constants), imports can live with it; otherwise keep imports in their own cell so neither drowns the other.
+- Group imports conventionally: stdlib first, third-party next, local/project last, with blank lines between groups.
+
+**Narrow exceptions — keep an import local to its cell only when:**
+- The import has a heavy side effect at import time (registers a plugin, mutates global state) and the reader needs to see it happen at that point in the narrative.
+- The import is genuinely optional/conditional (inside a `try/except` or guarded by a feature flag).
+
+**Never** write `# MelleaDocument is used later in §4` above an import as a workaround for scattered placement — consolidate instead. Pointer comments are a smell.
+
+### 8. Comments — earn or delete
+
+Default to writing no comments. Add one only when:
+
+- **The value is non-obvious.** `TOP_K = 20` deserves "balances recall against context budget; mt-rag-benchmark default." `VLLM_PORT = 8000` does not deserve a comment.
+- **The ordering matters and a refactor could break it.** `# Harm check must run BEFORE scope check so harmful+out-of-scope queries are labeled harmful, not merely out-of-scope.` Without this, someone will swap them for "fail fast" and introduce a silent regression.
+- **The value is a knob readers will tune.** `temperature=0.0` deserves "grounded RAG — we want the model to repeat the docs, not paraphrase. Also makes demos reproducible." Someone will bump it to 0.7 and break grounding.
+
+**Do not** write comments that restate what the code says (`# retrieve top-K documents` above `retrieve_top_k_documents(...)`). Delete them.
+
+### 9. Reference tables — every row parallel
+
+Reference tables (like "what `show_intermediates` displays at each step") must have every row in the same shape. If row 1 says `"badge + raw score. Exits early if ≥ 0.5"`, row 4 should say `"badge + verdict string. Exits early if unanswerable"` — same structure, same verb, same vocabulary. Asymmetric rows look like bugs to a reader.
+
+Also: the text in reference tables must match what the code prints. If the code renders `🟢 safe / 🔴 harmful`, the table says `🟢 safe / 🔴 harmful`, not `safe / flagged`.
+
+### 10. Display rendering
+
+- **Use `display(Markdown(...))`** for rich output in notebooks. Don't use ANSI color codes (`\x1b[32m...`) — they render in terminal only and look like garbage in rendered notebooks or exported HTML.
+- **For collapsible detail** (large outputs, reference tables that take vertical space), use `<details open>...<summary>...</summary>...</details>`. Default to `<details open>` for load-bearing content (users can collapse it); use closed `<details>` only for truly optional depth.
+- **Standard emoji glyphs** for status — keep them consistent across the notebook series: ⛔ block/refuse · 🔍 empty/unanswerable · ❓ clarification needed · ✅ pass/done · 🟢/🔴 safe/danger binary · 📄 document · 📚 collection · 🔖 citation/reference.
+
+### 11. Helper scripts (supporting `.py` files)
+
+If the tutorial loads data or does heavy setup in a sibling script:
+
+- **Progress feedback for any operation > 5 seconds.** `tqdm` for downloads (use `httpx.stream()` with `Content-Length`), `tqdm` for batch processing, progress prints for shorter waits. Silent multi-minute operations make users think the notebook froze.
+- **Atomic writes for persistent state.** When writing a file the notebook will later re-read (e.g., extracted jsonl), write to `path.tmp` first, then `os.replace(tmp, path)`. A Ctrl-C mid-write produces a truncated file that silently breaks subsequent runs — one of the worst classes of bug to debug.
+- **Validate non-empty output loudly.** After parsing/loading, if the result has zero rows, raise `RuntimeError` with actionable guidance (`"Delete X and rerun"`), not a silent empty return.
+- **Split timeouts.** `httpx.Timeout(total_seconds, connect=10.0)` instead of a flat `timeout=120`. Fails fast on unreachable servers; patient on slow transfers.
+- **Escalate GPU/CPU warnings.** `print("Notice: ...")` gets lost in notebook output. Use `warnings.warn(...)` with a concrete time estimate ("~10 min on GPU vs. hours on CPU") so users can abort before committing.
+
+### 12. Queries / demos — design intentionally
+
+If the notebook ends with runnable demo cells, the demo set should *tour every exit path* in the system. For a pipeline with {happy path, ambiguous, unanswerable, out-of-scope, harmful} outcomes, include one demo of each. A demo that only shows the happy path teaches half the system.
+
+Add one-line intent comments per demo: `# Q3 — resolves clarification: query rewrite uses history to reconstruct full question`. These teach what each demo *is testing*, beyond what the query text alone conveys.
+
+### 13. Next steps section
+
+Close every notebook with 3–5 bullets pointing the reader somewhere concrete:
+
+1. **Adapt-to-your-app pointer:** name the reusable function/class and remind the reader it's lift-able. `run_pipeline(query, history)` is stateless — copy it as a starting point.
+2. **Go deeper on this topic:** related how-to or tutorial in the repo.
+3. **Extend with custom content:** how to bring your own adapter / corpus / model.
+4. **Library deep-dive:** link to the framework's main repo/docs.
+5. **Browse alternatives:** catalog of other adapters/models the reader could try.
+
+Two bullets that both say "go compose your own model" is one bullet wasted — make sure every bullet opens a *distinct* next direction.
+
+**Inter-notebook wiring rule:**
+
+The granite-switch tutorial set uses descriptive filenames (no numeric prefixes), so wiring is judged by *content*, not by index. Two principles for the next-steps bullets:
+
+1. **The producer is reachable from every consumer.** `compose_granite_switch.ipynb` produces the checkpoint that every other notebook consumes. Every notebook except the producer itself should include a "compose your own checkpoint" bullet pointing to it. A reader who lands on any consumer should be one click from the producer - they shouldn't have to discover it by reading every sibling.
+2. **Don't link backward to softer-intro notebooks.** If a notebook is a deeper or harder version of another (e.g., `granite_switch_with_hf.ipynb` is the long-form version of `hello_adapter.ipynb`; `rag_flow.ipynb` is the long-form of `rag_101.ipynb`), the long-form should *not* link back to its softer sibling - the reader already passed it. The softer notebook *can* link forward to the long-form.
+
+Every notebook should also link to whatever logical follow-ups exist for the reader (the next pipeline to try, the comparison/race demo, the framework's main repo). Three to five bullets is the right shape - see section 13's general rules.
+
+**Use same-directory relative paths** (`./name.ipynb`) when all notebooks live in one folder - not `../notebooks/name.ipynb`. After editing, run a link-resolution check to catch typos and stale filenames:
+
+```python
+import json, re, pathlib
+nbdir = pathlib.Path("tutorials/notebooks")
+for nb_path in sorted(nbdir.glob("*.ipynb")):
+    nb = json.loads(nb_path.read_text())
+    for c in nb["cells"]:
+        if c["cell_type"] != "markdown": continue
+        src = "".join(c["source"]) if isinstance(c["source"], list) else c["source"]
+        if "Next steps" not in src: continue
+        for href in re.findall(r"\]\((\./[^)]+\.ipynb)\)", src):
+            assert (nb_path.parent / href).resolve().exists(), f"{nb_path.name}: broken {href}"
+print("all next-steps links resolve")
+```
+
+When notebooks get renamed or split, the next-steps sections of *every other notebook in the series* go stale silently. Always re-run the link check after any rename. The repo also has a `validate-links` skill that runs this check across notebooks and markdown together - prefer it for cross-cutting validation.
+
+### 14. Links
+
+- **Every external link:** verify with `curl -s -o /dev/null -w "%{http_code}" <url>`. Expect 200/301/302.
+- **Every internal link:** verify the file exists on disk. Relative paths should work from the notebook's directory (notebooks live in `tutorials/notebooks/`, so `../PREREQUISITES.md` resolves to `tutorials/PREREQUISITES.md`).
+- **Anchor links:** Markdown lowercases heading text to form anchors. `## Prerequisites` produces `#prerequisites`, not `#Prerequisites`. Check every in-notebook anchor.
+
+### 15. Prose — light touch
+
+For prose-clarity passes: fix stale references (section counts that have changed, helper function signatures that have changed), tighten walls of text (split 90-word single paragraphs into two), and make sure section intros say *what* and *why*, not just the section name restated. Don't rewrite prose that's already clear — every unnecessary change is a chance to introduce a regression.
+
+---
+
+## Template skeleton (scaffold mode)
+
+For a brand-new notebook, start from this and fill it in. Delete sections that truly don't apply (e.g., no diagram if the flow is linear and has one step).
+
+```markdown
+# <Title — what the notebook accomplishes>
+
+**Duration:** ~N min (first run)
+
+This notebook demonstrates <one-sentence concrete pitch>. <One more sentence on scope.>
+
+*Why <key choice>:* <one-line constraint explanation, if the choice isn't self-evident>
+
+**What you'll learn:**
+- How to build <concrete deliverable - "a 7-turn conversation that exercises every step", "a composed model checkpoint with two adapters", etc.>
+- <Transferable takeaway 1>
+- <Transferable takeaway 2>
+
+**Adapters used:** intrinsics from the [<Library>](<hf-url>) library (`<adapter_1>`, `<adapter_2>`)<, and the [<Library>](<hf-url>) library (`<adapter_3>`)>.
+
+## Prerequisites
+
+1. **Install dependencies** (<GPU? CPU? which>):
+   ```bash
+   pip install "<extras>"
+   ```
+2. **Get <artifact>.** <How to obtain it, pointer to a ready-made option, pointer to a "compose your own" tutorial.>
+3. **Start <service>** (if applicable):
+   ```bash
+   <start command>
+   ```
+4. **Verify:** `<verification command>`
+
+<Pointer to softer-intro notebook if one exists, pointer to PREREQUISITES.md for depth.>
+```
+
+Then a second markdown cell with (if applicable) an intrinsics/components table and the diagram (image attachment — see section 5). Then numbered `## 1 · Section`, `## 2 · Section`, etc., each with a one-line intro markdown cell before its code. End with `## N · Next steps`.
+
+---
+
+## When working on an existing notebook
+
+1. **Read the whole thing first.** Don't edit until you understand the arc: what it teaches, what the demos tour, what the reader is expected to walk away with.
+2. **Identify the real bugs before the polish.** Run through section 1 of the checklist. A broken import is worth ten prose tweaks.
+3. **Propose changes one at a time.** Each should be justifiable in one sentence. If you can't justify it, don't do it.
+4. **After each change:** verify JSON validity, syntax, and that the diff scope matches what was planned. Show the diff.
+5. **When multiple tasks conflict:** defer to the principle that serves the first-time reader. For example, a reader scanning a notebook benefits from one consolidated imports cell (full dependency list visible at a glance) more than from imports scattered next to their use sites.
+
+---
+
+## Universal anti-patterns — push back if the user asks for these
+
+- **Classes that are just namespaces.** `class Show: @staticmethod def answer(r): ...` adds ceremony. Use functions in cells.
+- **Scattering imports across cells near their first use.** Hurts scan-ability; readers lose the single-glance view of what the notebook depends on. Consolidate into one imports cell near the top.
+- **Collapsing load-bearing reference material behind closed `<details>`.** `<details open>` is fine; closed is only for optional depth.
+- **Splitting "What you'll build" out as its own section.** Use one consolidated "What you'll learn" list; if the concrete deliverable is load-bearing, make it the first bullet phrased as a learning outcome (`"How to build <deliverable>"`) rather than a separate header.
+- **Renaming a section just because another notebook uses a different name.** Template consistency matters, but forcing "Prerequisites" when the existing "Before you start" reads better locally is cargo-cult.
+- **Numbering every heading mechanically.** "Next steps" as "`## 6 · Next steps`" vs. unnumbered `## Next steps` is a style call, not a correctness one. Only enforce when the user has said they want strict numbering.
+
+---
+
+## Verification checklist (before declaring done)
+
+- [ ] Notebook is valid JSON: `python3 -c "import json; json.load(open(PATH))"`.
+- [ ] Every code cell parses: walk cells, `ast.parse(source)` each one.
+- [ ] Structural overview (`for i, c in enumerate(nb['cells']): print(i, c['cell_type'], first_line)`) shows one H1, numbered H2s, H3s only under H2s that have them, code-cell length sensible (no >120-line monsters unless justified).
+- [ ] All external URLs return 2xx/3xx.
+- [ ] All internal links point at files/anchors that exist.
+- [ ] Reference tables' badge glyphs match the code's actual print statements.
+- [ ] Diagram terminals match the code's actual exit names (`blocked`, `unanswerable`, etc.).
+- [ ] Intro cell has an **Adapters used:** callout naming every adapter library the notebook actually invokes, each linked to its HuggingFace repo.
+- [ ] No `!pip install` / `!pip uninstall` / `!conda install` lines anywhere — installer lines use `%pip` / `%conda` so they target the running kernel.
+- [ ] Imports are consolidated into one cell near the top, not scattered across cells next to their first use (narrow exceptions: side-effectful or conditional imports — see section 7).
+- [ ] Running the notebook top-to-bottom with "Run All" should complete cleanly (requires the runtime environment — document this as a manual step for the user).
+
+If any of these fails, fix before handing off. The skill is "produce a notebook that runs cleanly on first try for a cold reader" — missing verification undermines the whole exercise.
diff --git a/.claude/skills/validate-links/SKILL.md b/.claude/skills/validate-links/SKILL.md
new file mode 100644
index 0000000..c303b3c
--- /dev/null
+++ b/.claude/skills/validate-links/SKILL.md
@@ -0,0 +1,388 @@
+---
+name: validate-links
+description: Validate local file links AND first-party Python imports across an entire repo (notebooks and markdown) and propose fixes for broken targets. Catches the kind of breakage that happens after renames, renumbering, or directory moves -- `[foo](./old_name.ipynb)` style links that silently 404 from GitHub/Colab/nbviewer, plus `from pkg.old_module import ...` imports that fail at notebook runtime. Use when the user asks to "validate links", "check links", "audit links", "verify links", "find broken links", "validate imports", or "make sure filenames align" after any restructuring.
+---
+
+# Link Validation Skill
+
+Find every local link in the repo's `.ipynb` and `.md` files, flag the ones whose targets don't exist on disk, and propose fixes. Also validate first-party Python imports inside notebook code cells and `.py` files -- a `from granite_switch.tutorials.rag_display import ask` line silently breaks the same way a stale markdown link does when a module moves or gets renamed. Read-only by default; fixes happen only after the user confirms.
+
+## What counts as a "local link"
+
+- Markdown link syntax `[text](target)` where `target` does **not** start with `http://` or `https://`.
+- Targets that point at a file with extension `.ipynb`, `.md`, `.py`, `.png`, `.jpg`, `.svg`, `.json`, or `.sh` (extend the list if the repo uses others — ask if unsure).
+- Inside `.ipynb`: only **markdown cells** (`cell_type == "markdown"`). Code cells are skipped — strings inside Python aren't links.
+- `attachment:` references (notebook-embedded images) are **not** local file links — skip them.
+
+## What counts as a "stale label"
+
+Display labels that look like a filename — `[`old_name.ipynb`](new_name.ipynb)`, `[old_name.ipynb](new_name.ipynb)`, or `[Title (old_name.ipynb)](new_name.ipynb)` — but where the filename in the label doesn't match the URL. After a rename, the URL often gets fixed while the label keeps the old name and silently lies about what the link points at. Treat these as fixable in the same pass as broken targets.
+
+A label is considered "filename-shaped" when it contains a token that ends in one of the tracked extensions (`.ipynb`, `.md`, `.py`, ...). Plain prose labels like `"the simple pipeline"` are not stale even if the URL changes -- only fix labels that purport to name the file.
+
+## What counts as a "broken import"
+
+A first-party Python import is **broken** when its dotted module path does not resolve to a file or package on disk under the repo's package roots. Example: `from granite_switch.tutorials.rag_display import ask` is valid only when `<root>/granite_switch/tutorials/rag_display.py` (or `<root>/granite_switch/tutorials/rag_display/__init__.py`) exists, where `<root>` is one of the configured package roots (typically `src/` for src-layout repos, or `.` otherwise).
+
+Scope:
+
+- Only **first-party** packages count. Determine the set of first-party top-level package names by listing the immediate child directories of each package root that contain an `__init__.py` (e.g., `src/granite_switch/` -> first-party name `granite_switch`). Imports whose top-level name is not in this set (`numpy`, `torch`, `os`, `json`, ...) are skipped -- this is not a substitute for a real linter.
+- Both forms are checked: `from A.B.C import name` and `import A.B.C [as alias]`.
+- Inside `.ipynb`: only **code cells** (`cell_type == "code"`). Skip cells whose first non-empty source line is a Jupyter magic (`%`, `%%`, `!`).
+- Inside `.py`: parse with `ast.parse`; this naturally ignores strings and comments and handles multi-line imports, parenthesized import lists, and relative imports. Relative imports (`from .foo import x`) are resolved against the importing file's package and are checked the same way.
+- A dotted path resolves if walking it from a package root lands on a directory with `__init__.py` at every intermediate step and a `.py` file or package directory at the leaf. The imported names themselves (`ask`, `show_answer`) are **not** verified -- that needs real import-time analysis.
+
+Discovering package roots:
+
+1. If `pyproject.toml` has `[tool.setuptools.packages.find] where = ["src"]` (or `[tool.hatch.build.targets.wheel] packages = ["src/foo"]`, or `[tool.poetry] packages = [{ include = "foo", from = "src" }]`), use those.
+2. Otherwise default to `.` and `src/` if either contains a top-level dir with `__init__.py`.
+3. If the user's repo uses a layout the heuristic misses, ask before guessing.
+
+## Workflow
+
+### 1. Discover
+
+Run from the repo root:
+
+- List all `.ipynb`, `.md`, and `.py` files (respect `.gitignore` -- use `git ls-files '*.ipynb' '*.md' '*.py'` so you don't audit vendored copies in `node_modules/`, `.venv/`, etc.).
+- Build a set of every existing file path in the repo (`git ls-files`) -- this is what link targets are checked against.
+- Determine the import package roots and first-party package names per "What counts as a broken import" above. If `pyproject.toml` is missing or unreadable, fall back to `.` + `src/` and report which roots/packages were used so the user can correct the assumption.
+
+### 2. Scan
+
+For each file:
+
+- For `.md` -- read the raw text and run the **link** scan only.
+- For `.ipynb` -- parse JSON, iterate `cells`. Run the **link** scan against `markdown` cells (join `source` to a string). Run the **import** scan against `code` cells (skip cells whose first non-empty line is a `%`, `%%`, or `!` magic; otherwise concatenate `source` and feed to the import scanner).
+- For `.py` -- run the **import** scan only (parse with `ast.parse`; record the line number from the AST node for reporting).
+
+**Link scan:**
+
+- Run the link regex `\[([^\]]+)\]\(([^)]+)\)` against the text.
+- For each match, take the target, drop any `#anchor` fragment, resolve the path **relative to the file's directory** (so `../foo.md` from `tutorials/notebooks/x.ipynb` resolves to `tutorials/foo.md`).
+- A target is **broken** if the resolved path doesn't exist on disk.
+- Independently, flag the link as having a **stale label** when:
+  - the label contains a filename-shaped token (ends in a tracked extension), AND
+  - that token is not equal to `Path(target).name` (the basename of the URL, after stripping any `#anchor`).
+
+  Stale labels are reported even when the target itself resolves cleanly -- the URL works, but the label lies about what it points at.
+
+**Import scan:**
+
+- For notebook code cells, parse the joined source with `ast.parse`. Wrap in `try/except SyntaxError` and skip cells that fail to parse (rare, usually transient half-edited cells); note the cell index in the skip log so the user can investigate.
+- For `.py` files, parse the whole file the same way.
+- Walk `ast.Import` and `ast.ImportFrom` nodes. For each, build the dotted module path:
+  - `import a.b.c` -> `a.b.c` per alias.
+  - `from a.b import c, d` -> check `a.b` resolves; the imported names are not verified, but if `a.b.c` *also* resolves as a submodule path, prefer that interpretation when reporting (it makes the suggestion more specific).
+  - `from . import x` and `from .. import x` -> compute the absolute package by walking up from the file's package, then check that resolves.
+- Filter to first-party top-level names. Skip everything else.
+- A first-party dotted path is **broken** when no package root contains a matching directory-or-file chain (intermediate dirs need `__init__.py`; leaf can be either `<name>.py` or `<name>/__init__.py`).
+
+### 3. Report
+
+Present a single report grouped by source file, in this shape:
+
+```
+BROKEN LINKS
+
+tutorials/notebooks/00_hello_adapter.ipynb (cell 0)
+  ./hello_mellea.ipynb                          -> closest match: 01_hello_mellea.ipynb
+  ../notebooks/03_compose_granite_switch.ipynb  -> closest match: 04_compose_granite_switch.ipynb
+
+docs/SOMETHING.md (line 42)
+  ../old/path/file.md                           -> no close match found
+
+STALE LABELS (target works, but the label names the wrong file)
+
+tutorials/README.md (line 15)
+  [03_01_old_name.ipynb](notebooks/03_01_new_name.ipynb)
+    -> label should be `03_01_new_name.ipynb`
+
+BROKEN IMPORTS (first-party module path does not resolve on disk)
+
+tutorials/notebooks/03_01_rag_101.ipynb (cell 4)
+  from granite_switch.tutorials.rag_displays import ask
+    -> closest match: granite_switch.tutorials.rag_display
+
+src/granite_switch/composer/old_helpers.py (line 12)
+  from granite_switch.composer.weight_remap import AdapterRemapper
+    -> closest match: granite_switch.composer.weight_remapper
+
+(package roots used: src/  |  first-party packages: granite_switch)
+```
+
+For each broken link, compute a "closest match" by:
+
+1. Take the basename of the broken target (`hello_mellea.ipynb`).
+2. Among all existing files in the repo with the same extension, prefer the one whose basename has the smallest edit distance (or contains the broken basename as a substring, or vice versa). Renumbering cases -- `03_compose_x.ipynb` vs `04_compose_x.ipynb` -- should match strongly.
+3. If no candidate is closer than ~50% similar, report "no close match found" rather than guessing.
+
+For each broken import, compute a closest match against the set of all valid first-party dotted paths (every `.py` file and package directory under each package root, expressed in dotted form). Use the same edit-distance heuristic, but match on the **full dotted path**, not just the leaf, so `granite_switch.tutorials.rag_displays` correctly suggests `granite_switch.tutorials.rag_display` rather than some unrelated `rag_display` elsewhere. As with links, suppress suggestions weaker than ~50% similar.
+
+Also note when a `BROKEN` link or import has multiple plausible matches (e.g., `02_govt_rag_pipeline.ipynb` is gone and the repo now has `03_01_*`, `03_02_*`, `03_03_*`) -- list them all and ask the user which one to use.
+
+### 4. Propose fixes
+
+After the report, ask the user:
+
+- **High-confidence renames** (single obvious match, just a number prefix change): show the exact replacements you'd make as a list and ask for approval as a batch.
+- **Ambiguous cases**: ask one question per ambiguous link or import, presenting candidates as options.
+- **No-match cases**: ask whether to drop the link/import, leave it, or point it somewhere else.
+- **Stale labels**: include label-only fixes in the same approval batch as the URL fixes. When a single broken link has both a broken URL *and* a stale label (common after a rename), propose fixing both at once - the user shouldn't have to approve the URL, run the skill again, and approve the label separately. Default to "yes, fix labels too" for filename-shaped labels; only ask separately when the label is something other than a bare filename (e.g. a sentence that happens to mention the old filename).
+- **Broken imports**: treat the same as broken links. High-confidence module-rename fixes (`weight_remap` -> `weight_remapper`) go in the batch; ambiguous ones become individual questions. When the same broken import appears in many files, propose a single repo-wide find-and-replace for that exact `from ... import` / `import ...` line and apply it everywhere at once -- a typo'd module name is almost never correct in one file and wrong in another.
+
+Do not edit anything until the user confirms.
+
+### 5. Apply fixes
+
+For `.md` and `.py` files, use `Edit` with a precise `old_string` that includes enough surrounding context to be unique.
+
+For `.ipynb` files, use `NotebookEdit` - `Edit` will refuse on notebooks. You'll need the cell's `id`, which you already saw in step 2; pass it as `cell_id`. Replace the full cell `new_source` with the corrected text.
+
+When a cell has multiple fixes pending (broken URL + stale label, or several broken imports, or a mix), apply them in the **same** `NotebookEdit`/`Edit` call. Two passes through the same cell wastes tool calls and risks the second edit racing a linter that reformats the file between Reads.
+
+After each edit, do not re-read the file - `NotebookEdit`/`Edit` errors loudly if the change failed.
+
+### 6. Verify
+
+Re-run the scanner from step 2. The report should show **0 broken links**, **0 stale labels**, *and* **0 broken imports**. If it still shows some, investigate - don't declare done. As a belt-and-braces check, also `git ls-files | xargs grep -l <old_token>` for any string fragment that was renamed (e.g. `govt_rag`); the scanner only catches strings inside `[...](...)` syntax and parsed `import` statements, and the same token may appear elsewhere (Colab badge URLs, prose, code comments, dynamic `importlib.import_module(...)` calls) where it's just as broken.
+
+## Reference scanner
+
+This Python snippet implements steps 1-3 and is safe to copy verbatim into a `Bash` call:
+
+```python
+import json, re, subprocess
+from pathlib import Path
+
+repo = Path('.').resolve()
+tracked = subprocess.check_output(
+    ['git', 'ls-files'], cwd=repo, text=True
+).splitlines()
+existing = {(repo / p).resolve() for p in tracked}
+# Directories that contain tracked files - so dir-style links like
+# `[scripts/](scripts/)` or `[docs/](../docs/)` aren't false-positived.
+existing_dirs = set()
+for p in tracked:
+    for parent in (repo / p).resolve().parents:
+        existing_dirs.add(parent)
+
+link_re = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
+ext_ok = {'.ipynb', '.md', '.py', '.png', '.jpg', '.jpeg', '.svg', '.json', '.sh'}
+
+def scan_text(text, source_path, source_label):
+    """Return (broken, stale_labels) tuples for one file.
+
+    broken       : (source_label, target, basename)
+    stale_labels : (source_label, label_text, target, expected_label_token)
+    """
+    broken = []
+    stale = []
+    # Token in the label that looks like a filename (ends in a tracked ext).
+    label_filename_re = re.compile(
+        r'[\w./-]+\.(?:ipynb|md|py|png|jpg|jpeg|svg|json|sh)\b',
+        re.IGNORECASE,
+    )
+    for m in link_re.finditer(text):
+        label_text = m.group(1)
+        target = m.group(2).strip()
+        if target.startswith(('http://', 'https://', 'mailto:', '#', 'attachment:')):
+            continue
+        bare = target.split('#')[0].split('?')[0]
+        if not bare:
+            continue
+        ext = Path(bare).suffix.lower()
+        if ext and ext not in ext_ok:
+            continue
+        resolved = (source_path.parent / bare).resolve()
+        target_basename = Path(bare).name
+        target_ok = resolved in existing or (not ext and resolved in existing_dirs)
+        if not target_ok:
+            broken.append((source_label, target, target_basename))
+            # Don't double-report a broken link as also having a stale label;
+            # fixing the URL is the load-bearing part. The label gets fixed
+            # in the same edit per "What counts as a stale label" guidance.
+            continue
+        # Target resolves. Now check whether the label names a *different* file.
+        for tok_match in label_filename_re.finditer(label_text):
+            label_token = tok_match.group(0).split('/')[-1]
+            if label_token != target_basename:
+                stale.append((source_label, label_text, target, target_basename))
+                break
+    return broken, stale
+
+broken = []
+stale = []
+for rel in tracked:
+    p = repo / rel
+    if not p.exists():
+        continue
+    if p.suffix == '.md':
+        b, s = scan_text(p.read_text(), p, rel)
+        broken += b; stale += s
+    elif p.suffix == '.ipynb':
+        try:
+            data = json.loads(p.read_text())
+        except Exception:
+            continue
+        for ci, cell in enumerate(data.get('cells', [])):
+            if cell.get('cell_type') != 'markdown':
+                continue
+            src = ''.join(cell.get('source', []))
+            b, s = scan_text(src, p, f'{rel} (cell {ci})')
+            broken += b; stale += s
+
+print('BROKEN LINKS')
+for label, target, _ in broken:
+    print(f'{label}\n  {target}')
+print(f'\n{len(broken)} broken link(s)')
+
+print('\nSTALE LABELS')
+for label, ltext, target, expected in stale:
+    print(f'{label}\n  [{ltext}]({target})  -> label should name {expected}')
+print(f'\n{len(stale)} stale label(s)')
+```
+
+For closest-match suggestions, extend the script to compute `difflib.get_close_matches(basename, [Path(f).name for f in tracked if Path(f).suffix == ext], n=3, cutoff=0.5)`.
+
+## Reference import scanner
+
+Drop-in companion to the link scanner above. Run from the repo root after the `tracked` / `existing` sets are built:
+
+```python
+import ast, json, tomllib
+from pathlib import Path
+
+def discover_package_roots(repo: Path):
+    """Return (roots, first_party_names) using pyproject.toml when possible."""
+    roots: list[Path] = []
+    pyproject = repo / 'pyproject.toml'
+    if pyproject.exists():
+        cfg = tomllib.loads(pyproject.read_text())
+        find = cfg.get('tool', {}).get('setuptools', {}).get('packages', {}).get('find', {})
+        for w in find.get('where', []) or []:
+            roots.append((repo / w).resolve())
+        # hatch / poetry / flit fallbacks omitted for brevity -- add as needed.
+    if not roots:
+        for cand in ('.', 'src'):
+            p = (repo / cand).resolve()
+            if p.exists():
+                roots.append(p)
+    first_party = set()
+    for r in roots:
+        if not r.exists():
+            continue
+        for child in r.iterdir():
+            if child.is_dir() and (child / '__init__.py').exists():
+                first_party.add(child.name)
+    return roots, first_party
+
+def module_resolves(dotted: str, roots: list[Path]) -> bool:
+    parts = dotted.split('.')
+    for root in roots:
+        cur = root
+        ok = True
+        for i, part in enumerate(parts):
+            is_last = i == len(parts) - 1
+            pkg_dir = cur / part
+            if pkg_dir.is_dir() and (pkg_dir / '__init__.py').exists():
+                cur = pkg_dir
+                continue
+            if is_last and (cur / f'{part}.py').exists():
+                return True
+            ok = False
+            break
+        if ok:
+            return True
+    return False
+
+def resolve_relative(file_path: Path, level: int, module: str | None, roots: list[Path]) -> str | None:
+    """Turn `from ..foo.bar import x` into an absolute dotted path, or None if outside any package root."""
+    for root in roots:
+        try:
+            rel = file_path.resolve().relative_to(root)
+        except ValueError:
+            continue
+        # Drop the file name; walk up `level` package boundaries.
+        pkg_parts = list(rel.parts[:-1])
+        if level - 1 > len(pkg_parts):
+            return None
+        base = pkg_parts[: len(pkg_parts) - (level - 1)] if level > 1 else pkg_parts
+        tail = module.split('.') if module else []
+        return '.'.join(base + tail)
+    return None
+
+def scan_imports(source: str, source_label: str, file_path: Path,
+                 roots: list[Path], first_party: set[str]) -> list[tuple[str, str, int]]:
+    try:
+        tree = ast.parse(source)
+    except SyntaxError:
+        return []
+    out = []
+    for node in ast.walk(tree):
+        if isinstance(node, ast.Import):
+            for alias in node.names:
+                top = alias.name.split('.')[0]
+                if top in first_party and not module_resolves(alias.name, roots):
+                    out.append((source_label, f'import {alias.name}', node.lineno))
+        elif isinstance(node, ast.ImportFrom):
+            if node.level:  # relative
+                dotted = resolve_relative(file_path, node.level, node.module, roots)
+                if dotted is None:
+                    continue
+            else:
+                dotted = node.module or ''
+            if not dotted:
+                continue
+            top = dotted.split('.')[0]
+            if top not in first_party:
+                continue
+            # Prefer the more specific submodule form when a single `name` is imported
+            # and `dotted.name` itself resolves -- it makes the suggestion sharper.
+            if len(node.names) == 1 and module_resolves(f'{dotted}.{node.names[0].name}', roots):
+                continue
+            if not module_resolves(dotted, roots):
+                names = ', '.join(a.name for a in node.names)
+                out.append((source_label, f'from {dotted} import {names}', node.lineno))
+    return out
+
+# Usage:
+# roots, first_party = discover_package_roots(repo)
+# broken_imports: list[tuple[str, str, int]] = []
+# for rel in tracked:
+#     p = repo / rel
+#     if p.suffix == '.py':
+#         broken_imports += scan_imports(p.read_text(), rel, p, roots, first_party)
+#     elif p.suffix == '.ipynb':
+#         data = json.loads(p.read_text())
+#         for ci, cell in enumerate(data.get('cells', [])):
+#             if cell.get('cell_type') != 'code':
+#                 continue
+#             src_lines = cell.get('source', [])
+#             if not src_lines:
+#                 continue
+#             first_nonblank = next((l for l in src_lines if l.strip()), '')
+#             if first_nonblank.lstrip().startswith(('%', '!')):
+#                 continue
+#             src = ''.join(src_lines)
+#             broken_imports += scan_imports(src, f'{rel} (cell {ci})', p, roots, first_party)
+```
+
+For closest-match import suggestions, build the set of all valid first-party dotted paths once (every `.py` file and package dir under the roots, expressed as dotted form), then `difflib.get_close_matches(broken_dotted, valid_dotted, n=3, cutoff=0.5)`.
+
+## Hard rules
+
+- **Never edit before the user confirms.** Even "obvious" renumbering fixes go through approval.
+- **Never delete a link target or import.** If a broken link or import has no plausible replacement, ask the user -- don't silently strip the link or comment out the import.
+- **Don't string-search code cells for links.** A string `"./old_name.ipynb"` inside a Python cell might be load-bearing test data, not a link. Code cells are scanned with `ast` for **imports only**, never with the link regex.
+- **Don't follow symlinks blindly.** If `git ls-files` lists a symlink, treat the symlink path as the file location for resolution purposes.
+- **Don't audit vendored trees.** `git ls-files` already excludes them; do not fall back to `find` or `glob` that would re-include `.venv/`, `node_modules/`, `dist/`, etc.
+- **Fix labels alongside URLs in the same edit.** When a link's target is renamed, the display label often becomes stale at the same time. Don't make the user run the skill twice - propose URL + label fixes together, apply them in one `Edit`/`NotebookEdit` call per cell, and only break the work into separate approvals when a label is genuinely ambiguous (e.g. prose, not a bare filename).
+- **Only validate first-party imports.** `numpy`, `torch`, stdlib, etc. are out of scope -- this skill is checking whether the repo's own module paths still resolve after a rename, not running a real linter.
+- **Don't verify imported names.** `from pkg.mod import some_name` is checked only at the module level (`pkg.mod`). Confirming `some_name` actually exists requires importing the module, which is out of scope.
+
+## When NOT to use this skill
+
+- The user is asking about external URL liveness (HTTP 200 / 404) -- that's a different tool (link-checker against the network).
+- The user wants to audit cross-references inside a single notebook (e.g., section anchors) -- that's narrower and the regex above won't cover it.
+- The user wants a full static type/name check (verify imported attributes exist, catch unused imports, flag third-party version mismatches) -- use a real linter (`ruff`, `pyright`, `mypy`). This skill only checks that first-party module *paths* resolve on disk.
diff --git a/.gitignore b/.gitignore
index a5dcceb..9417ad2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,8 +1,10 @@
 # OS files
 .DS_Store
 
-# Claude Code
+# Claude Code — ignore local settings but track shared skills
 .claude/
+!.claude/skills/
+!.claude/skills/**
 
 # Python cache files
 *.pyc
diff --git a/docs/GIT_WORKFLOW.md b/docs/GIT_WORKFLOW.md
index d1a4ff0..06e3bbb 100644
--- a/docs/GIT_WORKFLOW.md
+++ b/docs/GIT_WORKFLOW.md
@@ -66,6 +66,12 @@ Before committing:
 2. **Check comments match code** — stale comments are worse than no comments
 3. **Update docs** if behavior changed
 
+## Before opening a PR that touches notebooks or docs
+
+Run `/validate-links` to catch broken local links, stale labels, and broken first-party imports
+introduced by any renames or restructuring. It scans all `.ipynb`, `.md`, and `.py` files and
+proposes fixes before anything goes to reviewers.
+
 ## Pull Requests
 
 - Target the `main` branch
diff --git a/tutorials/CLAUDE.md b/tutorials/CLAUDE.md
index 7306b5c..57ebd2c 100644
--- a/tutorials/CLAUDE.md
+++ b/tutorials/CLAUDE.md
@@ -30,6 +30,14 @@ Use cell id `hf-login-call` for consistency.
 Add `# Estimated duration: ~2 min on A100, ~7 min on T4` to cells that download models or
 launch vLLM. Put these in **notebook cells only** — not in code files under `src/`.
 
+## Skills
+
+- `/validate-links` — run before any PR that renames, moves, or restructures notebooks or docs.
+  Scans all `.ipynb`/`.md`/`.py` files for broken local links, stale labels, and broken
+  first-party imports. Proposes fixes; never edits without confirmation.
+- `/tutorial-notebook` — run when creating or polishing a notebook. Applies a 15-item checklist
+  (structure, bugs, imports, comments, diagrams, demo coverage, next-steps wiring).
+
 ## Utility Modules
 
 These live in `src/granite_switch/tutorials/` and are imported by notebooks:

From 7eba15b6114da426d3cb0b5dcd5a312f4b694144 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 13:43:04 +0300
Subject: [PATCH 03/12] Fix stale version and compatibility claims in docs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- README.md: Python 3.9+ → 3.11–3.13, PyTorch 2.0+ → 2.10+; [dev] described as "Everything" → accurate description; [vllm20] removed wrong "CUDA 13+" claim
- PREREQUISITES.md: Python 3.10+ → 3.11–3.13; RAG adapter count 5 → 6 (granitelib-rag-r1.0 ships 6 adapters)
- build_your_own_adapter.md: custom adapters ARE supported via Mellea's Intrinsic API; updated Step 4 note to reflect this
---
 README.md                                  | 4 ++--
 tutorials/PREREQUISITES.md                 | 4 ++--
 tutorials/guides/build_your_own_adapter.md | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 6c0e673..c854aa2 100644
--- a/README.md
+++ b/README.md
@@ -48,10 +48,10 @@ Other install options depending on your use case:
 pip install "granite-switch[compose]"   # Compose modular models
 pip install "granite-switch[hf]"        # HuggingFace inference
 pip install "granite-switch[vllm20]"    # vLLM 0.20+ (requires CUDA 13+)
-pip install "granite-switch[dev]"       # Everything
+pip install "granite-switch[dev]"       # HF + vLLM 0.19.x + compose + tests
 ```
 
-Requires Python 3.9+ and PyTorch 2.0+. Two vLLM backends are available: `.[vllm]` for broad CUDA 12.x compatibility (0.19.x), and `.[vllm20]` for the latest performance improvements (CUDA 13+).
+Requires Python 3.11–3.13 and PyTorch 2.10+. Two vLLM backends are available: `.[vllm]` for vLLM 0.19.x, and `.[vllm20]` for vLLM 0.20.x.
 
 ### Compose a Model
 
diff --git a/tutorials/PREREQUISITES.md b/tutorials/PREREQUISITES.md
index 9203aff..a1bc7a7 100644
--- a/tutorials/PREREQUISITES.md
+++ b/tutorials/PREREQUISITES.md
@@ -15,7 +15,7 @@ Setup requirements for running Granite Switch tutorials.
 
 ### Python Version
 
-Python 3.10+ is required.
+Python 3.11–3.13 is required.
 
 ### Base Installation
 
@@ -84,7 +84,7 @@ Official IBM Granite adapter libraries (r1.0):
 
 | Library | Adapters | Purpose |
 |---------|----------|---------|
-| [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) | 5 | RAG adapters (rewrite, answerability, citations, etc.) |
+| [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) | 6 | RAG adapters (rewrite, answerability, citations, etc.) |
 | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) | 3 | Core adapters (certainty, requirements, attributions) |
 | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) | 4 | Guardian adapters (harm check, policy, factuality, etc.) |
 
diff --git a/tutorials/guides/build_your_own_adapter.md b/tutorials/guides/build_your_own_adapter.md
index 7af4d92..9a894ce 100644
--- a/tutorials/guides/build_your_own_adapter.md
+++ b/tutorials/guides/build_your_own_adapter.md
@@ -183,7 +183,7 @@ The base model's tokenizer and generation assets (`generation_config.json`, `mer
 
 ## Step 4: Use the Composed Model
 
-> **Note:** Custom (BYOA) adapters are not supported by [Mellea](https://github.com/generative-computing/mellea). Mellea only supports the official IBM Granite Library adapters. To invoke your custom adapters, use the chat template directly as shown below.
+> **Note:** The high-level Mellea wrappers (`guardian_check`, `rag.rewrite_question`, etc.) are built for the official IBM Granite Library adapters. Custom adapters can be invoked through Mellea's lower-level `Intrinsic` API — see [Bring Your Own Adapter with Mellea](mellea_build_your_own_adapter.md). To invoke adapters without Mellea at all, use the chat template directly as shown below.
 
 ### With HuggingFace
 

From a9c8b00174295d984ec9da23511cae08e775e659 Mon Sep 17 00:00:00 2001
From: AlonMalach <alonmalach@gmail.com>
Date: Wed, 27 May 2026 13:54:03 +0300
Subject: [PATCH 04/12] Push module-specific gotchas into child CLAUDE.md files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Apply the best-practices doc's child-directory pattern: gotchas scoped to a
single backend get loaded on demand from src/granite_switch/{hf,vllm,composer}/CLAUDE.md
instead of paying token cost in every session.

Moved out of root:
- vLLM: Punica -1 index detail, TP row-parallel bias-doubling, deployment commands
- HF: eager-backend causal-masking quirk, fused-projections / bit-exact skip
- composer: e2e-tests-must-use-compose rule, compose CLI

Root keeps universal items (file org, test cadence, config params, control-token
generatability, ALORA/LORA placement, hidden-count offset) plus a pointer block
listing the child files. Drops root from 204 → 157 lines.
---
 CLAUDE.md                             | 63 ++++-----------------------
 src/granite_switch/composer/CLAUDE.md | 18 ++++++++
 src/granite_switch/hf/CLAUDE.md       | 23 ++++++++++
 src/granite_switch/vllm/CLAUDE.md     | 29 ++++++++++++
 4 files changed, 78 insertions(+), 55 deletions(-)
 create mode 100644 src/granite_switch/composer/CLAUDE.md
 create mode 100644 src/granite_switch/hf/CLAUDE.md
 create mode 100644 src/granite_switch/vllm/CLAUDE.md

diff --git a/CLAUDE.md b/CLAUDE.md
index 5c9ab89..48a4f79 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -57,13 +57,6 @@ debugging, or exploratory scripts in `tests/`. Use `scratch/` instead (it is git
 
 ## Development Commands
 
-### Composing Models
-
-```bash
-python -m granite_switch.composer.compose_granite_switch \
-  --adapters ibm-granite/granitelib-rag-r1.0
-```
-
 ### Testing
 
 **Always use `-v -s --tb=short`** when running tests. `-x` (fail fast) stops on the first failure —
@@ -94,21 +87,6 @@ pytest tests/vllm/test_model_forward.py -v -s --tb=short -x
 pytest tests/integration/ -v -s --tb=short -x
 ```
 
-### vLLM Deployment
-
-```bash
-# Verify plugin registration
-python -c "from vllm.plugins import load_general_plugins; \
-           from vllm import ModelRegistry; \
-           load_general_plugins(); \
-           print('OK' if 'GraniteSwitchForCausalLM' in ModelRegistry.get_supported_archs() else 'FAIL')"
-
-# Start API server
-python -m vllm.entrypoints.openai.api_server \
-  --model ./granite-with-all-aloras \
-  --port 8000
-```
-
 ## Key Configuration Parameters
 
 - **`attention_multiplier`**: Attention score scaling (instead of `1/sqrt(head_dim)`)
@@ -122,9 +100,8 @@ Always use config values — never hardcode these parameters.
 
 ### 1. Adapter Index Convention
 
-**Control tokens**: `0` = no adapter, `1+` = adapter indices
-
-**vLLM Punica kernels**: `-1` = no adapter (internal conversion: `adapter_indices - 1`)
+`0` = no adapter, `1+` = adapter index. (vLLM Punica kernels use a shifted convention internally —
+see `src/granite_switch/vllm/CLAUDE.md`.)
 
 ### 2. Control Token Generatability
 
@@ -144,43 +121,19 @@ model can produce any control token during generation.
 
 Always load from config, never hardcode.
 
-### 5. End-to-End Tests Must Use Compose Infrastructure
-
-No test should manually assemble `GraniteSwitchConfig` or call `transfer_base_weights`
-directly. All model construction must go through `GraniteSwitchComposer` so that the
-compose pipeline itself is what's being tested. If the composer can't handle a use case
-(e.g., zero-adapter skinning), extend the composer — don't work around it in tests.
-
-### 6. HF Attention Backends and Causal Masking
-
-The eager backend does NOT handle `attention_mask=None` as causal — it treats `None` as no mask
-(full attention). SDPA and FlashAttention handle `attention_mask=None` correctly via `is_causal`
-attribute on the module.
-
-The HF stress tests (`tests/hf/test_single_switch.py`) auto-detect which attention backends work on the
-current platform by probing each with a k=-inf GQA call at import time. Unavailable backends are skipped.
-
-### 7. Known Limitation: Hidden Count Offset When Position 0 is in a Hiding Group
+### 5. Hidden Count Offset When Position 0 is in a Hiding Group
 
 When position 0 is a control token in a hiding group (e.g., a LoRA prefix token with
 `add_bos_token=False`), `hidden_count` is off by 1, causing a 1-position RoPE offset. This is
 acceptable because adapter detection is exact and RoPE is robust to small positional shifts.
 
-### 8. Known Limitation: TP Row-Parallel Bias Doubling
-
-`SwitchedLoRALinear`'s row-parallel bypass path passes bias to all TP ranks instead of
-suppressing it for rank > 0. After all-reduce this doubles the bias. Not affected: all Granite
-architectures (4.0, 4.1) use `attention_bias=False` and `mlp_bias=False`.
+### Backend- and module-specific gotchas
 
-### 9. HF Backend Uses Fused Projections (Not Bit-Exact with Upstream HF)
+Loaded on demand from child CLAUDE.md files when you touch those modules:
 
-The GraniteSwitch HF backend uses fused QKV and gate-up projections, symmetric with the vLLM
-backend architecture. Upstream HuggingFace `GraniteMoeHybridForCausalLM` uses separate projections.
-Fused projections change the floating-point reduction order, so bit-exact skinning equivalence
-with the upstream HF model is not achievable. The vLLM skinning equivalence tests are the
-authoritative check — both the upstream and skinned models use the same fused-projection
-architecture there. The HF skinning tests in `tests/composer/test_skinning_equivalence.py` are
-skipped for this reason.
+- `src/granite_switch/hf/CLAUDE.md` — HF attention backends, fused projections vs upstream HF
+- `src/granite_switch/vllm/CLAUDE.md` — Punica `-1` index, TP row-parallel bias, deployment commands
+- `src/granite_switch/composer/CLAUDE.md` — compose-infra rule for e2e tests, compose CLI
 
 ## Documentation
 
diff --git a/src/granite_switch/composer/CLAUDE.md b/src/granite_switch/composer/CLAUDE.md
new file mode 100644
index 0000000..7a7fbc9
--- /dev/null
+++ b/src/granite_switch/composer/CLAUDE.md
@@ -0,0 +1,18 @@
+# CLAUDE.md — composer/
+
+Compose system: builds Granite Switch checkpoints from a base model + LoRA adapters. Loaded
+automatically when reading any file under `src/granite_switch/composer/`.
+
+## End-to-End Tests Must Use Compose Infrastructure
+
+No test should manually assemble `GraniteSwitchConfig` or call `transfer_base_weights` directly.
+All model construction must go through `GraniteSwitchComposer` so that the compose pipeline
+itself is what's being tested. If the composer can't handle a use case (e.g., zero-adapter
+skinning), extend the composer — don't work around it in tests.
+
+## Composing Models
+
+```bash
+python -m granite_switch.composer.compose_granite_switch \
+  --adapters ibm-granite/granitelib-rag-r1.0
+```
diff --git a/src/granite_switch/hf/CLAUDE.md b/src/granite_switch/hf/CLAUDE.md
new file mode 100644
index 0000000..0e8e935
--- /dev/null
+++ b/src/granite_switch/hf/CLAUDE.md
@@ -0,0 +1,23 @@
+# CLAUDE.md — hf/
+
+HuggingFace backend for training and debugging. Loaded automatically when reading any file under `src/granite_switch/hf/`.
+
+## HF Attention Backends and Causal Masking
+
+The eager backend does NOT handle `attention_mask=None` as causal — it treats `None` as no mask
+(full attention). SDPA and FlashAttention handle `attention_mask=None` correctly via `is_causal`
+attribute on the module.
+
+The HF stress tests (`tests/hf/test_single_switch.py`) auto-detect which attention backends work
+on the current platform by probing each with a k=-inf GQA call at import time. Unavailable
+backends are skipped.
+
+## Fused Projections (Not Bit-Exact with Upstream HF)
+
+The GraniteSwitch HF backend uses fused QKV and gate-up projections, symmetric with the vLLM
+backend architecture. Upstream HuggingFace `GraniteMoeHybridForCausalLM` uses separate
+projections. Fused projections change the floating-point reduction order, so bit-exact skinning
+equivalence with the upstream HF model is not achievable. The vLLM skinning equivalence tests
+are the authoritative check — both the upstream and skinned models use the same fused-projection
+architecture there. The HF skinning tests in `tests/composer/test_skinning_equivalence.py` are
+skipped for this reason.
diff --git a/src/granite_switch/vllm/CLAUDE.md b/src/granite_switch/vllm/CLAUDE.md
new file mode 100644
index 0000000..b1c6018
--- /dev/null
+++ b/src/granite_switch/vllm/CLAUDE.md
@@ -0,0 +1,29 @@
+# CLAUDE.md — vllm/
+
+vLLM backend for production inference. Loaded automatically when reading any file under `src/granite_switch/vllm/`.
+
+## Adapter Index Convention (vLLM-specific)
+
+Punica kernels use `-1` = no adapter. Internal conversion from the shared convention:
+`adapter_indices - 1` (so the shared `0` = no adapter becomes `-1` for Punica).
+
+## Known Limitation: TP Row-Parallel Bias Doubling
+
+`SwitchedLoRALinear`'s row-parallel bypass path passes bias to all TP ranks instead of
+suppressing it for rank > 0. After all-reduce this doubles the bias. Not affected: all Granite
+architectures (4.0, 4.1) use `attention_bias=False` and `mlp_bias=False`.
+
+## Deployment
+
+```bash
+# Verify plugin registration
+python -c "from vllm.plugins import load_general_plugins; \
+           from vllm import ModelRegistry; \
+           load_general_plugins(); \
+           print('OK' if 'GraniteSwitchForCausalLM' in ModelRegistry.get_supported_archs() else 'FAIL')"
+
+# Start API server
+python -m vllm.entrypoints.openai.api_server \
+  --model ./granite-with-all-aloras \
+  --port 8000
+```

From a627f88222a851a6320c1c7dfb47f520c90fb7f5 Mon Sep 17 00:00:00 2001
From: AlonMalach <alonmalach@gmail.com>
Date: Wed, 27 May 2026 13:56:20 +0300
Subject: [PATCH 05/12] Remove Llama references from docs

Llama is no longer supported. Drop the Granite-vs-Llama comparison gotcha and
the parenthetical "main architectural difference with Llama" framing on
logits_scaling in both CLAUDE.md and docs/ARCHITECTURE.md. Renumber the
remaining root-level gotchas (1-4).

Code references to Llama in src/granite_switch/vllm/core/decoder.py are kept:
they document why the RMSNorm dispatch helper exists (different vLLM model
classes use different calling conventions) and are not support claims.
---
 CLAUDE.md            | 12 ++----------
 docs/ARCHITECTURE.md |  2 +-
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 48a4f79..4185f4a 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -90,7 +90,7 @@ pytest tests/integration/ -v -s --tb=short -x
 ## Key Configuration Parameters
 
 - **`attention_multiplier`**: Attention score scaling (instead of `1/sqrt(head_dim)`)
-- **`logits_scaling`**: Applied to final logits (main architectural difference with Llama)
+- **`logits_scaling`**: Applied to final logits
 - **`residual_multiplier`**: Applied to residual connections
 - **`embedding_multiplier`**: Applied to input embeddings
 
@@ -113,15 +113,7 @@ model can produce any control token during generation.
 - **ALORA adapters**: Token placed either in user message by matching invocation sequence or right before generation prompt
 - **LORA adapters**: Token placed at sequence beginning
 
-### 4. Granite vs Llama Differences
-
-- Granite uses `logits_scaling` (typically 8.0)
-- Custom attention scaling via `attention_multiplier`
-- Different residual and embedding multipliers
-
-Always load from config, never hardcode.
-
-### 5. Hidden Count Offset When Position 0 is in a Hiding Group
+### 4. Hidden Count Offset When Position 0 is in a Hiding Group
 
 When position 0 is a control token in a hiding group (e.g., a LoRA prefix token with
 `add_bos_token=False`), `hidden_count` is off by 1, causing a 1-position RoPE offset. This is
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 66401bb..7da3add 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -69,7 +69,7 @@ These fields are specific to Granite Switch and not present in base Granite:
 ### Granite-Specific Parameters (inherited from base model)
 
 - **`attention_multiplier`**: Attention score scaling (replaces `1/sqrt(head_dim)`)
-- **`logits_scaling`**: Applied to final logits (main architectural difference with Llama)
+- **`logits_scaling`**: Applied to final logits
 - **`residual_multiplier`**: Applied to residual connections
 - **`embedding_multiplier`**: Applied to input embeddings
 

From 6cb87fb0e9fbe7ff2501a8d9f66979915c9757d9 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 14:06:48 +0300
Subject: [PATCH 06/12] Fix SUPPORTED_MODELS.md: remove wrong single-GPU claim,
 add 30B model

TP is supported and tested (tests cover TP specifically).
The single-GPU-only note was stale and excluded the 30B model.
---
 docs/SUPPORTED_MODELS.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/SUPPORTED_MODELS.md b/docs/SUPPORTED_MODELS.md
index 0094911..9a28e2b 100644
--- a/docs/SUPPORTED_MODELS.md
+++ b/docs/SUPPORTED_MODELS.md
@@ -16,9 +16,6 @@ automatically from the HuggingFace `config.model_type` field.
 Any Granite model whose HuggingFace config has `model_type: granite` can be used
 as a base model. The table below lists representative examples.
 
-**Note:** Granite Switch currently supports single-GPU inference only. Models
-that do not fit in a single GPU's memory are not yet supported.
-
 #### Granite 4.x (`granite`)
 
 | Model Tag | Size | Variant |
@@ -26,6 +23,7 @@ that do not fit in a single GPU's memory are not yet supported.
 | `ibm-granite/granite-4.1-3b` | 3B | Dense, instruct |
 | `ibm-granite/granite-4.1-8b` | 8B | Dense, instruct |
 | `ibm-granite/granite-4.0-micro` | 3B | Dense, instruct |
+| `ibm-granite/granite-4.1-30b` | 30B | Dense, instruct |
 
 Base variants (`granite-4.1-3b-base`, `granite-4.1-8b-base`) are also supported.
 

From 893c4a08d20497aaa3232b16f26680b551501776 Mon Sep 17 00:00:00 2001
From: AlonMalach <alonmalach@gmail.com>
Date: Wed, 27 May 2026 14:36:20 +0300
Subject: [PATCH 07/12] Trim duplicated and default-knowledge sections from
 root CLAUDE.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three small cuts that pass the "would removing this cause Claude to make
mistakes?" test:

1. Test Files section — drop the per-subdirectory enumeration; the same
   list already appears in Project Structure. Keep the load-bearing rule
   (regression tests only, use scratch/ for throwaway).

2. Naming Conventions — drop test_*.py (pytest default) and snake_case.py
   (PEP 8). Keep only the non-default UPPER_CASE.md rule, renamed to
   "Documentation Naming".

3. Git Workflow — collapse the bullet list that restated GIT_WORKFLOW.md.
   Keep one-line pointer plus the "never sign as Claude" rule (the only
   item not covered by the linked doc).

Drops root from 149 → 132 lines.
---
 CLAUDE.md | 33 ++++++++-------------------------
 1 file changed, 8 insertions(+), 25 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 4185f4a..d3f629a 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -35,25 +35,14 @@ pip install -e ".[hf,compose]"  # HF + composer only (no vLLM)
 
 ### Test Files (Python)
 
-**All `test_*.py` test files MUST go in a `tests/` directory:**
+**`tests/` is for official regression tests ONLY.** Do NOT place throwaway diagnostic,
+debugging, or exploratory scripts in `tests/`. Use `scratch/` instead (it is gitignored).
+Running `pytest tests/` should only execute curated, maintained tests — never one-off
+investigations. Subdirectories are listed in Project Structure above.
 
-- **`tests/unit/`**: Unit tests (fastest, CPU-only)
-- **`tests/hf/`**: HuggingFace implementation tests
-- **`tests/vllm/`**: vLLM implementation tests
-- **`tests/composer/`**: Compose system tests
-- **`tests/integration/`**: Cross-implementation and end-to-end integration tests
-- **`tests/regression/`**: Regression tests (hf/, vllm/, integration/, shared/, tools/)
-- **`tests/shared/`**: Shared test utilities and parametrized cases
+### Documentation Naming
 
-**IMPORTANT: `tests/` is for official regression tests ONLY.** Do NOT place throwaway diagnostic,
-debugging, or exploratory scripts in `tests/`. Use `scratch/` instead (it is gitignored). Running
-`pytest tests/` should only execute curated, maintained tests — never one-off investigations.
-
-### Naming Conventions
-
-- **Test files**: `test_*.py`
-- **Documentation**: `UPPER_CASE.md`
-- **Scripts**: `snake_case.py`
+`UPPER_CASE.md` for docs under `docs/`.
 
 ## Development Commands
 
@@ -135,14 +124,8 @@ Loaded on demand from child CLAUDE.md files when you touch those modules:
 
 ## Git Workflow
 
-**See [docs/GIT_WORKFLOW.md](docs/GIT_WORKFLOW.md) for complete git workflow guidelines.**
-
-- **Branch naming**: `feature/ticket-ID-description` or `bugfix/ticket-ID-description`
-- **Workflow**: Branch from `main` → develop → rebase → PR → merge → delete branch
-- **Critical**: Always verify comments match code before committing (see GIT_WORKFLOW.md)
-- **Commit format**: Clear summary + explanation of WHAT changed and WHY
-
-When committing, **never sign as Claude** (per project instructions)
+See [docs/GIT_WORKFLOW.md](docs/GIT_WORKFLOW.md) for branch naming, commit format, and
+PR workflow. **When committing, never sign as Claude** (per project instructions).
 
 ## License
 

From 053760cb5c617a120f833cfffd5158160f5642d8 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 14:51:32 +0300
Subject: [PATCH 08/12] Restore CUDA version remarks; use Python 3.11+ in
 README install section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Keep CUDA 12.x / CUDA 13+ distinction — useful context for users choosing a vLLM backend
- Use '3.11+' instead of '3.11-3.13' — upper bound in pyproject.toml reflects untested versions, not incompatibility
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index c854aa2..7429de0 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ pip install "granite-switch[vllm20]"    # vLLM 0.20+ (requires CUDA 13+)
 pip install "granite-switch[dev]"       # HF + vLLM 0.19.x + compose + tests
 ```
 
-Requires Python 3.11–3.13 and PyTorch 2.10+. Two vLLM backends are available: `.[vllm]` for vLLM 0.19.x, and `.[vllm20]` for vLLM 0.20.x.
+Requires Python 3.11+ and PyTorch 2.10+. Two vLLM backends are available: `.[vllm]` for broad CUDA 12.x compatibility (0.19.x), and `.[vllm20]` for the latest performance improvements (CUDA 13+).
 
 ### Compose a Model
 

From c8f23d8311297509cbabd05196a6f4ba5550acf3 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 15:04:59 +0300
Subject: [PATCH 09/12] =?UTF-8?q?Remove=20skills=20=E2=80=94=20will=20be?=
 =?UTF-8?q?=20added=20separately?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .claude/skills/tutorial-notebook/SKILL.md | 319 ------------------
 .claude/skills/validate-links/SKILL.md    | 388 ----------------------
 2 files changed, 707 deletions(-)
 delete mode 100644 .claude/skills/tutorial-notebook/SKILL.md
 delete mode 100644 .claude/skills/validate-links/SKILL.md

diff --git a/.claude/skills/tutorial-notebook/SKILL.md b/.claude/skills/tutorial-notebook/SKILL.md
deleted file mode 100644
index 5734df6..0000000
--- a/.claude/skills/tutorial-notebook/SKILL.md
+++ /dev/null
@@ -1,319 +0,0 @@
----
-name: tutorial-notebook
-description: Polish an existing Jupyter tutorial notebook — or scaffold a new one from scratch — so it teaches a first-time reader clearly. Enforces a standard template (title → metadata → intro → prerequisites → numbered sections → next steps), catches common bugs (broken imports, silent data loads, stale thresholds, dead code), and adds the load-bearing polish items that actually move the needle (diagrams, explanatory comments, adapter descriptions, link validation). Use when the user asks to "improve a notebook", "make a notebook perfect", "apply the template", or is creating a new tutorial and wants it to match their existing ones.
----
-
-# Tutorial Notebook Skill
-
-This skill produces tutorial notebooks that a first-time reader can open, execute, and understand without needing to consult other docs. It was distilled from an end-to-end polish of `tutorials/notebooks/rag_flow.ipynb` and the lessons from what made each change earn its place.
-
-## Core principle
-
-**Comments, cells, and sections earn their place by answering WHY, not what.** If removing a comment or splitting a cell wouldn't help a future reader make or avoid a decision, don't add it. Apply every checklist item below through that lens — don't mechanically tick boxes.
-
-## Two modes
-
-- **Polish mode** — user hands you an existing notebook. Read it end-to-end first; don't edit before you understand the whole narrative. Make changes one at a time so each can be verified.
-- **Scaffold mode** — user wants a new notebook from scratch. Start from the [template skeleton](#template-skeleton) below and fill in content. Still apply the full checklist before declaring done.
-
-In both modes, always check against the same rubric — that's what makes notebooks feel like siblings instead of cousins.
-
-## Interaction rhythm
-
-- **Never batch large rewrites.** Propose changes one at a time, each with a brief "why." Let the user say yes/no/adjust before moving on.
-- **Push back honestly when asked to do something that adds noise rather than clarity.** If a requested change would hurt the first-time reader (e.g., collapsing load-bearing reference material behind closed `<details>`, or wrapping unrelated functions in a namespace class), say so with reasoning — don't silently comply.
-- **Show diffs, not summaries.** After an edit, show the actual changed lines. "Summary of changes: …" is less trustworthy than the diff itself.
-- **Verify after each code edit.** `python3 -c "import ast; ast.parse(open(file).read())"` for Python files, `python3 -c "import json; json.load(open(nb))"` for notebooks. Also run `git diff --stat` to sanity-check scope.
-
----
-
-## The checklist
-
-Work through this in order. Each item has: **what**, **why**, and **how to check**.
-
-### 1. Correctness and bugs (do these FIRST — everything else is polish)
-
-These are the ones that make a notebook fail to run for a first-time user.
-
-- **Broken import paths.** Imports that assume the repo is on `sys.path` (e.g., `from tutorials.scripts.xyz import ...` when there are no `__init__.py` files) will `ImportError` for any user who opens the notebook cold. Fix with `import sys; sys.path.insert(0, "../scripts")` + plain import. *Check:* open the notebook's directory, look at the import path, and ask "does this actually work without my PYTHONPATH being set right?"
-
-- **Threshold / constant mismatches between logic and display.** A function that *decides* using threshold `0.5` while the *display helper* badges using `0.4` will produce contradictory output ("🟢 passed AND 🔴 blocked"). Grep for the same threshold across all cells; make sure they agree. This was a real bug in `rag_flow.ipynb`.
-
-- **Unused constants and dead config.** Constants defined in the config cell that aren't referenced anywhere mislead readers into thinking they matter. Delete them. If the function they were meant for returns a string verdict instead of a score, the "threshold constant" is nonsense — that was the `ANSWERABILITY_THRESHOLD` case.
-
-- **Stale comments and docstrings.** `# QC returns CLEAR when...` — what's QC? If a function name appears in a docstring, make sure it's the *current* name, not an internal abbreviation.
-
-### 2. Template structure
-
-Every tutorial notebook should follow this shape. Deviations need a reason.
-
-```
-# H1 Title                          ← cell 0 starts here
-Metadata line (Duration only — full Prerequisites section follows below)
-Intro paragraph (what it demonstrates, one or two sentences)
-Why this approach (if the choice isn't obvious — e.g., "Why vLLM:")
-What you'll learn (bullets — first bullet names the concrete deliverable as a learning outcome, remaining bullets are transferable conceptual takeaways)
-
-## Prerequisites                   <- still cell 0, or next cell
-1. Install
-2. Get artifacts (models, data)
-3. Start servers
-4. Verify
-Pointer to the softer-intro notebook and PREREQUISITES.md for depth
-
----                                 ← visual break; diagram lives in its own cell
-Intrinsics / components used        ← if the tutorial exercises multiple adapters/tools
-Pipeline / architecture diagram     ← image attachment, on its own cell
-
-## 1 · <section name>               ← numbered H2s, numbered 1..N
-[one-line intro explaining the section's purpose]
-<code cell(s)>
-
-## 2 · <section name>
-...
-
-## N · Next steps                   ← terminal section; numbering is a style call
-- Adapt to your own app (point at the reusable function)
-- Related tutorials / how-tos
-- External references (library docs, model cards)
-```
-
-**Subsection rules (H3):** use sparingly, only when one H2 has multiple distinct helpers. In `rag_flow.ipynb`, §5 splits into `5a · Display helpers (printing only - not part of the pipeline)` because the display utilities are conceptually separate from the pipeline function above them. Don't force subsections for sections that have one concept.
-
-### 3. Intro cell (cell 0) — the highest-leverage surface
-
-A cold reader decides in 10 seconds whether the notebook is for them. That decision happens in cell 0.
-
-- **H1 title** matches the subject of the tutorial, not the repo.
-- **Metadata line** directly under the title: `**Duration:** ~X min (first run)`. The full `## Prerequisites` section is right below, so no need to link to it from the metadata line.
-- **Motivation paragraphs** should be two short paragraphs, not one 90-word wall. First paragraph: what this demonstrates. Second paragraph (optional, italicized): the constraint that explains *why this approach*. Example: `*Why vLLM:* the mellea intrinsics API currently supports vLLM only.`
-- **What you'll learn:** 3-5 bullets - one consolidated list, no separate "What you'll build" section. Lead with a bullet that names the concrete deliverable phrased as a learning outcome (e.g., `"How to build a 7-turn conversation that exercises every step of the pipeline"`), then follow with bullets about transferable conceptual takeaways. Bullets should not be a list of cells - `"how to call foo()"` is too mechanical; `"how to chain multiple intrinsics into one RAG pipeline"` is right.
-- **Adapters used callout:** directly after the "What you'll learn" bullets, add a one-line `**Adapters used:**` paragraph that names which adapter libraries (and specific intrinsics within them) the notebook exercises, each linked to its HuggingFace repo. Example: `**Adapters used:** intrinsics from the [Core](https://huggingface.co/ibm-granite/granitelib-core-r1.0) library (\`context-attribution\`, \`uncertainty\`) and the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (\`guardian-core\`).` This lets a reader skimming the intro instantly see whether the notebook touches the capability they care about, without reading the full body. Keep it to one sentence; list only adapters the notebook actually *invokes* (not ones mentioned in reference tables). If the notebook has no "What you'll learn" list (freeform intro), place the callout immediately before the Prerequisites section. Keep this list in sync with any top-level README's "where used in tutorials" column — mismatches surface fast.
-- **Prerequisites section:** numbered checklist with copy-pasteable commands. Every installation step, every "start this server" step, and a verification command (`curl ...`) the reader can run before moving on. Don't just link to `PREREQUISITES.md` - inline what they need, *then* link to the full doc for depth.
-
-### 4. Component/adapter introduction table
-
-If the tutorial uses more than 2–3 distinct libraries, adapters, or intrinsics, add a compact reference table near the top (right before the pipeline diagram). Two columns:
-
-| Component | Role |
-|-----------|------|
-| `foo.bar` | One-line description of what it does in *this* tutorial. |
-
-Not an API reference — just enough that a reader skimming the notebook knows what each name means before they hit it in code.
-
-### 5. Pipeline / architecture diagram
-
-A diagram is usually worth including. Default to adding one when the notebook executes a multi-step flow with branching, or when a conceptual illustration would help a reader form the right mental model before reading code. Skip a diagram only when the flow is trivially linear (e.g., a two-cell "load model, run inference" demo) and a picture would add nothing a section header doesn't already convey.
-
-**Two acceptable formats, both in use in this repo:**
-
-1. **Image cell attachment** - markdown cell containing `![image.png](attachment:image.png)` with the PNG embedded under that cell's `attachments` metadata. Used by `granite_switch_with_hf.ipynb`. Renders everywhere (GitHub, nbviewer, JupyterLab, VS Code).
-2. **Mermaid rendered from a code cell** - a Python cell that defines a Mermaid source string and renders it (e.g., via `IPython.display`). Used by `rag_flow.ipynb`. Easier to keep in sync with code labels and edit in-place.
-
-Pick whichever fits the diagram and the notebook. Don't mix both styles within one notebook.
-
-**The skill cannot generate or attach a PNG itself.** When the chosen format is an image attachment, describe to the user exactly what the diagram should show (the steps, branches, terminal states, labels you'd want), leave a placeholder markdown cell with a TODO comment to reserve the slot, and ask the user to produce and attach the image. For Mermaid, the skill *can* author the source directly.
-
-**Diagram content rules** (apply when describing what the image should show):
-- **Include every early-exit branch**, not just the happy path. A reader scanning the diagram should see all possible terminal states (e.g., `BLOCKED`, `UNANSWERABLE`, `DONE`).
-- **Match node labels to code.** If the display helper calls steps `[1a]`, `[1b]`, `[2]`... the diagram nodes should use the same tags. Shared vocabulary is the point.
-- **Match terminal emoji to the code's print output.** If `show_answer` prints ⛔ for blocks and 🔍 for unanswerable, the diagram's terminal nodes use the same glyphs. This makes the diagram a legend for the runtime output.
-- **Keep the diagram on its own cell.** Cell 0 has enough to carry without a diagram inside it.
-
-### 6. Code cells — structure
-
-- **Each `## N · ...` section has at most one concept per code cell.** If a cell is >80 lines doing two distinct things (e.g., `run_pipeline` + display helpers), split it with a markdown divider (`### Na · ...`). Subsections need a short intro markdown cell describing what the code does.
-
-- **Extract helpers when extracting *gains* clarity.** The 7-line `ChatContext` build logic at the top of `run_pipeline` was pure bootstrapping; lifting it into `_build_context(history, query)` let the main function read as a clean 7-step sequence. Extract when it tightens; don't extract for the sake of extraction.
-
-- **Don't extract "for namespacing."** A `Show` class containing three unrelated static methods adds ceremony without structure. Python's namespacing tool is the module/cell, not the class.
-
-- **Shell-out lines — `%pip` for installs, `!` for everything else.** Installer lines (`pip install`, `pip uninstall`, `conda install`) use the Jupyter line magic: `%pip install -q -e "/content/granite-switch[vllm]"`. Every other shell-out — `!git clone ...`, `!python -m ...`, `!python script.py`, `!huggingface-cli ...`, `!curl ...`, `!ls`, `!head`, etc. — keeps the `!` prefix. **Why:** `%pip` is a Jupyter line magic that always installs into the kernel running the notebook; `!pip` shells out to whatever `pip` is on PATH, which in Colab and managed-Jupyter environments often differs from the kernel's interpreter and produces the classic "installed fine, still ImportError" failure for the reader. Only `%pip`/`%conda` get this treatment — the others are not line magics and `%`-prefixing them would either no-op or error.
-
-  ```
-  %pip install -q -e "/content/granite-switch[vllm]"   # good — targets the running kernel
-  !pip install -q -e "/content/granite-switch[vllm]"   # bad — may install into the wrong interpreter
-  !git clone https://github.com/...                    # good — git is not a line magic
-  !python -m granite_switch.composer.compose_granite_switch ...   # good — same reason
-  ```
-
-### 7. Imports
-
-**Consolidate imports into a single cell near the top of the notebook, not scattered across cells.** One dedicated imports cell (typically right after the config cell, or merged with it if config is small) makes dependencies visible at a glance and matches standard Python/Jupyter convention. Readers scanning the notebook can see the full set of external dependencies in one place instead of discovering them cell-by-cell.
-
-**Placement:**
-- Put the imports cell early — after the intro/prerequisites markdown, before the first substantive code section.
-- If the config cell is small (a few constants), imports can live with it; otherwise keep imports in their own cell so neither drowns the other.
-- Group imports conventionally: stdlib first, third-party next, local/project last, with blank lines between groups.
-
-**Narrow exceptions — keep an import local to its cell only when:**
-- The import has a heavy side effect at import time (registers a plugin, mutates global state) and the reader needs to see it happen at that point in the narrative.
-- The import is genuinely optional/conditional (inside a `try/except` or guarded by a feature flag).
-
-**Never** write `# MelleaDocument is used later in §4` above an import as a workaround for scattered placement — consolidate instead. Pointer comments are a smell.
-
-### 8. Comments — earn or delete
-
-Default to writing no comments. Add one only when:
-
-- **The value is non-obvious.** `TOP_K = 20` deserves "balances recall against context budget; mt-rag-benchmark default." `VLLM_PORT = 8000` does not deserve a comment.
-- **The ordering matters and a refactor could break it.** `# Harm check must run BEFORE scope check so harmful+out-of-scope queries are labeled harmful, not merely out-of-scope.` Without this, someone will swap them for "fail fast" and introduce a silent regression.
-- **The value is a knob readers will tune.** `temperature=0.0` deserves "grounded RAG — we want the model to repeat the docs, not paraphrase. Also makes demos reproducible." Someone will bump it to 0.7 and break grounding.
-
-**Do not** write comments that restate what the code says (`# retrieve top-K documents` above `retrieve_top_k_documents(...)`). Delete them.
-
-### 9. Reference tables — every row parallel
-
-Reference tables (like "what `show_intermediates` displays at each step") must have every row in the same shape. If row 1 says `"badge + raw score. Exits early if ≥ 0.5"`, row 4 should say `"badge + verdict string. Exits early if unanswerable"` — same structure, same verb, same vocabulary. Asymmetric rows look like bugs to a reader.
-
-Also: the text in reference tables must match what the code prints. If the code renders `🟢 safe / 🔴 harmful`, the table says `🟢 safe / 🔴 harmful`, not `safe / flagged`.
-
-### 10. Display rendering
-
-- **Use `display(Markdown(...))`** for rich output in notebooks. Don't use ANSI color codes (`\x1b[32m...`) — they render in terminal only and look like garbage in rendered notebooks or exported HTML.
-- **For collapsible detail** (large outputs, reference tables that take vertical space), use `<details open>...<summary>...</summary>...</details>`. Default to `<details open>` for load-bearing content (users can collapse it); use closed `<details>` only for truly optional depth.
-- **Standard emoji glyphs** for status — keep them consistent across the notebook series: ⛔ block/refuse · 🔍 empty/unanswerable · ❓ clarification needed · ✅ pass/done · 🟢/🔴 safe/danger binary · 📄 document · 📚 collection · 🔖 citation/reference.
-
-### 11. Helper scripts (supporting `.py` files)
-
-If the tutorial loads data or does heavy setup in a sibling script:
-
-- **Progress feedback for any operation > 5 seconds.** `tqdm` for downloads (use `httpx.stream()` with `Content-Length`), `tqdm` for batch processing, progress prints for shorter waits. Silent multi-minute operations make users think the notebook froze.
-- **Atomic writes for persistent state.** When writing a file the notebook will later re-read (e.g., extracted jsonl), write to `path.tmp` first, then `os.replace(tmp, path)`. A Ctrl-C mid-write produces a truncated file that silently breaks subsequent runs — one of the worst classes of bug to debug.
-- **Validate non-empty output loudly.** After parsing/loading, if the result has zero rows, raise `RuntimeError` with actionable guidance (`"Delete X and rerun"`), not a silent empty return.
-- **Split timeouts.** `httpx.Timeout(total_seconds, connect=10.0)` instead of a flat `timeout=120`. Fails fast on unreachable servers; patient on slow transfers.
-- **Escalate GPU/CPU warnings.** `print("Notice: ...")` gets lost in notebook output. Use `warnings.warn(...)` with a concrete time estimate ("~10 min on GPU vs. hours on CPU") so users can abort before committing.
-
-### 12. Queries / demos — design intentionally
-
-If the notebook ends with runnable demo cells, the demo set should *tour every exit path* in the system. For a pipeline with {happy path, ambiguous, unanswerable, out-of-scope, harmful} outcomes, include one demo of each. A demo that only shows the happy path teaches half the system.
-
-Add one-line intent comments per demo: `# Q3 — resolves clarification: query rewrite uses history to reconstruct full question`. These teach what each demo *is testing*, beyond what the query text alone conveys.
-
-### 13. Next steps section
-
-Close every notebook with 3–5 bullets pointing the reader somewhere concrete:
-
-1. **Adapt-to-your-app pointer:** name the reusable function/class and remind the reader it's lift-able. `run_pipeline(query, history)` is stateless — copy it as a starting point.
-2. **Go deeper on this topic:** related how-to or tutorial in the repo.
-3. **Extend with custom content:** how to bring your own adapter / corpus / model.
-4. **Library deep-dive:** link to the framework's main repo/docs.
-5. **Browse alternatives:** catalog of other adapters/models the reader could try.
-
-Two bullets that both say "go compose your own model" is one bullet wasted — make sure every bullet opens a *distinct* next direction.
-
-**Inter-notebook wiring rule:**
-
-The granite-switch tutorial set uses descriptive filenames (no numeric prefixes), so wiring is judged by *content*, not by index. Two principles for the next-steps bullets:
-
-1. **The producer is reachable from every consumer.** `compose_granite_switch.ipynb` produces the checkpoint that every other notebook consumes. Every notebook except the producer itself should include a "compose your own checkpoint" bullet pointing to it. A reader who lands on any consumer should be one click from the producer - they shouldn't have to discover it by reading every sibling.
-2. **Don't link backward to softer-intro notebooks.** If a notebook is a deeper or harder version of another (e.g., `granite_switch_with_hf.ipynb` is the long-form version of `hello_adapter.ipynb`; `rag_flow.ipynb` is the long-form of `rag_101.ipynb`), the long-form should *not* link back to its softer sibling - the reader already passed it. The softer notebook *can* link forward to the long-form.
-
-Every notebook should also link to whatever logical follow-ups exist for the reader (the next pipeline to try, the comparison/race demo, the framework's main repo). Three to five bullets is the right shape - see section 13's general rules.
-
-**Use same-directory relative paths** (`./name.ipynb`) when all notebooks live in one folder - not `../notebooks/name.ipynb`. After editing, run a link-resolution check to catch typos and stale filenames:
-
-```python
-import json, re, pathlib
-nbdir = pathlib.Path("tutorials/notebooks")
-for nb_path in sorted(nbdir.glob("*.ipynb")):
-    nb = json.loads(nb_path.read_text())
-    for c in nb["cells"]:
-        if c["cell_type"] != "markdown": continue
-        src = "".join(c["source"]) if isinstance(c["source"], list) else c["source"]
-        if "Next steps" not in src: continue
-        for href in re.findall(r"\]\((\./[^)]+\.ipynb)\)", src):
-            assert (nb_path.parent / href).resolve().exists(), f"{nb_path.name}: broken {href}"
-print("all next-steps links resolve")
-```
-
-When notebooks get renamed or split, the next-steps sections of *every other notebook in the series* go stale silently. Always re-run the link check after any rename. The repo also has a `validate-links` skill that runs this check across notebooks and markdown together - prefer it for cross-cutting validation.
-
-### 14. Links
-
-- **Every external link:** verify with `curl -s -o /dev/null -w "%{http_code}" <url>`. Expect 200/301/302.
-- **Every internal link:** verify the file exists on disk. Relative paths should work from the notebook's directory (notebooks live in `tutorials/notebooks/`, so `../PREREQUISITES.md` resolves to `tutorials/PREREQUISITES.md`).
-- **Anchor links:** Markdown lowercases heading text to form anchors. `## Prerequisites` produces `#prerequisites`, not `#Prerequisites`. Check every in-notebook anchor.
-
-### 15. Prose — light touch
-
-For prose-clarity passes: fix stale references (section counts that have changed, helper function signatures that have changed), tighten walls of text (split 90-word single paragraphs into two), and make sure section intros say *what* and *why*, not just the section name restated. Don't rewrite prose that's already clear — every unnecessary change is a chance to introduce a regression.
-
----
-
-## Template skeleton (scaffold mode)
-
-For a brand-new notebook, start from this and fill it in. Delete sections that truly don't apply (e.g., no diagram if the flow is linear and has one step).
-
-```markdown
-# <Title — what the notebook accomplishes>
-
-**Duration:** ~N min (first run)
-
-This notebook demonstrates <one-sentence concrete pitch>. <One more sentence on scope.>
-
-*Why <key choice>:* <one-line constraint explanation, if the choice isn't self-evident>
-
-**What you'll learn:**
-- How to build <concrete deliverable - "a 7-turn conversation that exercises every step", "a composed model checkpoint with two adapters", etc.>
-- <Transferable takeaway 1>
-- <Transferable takeaway 2>
-
-**Adapters used:** intrinsics from the [<Library>](<hf-url>) library (`<adapter_1>`, `<adapter_2>`)<, and the [<Library>](<hf-url>) library (`<adapter_3>`)>.
-
-## Prerequisites
-
-1. **Install dependencies** (<GPU? CPU? which>):
-   ```bash
-   pip install "<extras>"
-   ```
-2. **Get <artifact>.** <How to obtain it, pointer to a ready-made option, pointer to a "compose your own" tutorial.>
-3. **Start <service>** (if applicable):
-   ```bash
-   <start command>
-   ```
-4. **Verify:** `<verification command>`
-
-<Pointer to softer-intro notebook if one exists, pointer to PREREQUISITES.md for depth.>
-```
-
-Then a second markdown cell with (if applicable) an intrinsics/components table and the diagram (image attachment — see section 5). Then numbered `## 1 · Section`, `## 2 · Section`, etc., each with a one-line intro markdown cell before its code. End with `## N · Next steps`.
-
----
-
-## When working on an existing notebook
-
-1. **Read the whole thing first.** Don't edit until you understand the arc: what it teaches, what the demos tour, what the reader is expected to walk away with.
-2. **Identify the real bugs before the polish.** Run through section 1 of the checklist. A broken import is worth ten prose tweaks.
-3. **Propose changes one at a time.** Each should be justifiable in one sentence. If you can't justify it, don't do it.
-4. **After each change:** verify JSON validity, syntax, and that the diff scope matches what was planned. Show the diff.
-5. **When multiple tasks conflict:** defer to the principle that serves the first-time reader. For example, a reader scanning a notebook benefits from one consolidated imports cell (full dependency list visible at a glance) more than from imports scattered next to their use sites.
-
----
-
-## Universal anti-patterns — push back if the user asks for these
-
-- **Classes that are just namespaces.** `class Show: @staticmethod def answer(r): ...` adds ceremony. Use functions in cells.
-- **Scattering imports across cells near their first use.** Hurts scan-ability; readers lose the single-glance view of what the notebook depends on. Consolidate into one imports cell near the top.
-- **Collapsing load-bearing reference material behind closed `<details>`.** `<details open>` is fine; closed is only for optional depth.
-- **Splitting "What you'll build" out as its own section.** Use one consolidated "What you'll learn" list; if the concrete deliverable is load-bearing, make it the first bullet phrased as a learning outcome (`"How to build <deliverable>"`) rather than a separate header.
-- **Renaming a section just because another notebook uses a different name.** Template consistency matters, but forcing "Prerequisites" when the existing "Before you start" reads better locally is cargo-cult.
-- **Numbering every heading mechanically.** "Next steps" as "`## 6 · Next steps`" vs. unnumbered `## Next steps` is a style call, not a correctness one. Only enforce when the user has said they want strict numbering.
-
----
-
-## Verification checklist (before declaring done)
-
-- [ ] Notebook is valid JSON: `python3 -c "import json; json.load(open(PATH))"`.
-- [ ] Every code cell parses: walk cells, `ast.parse(source)` each one.
-- [ ] Structural overview (`for i, c in enumerate(nb['cells']): print(i, c['cell_type'], first_line)`) shows one H1, numbered H2s, H3s only under H2s that have them, code-cell length sensible (no >120-line monsters unless justified).
-- [ ] All external URLs return 2xx/3xx.
-- [ ] All internal links point at files/anchors that exist.
-- [ ] Reference tables' badge glyphs match the code's actual print statements.
-- [ ] Diagram terminals match the code's actual exit names (`blocked`, `unanswerable`, etc.).
-- [ ] Intro cell has an **Adapters used:** callout naming every adapter library the notebook actually invokes, each linked to its HuggingFace repo.
-- [ ] No `!pip install` / `!pip uninstall` / `!conda install` lines anywhere — installer lines use `%pip` / `%conda` so they target the running kernel.
-- [ ] Imports are consolidated into one cell near the top, not scattered across cells next to their first use (narrow exceptions: side-effectful or conditional imports — see section 7).
-- [ ] Running the notebook top-to-bottom with "Run All" should complete cleanly (requires the runtime environment — document this as a manual step for the user).
-
-If any of these fails, fix before handing off. The skill is "produce a notebook that runs cleanly on first try for a cold reader" — missing verification undermines the whole exercise.
diff --git a/.claude/skills/validate-links/SKILL.md b/.claude/skills/validate-links/SKILL.md
deleted file mode 100644
index c303b3c..0000000
--- a/.claude/skills/validate-links/SKILL.md
+++ /dev/null
@@ -1,388 +0,0 @@
----
-name: validate-links
-description: Validate local file links AND first-party Python imports across an entire repo (notebooks and markdown) and propose fixes for broken targets. Catches the kind of breakage that happens after renames, renumbering, or directory moves -- `[foo](./old_name.ipynb)` style links that silently 404 from GitHub/Colab/nbviewer, plus `from pkg.old_module import ...` imports that fail at notebook runtime. Use when the user asks to "validate links", "check links", "audit links", "verify links", "find broken links", "validate imports", or "make sure filenames align" after any restructuring.
----
-
-# Link Validation Skill
-
-Find every local link in the repo's `.ipynb` and `.md` files, flag the ones whose targets don't exist on disk, and propose fixes. Also validate first-party Python imports inside notebook code cells and `.py` files -- a `from granite_switch.tutorials.rag_display import ask` line silently breaks the same way a stale markdown link does when a module moves or gets renamed. Read-only by default; fixes happen only after the user confirms.
-
-## What counts as a "local link"
-
-- Markdown link syntax `[text](target)` where `target` does **not** start with `http://` or `https://`.
-- Targets that point at a file with extension `.ipynb`, `.md`, `.py`, `.png`, `.jpg`, `.svg`, `.json`, or `.sh` (extend the list if the repo uses others — ask if unsure).
-- Inside `.ipynb`: only **markdown cells** (`cell_type == "markdown"`). Code cells are skipped — strings inside Python aren't links.
-- `attachment:` references (notebook-embedded images) are **not** local file links — skip them.
-
-## What counts as a "stale label"
-
-Display labels that look like a filename — `[`old_name.ipynb`](new_name.ipynb)`, `[old_name.ipynb](new_name.ipynb)`, or `[Title (old_name.ipynb)](new_name.ipynb)` — but where the filename in the label doesn't match the URL. After a rename, the URL often gets fixed while the label keeps the old name and silently lies about what the link points at. Treat these as fixable in the same pass as broken targets.
-
-A label is considered "filename-shaped" when it contains a token that ends in one of the tracked extensions (`.ipynb`, `.md`, `.py`, ...). Plain prose labels like `"the simple pipeline"` are not stale even if the URL changes -- only fix labels that purport to name the file.
-
-## What counts as a "broken import"
-
-A first-party Python import is **broken** when its dotted module path does not resolve to a file or package on disk under the repo's package roots. Example: `from granite_switch.tutorials.rag_display import ask` is valid only when `<root>/granite_switch/tutorials/rag_display.py` (or `<root>/granite_switch/tutorials/rag_display/__init__.py`) exists, where `<root>` is one of the configured package roots (typically `src/` for src-layout repos, or `.` otherwise).
-
-Scope:
-
-- Only **first-party** packages count. Determine the set of first-party top-level package names by listing the immediate child directories of each package root that contain an `__init__.py` (e.g., `src/granite_switch/` -> first-party name `granite_switch`). Imports whose top-level name is not in this set (`numpy`, `torch`, `os`, `json`, ...) are skipped -- this is not a substitute for a real linter.
-- Both forms are checked: `from A.B.C import name` and `import A.B.C [as alias]`.
-- Inside `.ipynb`: only **code cells** (`cell_type == "code"`). Skip cells whose first non-empty source line is a Jupyter magic (`%`, `%%`, `!`).
-- Inside `.py`: parse with `ast.parse`; this naturally ignores strings and comments and handles multi-line imports, parenthesized import lists, and relative imports. Relative imports (`from .foo import x`) are resolved against the importing file's package and are checked the same way.
-- A dotted path resolves if walking it from a package root lands on a directory with `__init__.py` at every intermediate step and a `.py` file or package directory at the leaf. The imported names themselves (`ask`, `show_answer`) are **not** verified -- that needs real import-time analysis.
-
-Discovering package roots:
-
-1. If `pyproject.toml` has `[tool.setuptools.packages.find] where = ["src"]` (or `[tool.hatch.build.targets.wheel] packages = ["src/foo"]`, or `[tool.poetry] packages = [{ include = "foo", from = "src" }]`), use those.
-2. Otherwise default to `.` and `src/` if either contains a top-level dir with `__init__.py`.
-3. If the user's repo uses a layout the heuristic misses, ask before guessing.
-
-## Workflow
-
-### 1. Discover
-
-Run from the repo root:
-
-- List all `.ipynb`, `.md`, and `.py` files (respect `.gitignore` -- use `git ls-files '*.ipynb' '*.md' '*.py'` so you don't audit vendored copies in `node_modules/`, `.venv/`, etc.).
-- Build a set of every existing file path in the repo (`git ls-files`) -- this is what link targets are checked against.
-- Determine the import package roots and first-party package names per "What counts as a broken import" above. If `pyproject.toml` is missing or unreadable, fall back to `.` + `src/` and report which roots/packages were used so the user can correct the assumption.
-
-### 2. Scan
-
-For each file:
-
-- For `.md` -- read the raw text and run the **link** scan only.
-- For `.ipynb` -- parse JSON, iterate `cells`. Run the **link** scan against `markdown` cells (join `source` to a string). Run the **import** scan against `code` cells (skip cells whose first non-empty line is a `%`, `%%`, or `!` magic; otherwise concatenate `source` and feed to the import scanner).
-- For `.py` -- run the **import** scan only (parse with `ast.parse`; record the line number from the AST node for reporting).
-
-**Link scan:**
-
-- Run the link regex `\[([^\]]+)\]\(([^)]+)\)` against the text.
-- For each match, take the target, drop any `#anchor` fragment, resolve the path **relative to the file's directory** (so `../foo.md` from `tutorials/notebooks/x.ipynb` resolves to `tutorials/foo.md`).
-- A target is **broken** if the resolved path doesn't exist on disk.
-- Independently, flag the link as having a **stale label** when:
-  - the label contains a filename-shaped token (ends in a tracked extension), AND
-  - that token is not equal to `Path(target).name` (the basename of the URL, after stripping any `#anchor`).
-
-  Stale labels are reported even when the target itself resolves cleanly -- the URL works, but the label lies about what it points at.
-
-**Import scan:**
-
-- For notebook code cells, parse the joined source with `ast.parse`. Wrap in `try/except SyntaxError` and skip cells that fail to parse (rare, usually transient half-edited cells); note the cell index in the skip log so the user can investigate.
-- For `.py` files, parse the whole file the same way.
-- Walk `ast.Import` and `ast.ImportFrom` nodes. For each, build the dotted module path:
-  - `import a.b.c` -> `a.b.c` per alias.
-  - `from a.b import c, d` -> check `a.b` resolves; the imported names are not verified, but if `a.b.c` *also* resolves as a submodule path, prefer that interpretation when reporting (it makes the suggestion more specific).
-  - `from . import x` and `from .. import x` -> compute the absolute package by walking up from the file's package, then check that resolves.
-- Filter to first-party top-level names. Skip everything else.
-- A first-party dotted path is **broken** when no package root contains a matching directory-or-file chain (intermediate dirs need `__init__.py`; leaf can be either `<name>.py` or `<name>/__init__.py`).
-
-### 3. Report
-
-Present a single report grouped by source file, in this shape:
-
-```
-BROKEN LINKS
-
-tutorials/notebooks/00_hello_adapter.ipynb (cell 0)
-  ./hello_mellea.ipynb                          -> closest match: 01_hello_mellea.ipynb
-  ../notebooks/03_compose_granite_switch.ipynb  -> closest match: 04_compose_granite_switch.ipynb
-
-docs/SOMETHING.md (line 42)
-  ../old/path/file.md                           -> no close match found
-
-STALE LABELS (target works, but the label names the wrong file)
-
-tutorials/README.md (line 15)
-  [03_01_old_name.ipynb](notebooks/03_01_new_name.ipynb)
-    -> label should be `03_01_new_name.ipynb`
-
-BROKEN IMPORTS (first-party module path does not resolve on disk)
-
-tutorials/notebooks/03_01_rag_101.ipynb (cell 4)
-  from granite_switch.tutorials.rag_displays import ask
-    -> closest match: granite_switch.tutorials.rag_display
-
-src/granite_switch/composer/old_helpers.py (line 12)
-  from granite_switch.composer.weight_remap import AdapterRemapper
-    -> closest match: granite_switch.composer.weight_remapper
-
-(package roots used: src/  |  first-party packages: granite_switch)
-```
-
-For each broken link, compute a "closest match" by:
-
-1. Take the basename of the broken target (`hello_mellea.ipynb`).
-2. Among all existing files in the repo with the same extension, prefer the one whose basename has the smallest edit distance (or contains the broken basename as a substring, or vice versa). Renumbering cases -- `03_compose_x.ipynb` vs `04_compose_x.ipynb` -- should match strongly.
-3. If no candidate is closer than ~50% similar, report "no close match found" rather than guessing.
-
-For each broken import, compute a closest match against the set of all valid first-party dotted paths (every `.py` file and package directory under each package root, expressed in dotted form). Use the same edit-distance heuristic, but match on the **full dotted path**, not just the leaf, so `granite_switch.tutorials.rag_displays` correctly suggests `granite_switch.tutorials.rag_display` rather than some unrelated `rag_display` elsewhere. As with links, suppress suggestions weaker than ~50% similar.
-
-Also note when a `BROKEN` link or import has multiple plausible matches (e.g., `02_govt_rag_pipeline.ipynb` is gone and the repo now has `03_01_*`, `03_02_*`, `03_03_*`) -- list them all and ask the user which one to use.
-
-### 4. Propose fixes
-
-After the report, ask the user:
-
-- **High-confidence renames** (single obvious match, just a number prefix change): show the exact replacements you'd make as a list and ask for approval as a batch.
-- **Ambiguous cases**: ask one question per ambiguous link or import, presenting candidates as options.
-- **No-match cases**: ask whether to drop the link/import, leave it, or point it somewhere else.
-- **Stale labels**: include label-only fixes in the same approval batch as the URL fixes. When a single broken link has both a broken URL *and* a stale label (common after a rename), propose fixing both at once - the user shouldn't have to approve the URL, run the skill again, and approve the label separately. Default to "yes, fix labels too" for filename-shaped labels; only ask separately when the label is something other than a bare filename (e.g. a sentence that happens to mention the old filename).
-- **Broken imports**: treat the same as broken links. High-confidence module-rename fixes (`weight_remap` -> `weight_remapper`) go in the batch; ambiguous ones become individual questions. When the same broken import appears in many files, propose a single repo-wide find-and-replace for that exact `from ... import` / `import ...` line and apply it everywhere at once -- a typo'd module name is almost never correct in one file and wrong in another.
-
-Do not edit anything until the user confirms.
-
-### 5. Apply fixes
-
-For `.md` and `.py` files, use `Edit` with a precise `old_string` that includes enough surrounding context to be unique.
-
-For `.ipynb` files, use `NotebookEdit` - `Edit` will refuse on notebooks. You'll need the cell's `id`, which you already saw in step 2; pass it as `cell_id`. Replace the full cell `new_source` with the corrected text.
-
-When a cell has multiple fixes pending (broken URL + stale label, or several broken imports, or a mix), apply them in the **same** `NotebookEdit`/`Edit` call. Two passes through the same cell wastes tool calls and risks the second edit racing a linter that reformats the file between Reads.
-
-After each edit, do not re-read the file - `NotebookEdit`/`Edit` errors loudly if the change failed.
-
-### 6. Verify
-
-Re-run the scanner from step 2. The report should show **0 broken links**, **0 stale labels**, *and* **0 broken imports**. If it still shows some, investigate - don't declare done. As a belt-and-braces check, also `git ls-files | xargs grep -l <old_token>` for any string fragment that was renamed (e.g. `govt_rag`); the scanner only catches strings inside `[...](...)` syntax and parsed `import` statements, and the same token may appear elsewhere (Colab badge URLs, prose, code comments, dynamic `importlib.import_module(...)` calls) where it's just as broken.
-
-## Reference scanner
-
-This Python snippet implements steps 1-3 and is safe to copy verbatim into a `Bash` call:
-
-```python
-import json, re, subprocess
-from pathlib import Path
-
-repo = Path('.').resolve()
-tracked = subprocess.check_output(
-    ['git', 'ls-files'], cwd=repo, text=True
-).splitlines()
-existing = {(repo / p).resolve() for p in tracked}
-# Directories that contain tracked files - so dir-style links like
-# `[scripts/](scripts/)` or `[docs/](../docs/)` aren't false-positived.
-existing_dirs = set()
-for p in tracked:
-    for parent in (repo / p).resolve().parents:
-        existing_dirs.add(parent)
-
-link_re = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
-ext_ok = {'.ipynb', '.md', '.py', '.png', '.jpg', '.jpeg', '.svg', '.json', '.sh'}
-
-def scan_text(text, source_path, source_label):
-    """Return (broken, stale_labels) tuples for one file.
-
-    broken       : (source_label, target, basename)
-    stale_labels : (source_label, label_text, target, expected_label_token)
-    """
-    broken = []
-    stale = []
-    # Token in the label that looks like a filename (ends in a tracked ext).
-    label_filename_re = re.compile(
-        r'[\w./-]+\.(?:ipynb|md|py|png|jpg|jpeg|svg|json|sh)\b',
-        re.IGNORECASE,
-    )
-    for m in link_re.finditer(text):
-        label_text = m.group(1)
-        target = m.group(2).strip()
-        if target.startswith(('http://', 'https://', 'mailto:', '#', 'attachment:')):
-            continue
-        bare = target.split('#')[0].split('?')[0]
-        if not bare:
-            continue
-        ext = Path(bare).suffix.lower()
-        if ext and ext not in ext_ok:
-            continue
-        resolved = (source_path.parent / bare).resolve()
-        target_basename = Path(bare).name
-        target_ok = resolved in existing or (not ext and resolved in existing_dirs)
-        if not target_ok:
-            broken.append((source_label, target, target_basename))
-            # Don't double-report a broken link as also having a stale label;
-            # fixing the URL is the load-bearing part. The label gets fixed
-            # in the same edit per "What counts as a stale label" guidance.
-            continue
-        # Target resolves. Now check whether the label names a *different* file.
-        for tok_match in label_filename_re.finditer(label_text):
-            label_token = tok_match.group(0).split('/')[-1]
-            if label_token != target_basename:
-                stale.append((source_label, label_text, target, target_basename))
-                break
-    return broken, stale
-
-broken = []
-stale = []
-for rel in tracked:
-    p = repo / rel
-    if not p.exists():
-        continue
-    if p.suffix == '.md':
-        b, s = scan_text(p.read_text(), p, rel)
-        broken += b; stale += s
-    elif p.suffix == '.ipynb':
-        try:
-            data = json.loads(p.read_text())
-        except Exception:
-            continue
-        for ci, cell in enumerate(data.get('cells', [])):
-            if cell.get('cell_type') != 'markdown':
-                continue
-            src = ''.join(cell.get('source', []))
-            b, s = scan_text(src, p, f'{rel} (cell {ci})')
-            broken += b; stale += s
-
-print('BROKEN LINKS')
-for label, target, _ in broken:
-    print(f'{label}\n  {target}')
-print(f'\n{len(broken)} broken link(s)')
-
-print('\nSTALE LABELS')
-for label, ltext, target, expected in stale:
-    print(f'{label}\n  [{ltext}]({target})  -> label should name {expected}')
-print(f'\n{len(stale)} stale label(s)')
-```
-
-For closest-match suggestions, extend the script to compute `difflib.get_close_matches(basename, [Path(f).name for f in tracked if Path(f).suffix == ext], n=3, cutoff=0.5)`.
-
-## Reference import scanner
-
-Drop-in companion to the link scanner above. Run from the repo root after the `tracked` / `existing` sets are built:
-
-```python
-import ast, json, tomllib
-from pathlib import Path
-
-def discover_package_roots(repo: Path):
-    """Return (roots, first_party_names) using pyproject.toml when possible."""
-    roots: list[Path] = []
-    pyproject = repo / 'pyproject.toml'
-    if pyproject.exists():
-        cfg = tomllib.loads(pyproject.read_text())
-        find = cfg.get('tool', {}).get('setuptools', {}).get('packages', {}).get('find', {})
-        for w in find.get('where', []) or []:
-            roots.append((repo / w).resolve())
-        # hatch / poetry / flit fallbacks omitted for brevity -- add as needed.
-    if not roots:
-        for cand in ('.', 'src'):
-            p = (repo / cand).resolve()
-            if p.exists():
-                roots.append(p)
-    first_party = set()
-    for r in roots:
-        if not r.exists():
-            continue
-        for child in r.iterdir():
-            if child.is_dir() and (child / '__init__.py').exists():
-                first_party.add(child.name)
-    return roots, first_party
-
-def module_resolves(dotted: str, roots: list[Path]) -> bool:
-    parts = dotted.split('.')
-    for root in roots:
-        cur = root
-        ok = True
-        for i, part in enumerate(parts):
-            is_last = i == len(parts) - 1
-            pkg_dir = cur / part
-            if pkg_dir.is_dir() and (pkg_dir / '__init__.py').exists():
-                cur = pkg_dir
-                continue
-            if is_last and (cur / f'{part}.py').exists():
-                return True
-            ok = False
-            break
-        if ok:
-            return True
-    return False
-
-def resolve_relative(file_path: Path, level: int, module: str | None, roots: list[Path]) -> str | None:
-    """Turn `from ..foo.bar import x` into an absolute dotted path, or None if outside any package root."""
-    for root in roots:
-        try:
-            rel = file_path.resolve().relative_to(root)
-        except ValueError:
-            continue
-        # Drop the file name; walk up `level` package boundaries.
-        pkg_parts = list(rel.parts[:-1])
-        if level - 1 > len(pkg_parts):
-            return None
-        base = pkg_parts[: len(pkg_parts) - (level - 1)] if level > 1 else pkg_parts
-        tail = module.split('.') if module else []
-        return '.'.join(base + tail)
-    return None
-
-def scan_imports(source: str, source_label: str, file_path: Path,
-                 roots: list[Path], first_party: set[str]) -> list[tuple[str, str, int]]:
-    try:
-        tree = ast.parse(source)
-    except SyntaxError:
-        return []
-    out = []
-    for node in ast.walk(tree):
-        if isinstance(node, ast.Import):
-            for alias in node.names:
-                top = alias.name.split('.')[0]
-                if top in first_party and not module_resolves(alias.name, roots):
-                    out.append((source_label, f'import {alias.name}', node.lineno))
-        elif isinstance(node, ast.ImportFrom):
-            if node.level:  # relative
-                dotted = resolve_relative(file_path, node.level, node.module, roots)
-                if dotted is None:
-                    continue
-            else:
-                dotted = node.module or ''
-            if not dotted:
-                continue
-            top = dotted.split('.')[0]
-            if top not in first_party:
-                continue
-            # Prefer the more specific submodule form when a single `name` is imported
-            # and `dotted.name` itself resolves -- it makes the suggestion sharper.
-            if len(node.names) == 1 and module_resolves(f'{dotted}.{node.names[0].name}', roots):
-                continue
-            if not module_resolves(dotted, roots):
-                names = ', '.join(a.name for a in node.names)
-                out.append((source_label, f'from {dotted} import {names}', node.lineno))
-    return out
-
-# Usage:
-# roots, first_party = discover_package_roots(repo)
-# broken_imports: list[tuple[str, str, int]] = []
-# for rel in tracked:
-#     p = repo / rel
-#     if p.suffix == '.py':
-#         broken_imports += scan_imports(p.read_text(), rel, p, roots, first_party)
-#     elif p.suffix == '.ipynb':
-#         data = json.loads(p.read_text())
-#         for ci, cell in enumerate(data.get('cells', [])):
-#             if cell.get('cell_type') != 'code':
-#                 continue
-#             src_lines = cell.get('source', [])
-#             if not src_lines:
-#                 continue
-#             first_nonblank = next((l for l in src_lines if l.strip()), '')
-#             if first_nonblank.lstrip().startswith(('%', '!')):
-#                 continue
-#             src = ''.join(src_lines)
-#             broken_imports += scan_imports(src, f'{rel} (cell {ci})', p, roots, first_party)
-```
-
-For closest-match import suggestions, build the set of all valid first-party dotted paths once (every `.py` file and package dir under the roots, expressed as dotted form), then `difflib.get_close_matches(broken_dotted, valid_dotted, n=3, cutoff=0.5)`.
-
-## Hard rules
-
-- **Never edit before the user confirms.** Even "obvious" renumbering fixes go through approval.
-- **Never delete a link target or import.** If a broken link or import has no plausible replacement, ask the user -- don't silently strip the link or comment out the import.
-- **Don't string-search code cells for links.** A string `"./old_name.ipynb"` inside a Python cell might be load-bearing test data, not a link. Code cells are scanned with `ast` for **imports only**, never with the link regex.
-- **Don't follow symlinks blindly.** If `git ls-files` lists a symlink, treat the symlink path as the file location for resolution purposes.
-- **Don't audit vendored trees.** `git ls-files` already excludes them; do not fall back to `find` or `glob` that would re-include `.venv/`, `node_modules/`, `dist/`, etc.
-- **Fix labels alongside URLs in the same edit.** When a link's target is renamed, the display label often becomes stale at the same time. Don't make the user run the skill twice - propose URL + label fixes together, apply them in one `Edit`/`NotebookEdit` call per cell, and only break the work into separate approvals when a label is genuinely ambiguous (e.g. prose, not a bare filename).
-- **Only validate first-party imports.** `numpy`, `torch`, stdlib, etc. are out of scope -- this skill is checking whether the repo's own module paths still resolve after a rename, not running a real linter.
-- **Don't verify imported names.** `from pkg.mod import some_name` is checked only at the module level (`pkg.mod`). Confirming `some_name` actually exists requires importing the module, which is out of scope.
-
-## When NOT to use this skill
-
-- The user is asking about external URL liveness (HTTP 200 / 404) -- that's a different tool (link-checker against the network).
-- The user wants to audit cross-references inside a single notebook (e.g., section anchors) -- that's narrower and the regex above won't cover it.
-- The user wants a full static type/name check (verify imported attributes exist, catch unused imports, flag third-party version mismatches) -- use a real linter (`ruff`, `pyright`, `mypy`). This skill only checks that first-party module *paths* resolve on disk.

From 10b1e50910310532fab2eea858a742fc447fc0ca Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 15:09:13 +0300
Subject: [PATCH 10/12] Remove skill references until skills are added back

---
 docs/GIT_WORKFLOW.md | 6 ------
 tutorials/CLAUDE.md  | 8 --------
 2 files changed, 14 deletions(-)

diff --git a/docs/GIT_WORKFLOW.md b/docs/GIT_WORKFLOW.md
index 06e3bbb..d1a4ff0 100644
--- a/docs/GIT_WORKFLOW.md
+++ b/docs/GIT_WORKFLOW.md
@@ -66,12 +66,6 @@ Before committing:
 2. **Check comments match code** — stale comments are worse than no comments
 3. **Update docs** if behavior changed
 
-## Before opening a PR that touches notebooks or docs
-
-Run `/validate-links` to catch broken local links, stale labels, and broken first-party imports
-introduced by any renames or restructuring. It scans all `.ipynb`, `.md`, and `.py` files and
-proposes fixes before anything goes to reviewers.
-
 ## Pull Requests
 
 - Target the `main` branch
diff --git a/tutorials/CLAUDE.md b/tutorials/CLAUDE.md
index 57ebd2c..7306b5c 100644
--- a/tutorials/CLAUDE.md
+++ b/tutorials/CLAUDE.md
@@ -30,14 +30,6 @@ Use cell id `hf-login-call` for consistency.
 Add `# Estimated duration: ~2 min on A100, ~7 min on T4` to cells that download models or
 launch vLLM. Put these in **notebook cells only** — not in code files under `src/`.
 
-## Skills
-
-- `/validate-links` — run before any PR that renames, moves, or restructures notebooks or docs.
-  Scans all `.ipynb`/`.md`/`.py` files for broken local links, stale labels, and broken
-  first-party imports. Proposes fixes; never edits without confirmation.
-- `/tutorial-notebook` — run when creating or polishing a notebook. Applies a 15-item checklist
-  (structure, bugs, imports, comments, diagrams, demo coverage, next-steps wiring).
-
 ## Utility Modules
 
 These live in `src/granite_switch/tutorials/` and are imported by notebooks:

From 00bbea50a72020fede0cdbee251f20c34afa09f7 Mon Sep 17 00:00:00 2001
From: AlonMalach <alonmalach@gmail.com>
Date: Wed, 27 May 2026 15:52:42 +0300
Subject: [PATCH 11/12] Drop scratch/ convention from shared CLAUDE.md

scratch/ is gitignored, which makes it a per-developer convention rather than
a project rule. Mandating it in the shared CLAUDE.md presumes every developer
wants that workflow. The load-bearing rule for the project is "don't put
throwaway scripts in tests/" (because pytest tests/ would pick them up); the
scratch/ recommendation is just one possible workaround. Anyone who wants
that convention can put it in their own CLAUDE.local.md.

Removes two mentions: the Project Structure bullet and the parenthetical in
the Test Files section.
---
 CLAUDE.md | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index d3f629a..1326482 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -12,7 +12,6 @@ Key layout rules — full tree via `find src/` or `find tests/`:
 
 - `src/granite_switch/` — unified package; `composer/`, `hf/`, `vllm/` match the optional extras
 - `tests/` — official test suite only; subdirs: `unit/`, `hf/`, `vllm/`, `composer/`, `integration/`, `regression/`, `shared/`
-- `scratch/` — gitignored; use this for throwaway diagnostic scripts (not `tests/`)
 - `tutorials/` — notebooks and guides; see `tutorials/CLAUDE.md` for conventions
 
 ## Installation (local/dev)
@@ -36,9 +35,9 @@ pip install -e ".[hf,compose]"  # HF + composer only (no vLLM)
 ### Test Files (Python)
 
 **`tests/` is for official regression tests ONLY.** Do NOT place throwaway diagnostic,
-debugging, or exploratory scripts in `tests/`. Use `scratch/` instead (it is gitignored).
-Running `pytest tests/` should only execute curated, maintained tests — never one-off
-investigations. Subdirectories are listed in Project Structure above.
+debugging, or exploratory scripts in `tests/` — `pytest tests/` should only execute
+curated, maintained tests, never one-off investigations. Subdirectories are listed in
+Project Structure above.
 
 ### Documentation Naming
 

From dc04aeaf271355790b847580d3861301fc3b8de8 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Wed, 27 May 2026 17:08:29 +0300
Subject: [PATCH 12/12] Address Yair's review: revert out-of-scope doc fixes,
 remove granite-4.0-micro

- Revert README Python/PyTorch versions and [dev] description (moving to separate PR)
- Revert PREREQUISITES.md RAG adapter count (moving to separate PR)
- Remove granite-4.0-micro from SUPPORTED_MODELS.md per review comment
---
 README.md                  | 4 ++--
 docs/SUPPORTED_MODELS.md   | 1 -
 tutorials/PREREQUISITES.md | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 7429de0..6c0e673 100644
--- a/README.md
+++ b/README.md
@@ -48,10 +48,10 @@ Other install options depending on your use case:
 pip install "granite-switch[compose]"   # Compose modular models
 pip install "granite-switch[hf]"        # HuggingFace inference
 pip install "granite-switch[vllm20]"    # vLLM 0.20+ (requires CUDA 13+)
-pip install "granite-switch[dev]"       # HF + vLLM 0.19.x + compose + tests
+pip install "granite-switch[dev]"       # Everything
 ```
 
-Requires Python 3.11+ and PyTorch 2.10+. Two vLLM backends are available: `.[vllm]` for broad CUDA 12.x compatibility (0.19.x), and `.[vllm20]` for the latest performance improvements (CUDA 13+).
+Requires Python 3.9+ and PyTorch 2.0+. Two vLLM backends are available: `.[vllm]` for broad CUDA 12.x compatibility (0.19.x), and `.[vllm20]` for the latest performance improvements (CUDA 13+).
 
 ### Compose a Model
 
diff --git a/docs/SUPPORTED_MODELS.md b/docs/SUPPORTED_MODELS.md
index 9a28e2b..41f6c66 100644
--- a/docs/SUPPORTED_MODELS.md
+++ b/docs/SUPPORTED_MODELS.md
@@ -22,7 +22,6 @@ as a base model. The table below lists representative examples.
 |---|---|---|
 | `ibm-granite/granite-4.1-3b` | 3B | Dense, instruct |
 | `ibm-granite/granite-4.1-8b` | 8B | Dense, instruct |
-| `ibm-granite/granite-4.0-micro` | 3B | Dense, instruct |
 | `ibm-granite/granite-4.1-30b` | 30B | Dense, instruct |
 
 Base variants (`granite-4.1-3b-base`, `granite-4.1-8b-base`) are also supported.
diff --git a/tutorials/PREREQUISITES.md b/tutorials/PREREQUISITES.md
index a1bc7a7..aa75d0e 100644
--- a/tutorials/PREREQUISITES.md
+++ b/tutorials/PREREQUISITES.md
@@ -84,7 +84,7 @@ Official IBM Granite adapter libraries (r1.0):
 
 | Library | Adapters | Purpose |
 |---------|----------|---------|
-| [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) | 6 | RAG adapters (rewrite, answerability, citations, etc.) |
+| [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) | 5 | RAG adapters (rewrite, answerability, citations, etc.) |
 | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) | 3 | Core adapters (certainty, requirements, attributions) |
 | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) | 4 | Guardian adapters (harm check, policy, factuality, etc.) |