Skip to content

Update claude md#78

Open
freunda wants to merge 12 commits into
mainfrom
feature/trim-claude-md
Open

Update claude md#78
freunda wants to merge 12 commits into
mainfrom
feature/trim-claude-md

Conversation

@freunda
Copy link
Copy Markdown
Collaborator

@freunda freunda commented May 27, 2026

Summary

  • Trim CLAUDE.md (~393 → ~200 lines): removed sections that duplicate what Claude can infer from code (import paths, full directory tree, weight compatibility, config JSON examples). Kept only non-obvious, actionable rules.
  • Split architecture docs out of CLAUDE.md: created docs/ARCHITECTURE.md for theory (control tokens, backends, SingleSwitch). Fixed factually wrong SingleSwitch description (it's a single attention head with cumsum, not N transformer layers + projection head).
  • Push module-specific gotchas into child CLAUDE.md files: HF attention backend caveats → src/granite_switch/hf/CLAUDE.md, vLLM Punica/TP gotchas → src/granite_switch/vllm/CLAUDE.md, compose infra rule → src/granite_switch/composer/CLAUDE.md.
  • Create tutorials/CLAUDE.md: notebook cell ordering, HF login cell convention, duration comment placement, utility module references.
  • Fix stale claims in docs:
    • README.md: Python 3.11+, PyTorch 2.10+, CUDA version remarks preserved, [dev] description corrected
    • PREREQUISITES.md: Python 3.11+, RAG adapter count 5 → 6
    • SUPPORTED_MODELS.md: remove wrong single-GPU-only claim (TP is supported and tested), add 30B model
    • build_your_own_adapter.md: custom adapters are supported via Mellea's Intrinsic API — updated Step 4 note

Alon Freund and others added 11 commits May 27, 2026 11:18
CLAUDE.md was 393 lines and contained content Claude could infer
from code (import paths, full directory tree, weight compatibility
example, Switch Configuration JSON). Applying the Anthropic best-
practice test — "would removing this cause Claude to make mistakes?"
— cut it to 204 lines.

Key changes:
- Remove Import Paths, full Project Structure tree, Architecture
  section, Weight Compatibility, and Switch Configuration JSON
- Fix the SingleSwitch description: the old text claimed "N
  transformer layers + linear projection head + ~1-2% of parameters",
  all of which are wrong. Actual implementation is a single attention
  head with one-hot dim-0 pattern and attention-based cumsum, with
  negligible parameter cost
- Architecture theory moved to docs/ARCHITECTURE.md so it stays
  accessible but doesn't bloat the per-session context
- Create tutorials/CLAUDE.md: Claude loads it automatically when
  reading any tutorials/ file, keeping notebook conventions
  (cell ordering, HF login cell, duration comments, utility modules)
  scoped to the directory where they apply
Installs two Claude Code skills in .claude/skills/:

- validate-links: scans all .ipynb/.md/.py files for broken local
  links, stale labels (link text names the wrong file), and broken
  first-party imports after renames or restructuring. Proposes fixes;
  never edits without user confirmation.

- tutorial-notebook: 15-item checklist + template for polishing or
  creating tutorial notebooks. Covers structure, correctness bugs,
  imports, comments, diagrams, demo coverage, and next-steps wiring.

References added:
- tutorials/CLAUDE.md: when to invoke each skill
- docs/GIT_WORKFLOW.md: run /validate-links before any PR that
  touches notebooks or docs

.gitignore updated to track .claude/skills/ while keeping local
Claude settings (settings.json, etc.) ignored.
- README.md: Python 3.9+ → 3.11–3.13, PyTorch 2.0+ → 2.10+; [dev] described as "Everything" → accurate description; [vllm20] removed wrong "CUDA 13+" claim
- PREREQUISITES.md: Python 3.10+ → 3.11–3.13; RAG adapter count 5 → 6 (granitelib-rag-r1.0 ships 6 adapters)
- build_your_own_adapter.md: custom adapters ARE supported via Mellea's Intrinsic API; updated Step 4 note to reflect this
Apply the best-practices doc's child-directory pattern: gotchas scoped to a
single backend get loaded on demand from src/granite_switch/{hf,vllm,composer}/CLAUDE.md
instead of paying token cost in every session.

Moved out of root:
- vLLM: Punica -1 index detail, TP row-parallel bias-doubling, deployment commands
- HF: eager-backend causal-masking quirk, fused-projections / bit-exact skip
- composer: e2e-tests-must-use-compose rule, compose CLI

Root keeps universal items (file org, test cadence, config params, control-token
generatability, ALORA/LORA placement, hidden-count offset) plus a pointer block
listing the child files. Drops root from 204 → 157 lines.
Llama is no longer supported. Drop the Granite-vs-Llama comparison gotcha and
the parenthetical "main architectural difference with Llama" framing on
logits_scaling in both CLAUDE.md and docs/ARCHITECTURE.md. Renumber the
remaining root-level gotchas (1-4).

Code references to Llama in src/granite_switch/vllm/core/decoder.py are kept:
they document why the RMSNorm dispatch helper exists (different vLLM model
classes use different calling conventions) and are not support claims.
TP is supported and tested (tests cover TP specifically).
The single-GPU-only note was stale and excluded the 30B model.
Three small cuts that pass the "would removing this cause Claude to make
mistakes?" test:

1. Test Files section — drop the per-subdirectory enumeration; the same
   list already appears in Project Structure. Keep the load-bearing rule
   (regression tests only, use scratch/ for throwaway).

2. Naming Conventions — drop test_*.py (pytest default) and snake_case.py
   (PEP 8). Keep only the non-default UPPER_CASE.md rule, renamed to
   "Documentation Naming".

3. Git Workflow — collapse the bullet list that restated GIT_WORKFLOW.md.
   Keep one-line pointer plus the "never sign as Claude" rule (the only
   item not covered by the linked doc).

Drops root from 149 → 132 lines.
- Keep CUDA 12.x / CUDA 13+ distinction — useful context for users choosing a vLLM backend
- Use '3.11+' instead of '3.11-3.13' — upper bound in pyproject.toml reflects untested versions, not incompatibility
scratch/ is gitignored, which makes it a per-developer convention rather than
a project rule. Mandating it in the shared CLAUDE.md presumes every developer
wants that workflow. The load-bearing rule for the project is "don't put
throwaway scripts in tests/" (because pytest tests/ would pick them up); the
scratch/ recommendation is just one possible workaround. Anyone who wants
that convention can put it in their own CLAUDE.local.md.

Removes two mentions: the Project Structure bullet and the parenthetical in
the Test Files section.
Comment thread docs/SUPPORTED_MODELS.md Outdated
Comment thread README.md Outdated
Comment thread README.md
Comment thread tutorials/PREREQUISITES.md
…4.0-micro

- Revert README Python/PyTorch versions and [dev] description (moving to separate PR)
- Revert PREREQUISITES.md RAG adapter count (moving to separate PR)
- Remove granite-4.0-micro from SUPPORTED_MODELS.md per review comment
@AlonMalach AlonMalach force-pushed the feature/trim-claude-md branch from 5de1809 to dc04aea Compare May 27, 2026 14:39
@freunda freunda marked this pull request as ready for review May 27, 2026 15:54
@freunda freunda requested a review from antonpibm as a code owner May 27, 2026 15:54
@freunda freunda requested a review from yairallouche May 27, 2026 15:54
Comment thread docs/SUPPORTED_MODELS.md
| `ibm-granite/granite-4.1-3b` | 3B | Dense, instruct |
| `ibm-granite/granite-4.1-8b` | 8B | Dense, instruct |
| `ibm-granite/granite-4.0-micro` | 3B | Dense, instruct |
| `ibm-granite/granite-4.1-30b` | 30B | Dense, instruct |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granite-4.0-micro is still supported

Comment thread docs/ARCHITECTURE.md
sequence) or right before the generation prompt
- **LORA adapters**: token placed at sequence beginning

### 4. Optional Trainable Router (SingleSwitch)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adapt this section to better reflect how the switch really works?

Comment thread docs/ARCHITECTURE.md

## Two Backends

Both backends share the same checkpoint format (`save_pretrained` / `from_pretrained`).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_pretrained is HF API. The checkpoint format is called Safetensors format

Comment thread docs/ARCHITECTURE.md

Full `transformers` integration (`PreTrainedModel`, `GenerationMixin`). Used for training and
debugging. Uses fused QKV and gate-up projections, which changes floating-point reduction order
relative to the upstream `GraniteMoeHybridForCausalLM` (see Common Gotchas #9 in `CLAUDE.md`).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumeration is no longer correct

Comment thread docs/ARCHITECTURE.md
| `num_adapters` | Number of embedded LoRA adapters |
| `adapter_token_ids` | Token IDs for each adapter's control token |
| `adapter_names` | Human-readable names for each adapter |
| `hiding_groups` | Named groups of adapters for KV hiding |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not up-to date. Are you referring to the object args or the model config?

Comment thread docs/ARCHITECTURE.md
group-based control dimensions (`K=finfo.min`, `Q=per-adapter policy`). Control tokens are
KV-hidden to prevent cross-request interference.

### 3. Chat Template Integration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the fact that those are activated using the adapter_name arg in apply_chat_template?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants