diff --git a/docs/README.md b/docs/README.md
index 23002b6..db84177 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -32,6 +32,8 @@ dotnet test BitNet-b1.58-Sharp.slnx
 - [Bucketing implementation plan v1.0](bucketing-implementation-plan-v1.0.md)
 - [DataGen guide](datagen-guide.md)
 - [Implementation plan](implementation-plan-v3.md)
+- [Full implementation plan: real training + benchmarks + purity v1.0](full-implementation-plan-real-training-benchmarks-purity-v1.0.md)
+- [Real training implementation plan v1.0](real-training-implementation-plan-v1.0.md)
 - [Releases and packaging](releases-and-packaging.md)
 - [Usage](usage.md)
 - [Training and visualization](training-and-visualization.md)
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index 97aac87..8810920 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -6,6 +6,8 @@
   - [Bucketing implementation plan v1.0](bucketing-implementation-plan-v1.0.md)
   - [DataGen guide](datagen-guide.md)
   - [Implementation plan v3 (active)](implementation-plan-v3.md)
+  - [Full implementation plan: real training + benchmarks + purity v1.0](full-implementation-plan-real-training-benchmarks-purity-v1.0.md)
+  - [Real training implementation plan v1.0](real-training-implementation-plan-v1.0.md)
   - [Implementation plan v2 (archived)](implementation-plan-v2.md)
   - [Implementation plan v1 (archived)](implementation-plan-v1.md)
   - [Benchmarking and model comparison](benchmarking.md)
diff --git a/docs/full-implementation-plan-real-training-benchmarks-purity-v1.0.md b/docs/full-implementation-plan-real-training-benchmarks-purity-v1.0.md
new file mode 100644
index 0000000..6e3d90d
--- /dev/null
+++ b/docs/full-implementation-plan-real-training-benchmarks-purity-v1.0.md
@@ -0,0 +1,190 @@
+# Full Implementation Plan: Real Training + Enhanced Benchmarks + Repository Purity v1.0
+**Address All Three Issues in One Cohesive Plan**  
+**Core Repository – Strictly Domain-Agnostic**
+
+**Version:** 1.0  
+**Date:** March 20, 2026  
+**Status:** Ready-to-execute
+
+> **Dependency note:** WikiText-2 validation download and tokenization are being added in PR #27. This plan assumes that dependency merges first and then consumes those repository-local artifacts.
+
+---
+
+## Table of Contents
+
+1. [Executive Summary & Success Criteria](#1-executive-summary--success-criteria)
+2. [Prerequisites](#2-prerequisites)
+3. [Overall Architecture](#3-overall-architecture)
+4. [Phase 1: Enforce Repository Purity & Architecture Guidelines (1–2 days)](#4-phase-1-enforce-repository-purity--architecture-guidelines-12-days)
+5. [Phase 2: Implement Real Training Loop (7–10 days)](#5-phase-2-implement-real-training-loop-710-days)
+6. [Phase 3: Build Enhanced Benchmark Suite with TinyLlama-1.1B (6–8 days)](#6-phase-3-build-enhanced-benchmark-suite-with-tinyllama-11b-68-days)
+7. [Phase 4: Create Improved Report that Surfaces Strengths & Deficiencies (3–4 days)](#7-phase-4-create-improved-report-that-surfaces-strengths--deficiencies-34-days)
+8. [Phase 5: CI Integration & Release (2 days)](#8-phase-5-ci-integration--release-2-days)
+9. [Full UML Catalog](#9-full-uml-catalog)
+10. [Risk Register & Mitigation](#10-risk-register--mitigation)
+11. [Timeline & Effort Estimates](#11-timeline--effort-estimates)
+
+---
+
+## 1. Executive Summary & Success Criteria
+
+This plan replaces the stub training, expands benchmarks to include TinyLlama-1.1B, perplexity, and real-world task comparisons, and redesigns the report to clearly show where BitNet wins on speed and memory and where it still needs quality improvements.
+
+### Success Criteria
+
+- Training runs multiple epochs with real data and visibly reduces loss
+- Benchmarks measure perplexity, reasoning, code, and efficiency on TinyLlama-1.1B
+- Report shows zero-based quality delta and clearly flags deficiencies
+- Repository remains 100% domain-agnostic with no vertical code
+
+---
+
+## 2. Prerequisites
+
+- Existing `BitNetModel`, `BitLinear`, tokenizer, and SpecFlow tests
+- BenchmarkDotNet already added to the test project
+- WikiText-2 validation set downloaded and pre-tokenized by PR #27
+
+---
+
+## 3. Overall Architecture
+
+```mermaid
+flowchart TD
+    A[WikiText-2 Loader] --> B[Real Training Loop (Epochs + STE)]
+    B --> C[BenchmarkDotNet Suite (TinyLlama-1.1B)]
+    C --> D[Perplexity + Zero-Shot + Code + Efficiency]
+    D --> E[Improved Report (Strengths vs Deficiencies)]
+```
+
+---
+
+## 4. Phase 1: Enforce Repository Purity & Architecture Guidelines (1–2 days)
+
+1. Commit `docs/repo-alignment-guidelines.md` from the prior discussion.
+2. Update the root `README.md` with a repository-purity banner and no vertical mentions.
+3. Add a pull request template that requires a purity checklist.
+4. Move any stray domain code, if present, to a new companion repository stub.
+
+---
+
+## 5. Phase 2: Implement Real Training Loop (7–10 days)
+
+Replace the stub in `BitNetModel.cs` with a training API shaped like this:
+
+```csharp
+public TrainingReport Train(int epochs, IDataLoader loader)
+{
+    var optimizer = new AdamWOptimizer(3e-4f, 0.1f);
+    var report = new TrainingReport();
+
+    for (int e = 0; e < epochs; e++)
+    {
+        double totalLoss = 0;
+        int count = 0;
+
+        foreach (var batch in loader.GetBatches())
+        {
+            var logits = Forward(batch.Input);
+            var loss = CrossEntropyLoss(logits, batch.Target);
+            totalLoss += loss.Value * batch.Size;
+            count += batch.Size;
+
+            loss.BackwardWithSTE();
+            optimizer.Step(Parameters);
+            optimizer.ZeroGrad();
+        }
+
+        ReQuantizeAllLayers();
+        report.AddEpoch(e, totalLoss / count);
+    }
+
+    return report;
+}
+```
+
+Implement `IDataLoader`, `AdamWOptimizer`, and `CrossEntropyLoss` with STE support.
+
+---
+
+## 6. Phase 3: Build Enhanced Benchmark Suite with TinyLlama-1.1B (6–8 days)
+
+Create `tests/BitNetSharp.Tests/Benchmarks/TinyLlamaBenchmark.cs`:
+
+```csharp
+[Config(typeof(BitNetBenchmarkConfig))]
+public class TinyLlamaBenchmark
+{
+    [Benchmark] public void TrainingEpoch() => model.Train(1, wikiLoader);
+    [Benchmark] public double PerplexityBitNet() => model.CalculatePerplexity(wikiLoader);
+    [Benchmark] public double ARCEasyAccuracy() => model.EvaluateZeroShot(ARC_Easy);
+    [Benchmark] public double HumanEvalPass1() => model.EvaluateHumanEval();
+}
+```
+
+Add a WikiText-2 loader and zero-shot evaluators.
+
+---
+
+## 7. Phase 4: Create Improved Report that Surfaces Strengths & Deficiencies (3–4 days)
+
+Update `ReportGenerator.cs` to emit a clear comparison table:
+
+```markdown
+Category              | Metric                  | BitNet   | Traditional | Delta          | Interpretation
+----------------------|-------------------------|----------|-------------|----------------|-------------------------------
+Language Modeling     | WikiText-2 PPL          | 18.4     | 17.1        | -7.6%          | Minor quality gap
+Reasoning             | ARC-Easy Accuracy       | 61%      | 68%         | -10.3%         | Needs improvement
+Code Generation       | HumanEval Pass@1        | 19%      | 25%         | -24%           | Significant deficiency
+Efficiency            | CPU Tokens/sec          | 48       | 13          | +269%          | Major win
+Efficiency            | Memory (MB)             | 1,150    | 4,600       | 4× smaller     | Strong advantage
+```
+
+Delta is zero-based: `0%` means parity, positive means better, and negative means worse.
+
+---
+
+## 8. Phase 5: CI Integration & Release (2 days)
+
+- Add a nightly benchmark job in GitHub Actions
+- Publish the report to `docs/benchmarks/latest.html`
+- Tag a release when perplexity delta and speed targets are met
+
+---
+
+## 9. Full UML Catalog
+
+### Full Pipeline
+
+```mermaid
+flowchart TD
+    A[WikiText-2] --> B[Real Training]
+    B --> C[Enhanced Benchmarks]
+    C --> D[Improved Report]
+    D --> E[Actionable Insights]
+```
+
+---
+
+## 10. Risk Register & Mitigation
+
+| Risk | Likelihood | Mitigation |
+|------|------------|------------|
+| Training still stub-like | High | Enforce a minimum of 3 epochs plus a real data loader |
+| Report misleading | Medium | Use zero-based delta plus explicit better/worse labels |
+| Scope creep | High | Require a purity checklist in every PR |
+
+---
+
+## 11. Timeline & Effort Estimates
+
+| Phase | Estimate |
+|------|----------|
+| Phase 1: Enforce Repository Purity & Architecture Guidelines | 1–2 days |
+| Phase 2: Implement Real Training Loop | 7–10 days |
+| Phase 3: Build Enhanced Benchmark Suite with TinyLlama-1.1B | 6–8 days |
+| Phase 4: Create Improved Report that Surfaces Strengths & Deficiencies | 3–4 days |
+| Phase 5: CI Integration & Release | 2 days |
+| **Total** | **19–26 days** |
+
+This plan keeps all work inside the core repository while remaining strictly domain-agnostic. It is intended to address stub training, benchmark quality, and report clarity as one coordinated roadmap.
diff --git a/docs/real-training-implementation-plan-v1.0.md b/docs/real-training-implementation-plan-v1.0.md
new file mode 100644
index 0000000..a2c3273
--- /dev/null
+++ b/docs/real-training-implementation-plan-v1.0.md
@@ -0,0 +1,219 @@
+# Implementation Plan for Real Training in BitNet-b1.58-Sharp v1.0
+**Replace Stub Training with Full Epochs, STE Backprop, Optimizer & Perplexity Validation**  
+**Core Repository – Domain-Agnostic**
+
+**Version:** 1.0  
+**Date:** March 20, 2026  
+**Status:** Ready-to-execute blueprint
+
+> **Dependency note:** WikiText-2 validation download and tokenization are being added in PR #27. This plan assumes that dependency merges first and then reuses those repository-local artifacts.
+
+---
+
+## Table of Contents
+
+1. [Executive Summary & Success Criteria](#1-executive-summary--success-criteria)
+2. [Prerequisites & Current State](#2-prerequisites--current-state)
+3. [Overall Training Architecture](#3-overall-training-architecture)
+4. [Phase 1: WikiText-2 Data Loader & Tokenization (2–3 days)](#4-phase-1-wikitext-2-data-loader--tokenization-23-days)
+5. [Phase 2: Real Train Method with Epochs, Batches & STE (5–7 days)](#5-phase-2-real-train-method-with-epochs-batches--ste-57-days)
+6. [Phase 3: AdamW Optimizer & Gradient Updates (3–4 days)](#6-phase-3-adamw-optimizer--gradient-updates-34-days)
+7. [Phase 4: Perplexity Evaluation on WikiText-2 (2–3 days)](#7-phase-4-perplexity-evaluation-on-wikitext-2-23-days)
+8. [Phase 5: BenchmarkDotNet Integration & Reporting (3–4 days)](#8-phase-5-benchmarkdotnet-integration--reporting-34-days)
+9. [Phase 6: Final Validation & CI Integration (2 days)](#9-phase-6-final-validation--ci-integration-2-days)
+10. [Full UML Catalog](#10-full-uml-catalog)
+11. [Risk Register & Mitigation](#11-risk-register--mitigation)
+12. [Timeline & Effort Estimates](#12-timeline--effort-estimates)
+
+---
+
+## 1. Executive Summary & Success Criteria
+
+Goal: Replace the current stub training with a **real, measurable training loop** that performs multiple epochs, computes loss, applies STE backprop, updates weights via AdamW, and reports perplexity on WikiText-2.
+
+### Success Criteria
+
+- Training runs multiple epochs and visibly reduces loss
+- Perplexity on WikiText-2 validation is computed and reported (BitNet vs FP16 baseline)
+- BenchmarkDotNet measures training time, tokens/sec, memory, and perplexity delta
+- Report includes side-by-side TinyLlama-1.1B comparison
+- Training no longer finishes in seconds — realistic duration on CPU/GPU
+
+---
+
+## 2. Prerequisites & Current State
+
+- Existing `BitNetModel` and `BitLinear` with STE forward pass already implemented
+- WikiText-2 raw validation set downloaded and tokenized by PR #27 (one-time dependency)
+- BenchmarkDotNet already added to the test project (from prior benchmark patches)
+
+---
+
+## 3. Overall Training Architecture
+
+```mermaid
+flowchart TD
+    A[WikiText-2 Validation Tokens] --> B[DataLoader (Batching)]
+    B --> C[BitNetModel.Train(epochs)]
+    C --> D[For each epoch]
+    D --> E[Forward Pass (quantized)]
+    E --> F[Cross-Entropy Loss]
+    F --> G[STE Backward]
+    G --> H[AdamW Optimizer Step]
+    H --> I[Periodic Re-quantization]
+    I --> J[Perplexity Calculation]
+    J --> K[Benchmark Report]
+```
+
+---
+
+## 4. Phase 1: WikiText-2 Data Loader & Tokenization (2–3 days)
+
+1. Consume the repository-local WikiText-2 artifacts added by PR #27.
+2. Add a tokenizer helper to convert raw text to token IDs by reusing the existing tokenizer where needed.
+3. Create a `WikiTextDataLoader` class that yields batches of shape `(batchSize, seqLen)`.
+4. Cache or reuse the tokenized validation set in the test project for fast loading.
+
+---
+
+## 5. Phase 2: Real Train Method with Epochs, Batches & STE (5–7 days)
+
+Update `BitNetModel` with a training API shaped like this:
+
+```csharp
+public TrainingReport Train(int epochs, IDataLoader dataLoader)
+{
+    var optimizer = new AdamWOptimizer(lr: 3e-4f, weightDecay: 0.1f);
+    var report = new TrainingReport();
+
+    for (int epoch = 0; epoch < epochs; epoch++)
+    {
+        double totalLoss = 0;
+        int tokenCount = 0;
+
+        foreach (var batch in dataLoader.GetBatches())
+        {
+            var logits = Forward(batch.Input);           // quantized forward
+            var loss = CrossEntropyLoss(logits, batch.Target);
+            totalLoss += loss.Value * batch.Size;
+            tokenCount += batch.Size;
+
+            loss.BackwardWithSTE();                      // straight-through estimator
+            optimizer.Step(Parameters);
+            optimizer.ZeroGrad();
+        }
+
+        report.AddEpoch(epoch, totalLoss / tokenCount);
+        ReQuantizeAllLayers();                           // periodic re-quantization
+    }
+
+    return report;
+}
+```
+
+---
+
+## 6. Phase 3: AdamW Optimizer & Gradient Updates (3–4 days)
+
+Implement a simple `AdamWOptimizer` class, or reuse an existing one if present, with:
+
+- Momentum
+- Variance
+- Weight decay
+- Support for ternary weight scaling (`γ`)
+- In-place updates compatible with `BitLinear`
+
+---
+
+## 7. Phase 4: Perplexity Evaluation on WikiText-2 (2–3 days)
+
+Add a validation method to `BitNetModel`:
+
+```csharp
+public double CalculatePerplexity(IDataLoader validationLoader)
+{
+    double totalNLL = 0;
+    int tokenCount = 0;
+
+    foreach (var batch in validationLoader.GetBatches())
+    {
+        var logits = Forward(batch.Input);
+        var loss = CrossEntropyLoss(logits, batch.Target);
+        totalNLL += loss.Value * batch.Size;
+        tokenCount += batch.Size;
+    }
+
+    return Math.Exp(totalNLL / tokenCount);
+}
+```
+
+---
+
+## 8. Phase 5: BenchmarkDotNet Integration & Reporting (3–4 days)
+
+Update `TinyLlamaBenchmark.cs`, or create it if it is missing, with:
+
+```csharp
+[Benchmark]
+public double PerplexityBitNet() => _bitnetModel.CalculatePerplexity(wikiLoader);
+
+[Benchmark]
+public void TrainingEpoch() => _bitnetModel.Train(1, trainingLoader);
+```
+
+Enhance the report generator to include:
+
+- Training time per epoch
+- Perplexity before and after training
+- BitNet vs FP16 baseline comparison
+
+---
+
+## 9. Phase 6: Final Validation & CI Integration (2 days)
+
+- Add an integration test that runs 3 epochs and verifies loss decreases
+- Update CI to run the full benchmark suite on a nightly schedule
+- Generate HTML and JSON reports with tables and charts
+
+---
+
+## 10. Full UML Catalog
+
+### Training Loop Flow
+
+```mermaid
+flowchart TD
+    A[WikiText-2 Loader] --> B[Epoch Loop]
+    B --> C[Batch Forward (BitLinear)]
+    C --> D[Cross-Entropy Loss]
+    D --> E[STE Backward]
+    E --> F[AdamW Step]
+    F --> G[Re-quantize]
+    G --> H[Perplexity Calc]
+```
+
+---
+
+## 11. Risk Register & Mitigation
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| Training still too fast | High | High | Enforce a minimum of 3 epochs and a real WikiText loader |
+| STE gradient issues | Medium | High | Add a unit test that verifies gradient flow on a small batch |
+| Memory explosion | Low | Medium | Use a small batch size (8–32) plus gradient clipping |
+
+---
+
+## 12. Timeline & Effort Estimates
+
+| Phase | Estimate |
+|------|----------|
+| Phase 1: WikiText-2 Data Loader & Tokenization | 2–3 days |
+| Phase 2: Real Train Method with Epochs, Batches & STE | 5–7 days |
+| Phase 3: AdamW Optimizer & Gradient Updates | 3–4 days |
+| Phase 4: Perplexity Evaluation on WikiText-2 | 2–3 days |
+| Phase 5: BenchmarkDotNet Integration & Reporting | 3–4 days |
+| Phase 6: Final Validation & CI Integration | 2 days |
+| **Total** | **17–23 days** |
+
+This plan is intentionally scoped to the core repository and remains domain-agnostic. It focuses on replacing stubbed training behavior with a measurable, benchmarked, paper-aligned training path.