ChatGPU · ChatGPU · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml
@@ -23,6 +23,18 @@ jobs:
         with: { python-version: '3.11' }
       - name: Validate graph.json
         run: python tools/validate_graph.py
+      - name: Lint extended cards (cross-link integrity, LaTeX, stub bodies)
+        run: python tools/lint_extended_cards.py
+        continue-on-error: true  # surfaced pre-existing broken card links; tracked for cleanup, not blocking until backlog clears
+      - name: Validate structured research layer
+        run: python tools/validate_research.py
+      - name: Rebuild research node overlay (for the 3D atlas visual encoding)
+        run: python tools/build_research_overlay.py
+      - name: Confirm research node overlay is in sync with sources
+        run: |
+          git diff --exit-code docs/data/research/node_overlay.json || (echo "node_overlay.json is out of sync; run python tools/build_research_overlay.py and commit"; exit 1)
+      - name: Run screenshot regression scaffold (no-op when Playwright is absent)
+        run: python tools/screenshot_regression.py
       - name: Check external deep links
         run: python tools/check_links.py --max-failures 5
         continue-on-error: true
diff --git a/README.md b/README.md
@@ -1,13 +1,18 @@
 # Autonomous-Driving Learning Atlas
-> 自动驾驶学习地图 — 一个以**交互知识图谱**为核心、面向博士级研究者的、中英双语的机器学习 / 强化学习 / 自动驾驶**入门-进阶-前沿**学习地图。
+> 自动驾驶研究地图 — 一个面向博士级与产业研究者的、围绕**可证伪主张、论文论证链、场景与数据集、失败模式与三层实验**组织的论文产出系统。中英双语，知识图谱与论文产出工作台并列，视觉只服务于研究结构。
 
 [![Pages](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/pages.yml/badge.svg)](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/pages.yml)
 [![Validate](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/validate.yml/badge.svg)](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/validate.yml)
 [![Labs smoke](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/labs_smoke.yml/badge.svg)](https://github.com/ChatGPU/Autonomous-Driving-Learning-Atlas/actions/workflows/labs_smoke.yml)
 [![License: MIT (code)](https://img.shields.io/badge/code-MIT-blue.svg)](LICENSE)
 [![License: CC BY 4.0 (prose)](https://img.shields.io/badge/prose-CC%20BY%204.0-lightgrey.svg)](LICENSE-CC)
 
-🌐 **Live atlas**：<https://chatgpu.github.io/Autonomous-Driving-Learning-Atlas/>
+🌐 **三维知识星图**：<https://chatgpu.github.io/Autonomous-Driving-Learning-Atlas/>  
+🛠 **论文产出工作台**：<https://chatgpu.github.io/Autonomous-Driving-Learning-Atlas/workbench.html>
+
+> 工作台围绕六类结构化研究节点组织：**可证伪主张** · **论文论证链** · **场景** · **数据集 / 指标**（含能与不能证明的边界）· **失败模式**（含触发条件、诊断指标、已有半解、可投稿切入点）· **三层实验计划**（最小机制 / 公开基准 / 压力测试）。当前覆盖 12 条主张 · 6 条论证链 · 11 个场景 · 6 个数据集 · 6 个指标 · 17 个失败模式 · 6 份实验计划。所有节点都通过 `tools/validate_research.py` 做结构完整性校验，且每轮迭代的修订都附带独立审查代理的交叉审查报告（参见 `docs/data/research/cross_review_*.md`）。
+
+> 覆盖方向包括：端到端规划（UniAD / PlanT / VADv2）、视觉语言动作模型（DriveVLM / Agent-Driver / DiLu / CF-VLA）、强化学习骨干（PPO / DQN / DAgger）、离线强化学习（CQL 风格保守惩罚）、世界模型（Dreamer 风格隐空间想象）、安全约束（拉格朗日 / 显式约束层）、闭环评测协议审计与 Bitter Lesson 的可证伪化叙述。
 
 <p align="center">
   <a href="https://chatgpu.github.io/Autonomous-Driving-Learning-Atlas/?node=paper:2212.10156">
@@ -91,7 +96,9 @@
 Autonomous-Driving-Learning-Atlas/
 ├── README.md / AGENTS.md / LICENSE / LICENSE-CC / CITATION.cff
 ├── docs/                          # GitHub-Pages 根目录（交互站点）
-│   ├── index.html · atlas3d.css
+│   ├── index.html · atlas3d.css   # 三维知识星图（视觉编码绑定研究维度）
+│   ├── workbench.html · workbench.css · js/workbench.js
+│   │                              # 论文产出工作台：主张 / 论证链 / 场景 / 失败模式 / 实验计划 / 选择篮
 │   ├── js/                        # atlas-main · atlas-render · atlas-physics ·
 │   │                              # atlas-cards (含 Mermaid 渲染 + 动态洞察)
 │   ├── vendor/                    # KaTeX + auto-render · Mermaid · DOMPurify · marked · Three.js
@@ -100,6 +107,7 @@ Autonomous-Driving-Learning-Atlas/
 │       ├── graph_extended.json    # 489 节点 / 1440 边 (含 paradigm/insight/validation/move/problem)
 │       ├── layout_positions.json  # 由 tools/precompute_layout.py 预烤的稳定 3D 位置
 │       ├── generated/             # 多维度生成轴（decision / foundation / methodology / perception / wave-E stubs）
+│       ├── research/              # 结构化研究层（claims / chains / scenarios / datasets / metrics / failure_modes / experiment_plans + schema + node_overlay）
 │       └── cards/
 │           ├── *.md               # spine + Tier-S 原始论文卡 (40 张)
 │           └── extended/          # paradigm / insight / validation / move / problem / paper stub (200+ 张)
@@ -116,6 +124,8 @@ Autonomous-Driving-Learning-Atlas/
 │       └── lab_dreamer_cartpole_pixels/ # CartPole 像素 RSSM + latent imagination
 ├── tools/
 │   ├── validate_graph.py · check_links.py · lint_extended_cards.py
+│   ├── validate_research.py            # 结构化研究层的质量门禁
+│   ├── build_research_overlay.py       # 由 research/*.json 生成 node_overlay.json
 │   ├── audit_card_meta_language.py     # 扫描卡片里"元语言泄漏"短语
 │   ├── merge_graph.py                  # seed + generated/*.json → graph_extended.json
 │   ├── repair_extended_graph.py        # 重建 paradigm-validation-paper 与 problem 反向引用

diff --git a/docs/atlas3d.css b/docs/atlas3d.css
@@ -101,6 +101,10 @@ canvas#atlasCanvas {
 }
 .iconbtn:hover { background: rgba(108,177,255,0.22); border-color: rgba(108,177,255,0.55); }
 .iconbtn.active { background: rgba(255,170,85,0.22); border-color: rgba(255,170,85,0.55); color: var(--accent-warm); }
+.iconbtn.iconbtn-primary { background: rgba(167,243,208,0.22); border-color: rgba(167,243,208,0.55); color: #a7f3d0; font-weight: 600; }
+.iconbtn.iconbtn-primary:hover { background: rgba(167,243,208,0.32); border-color: rgba(167,243,208,0.75); color: #d6f5e6; }
+.iconbtn.iconbtn-subtle { opacity: 0.6; }
+.iconbtn.iconbtn-subtle:hover { opacity: 1; }
 
 /* ---------- side panels ---------- */
 .side-panel {
@@ -171,6 +175,12 @@ canvas#atlasCanvas {
 .legend .swatch { width: 16px; height: 3px; border-radius: 2px; flex-shrink: 0; }
 .legend .swatch.dashed { background-image: linear-gradient(90deg, currentColor 50%, transparent 50%); background-size: 6px 100%; }
 
+.research-legend .legend-dot { width: 14px; height: 14px; border-radius: 50%; flex-shrink: 0; box-shadow: 0 0 8px currentColor; }
+.research-legend .lg-evidence-3 { background: #a7f3d0; color: #a7f3d0; width: 16px; height: 16px; }
+.research-legend .lg-dispute    { background: #94a3b8; color: #94a3b8; opacity: 0.7; }
+.research-legend .lg-fb         { background: #fcd34d; color: #fcd34d; }
+.research-legend .lg-default    { background: #475569; color: #475569; box-shadow: none; }
+
 .time-row { display: flex; align-items: center; justify-content: space-between; margin-top: 6px; font-size: 12px; color: var(--ink-dim); }
 input#yearSlider { width: 100%; accent-color: var(--accent); }
 

diff --git a/docs/data/cards/paper_fujimoto2019_bcq.md b/docs/data/cards/paper_fujimoto2019_bcq.md
@@ -0,0 +1,56 @@
+---
+id: paper:fujimoto2019_bcq
+title: "BCQ — Off-Policy Deep Reinforcement Learning without Exploration"
+title_zh: "BCQ：无探索条件下的离线深度强化学习"
+kind: paper
+tier: A
+authors: [Fujimoto, S., Meger, D., Precup, D.]
+venue: "ICML 2019"
+year: 2019
+topic: deep_rl
+phase: core
+prereqs: [paper:mnih2015_dqn]
+extends: []
+parallel: []
+contested_by: []
+labs: []
+deep_links:
+  - {label: "PDF p.1 摘要", url: "https://arxiv.org/pdf/1812.02900#page=1"}
+  - {label: "PDF p.3 §4 BCQ 算法", url: "https://arxiv.org/pdf/1812.02900#page=3"}
+  - {label: "官方实现 (sfujim/BCQ)", url: "https://github.com/sfujim/BCQ"}
+bibtex: |
+  @inproceedings{fujimoto2019off,
+    title     = {Off-Policy Deep Reinforcement Learning without Exploration},
+    author    = {Fujimoto, Scott and Meger, David and Precup, Doina},
+    booktitle = {International Conference on Machine Learning},
+    year      = {2019}
+  }
+---
+
+## TL;DR
+BCQ 用条件 VAE 学习行为策略的支持集，并把候选动作限制在该支持集附近，从而避免离线 Q 学习对分布外动作的过估计。它在 CQL 之前提出，是最早把"行为约束"显式写进离线 RL 算法的工作。
+
+## 数学锚点
+策略输出：
+$$\pi(s) = \arg\max_a \Big[Q(s, a) - \lambda \cdot \mathrm{KL}\big(\hat\pi_\beta(\cdot \mid s) \,\big\|\, \delta_a\big)\Big] \approx \arg\max_{a \in \hat\pi_\beta(\cdot\mid s)\text{-support}} Q(s, a)$$
+通过条件 VAE 学到的 $\hat\pi_\beta$ 给定状态 $s$ 后生成候选动作集合，BCQ 只在这个集合上做 Q 评估与最大化。
+
+## 架构与方法
+- **行为策略密度估计**：训练一个条件 VAE，给定状态后能采样接近数据集动作分布的候选。
+- **扰动网络**：在 VAE 采样基础上加入一个小幅扰动网络，允许策略在支持集邻域内继续改进。
+- **双 Q 网络**：与 TD3 类似的两个 Q 网络取最小值以缓解过估计。
+
+## 工程要点
+- 训练流程比 CQL 复杂（VAE + 双 Q + 扰动网络），实现成本更高。
+- 对支持集的依赖让它在覆盖良好的数据集上表现稳定，但覆盖不足时策略改进受限。
+- 在 D4RL 部分子集上仍是有竞争力的基线，与 CQL 形成 "约束动作 vs 惩罚 Q" 的两条主路线对比。
+
+## 已知失败边界
+- 数据集覆盖空洞时支持集过窄，策略无法做出有效改进。
+- VAE 自身的密度估计偏差会传导到策略选择。
+
+## Bitter-Lesson 视角
+BCQ 在 2019 年率先承认"离线 RL 需要明确处理分布外动作"，但它选择了"用一个学到的密度估计来限制策略"的路径，仍然引入了相当多的人工组件（VAE + 扰动网络 + 双 Q）。CQL 之后的发展显示，把同一约束以单一标量超参 $\alpha$ 表达就足够，BCQ 的多模块设计在大规模数据下被更简洁的方案取代——这与 Bitter Lesson 关于"少先验更耐扩展"的判断一致。
+
+## 关联节点
+- → [`paper:kumar2020_cql`](paper_kumar2020_cql.md)：CQL 把 BCQ 的约束动作思想推进为更简洁的 Q 下界估计。
diff --git a/docs/data/cards/paper_hafner2020_dreamer.md b/docs/data/cards/paper_hafner2020_dreamer.md
@@ -0,0 +1,59 @@
+---
+id: paper:hafner2020_dreamer
+title: "Dreamer — Dream to Control: Learning Behaviors by Latent Imagination"
+title_zh: "Dreamer：通过隐空间想象学习控制行为"
+kind: paper
+tier: S
+authors: [Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.]
+venue: "ICLR 2020"
+year: 2020
+topic: deep_rl
+phase: frontier
+prereqs: [paper:world_models]
+extends: [paper:world_models]
+parallel: []
+contested_by: []
+labs: [lab_dreamer_cartpole_pixels]
+deep_links:
+  - {label: "PDF p.1 摘要", url: "https://arxiv.org/pdf/1912.01603#page=1"}
+  - {label: "PDF p.3 §3 RSSM 与 actor-critic 想象", url: "https://arxiv.org/pdf/1912.01603#page=3"}
+  - {label: "PDF p.6 §5 DM Control 与 Atari 结果", url: "https://arxiv.org/pdf/1912.01603#page=6"}
+  - {label: "官方实现 (danijar/dreamerv1)", url: "https://github.com/danijar/dreamer"}
+bibtex: |
+  @inproceedings{hafner2020dream,
+    title     = {Dream to Control: Learning Behaviors by Latent Imagination},
+    author    = {Hafner, Danijar and Lillicrap, Timothy and Ba, Jimmy and Norouzi, Mohammad},
+    booktitle = {International Conference on Learning Representations},
+    year      = {2020}
+  }
+---
+
+## TL;DR
+Dreamer 在 Recurrent State Space Model 学到的隐空间里做想象式 rollout，并用 actor-critic 同时训练价值与策略，使得 DM Control 与 Atari 系列任务在真实样本预算受限时显著超过无世界模型的 SAC 与 PPO。
+
+## 数学锚点
+隐空间过渡：
+$$z_t \sim p_\theta(z_t \mid z_{t-1}, a_{t-1}),\quad h_t = f_\theta(h_{t-1}, z_{t-1}, a_{t-1}),\quad \hat o_t \sim p_\theta(o_t \mid h_t, z_t)$$
+想象 rollout 上的 actor-critic 目标：
+$$V_\lambda(z_\tau) = (1-\lambda)\sum_{n=1}^{H-1} \lambda^{n-1} V_n(z_\tau) + \lambda^{H-1} V_H(z_\tau)$$
+其中 $V_n$ 是从想象 horizon $n$ 起的 $n$-step 回报估计。
+
+## 架构与方法
+- **RSSM 世界模型**：把图像观测压缩到一个低维隐状态 $z_t$，并学习其确定性与随机性两路过渡。
+- **想象 rollout**：在 $z_t$ 上滚动若干步生成想象轨迹，actor 与 critic 全程在想象中学习。
+- **真实样本只用于学世界模型**：策略学习对真实样本预算的依赖被显著降低。
+
+## 工程要点
+- 与无模型基线相比，Dreamer 在固定真实样本预算下回报曲线显著更高且方差更小。
+- 想象 horizon 过长时策略可能学到模型预测漏洞而非真实环境动力（见 `failure_mode:world_model_compounding_imagination_error`）。
+- 后续 DreamerV2 与 DreamerV3 在更大规模任务上把该范式推到更强。
+
+## 已知失败边界
+- 视觉极端噪声或非平稳环境下 RSSM 预测误差大，想象 rollout 反而损害策略。
+- 隐空间过小时世界模型无法表示足够丰富的动力学。
+
+## Bitter-Lesson 视角
+Dreamer 把"在脑中想象后果"这一人类认知特征显式建模为可微分的隐空间过渡，并交给规模化训练去解决预测精度。它没有用任何人为设计的物理先验，整个 RSSM 从像素到回报全部由数据驱动。规模越大、隐空间越能容纳真实动力学，与 Bitter Lesson 关于"通用方法 + 算力"的判断完全一致。
+
+## 配套实验
+[`labs/world_models/lab_dreamer_cartpole_pixels`](../../../labs/world_models/lab_dreamer_cartpole_pixels/) 在 CartPole 像素观测上复现 Dreamer 的 RSSM 与 latent imagination 训练流程。
diff --git a/docs/data/cards/paper_kumar2020_cql.md b/docs/data/cards/paper_kumar2020_cql.md
@@ -0,0 +1,57 @@
+---
+id: paper:kumar2020_cql
+title: "CQL — Conservative Q-Learning for Offline Reinforcement Learning"
+title_zh: "CQL：离线强化学习的保守 Q 学习"
+kind: paper
+tier: S
+authors: [Kumar, A., Zhou, A., Tucker, G., Levine, S.]
+venue: "NeurIPS 2020"
+year: 2020
+topic: deep_rl
+phase: core
+prereqs: [paper:mnih2015_dqn, paper:schulman2017_ppo]
+extends: [paper:fujimoto2019_bcq]
+parallel: []
+contested_by: []
+labs: [lab_cql_offline_minigrid]
+deep_links:
+  - {label: "PDF p.1 摘要", url: "https://arxiv.org/pdf/2006.04779#page=1"}
+  - {label: "PDF p.4 §3.2 保守损失推导", url: "https://arxiv.org/pdf/2006.04779#page=4"}
+  - {label: "PDF p.6 §4 D4RL 结果", url: "https://arxiv.org/pdf/2006.04779#page=6"}
+  - {label: "官方实现 (aviralkumar2907/CQL)", url: "https://github.com/aviralkumar2907/CQL"}
+bibtex: |
+  @inproceedings{kumar2020conservative,
+    title     = {Conservative Q-Learning for Offline Reinforcement Learning},
+    author    = {Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey},
+    booktitle = {Advances in Neural Information Processing Systems},
+    year      = {2020}
+  }
+---
+
+## TL;DR
+CQL 在 Bellman 损失上叠加一个对未见动作的下界惩罚项，使学到的 $Q$ 函数成为真实 $Q$ 函数的可证明下界，从而避免离线强化学习中分布外动作的 Q 值过估计。
+
+## 数学锚点
+保守损失项：
+$$\min_Q \alpha \cdot \mathbb{E}_{s\sim\mathcal{D}}\!\left[\log\sum_a \exp Q(s,a) - \mathbb{E}_{a\sim\hat\pi_\beta(\cdot\mid s)} Q(s,a)\right] + \tfrac12 \mathbb{E}_{(s,a,s')}\!\left[\big(Q(s,a) - \mathcal{B}^{\hat\pi} Q(s,a)\big)^2\right]$$
+其中 $\alpha$ 控制保守强度，$\hat\pi_\beta$ 是行为策略的密度估计，$\mathcal{B}^{\hat\pi}$ 是 Bellman 算子。定理 3.1 保证 $\hat Q_\text{CQL}(s,a) \le Q^\pi(s,a)$ 在数据集分布上几乎处处成立。
+
+## 架构与方法
+- **保守 Bellman 算子**：相对常规 Q 学习，每一步更新都把分布外动作的 Q 值显式压低。
+- **可调拉格朗日 $\alpha$**：CQL 的 H 变体允许 $\alpha$ 通过对偶变量自动调节，使保守惩罚刚好把数据集动作上的 Q 值约束在 0 之上。
+- **可叠加到 SAC 或 DQN**：CQL 把保守项作为额外损失项加在原始 Q 损失之上，几乎不改变实现细节。
+
+## 工程要点
+- 实现简单，几十行代码即可在现有 SAC / DQN 实现上加入。
+- 对数据集覆盖几何敏感：覆盖严重不足时策略会退化为模仿数据集均值行为。
+- 在 D4RL MuJoCo 与 Antmaze 上全面优于 BC 与离线 SAC，已成为离线强化学习的标准基线之一。
+
+## 已知失败边界
+- 数据集存在显著动作覆盖空洞时保守惩罚导致策略陷入死锁（见 `failure_mode:offline_rl_extrapolation_error`）。
+- $\alpha$ 过大时策略退化为模仿数据集的平均行为。
+
+## Bitter-Lesson 视角
+CQL 通过显式建模"分布外动作的不可信"实现了真正的离线策略改进，没有把这一负责任的悲观转嫁给手工设计的规则集。相对于更早期依赖人为白名单的离线方法，它把人工先验压缩到一个标量 $\alpha$，并把绝大部分工作交给数据驱动的下界估计。在算力允许大规模离线训练的世界里，这种最小先验、最大数据利用的设计正是 Bitter Lesson 在离线强化学习上的具体体现。
+
+## 配套实验
+[`labs/rl_decision/lab_cql_offline_minigrid`](../../../labs/rl_decision/lab_cql_offline_minigrid/) 提供 CQL 与 BC / 离线 SAC 在 MiniGrid 上的并排训练与 Q 值过估计可视化。