diff --git a/docs/data/cards/extended/paper_pid_lagrangian.md b/docs/data/cards/extended/paper_pid_lagrangian.md new file mode 100644 index 0000000..1a7bd18 --- /dev/null +++ b/docs/data/cards/extended/paper_pid_lagrangian.md @@ -0,0 +1,56 @@ +--- +id: paper:pid_lagrangian +title: "Responsive Safety in RL by PID Lagrangian Methods" +title_zh: "PID-Lagrangian(PID 控制的乘子更新)" +kind: paper +tier: B +authors: [Stooke, A., Achiam, J., Abbeel, P.] +venue: "ICML 2020" +year: 2020 +topic: safety +phase: core +deep_links: + - {label: "arXiv 2007.03964", url: "https://arxiv.org/abs/2007.03964"} +--- + +# PID-Lagrangian(PID 控制的乘子更新) + +> Stooke 等 2020 把安全 RL 中的乘子更新从纯梯度上升换成 PID 控制器,让约束响应在违反扩大或缩小时都更平滑。它解决了 [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) 长期存在的乘子震荡问题——纯积分(梯度上升)相当于 PI 控制器漏掉 D 项,PID 引入比例与微分让控制更稳。 + +## 一个最小公式 / Math anchor + +把约束违反 $J^{C}(\pi) - d$ 看作误差信号 $e(t)$,PID 控制器更新乘子 + +$$ +\lambda(t) \;=\; K_P\,e(t)\ +\ K_I \int_0^t e(\tau)\,d\tau\ +\ K_D\,\frac{de(t)}{dt} +$$ + +- **P (比例)**:当约束被违反时立刻把 $\lambda$ 推高 → 加快响应。 +- **I (积分)**:积累历史违反 → 等价于传统 Lagrangian 的梯度上升。 +- **D (微分)**:根据违反的变化速率超前调整 → 抑制震荡。 + +实际工程上 $\lambda \ge 0$ 必须保证,所以 $\lambda(t) = [\cdot]_+$。 + +## 在图谱里的位置 + +PID-Lagrangian 是 [`paradigm:safe_rl`](paradigm_safe_rl.md) 中"乘子更新工程化"路线的代表,扩展 [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) 与 [RCPO](paper_rcpo.md)(纯 I 控制器)到 PID 控制器。它响应 [`insight:safety_emerges_from_constraint_lagrangian_not_reward_shaping`](insight_safety_emerges_from_constraint_lagrangian_not_reward_shaping.md) 中乘子震荡的核心痛点,并通过 [`insight:alignment_is_constraint_satisfaction_over_generation`](insight_alignment_is_constraint_satisfaction_over_generation.md) 与 RLHF 的偏好对齐建立类比——两者都是"多约束下的稳定优化"问题。 + +## 工程上真正要注意什么 + +- PID 参数调节比单个学习率更复杂——$K_P, K_I, K_D$ 需要联合调参。Stooke 等给出 Safety Gym 上的默认配置。 +- D 项对噪声敏感——cost critic 估计有噪时 D 项可能放大震荡,需要先平滑(exponential moving average)。 +- 与控制论的 PID tuning 理论可借鉴——Ziegler-Nichols 自调谐方法在 Safe RL 上也有人尝试。 +- 工业上自适应学习率(Adam)+ PID 乘子是常见组合,但 Adam 的二阶矩估计会与 PID 的 D 项干扰,需要小心。 +- 没有官方仓库,常通过 OpenAI safety-starter-agents 改装。 + +## Bitter-Lesson 视角 + +PID-Lagrangian 在 Bitter Lesson 维度上是工程改良——它不挑战"约束 + 对偶"的根本结构,只是用更聪明的控制理论把乘子更新规则做精。它和 RCPO 一起把 Lagrangian Safe RL 推到工业部署友好的水平。 + +## 接下来读什么 + +- [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) — 直接前作 +- [RCPO](paper_rcpo.md) — 平行的奖励 shaping 路线 +- [CPO](paper_cpo_safe_rl.md) — 信赖域投影对照 +- [`insight:safety_emerges_from_constraint_lagrangian_not_reward_shaping`](insight_safety_emerges_from_constraint_lagrangian_not_reward_shaping.md) +- [`paradigm:safe_rl`](paradigm_safe_rl.md) diff --git a/docs/data/cards/extended/paper_rcpo.md b/docs/data/cards/extended/paper_rcpo.md new file mode 100644 index 0000000..10488dd --- /dev/null +++ b/docs/data/cards/extended/paper_rcpo.md @@ -0,0 +1,58 @@ +--- +id: paper:rcpo +title: "Reward Constrained Policy Optimization (RCPO)" +title_zh: "RCPO(奖励约束的策略优化)" +kind: paper +tier: B +authors: [Tessler, C., Mankowitz, D. J., Mannor, S.] +venue: "ICLR 2019" +year: 2018 +topic: safety +phase: core +deep_links: + - {label: "arXiv 1805.11074", url: "https://arxiv.org/abs/1805.11074"} +--- + +# RCPO(奖励约束的策略优化) + +> Tessler 等 2018 提出 Reward Constrained Policy Optimization (RCPO),把约束代价直接通过 Lagrangian 乘子加进奖励信号,再用标准 actor-critic 优化。它简化了 [CPO](paper_cpo_safe_rl.md) 的二阶投影,是 [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) 的早期统一框架——证明用"软"约束(奖励 shaping + 乘子)也能在收敛性上接近 CPO 的"硬"约束。 + +## 一个最小公式 / Math anchor + +RCPO 把 CMDP 改写成等价的标量奖励问题 + +$$ +\tilde R(s, a) \;=\; R(s, a)\ -\ \sum_i \lambda_i\,C_i(s, a) +$$ + +然后用任意 actor-critic(A2C、PPO、SAC)在 $\tilde R$ 上做训练,同时对 $\lambda_i$ 做对偶上升: + +$$ +\lambda_i \;\leftarrow\; \big[\lambda_i\ +\ \eta_\lambda\,\big(J^{C_i}(\pi) - d_i\big)\big]_+ +$$ + +Tessler 等证明:在合理的学习率条件下,$(\pi, \lambda)$ 收敛到 CMDP 的鞍点。这给出了"奖励 shaping + 动态乘子"作为安全 RL 工程基线的理论保证。 + +## 在图谱里的位置 + +RCPO 是 [`paradigm:safe_rl`](paradigm_safe_rl.md) 中 Lagrangian 路线的早期统一工作,把 [Altman 1999 CMDP](paper_altman_constrained_mdp.md) 的对偶理论用最简单的奖励 shaping 落地。它和 [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) 是平行变体(不同的实现细节),和 [CPO](paper_cpo_safe_rl.md) 形成"软 vs 硬约束"对照,被 [PID-Lagrangian](paper_pid_lagrangian.md) 进一步改良(PID 控制器替换梯度上升)。它实现 [`move:apply_dual_lagrangian_to_safety_constraint`](move_apply_dual_lagrangian_to_safety_constraint.md)。 + +## 工程上真正要注意什么 + +- 奖励 shaping 的最大风险是约束违反 → 奖励变负 → 策略 "宁可不动也不违反",需要谨慎设计 $C_i$ 的尺度。 +- 乘子学习率 $\eta_\lambda$ 比策略学习率慢 1–2 个数量级是经验常用配比。 +- 在 Safety Gym 等基准上 RCPO 与 PPO-Lagrangian 性能接近,但训练曲线在不同任务上互有胜负。 +- 没有官方 PyTorch 仓库,多通过 OpenAI safety-starter-agents、tianshou、d3rlpy 等改装实现。 +- License:依实现而异。 + +## Bitter-Lesson 视角 + +RCPO 接近 Bitter Lesson 路线——它不要求二阶解析(如 CPO),只要求把约束变成可微的奖励 shaping + 简单乘子上升。这让安全 RL 可以挂在任意现代策略梯度算法上,是 "通用学习器 + 简单约束"的胜利。 + +## 接下来读什么 + +- [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) — 平行变体 +- [CPO](paper_cpo_safe_rl.md) — 二阶投影对照 +- [PID-Lagrangian](paper_pid_lagrangian.md) — 乘子更新的进一步改良 +- [Altman CMDP](paper_altman_constrained_mdp.md) — 理论基底 +- [`paradigm:safe_rl`](paradigm_safe_rl.md) diff --git a/docs/data/cards/extended/paper_safe_rl_carla.md b/docs/data/cards/extended/paper_safe_rl_carla.md new file mode 100644 index 0000000..8458397 --- /dev/null +++ b/docs/data/cards/extended/paper_safe_rl_carla.md @@ -0,0 +1,59 @@ +--- +id: paper:safe_rl_carla +title: "Safe Reinforcement Learning on Driving Benchmarks (CARLA / nuPlan)" +title_zh: "驾驶场景下的安全 RL(CARLA / nuPlan 实证)" +kind: paper +tier: B +authors: [Various — SafeDriver, ConstrainedDrive, Safe-Behavior-Cloning teams] +venue: "ICRA / CoRL / ITSC representative line" +year: 2022 +topic: safety +phase: frontier +deep_links: + - {label: "Safe RL for autonomous driving 综述 (arXiv 2305.10076)", url: "https://arxiv.org/abs/2305.10076"} + - {label: "arxiv-not-yet-available (具体论文索引待确认)", url: "https://arxiv.org/search/?query=safe+rl+CARLA"} +phase_note: "frontier" +--- + +# 驾驶场景下的安全 RL(CARLA / nuPlan 实证) + +> 本节代表一类把 [CPO](paper_cpo_safe_rl.md) / [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) 在 CARLA 或 nuPlan 闭环上做实证的工作(如 SafeDriver、ConstrainedDrive、Safe-Behavior-Cloning),把约束 MDP 框架与碰撞、违规、舒适度多约束驾驶任务直接对齐。它是安全 RL 从 Safety Gym 走向驾驶基准的桥梁。注意这是一条研究线索而非单篇论文——arXiv 上有数十篇相关工作,此处把它们作为图谱节点统一引用。 + +## 一个最小公式 / Math anchor + +驾驶任务的典型 CMDP 设置 + +$$ +\max_\pi\ \mathbb E\!\left[\sum_t \gamma^t R_\text{progress}(s, a)\right]\quad \text{s.t.}\ \ +\begin{cases} +J^{C_\text{collision}}(\pi) \le 0.001 \\ +J^{C_\text{lane\_violation}}(\pi) \le 0.05 \\ +J^{C_\text{comfort}}(\pi) \le 1.0\,\text{m/s}^2 +\end{cases} +$$ + +奖励驱动进度,约束分别覆盖碰撞概率、违规率、最大加速度。用 PPO-Lagrangian、CPO、PID-Lagrangian 等任一安全 RL 算法求解。 + +## 在图谱里的位置 + +驾驶安全 RL 工作把 [`paradigm:safe_rl`](paradigm_safe_rl.md) 投影到 [CARLA Leaderboard](paper_carla_leaderboard.md) / [nuPlan](paper_nuplan.md) 闭环基准上。它和 [`paradigm:counterfactual_data_centric_safety`](paradigm_counterfactual_data_centric_safety.md) 互补——数据侧用反事实生成补罕见样本,算法侧用安全 RL 框约束。它响应 [`problem:exploration_in_safety_critical_systems`](problem_exploration_in_safety_critical_systems.md)(仿真允许"无成本探索",但安全 RL 让 sim-to-real 更平滑)与 [`problem:reward_specification_for_safe_polite_driving`](problem_reward_specification_for_safe_polite_driving.md)(把"礼让"等隐式要求显式化为约束)。 + +## 工程上真正要注意什么 + +- 仿真器假设是关键风险——CARLA 的 NPC 行为可能让安全 RL 学到的策略在真实路况下水土不服。 +- 多约束权衡(碰撞 vs 进度 vs 舒适)的乘子互相博弈,常出现训练初期某一约束被严重违反、后期才追回。 +- 在 nuPlan 这种真实日志驱动的 benchmark 上,"约束违反"的真值标签必须从日志推断,工程上不易精确。 +- 没有统一参考实现——研究界通常基于 stable-baselines3 + safety-gym 风格定制。 +- 与 ISO 21448 SOTIF / RSS(Responsibility-Sensitive Safety)的对齐是工业部署关键,但学术论文极少显式讨论。 + +## Bitter-Lesson 视角 + +驾驶安全 RL 在 Bitter Lesson 维度上是中间地带——它接受 RL + 约束的统一框架(通用),但约束本身(碰撞代价、违规规则)仍是手工列出。完全交给学习器自动发现约束(如 inverse constrained RL)是更激进的 Bitter Lesson 方向,但目前在驾驶上仍未成熟。 + +## 接下来读什么 + +- [CPO](paper_cpo_safe_rl.md) — 信赖域 + 约束基底 +- [Lagrangian Safe RL](paper_lagrangian_safe_rl.md) — 对偶基线 +- [`paradigm:safe_rl`](paradigm_safe_rl.md) +- [`problem:exploration_in_safety_critical_systems`](problem_exploration_in_safety_critical_systems.md) +- [`paradigm:counterfactual_data_centric_safety`](paradigm_counterfactual_data_centric_safety.md) — 数据侧互补 diff --git a/docs/data/cards/extended/paper_schulman2015_trpo.md b/docs/data/cards/extended/paper_schulman2015_trpo.md new file mode 100644 index 0000000..f7fd567 --- /dev/null +++ b/docs/data/cards/extended/paper_schulman2015_trpo.md @@ -0,0 +1,54 @@ +--- +id: paper:schulman2015_trpo +title: "Trust Region Policy Optimization (TRPO)" +title_zh: "TRPO(信赖域策略优化)" +kind: paper +tier: S +authors: [Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P.] +venue: "ICML 2015" +year: 2015 +topic: deep_rl +phase: core +deep_links: + - {label: "arXiv 1502.05477", url: "https://arxiv.org/abs/1502.05477"} + - {label: "openai/spinningup TRPO", url: "https://spinningup.openai.com/en/latest/algorithms/trpo.html"} +--- + +# TRPO(信赖域策略优化) + +> TRPO 把策略改进步骤限制在新旧策略 KL 散度不超过 $\delta$ 的信赖域内,并给出近似单调改进的理论保证。它的二阶求解(共轭梯度 + 自然梯度)虽然工程门槛高,但为 [PPO](paper_schulman2017_ppo.md)、[CPO](paper_cpo_safe_rl.md) 等后续算法奠定了"信赖域"几何基石——所有现代约束策略优化都建在 TRPO 的理论之上。 + +## 一个最小公式 / Math anchor + +TRPO 的每步策略更新求解 + +$$ +\max_\theta\ \mathbb E_{s, a \sim \pi_{\theta_\text{old}}}\!\left[\frac{\pi_\theta(a|s)}{\pi_{\theta_\text{old}}(a|s)}\,A^{\pi_{\theta_\text{old}}}(s, a)\right] \quad \text{s.t.}\ \ \mathbb E_s\big[D_{\mathrm{KL}}(\pi_{\theta_\text{old}} \| \pi_\theta)\big] \le \delta +$$ + +近似单调改进的核心定理:$J(\pi_\theta) - J(\pi_{\theta_\text{old}}) \ge L(\pi_\theta) - C\,D_{\mathrm{KL}}^\text{max}$,其中 $L$ 是 surrogate 目标,$C$ 是与折扣因子相关的常数。在 KL 约束下做策略更新等价于沿着 Fisher 信息度量的自然梯度方向走信赖域步长。 + +## 在图谱里的位置 + +TRPO 是 [`paradigm:model_free_rl`](paradigm_model_free_rl.md) 的关键支柱,把"如何让策略改进稳定"从直觉变成有理论保证的几何约束。它直接喂养 [PPO](paper_schulman2017_ppo.md)(用 clipping 近似 KL 约束)、[CPO](paper_cpo_safe_rl.md)(在 TRPO 框架里加约束)、[MPO](paper_mpo.md)(用 E-M 算法替换二阶求解)。它实现 [`move:trust_region_step_for_monotonic_improvement`](move_trust_region_step_for_monotonic_improvement.md) 这条 move 的母版——所有现代约束策略优化都建在 TRPO 的几何之上。 + +## 工程上真正要注意什么 + +- 二阶求解需 Fisher 信息矩阵的共轭梯度近似——实现复杂度比 PPO 高一个量级。 +- $\delta$ 选择:典型 $\delta = 0.01 \sim 0.05$。太大失去单调性保证,太小收敛慢。 +- 经验表明 PPO 的一阶 clipping 在多数任务上和 TRPO 性能相当——这是 PPO 取代 TRPO 成为工业基线的原因。 +- 但在带约束设定下(CPO),TRPO 的二阶框架仍有不可替代价值——KL 约束 + 代价约束的联合二阶解析解是 CPO 的核心。 +- 仓库参考:joschu/modular_rl 是 Schulman 原作;openai/baselines 提供经典实现;CleanRL 是教学友好版本。 +- License:依实现而异,通常 MIT。 + +## Bitter-Lesson 视角 + +TRPO 在 Bitter Lesson 维度上偏理论——它显式建模"策略改进的稳定性"用二阶几何工具。PPO 用一阶近似把它简化为工程可达版本,是 Bitter Lesson 在策略梯度上的胜利。但 TRPO 的理论价值至今未被超越:CPO 等带约束 RL 仍需要 TRPO 的二阶框架。 + +## 接下来读什么 + +- [PPO](paper_schulman2017_ppo.md) — 一阶近似版本 +- [CPO](paper_cpo_safe_rl.md) — 加约束的 TRPO +- [`move:trust_region_step_for_monotonic_improvement`](move_trust_region_step_for_monotonic_improvement.md) +- [SAC](paper_sac.md) — 完全不同的稳定性来源(最大熵) +- [`paradigm:model_free_rl`](paradigm_model_free_rl.md) diff --git a/docs/data/generated/wave_e_stubs.json b/docs/data/generated/wave_e_stubs.json index bb3b8e6..980acd6 100644 --- a/docs/data/generated/wave_e_stubs.json +++ b/docs/data/generated/wave_e_stubs.json @@ -10,7 +10,7 @@ "topic": "foundation_models", "phase": "core", "year": 2022, - "summary_zh": "Chinchilla 通过在 400 余次训练运行上拟合损失曲线,给出了在固定算力下参数量与训练 token 数应该同步放大的最优比例(约 1:20)。它揭示了 GPT-3 等早期大模型把算力过度投向参数而 token 不足,是 LLM scaling 的"compute-optimal" 修正标尺。" + "summary_zh": "Chinchilla 通过在 400 余次训练运行上拟合损失曲线,给出了在固定算力下参数量与训练 token 数应该同步放大的最优比例(约 1:20)。它揭示了 GPT-3 等早期大模型把算力过度投向参数而 token 不足,是 LLM scaling 的 compute-optimal 修正标尺。" }, { "id": "paper:watkins_dayan_qlearning", @@ -43,7 +43,7 @@ "topic": "foundation_models", "phase": "core", "year": 2020, - "summary_zh": "DDPM 把生成模型重新表述为"逐步加噪后再学习反向去噪"的过程,把图像生成的可训练目标压成预测每一步的噪声。它的简洁性与稳定性让扩散模型在两年内取代 GAN 成为图像、视频、动作生成的默认范式,也奠定了 Diffusion Policy、世界模型视频生成的方法学基础。" + "summary_zh": "DDPM 把生成模型重新表述为逐步加噪后再学习反向去噪的过程,把图像生成的可训练目标压成预测每一步的噪声。它的简洁性与稳定性让扩散模型在两年内取代 GAN 成为图像、视频、动作生成的默认范式,也奠定了 Diffusion Policy、世界模型视频生成的方法学基础。" }, { "id": "paper:lora", @@ -76,7 +76,7 @@ "topic": "deep_rl", "phase": "core", "year": 2015, - "summary_zh": "TRPO 把策略改进步骤限制在新旧策略 KL 散度不超过 δ 的信赖域内,并给出近似单调改进的理论保证。它的二阶求解(共轭梯度 + 自然梯度)虽然工程门槛高,但为 PPO、CPO 等后续算法奠定了"信赖域"几何基石。" + "summary_zh": "TRPO 把策略改进步骤限制在新旧策略 KL 散度不超过 δ 的信赖域内,并给出近似单调改进的理论保证。它的二阶求解(共轭梯度 + 自然梯度)虽然工程门槛高,但为 PPO、CPO 等后续算法奠定了信赖域几何基石。" }, { "id": "paper:rcpo", @@ -125,7 +125,7 @@ {"source": "paper:ddpm", "target": "paper:diffusion_policy_chi2023", "rel": "prereq"}, {"source": "paper:ddpm", "target": "insight:residual_learning_unlocks_arbitrary_depth", "rel": "composes"}, - {"source": "paper:ddpm", "target": "move:add_noise_then_denoise_for_score_based_generation", "rel": "composes"}, + {"source": "paper:ddpm", "target": "move:diffusion_denoise_sampling", "rel": "composes"}, {"source": "paper:lora", "target": "insight:residual_learning_unlocks_arbitrary_depth", "rel": "composes"}, {"source": "paper:lora", "target": "paper:llama", "rel": "parallel"}, diff --git a/docs/data/graph_extended.json b/docs/data/graph_extended.json index 6694c60..7c20277 100644 --- a/docs/data/graph_extended.json +++ b/docs/data/graph_extended.json @@ -257,7 +257,7 @@ "phase": "prereq", "year": 2020, "card": "paper_gpt3.md", - "degree": 26 + "degree": 27 }, { "id": "paper:schulman2017_ppo", @@ -269,7 +269,7 @@ "phase": "core", "year": 2017, "card": "paper_schulman2017_ppo.md", - "degree": 12 + "degree": 13 }, { "id": "paper:mnih2015_dqn", @@ -281,7 +281,7 @@ "phase": "prereq", "year": 2015, "card": "paper_mnih2015_dqn.md", - "degree": 11 + "degree": 12 }, { "id": "paper:ross2011_dagger", @@ -1076,7 +1076,7 @@ "year": 2023, "summary_zh": "Diffusion Policy 把模仿学习中的策略改写成一个条件扩散模型,输入是最近若干帧观察,输出是未来若干步的动作序列。扩散过程的多模态表达能力让它能优雅处理人类示教中的动作多解性,从而在多种机器人操控任务上把成功率显著推高。", "label": "Diffusion Policy", - "degree": 5 + "degree": 6 }, { "id": "paper:redq", @@ -1100,7 +1100,7 @@ "year": 2020, "summary_zh": "CQL 在离线强化学习的 Q 损失里额外加了一个项,使得对数据外动作的 Q 估计被显式压低,从而得到真实 Q 的下界。这种保守化让从离线数据学到的策略在部署时不再倾向于挑那些没见过却看起来高分的动作,极大地缓解了离线 RL 的分布偏移问题。", "label": "Conservative Q-Learning", - "degree": 5 + "degree": 6 }, { "id": "paper:iql", @@ -1268,7 +1268,7 @@ "year": 2017, "summary_zh": "CPO 把 TRPO 的信赖域思想推广到带约束的马尔可夫决策过程,在每一步策略更新里同时保证新的策略在期望回报上有改进,并且不会让某些预期约束代价超过预算。它是第一个在大规模深度强化学习里直接对安全约束做硬性保证的算法,奠定了后续安全 RL 的范式。", "label": "Constrained Policy Optimization", - "degree": 6 + "degree": 8 }, { "id": "paper:lagrangian_safe_rl", @@ -1280,7 +1280,7 @@ "year": 2019, "summary_zh": "Lagrangian 风格的安全强化学习把每个约束写成一个不等式,引入对偶变量并和策略参数一起做交替优化。这种方法实现简单,可以套在 PPO、SAC 等任意策略梯度算法上,是安全 RL 的工业基线,但对超参数和奖励 / 约束尺度比较敏感。", "label": "Lagrangian Safe RL", - "degree": 6 + "degree": 9 }, { "id": "paper:shielded_rl", @@ -1652,7 +1652,7 @@ "year": 2015, "summary_zh": "把策略更新限制在新旧策略 KL 散度不超过给定阈值的范围内,给出有理论保证的近似单调改进。从 TRPO 的硬约束到 PPO 的截断比都是这一动作的不同实现形式。", "label": "Trust region step for monotonic improvement", - "degree": 4 + "degree": 5 }, { "id": "move:expectile_or_quantile_target_for_distributional_robustness", @@ -1832,7 +1832,7 @@ "year": 2019, "summary_zh": "RL 需要主动尝试未知动作以学习更好的策略,但在驾驶或机器人手术等领域,错误探索代价无法承受。如何在硬安全约束下保持有效探索是安全 RL 与控制理论的长期共同难题。", "label": "Exploration in safety-critical systems", - "degree": 3 + "degree": 4 }, { "id": "problem:planning_horizon_vs_compute_budget_tradeoff", @@ -1882,6 +1882,18 @@ "label": "Imitation learning alone cannot recover from compounding errors", "degree": 1 }, + { + "id": "insight:world_model_as_inner_simulator_unlocks_long_horizon_planning", + "label_zh": "世界模型作为内部模拟器解锁长时规划", + "kind": "insight", + "tier": "insight", + "topic": "world_models", + "phase": "core", + "year": 2018, + "summary_zh": "一旦智能体内部拥有可微、可分支的环境近似,规划就不再受真实交互成本限制,可以在想象中跑成千上万次试错。这条洞见把 RL 从样本贵的范式带向了样本高效的范式。", + "label": "World model as inner simulator unlocks long-horizon planning", + "degree": 4 + }, { "id": "insight:human_demonstrations_compress_implicit_reward_function", "label_zh": "人类示教其实把隐式奖励压缩在轨迹里", @@ -1904,7 +1916,7 @@ "year": 2017, "summary_zh": "把碰撞惩罚塞进标量奖励里会被到达目标的回报抵消,策略仍可能选择高风险路径。把安全单独写成约束并用 Lagrangian 优化,让安全要求与性能要求分别有各自的对偶变量调控,是更稳健的安全 RL 范式。", "label": "Safety emerges from constraint Lagrangian not reward shaping", - "degree": 3 + "degree": 4 }, { "id": "insight:offline_rl_is_actually_constrained_dynamic_programming", @@ -1916,7 +1928,7 @@ "year": 2021, "summary_zh": "之所以 CQL、IQL、Cal-QL 等离线 RL 方法都奏效,是因为它们用不同方式把价值迭代约束在数据集支撑内。一旦明白离线 RL 等价于在数据集分布上做带约束的动态规划,就能从单一框架推出大量算法变体。", "label": "Offline RL is actually constrained dynamic programming", - "degree": 4 + "degree": 6 }, { "id": "insight:tokenized_trajectories_let_planning_borrow_from_language_modeling", @@ -2024,7 +2036,7 @@ "year": 2017, "summary_zh": "安全 RL 把碰撞或代价违规等硬约束显式建模成约束马尔可夫决策过程,并用 Lagrangian、信赖域或形式化屏蔽确保策略改进的同时不破坏安全。它处于纯 RL 与控制论的交叉地带。", "label": "Safe RL", - "degree": 4 + "degree": 7 }, { "id": "paradigm:sequence_modeling_for_decision", @@ -2096,7 +2108,7 @@ "year": 2023, "summary_zh": "LLaMA 是 Meta 发布的开源大语言模型权重族,第一次以学术许可让研究者获得百亿到千亿参数规模的强基线。它直接催生了 Alpaca、Vicuna、Llama-2-Chat 等指令微调分支,使 VLM 与 VLA 研究不必再依赖封闭 API 就能复现训练流程。", "label": "LLaMA family", - "degree": 6 + "degree": 7 }, { "id": "paper:mistral", @@ -2996,7 +3008,7 @@ "year": 2022, "summary_zh": "经验上某些能力例如多步推理、零样本指令跟随只在模型规模与数据量越过某个阈值后才突然出现,而无法在小模型上靠精调获得。这一洞察支撑了苦涩教训所倡导的把工程资源投入到通用扩展而非领域规则上的方法论选择。", "label": "Scaling unlocks emergent capabilities", - "degree": 3 + "degree": 4 }, { "id": "insight:world_model_video_diffusion_is_implicit_physics_engine", @@ -3256,7 +3268,7 @@ "summary_zh": "扩散方法把生成问题转化为反向去噪过程,模型只需学习预测每一步的噪声或得分函数,并在推断时迭代采样得到样本。这一移动在图像生成中由 DDPM 奠基,在视频中扩展为视频扩散,在控制领域被 Diffusion Policy 重新解释为以条件为状态、以动作为样本的策略学习。在自动驾驶中,扩散过程可以同时生成多模态轨迹候选和反事实场景,统一了生成与决策两个传统上分离的问题。", "building_blocks": [], "label": "Score-based Denoising Sampling", - "degree": 3 + "degree": 4 }, { "id": "move:dual_system_fast_slow", @@ -3402,7 +3414,7 @@ "paper:he2015_resnet" ], "label": "Residual Learning Unlocks Arbitrary Depth", - "degree": 3 + "degree": 5 }, { "id": "insight:masked_prediction_yields_self_supervised_signal", @@ -3549,7 +3561,7 @@ "paper:gpt3" ], "label": "Scaling Laws Predict Capability Emergence", - "degree": 7 + "degree": 8 }, { "id": "insight:foundation_pretraining_decouples_data_from_task", @@ -3584,6 +3596,22 @@ "label": "Test-Time Compute Can Substitute Train-Time via Search", "degree": 3 }, + { + "id": "insight:imitation_data_compresses_unspecified_reward", + "label_zh": "模仿数据压缩了未明示的奖励函数", + "kind": "insight", + "tier": "insight", + "topic": "deep_rl", + "phase": "core", + "year": 2011, + "summary_zh": "模仿学习的洞见是专家演示隐式编码了一个研究者难以手工指定的奖励函数,从而避开了奖励设计的难题。它在 ALVINN 中首次用于驾驶,在 GAIL、AIRL 中通过对抗逆强化学习显式提取了奖励,在 DriveGPT 系列中被推到大规模驾驶日志预训练。对于自动驾驶研究的含义是,与其试图手工写出涵盖舒适、安全、效率、社会礼仪的复合奖励,不如先用大规模驾驶日志做模仿预训练,再用偏好对齐与少量精细奖励做微调。", + "building_blocks": [ + "concept:imitation_learning", + "paper:ross2011_dagger" + ], + "label": "Imitation Data Compresses an Unspecified Reward", + "degree": 5 + }, { "id": "insight:world_models_let_planning_be_done_in_imagination", "label_zh": "世界模型让规划在想象中进行", @@ -4048,7 +4076,7 @@ "concept:covariate_shift", "move:dataset_aggregation", "course:cs285", - "insight:human_demonstrations_compress_implicit_reward_function" + "insight:imitation_data_compresses_unspecified_reward" ], "label": "Trace: Dataset Aggregation for Imitation", "degree": 6 @@ -4108,7 +4136,7 @@ "move:diffusion_denoise_sampling", "paper:diffuser", "insight:diffusion_unifies_generation_and_decision", - "insight:human_demonstrations_compress_implicit_reward_function" + "insight:imitation_data_compresses_unspecified_reward" ], "label": "Trace: Diffusion Policy as Score-Based Action Sampler", "degree": 6 @@ -4172,7 +4200,7 @@ "concept:imitation_learning", "move:tokenize_modalities", "insight:tokenization_collapses_modality_gap", - "insight:human_demonstrations_compress_implicit_reward_function" + "insight:imitation_data_compresses_unspecified_reward" ], "label": "Trace: Decision Transformer (Offline RL via Sequence Modeling)", "degree": 8 @@ -6356,6 +6384,126 @@ "summary_zh": "这一范式主张:与其手工建模仿真世界,不如把真实驾驶日志神经重建成可重新渲染、可编辑的数字孪生,然后在其中扩增、对抗、闭环。它把\"采集-训练-评估\"三件事统一进同一套表示,正成为头部公司投入最大的方向之一。", "label": "Neural scene reconstruction as the simulation engine", "degree": 5 + }, + { + "id": "paper:chinchilla", + "label": "Chinchilla", + "label_zh": "Chinchilla(compute-optimal LLM scaling laws)", + "kind": "paper", + "tier": "A", + "topic": "foundation_models", + "phase": "core", + "year": 2022, + "summary_zh": "Chinchilla 通过在 400 余次训练运行上拟合损失曲线,给出了在固定算力下参数量与训练 token 数应该同步放大的最优比例(约 1:20)。它揭示了 GPT-3 等早期大模型把算力过度投向参数而 token 不足,是 LLM scaling 的 compute-optimal 修正标尺。", + "degree": 3 + }, + { + "id": "paper:watkins_dayan_qlearning", + "label": "Watkins & Dayan Q-learning", + "label_zh": "Watkins & Dayan Q 学习(收敛性证明)", + "kind": "paper", + "tier": "S", + "topic": "rl_foundations", + "phase": "prereq", + "year": 1992, + "summary_zh": "Watkins 1989 博士论文提出 Q-learning,Watkins & Dayan 1992 证明在所有 (s, a) 被无限访问且学习率满足 Robbins-Monro 条件下,Q 值会以概率 1 收敛到最优 Q*。这是几乎所有现代深度 RL 算法的理论起点。", + "degree": 2 + }, + { + "id": "paper:bear", + "label": "BEAR", + "label_zh": "BEAR(行为约束的离线 RL)", + "kind": "paper", + "tier": "B", + "topic": "deep_rl", + "phase": "core", + "year": 2019, + "summary_zh": "BEAR (Bootstrapping Error Accumulation Reduction) 在离线 RL 的 actor 损失里加入对行为策略的 MMD 距离约束,让学到的策略不偏离数据集支撑太远。它是 BCQ 之后离线 RL 显式行为约束路线的代表,启发了后续 CQL、IQL 的设计。", + "degree": 2 + }, + { + "id": "paper:ddpm", + "label": "DDPM", + "label_zh": "DDPM(去噪扩散概率模型)", + "kind": "paper", + "tier": "S", + "topic": "foundation_models", + "phase": "core", + "year": 2020, + "summary_zh": "DDPM 把生成模型重新表述为逐步加噪后再学习反向去噪的过程,把图像生成的可训练目标压成预测每一步的噪声。它的简洁性与稳定性让扩散模型在两年内取代 GAN 成为图像、视频、动作生成的默认范式,也奠定了 Diffusion Policy、世界模型视频生成的方法学基础。", + "degree": 3 + }, + { + "id": "paper:lora", + "label": "LoRA", + "label_zh": "LoRA(低秩适配的高效微调)", + "kind": "paper", + "tier": "A", + "topic": "foundation_models", + "phase": "core", + "year": 2021, + "summary_zh": "LoRA 把大模型权重 W 的微调改写成 W + B·A 的低秩残差形式,其中 A、B 远小于 W。这使百亿参数模型可以在单卡几 GB 显存里完成下游微调,是开源大模型生态在学界规模化的关键工程使能。", + "degree": 2 + }, + { + "id": "paper:altman_constrained_mdp", + "label": "Altman Constrained MDP", + "label_zh": "Altman 1999《约束 MDP》", + "kind": "paper", + "tier": "A", + "topic": "safety", + "phase": "prereq", + "year": 1999, + "summary_zh": "Altman 1999 的专著《约束马尔可夫决策过程》系统给出 CMDP 的形式化、可行性条件、对偶理论与求解算法。它是 CPO、Lagrangian Safe RL、RCPO、PID-Lagrangian 等所有现代安全 RL 算法的理论基底。", + "degree": 3 + }, + { + "id": "paper:schulman2015_trpo", + "label": "TRPO", + "label_zh": "TRPO(信赖域策略优化)", + "kind": "paper", + "tier": "S", + "topic": "deep_rl", + "phase": "core", + "year": 2015, + "summary_zh": "TRPO 把策略改进步骤限制在新旧策略 KL 散度不超过 δ 的信赖域内,并给出近似单调改进的理论保证。它的二阶求解(共轭梯度 + 自然梯度)虽然工程门槛高,但为 PPO、CPO 等后续算法奠定了信赖域几何基石。", + "degree": 3 + }, + { + "id": "paper:rcpo", + "label": "RCPO", + "label_zh": "RCPO(奖励约束的策略优化)", + "kind": "paper", + "tier": "B", + "topic": "safety", + "phase": "core", + "year": 2018, + "summary_zh": "Tessler 等 2018 提出 Reward Constrained Policy Optimization (RCPO),把约束代价直接通过 Lagrangian 乘子加进奖励信号,再用标准 actor-critic 优化。它简化了 CPO 的二阶投影,是 Lagrangian Safe RL 的早期统一框架。", + "degree": 2 + }, + { + "id": "paper:pid_lagrangian", + "label": "PID-Lagrangian", + "label_zh": "PID-Lagrangian(PID 控制的乘子更新)", + "kind": "paper", + "tier": "B", + "topic": "safety", + "phase": "core", + "year": 2020, + "summary_zh": "Stooke 等 2020 把安全 RL 中的乘子更新从纯梯度上升换成 PID 控制器,让约束响应在违反扩大或缩小时都更平滑。它解决了 Lagrangian Safe RL 长期存在的乘子震荡问题,是工业部署友好的安全 RL 改进。", + "degree": 2 + }, + { + "id": "paper:safe_rl_carla", + "label": "Safe RL on CARLA", + "label_zh": "驾驶场景下的安全 RL(CARLA / nuPlan 实证)", + "kind": "paper", + "tier": "B", + "topic": "safety", + "phase": "frontier", + "year": 2022, + "summary_zh": "代表一类把 CPO / Lagrangian Safe RL 在 CARLA 或 nuPlan 闭环上做实证的工作(如 SafeDriver、ConstrainedDrive),把约束 MDP 框架与碰撞、违规、舒适度多约束驾驶任务直接对齐。它是安全 RL 从 Safety Gym 走向驾驶基准的桥梁。", + "degree": 2 } ], "edges": [ @@ -7745,17 +7893,17 @@ "rel": "manifests" }, { - "source": "insight:world_models_let_planning_be_done_in_imagination", + "source": "insight:world_model_as_inner_simulator_unlocks_long_horizon_planning", "target": "paper:world_models", "rel": "manifests" }, { - "source": "insight:world_models_let_planning_be_done_in_imagination", + "source": "insight:world_model_as_inner_simulator_unlocks_long_horizon_planning", "target": "paper:dreamer_v3", "rel": "manifests" }, { - "source": "insight:world_models_let_planning_be_done_in_imagination", + "source": "insight:world_model_as_inner_simulator_unlocks_long_horizon_planning", "target": "paper:muzero", "rel": "manifests" }, @@ -8036,7 +8184,7 @@ }, { "source": "problem:long_horizon_credit_assignment_in_driving", - "target": "insight:world_models_let_planning_be_done_in_imagination", + "target": "insight:world_model_as_inner_simulator_unlocks_long_horizon_planning", "rel": "motivates" }, { @@ -9331,12 +9479,12 @@ }, { "source": "concept:imitation_learning", - "target": "insight:human_demonstrations_compress_implicit_reward_function", + "target": "insight:imitation_data_compresses_unspecified_reward", "rel": "composes" }, { "source": "paper:ross2011_dagger", - "target": "insight:human_demonstrations_compress_implicit_reward_function", + "target": "insight:imitation_data_compresses_unspecified_reward", "rel": "composes" }, { @@ -10150,7 +10298,7 @@ "rel": "composes" }, { - "source": "insight:human_demonstrations_compress_implicit_reward_function", + "source": "insight:imitation_data_compresses_unspecified_reward", "target": "validation:trace_dataset_aggregation_for_imitation", "rel": "composes" }, @@ -10255,7 +10403,7 @@ "rel": "composes" }, { - "source": "insight:human_demonstrations_compress_implicit_reward_function", + "source": "insight:imitation_data_compresses_unspecified_reward", "target": "validation:trace_diffusion_policy_as_score_based_action_sampler", "rel": "composes" }, @@ -10380,7 +10528,7 @@ "rel": "composes" }, { - "source": "insight:human_demonstrations_compress_implicit_reward_function", + "source": "insight:imitation_data_compresses_unspecified_reward", "target": "validation:trace_decision_transformer_offline_rl_via_sequence_modeling", "rel": "composes" }, @@ -12689,6 +12837,126 @@ "target": "paper:clip", "rel": "covers" }, + { + "source": "paper:gpt3", + "target": "paper:chinchilla", + "rel": "extends" + }, + { + "source": "paper:chinchilla", + "target": "insight:scaling_laws_predict_capability_emergence", + "rel": "composes" + }, + { + "source": "paper:chinchilla", + "target": "insight:scaling_data_unlocks_capabilities_not_present_in_smaller_models", + "rel": "composes" + }, + { + "source": "paper:watkins_dayan_qlearning", + "target": "paper:mnih2015_dqn", + "rel": "prereq" + }, + { + "source": "paper:watkins_dayan_qlearning", + "target": "insight:offline_rl_is_actually_constrained_dynamic_programming", + "rel": "composes" + }, + { + "source": "paper:bear", + "target": "paper:cql", + "rel": "prereq" + }, + { + "source": "paper:bear", + "target": "insight:offline_rl_is_actually_constrained_dynamic_programming", + "rel": "composes" + }, + { + "source": "paper:ddpm", + "target": "paper:diffusion_policy_chi2023", + "rel": "prereq" + }, + { + "source": "paper:ddpm", + "target": "insight:residual_learning_unlocks_arbitrary_depth", + "rel": "composes" + }, + { + "source": "paper:ddpm", + "target": "move:diffusion_denoise_sampling", + "rel": "composes" + }, + { + "source": "paper:lora", + "target": "insight:residual_learning_unlocks_arbitrary_depth", + "rel": "composes" + }, + { + "source": "paper:lora", + "target": "paper:llama", + "rel": "parallel" + }, + { + "source": "paper:altman_constrained_mdp", + "target": "paper:cpo_safe_rl", + "rel": "prereq" + }, + { + "source": "paper:altman_constrained_mdp", + "target": "paper:lagrangian_safe_rl", + "rel": "prereq" + }, + { + "source": "paper:altman_constrained_mdp", + "target": "paradigm:safe_rl", + "rel": "composes" + }, + { + "source": "paper:schulman2015_trpo", + "target": "paper:schulman2017_ppo", + "rel": "prereq" + }, + { + "source": "paper:schulman2015_trpo", + "target": "paper:cpo_safe_rl", + "rel": "prereq" + }, + { + "source": "paper:schulman2015_trpo", + "target": "move:trust_region_step_for_monotonic_improvement", + "rel": "composes" + }, + { + "source": "paper:rcpo", + "target": "paper:lagrangian_safe_rl", + "rel": "parallel" + }, + { + "source": "paper:rcpo", + "target": "paradigm:safe_rl", + "rel": "composes" + }, + { + "source": "paper:pid_lagrangian", + "target": "paper:lagrangian_safe_rl", + "rel": "extends" + }, + { + "source": "paper:pid_lagrangian", + "target": "insight:safety_emerges_from_constraint_lagrangian_not_reward_shaping", + "rel": "composes" + }, + { + "source": "paper:safe_rl_carla", + "target": "paradigm:safe_rl", + "rel": "composes" + }, + { + "source": "paper:safe_rl_carla", + "target": "problem:exploration_in_safety_critical_systems", + "rel": "motivates" + }, { "source": "validation:trace_unified_planning_oriented_e2e_driving", "target": "paradigm:differentiable_end_to_end_imitation", diff --git a/docs/data/graph_extended.stats.json b/docs/data/graph_extended.stats.json index c0d57b8..982a258 100644 --- a/docs/data/graph_extended.stats.json +++ b/docs/data/graph_extended.stats.json @@ -1,8 +1,8 @@ { - "node_count": 489, - "edge_count": 1416, + "node_count": 499, + "edge_count": 1440, "by_kind": { - "paper": 174, + "paper": 184, "channel": 3, "course": 2, "essay": 1, @@ -16,9 +16,9 @@ }, "by_tier": { "spine": 14, - "S": 36, - "A": 71, - "B": 59, + "S": 39, + "A": 74, + "B": 63, "concept": 25, "lab": 11, "move": 129, @@ -34,14 +34,14 @@ "ssl_vision": 37, "math_foundations": 14, "companion_media": 2, - "rl_foundations": 8, - "deep_rl": 78, + "rl_foundations": 9, + "deep_rl": 80, "meta_philosophy": 3, "world_models": 32, "planning": 15, "control": 7, - "safety": 11, - "foundation_models": 15, + "safety": 15, + "foundation_models": 18, "alignment": 4, "llm_agent": 14, "reasoning": 8, @@ -57,17 +57,17 @@ "scene_understanding": 26 }, "by_rel": { - "prereq": 82, + "prereq": 89, "covers": 283, - "parallel": 108, + "parallel": 110, "contrasts": 23, - "extends": 35, + "extends": 37, "feeds": 88, "implements": 14, "manifests": 168, "enables": 58, - "composes": 343, - "motivates": 143, + "composes": 355, + "motivates": 144, "validates": 64, "unsolved_by": 7 }, @@ -77,6 +77,7 @@ "foundation_axis": 93, "insights_and_validations": 69, "methodology_axis": 84, - "perception_axis": 75 + "perception_axis": 75, + "wave_e_stubs": 10 } } diff --git a/docs/data/layout_positions.json b/docs/data/layout_positions.json index 3deb3a6..b929756 100644 --- a/docs/data/layout_positions.json +++ b/docs/data/layout_positions.json @@ -1 +1 @@ -{"channel:3blue1brown":[214.91,83.558,89.998],"channel:ez_encoder_academy":[-51.848,102.702,102.514],"channel:mu_li_bilibili":[85.309,50.305,86.871],"concept:actor_critic":[-93.042,54.219,-416.842],"concept:bellman_eq":[6.969,-9.145,-356.63],"concept:bev":[142.675,200.545,-13.566],"concept:cot":[-150.903,-23.104,68.58],"concept:counterfactual":[-195.474,-151.583,-49.496],"concept:covariate_shift":[-165.127,206.853,-287.824],"concept:detr_query":[185.645,188.586,24.061],"concept:dqn":[-53.217,156.992,-366.803],"concept:imitation_learning":[-54.654,184.979,-112.837],"concept:mdp":[-35.766,2.446,-344.733],"concept:meta_action":[-213.885,-118.763,-56.173],"concept:policy_gradient":[-106.469,27.167,-355.412],"concept:ppo":[-129.814,114.886,-387.207],"concept:replay_buffer":[-94.731,158.571,-388.756],"concept:rlhf":[-329.485,118.865,-208.336],"concept:scaling_vs_knowledge":[-47.048,12.808,0.851],"concept:self_attention":[175.343,102.02,119.556],"concept:spiking_nn":[121.089,127.995,297.225],"concept:ssl":[152.499,-106.655,4.589],"concept:td_learning":[54.251,43.828,-334.008],"concept:tool_use":[-209.944,-26.291,115.363],"concept:transformer":[91.609,100.102,72.384],"concept:value_iteration":[73.504,-72.485,-319.291],"concept:vla":[-118.143,-83.484,43.83],"concept:vlm":[-40.651,-75.664,93.361],"course:cs285":[-120.537,135.116,-265.777],"course:zhao_rl":[-23.771,50.263,-354.776],"essay:bitter_lesson":[-16.319,55.141,-5.509],"insight:agent_loop_is_just_iterated_conditional_generation":[-302.726,40.767,177.676],"insight:alignment_is_constraint_satisfaction_over_generation":[-275.124,95.849,-130.456],"insight:attention_is_typed_entity_communication":[152.723,139.433,82.515],"insight:bev_is_planning_friendly_intermediate":[135.299,209.046,-76.198],"insight:bigger_model_plus_more_data_beats_clever_priors":[18.261,151.411,-290.33],"insight:closed_loop_evaluation_is_the_only_ground_truth_for_planners":[-234.959,174.342,74.016],"insight:contrastive_alignment_creates_zero_shot_transfer":[41.163,-142.612,75.213],"insight:control_theory_and_rl_meet_in_optimal_control":[100.322,-37.356,-388.24],"insight:counterfactual_replanning_separates_intent_from_execution":[-157.774,-97.976,-129.393],"insight:data_engine_loop_is_more_valuable_than_static_dataset":[-269.158,249.383,148.822],"insight:differentiable_rendering_is_universal_inverse_solver":[374.173,-13.053,-163.397],"insight:diffusion_unifies_generation_and_decision":[-118.71,115.787,-141.65],"insight:dual_system_fast_slow_loop_marries_reactive_and_deliberative_control":[-300.176,-80.73,106.913],"insight:dual_system_handles_latency_quality_tradeoff":[-107.355,-70.328,161.451],"insight:emergent_planning_from_next_token_prediction_alone":[-218.662,67.883,162.909],"insight:end_to_end_differentiable_beats_handcraft_when_signal_strong":[-4.795,169.43,11.072],"insight:event_driven_computation_matches_natural_sparsity_of_driving_scene":[243.088,124.105,318.247],"insight:event_sparse_compute_matches_energy_budget":[122.563,164.526,241.018],"insight:foundation_features_transfer_without_finetune":[176.443,-128.366,-48.353],"insight:foundation_model_decouples_perception_from_task_specific_training":[261.175,-65.812,100.025],"insight:foundation_pretraining_decouples_data_from_task":[85.015,-113.134,47.268],"insight:hardware_software_co_design_unlocks_orders_of_magnitude_efficiency":[236.738,86.32,346.859],"insight:human_demonstrations_compress_implicit_reward_function":[-118.136,190.358,-179.013],"insight:imitation_learning_alone_cannot_recover_from_compounding_errors":[-103.118,346.075,-251.354],"insight:implicit_vs_explicit_is_a_continuum":[483.619,-9.457,-123.154],"insight:in_context_learning_emerges_at_scale":[-109.881,11.378,100.244],"insight:language_is_compressed_world_model_for_human_priors":[-74.35,6.788,186.094],"insight:long_tail_solved_by_synthesis_not_data_alone":[-96.43,-64.279,-84.404],"insight:masked_prediction_yields_self_supervised_signal":[131.388,-125.456,-35.722],"insight:multi_view_geometry_as_free_supervision":[330.227,-73.024,-132.381],"insight:occupancy_unifies_static_and_dynamic_scene":[335.429,112.163,-24.533],"insight:offline_metrics_co_evolve_with_methods_so_must_be_re_audited":[-272.978,226.867,-7.323],"insight:offline_rl_is_actually_constrained_dynamic_programming":[-116.202,342.317,-365.835],"insight:open_vocabulary_via_language_anchoring":[238.848,-100.507,72.507],"insight:open_weight_release_compounds_research_velocity":[-28.282,-57.601,323.039],"insight:residual_learning_unlocks_arbitrary_depth":[201.466,54.129,143.896],"insight:safety_constraints_via_lagrangian_dual":[-169.378,44.064,-408.053],"insight:safety_emerges_from_constraint_lagrangian_not_reward_shaping":[-259.955,32.34,-360.629],"insight:safety_emerges_from_layered_constraints_not_single_objective":[-201.903,62.641,474.573],"insight:scaling_data_unlocks_capabilities_not_present_in_smaller_models":[-1.272,11.531,137.189],"insight:scaling_laws_predict_capability_emergence":[3.482,0.614,41.61],"insight:set_prediction_eliminates_postprocessing_heuristics":[129.733,205.857,106.629],"insight:simulator_realism_is_lower_bound_on_training_value":[-26.405,159.736,182.583],"insight:symbolic_intermediate_enables_interpretability_and_transfer":[-182.771,-37.915,82.974],"insight:temporal_aggregation_buys_what_depth_sensor_buys":[255.765,185.701,-180.754],"insight:test_time_compute_substitutes_train_time_via_search":[-151.275,40.379,-293.936],"insight:tokenization_collapses_modality_gap":[41.834,-36.759,74.052],"insight:tokenized_trajectories_let_planning_borrow_from_language_modeling":[53.709,95.476,-199.911],"insight:tool_use_extends_language_model_into_environment_grounded_actor":[-307.413,95.271,143.004],"insight:uncertainty_calibration_is_prerequisite_for_safe_delegation":[-319.92,85.938,431.011],"insight:world_model_video_diffusion_is_implicit_physics_engine":[-15.657,-196.239,-99.84],"insight:world_models_let_planning_be_done_in_imagination":[-103.456,-37.024,-178.028],"lab:lab00":[510.767,137.796,176.56],"lab:lab01":[85.024,-9.731,-310.014],"lab:lab02":[-211.521,204.488,-158.926],"lab:lab03":[140.623,276.209,151.562],"lab:lab04":[-92.304,257.393,165.306],"lab:lab05":[170.264,-91.244,-135.407],"lab:lab06":[194.294,193.915,269.635],"lab:lab07":[-330.168,-47.75,54.752],"lab:lab08":[-320.724,-54.852,176.262],"lab:lab09":[-222.255,-129.871,164.165],"lab:lab10":[-259.855,-182.204,-21.333],"move:add_auxiliary_perspective_supervision_to_bev":[207.665,217.547,-301.824],"move:add_entropy_bonus_to_encourage_exploration":[-68.785,128.475,-532.012],"move:add_explanation_head_to_promote_interpretability":[-370.136,7.002,414.881],"move:add_intrinsic_motivation_via_novelty_or_curiosity":[-341.454,196.748,-368.063],"move:add_lagrangian_safety_constraint_to_actor_critic":[-140.707,3.943,-485.234],"move:add_reflection_step_so_agent_critiques_its_own_output":[-343.996,-21.219,123.379],"move:add_shield_layer_that_rejects_unsafe_actions_at_inference":[-217.875,103.2,464.41],"move:apply_gae_to_smooth_advantage_estimation":[13.761,104.39,-516.62],"move:apply_uncertainty_quantification_via_deep_ensemble_or_evidential_layer":[-310.383,125.949,388.05],"move:augment_dataset_via_offline_scenario_perturbation":[-133.822,217.751,206.5],"move:augment_supervised_training_with_counterfactual_or_synthetic_data":[-117.686,-182.214,-71.913],"move:augment_via_counterfactual_object_insertion":[296.183,-8.333,-153.386],"move:auto_label_with_offline_model_then_human_in_loop_validate":[-301.791,263.071,204.255],"move:bootstrap_target_network_to_stabilize_off_policy_learning":[-95.058,201.514,-503.629],"move:bridge_sim_and_real_via_neural_reconstruction":[331.381,-22.774,-65.166],"move:cache_KV_state_across_frames_to_amortize_attention_cost":[272.703,201.283,193.106],"move:cache_kv_state_to_amortize_long_context":[-76.248,36.923,310.049],"move:carry_object_query_across_time_as_recurrent_state":[299.628,226.291,-193.153],"move:cast_continuous_action_as_discretized_token_sequence":[59.03,137.2,-224.976],"move:cast_reasoning_as_search_over_thought_tree":[-383.658,17.177,-126.245],"move:clipped_surrogate_objective":[-163.81,107.409,-425.333],"move:co_design_silicon_with_algorithm_for_minimum_energy":[204.993,102.876,389.207],"move:co_finetune_language_model_with_action_data_jointly":[-96.459,-207.882,204.426],"move:condition_on_language_meta_action_then_emit_low_level_action":[-188.927,-147.446,40.311],"move:condition_video_generative_model_on_control_action_for_world_model":[-54.696,-184.844,-135.968],"move:contrast_corner_case_against_normal_case_in_training":[-131.955,-221.861,-47.196],"move:contrastive_alignment":[94.117,-157.112,53.45],"move:cotrain_dynamics_model_with_policy_to_share_representations":[-30.697,-217.144,-277.649],"move:counterfactual_replan":[-190.579,-78.621,-82.663],"move:cross_attention_query":[72.881,133.337,76.506],"move:dataset_aggregation":[-217.197,244.313,-302.524],"move:decompose_scene_into_static_and_dynamic_streams":[423.181,-73.969,-84.457],"move:design_closed_loop_metric_correlated_with_real_world_safety":[-280.614,222.686,38.146],"move:diffusion_denoise_sampling":[-91.069,72.656,-155.272],"move:discrete_latent_state_for_world_model":[-135.682,-191.653,-257.16],"move:distill_internet_data_into_small_specialist":[174.151,-15.584,-95.278],"move:distill_large_VLM_into_small_realtime_specialist":[182.794,10.049,250.787],"move:distill_large_model_into_specialist_for_deployment":[-153.445,-44.75,194.493],"move:distill_privileged_teacher_to_sensor_student":[1.776,258.529,-207.125],"move:double_q_to_reduce_overestimation":[-89.999,188.84,-556.208],"move:dual_system_fast_slow":[-137.667,-98.713,138.353],"move:embed_camera_geometry_into_positional_encoding":[238.65,325.238,-154.808],"move:emergent_segmentation_from_self_distillation":[268.555,-130.144,-29.228],"move:evaluate_open_loop_then_close_loop_for_realism":[-185.124,74.225,223.108],"move:expectile_or_quantile_target_for_distributional_robustness":[-193.717,297.089,-449.44],"move:expert_iteration_self_distillation":[-271.522,-4.376,-277.922],"move:fine_tune_with_instruction_data_then_align_with_preferences":[-270.479,129.404,-83.215],"move:formalize_safety_case_with_claim_evidence_assumption":[-328.575,-9.22,404.194],"move:freeze_giant_backbone_train_small_adapter":[121.14,-198.826,15.821],"move:freeze_visual_encoder_and_only_train_connector":[87.722,-189.482,175.175],"move:fuse_modalities_in_shared_intermediate_space":[132.788,206.563,-159.969],"move:guided_sampling_through_classifier_gradients_at_inference":[-174.177,236.868,-94.516],"move:hindsight_experience_relabeling":[-200.253,184.059,-551.078],"move:implement_spiking_neuron_with_surrogate_gradient_for_backprop":[160.287,169.376,312.388],"move:latent_imagination_rollout":[-66.144,-10.91,-172.926],"move:league_play_for_policy_diversity":[52.292,274.968,-458.123],"move:learn_motion_in_latent_space_then_decode":[91.091,-62.443,-132.758],"move:learn_world_model_then_plan_in_latent_imagination":[-128.745,-154.971,-217.7],"move:lift_2d_features_to_3d_via_learned_depth_distribution":[190.139,165.222,-142.638],"move:lift_2d_to_3d":[171.93,230.21,88.277],"move:long_horizon_via_hierarchical_subgoal":[-345.582,13.043,200.386],"move:make_camera_only_temporal_match_lidar":[286.741,207.385,-251.336],"move:make_pipeline_differentiable_via_shared_latent":[21.065,267.803,47.657],"move:masking_for_pretext":[106.909,-148.148,-24.334],"move:open_vocabulary_via_text_alignment":[305.084,-98.322,48.558],"move:patchify_tokenization":[151.044,21.943,97.199],"move:perform_neural_architecture_search_with_latency_constraint":[341.309,59.36,256.01],"move:plan_via_cross_entropy_method_on_dynamics_model":[-205.891,-147.596,-295.294],"move:plan_with_mcts_in_learned_model":[-215.876,-103.789,-220.287],"move:plug_in_modality_encoder_to_frozen_language_model_via_projection":[32.492,-164.617,210.649],"move:pretrain_with_contrastive_alignment_between_modalities":[29.041,-225.998,138.974],"move:prompt_chain_with_explicit_persona_roles":[-389.376,59.397,63.74],"move:quantize_attention_to_int8_with_calibration":[225.796,1.958,231.816],"move:rasterize_differentiable_renderer_for_inverse_problem":[444.704,-0.324,-168.452],"move:replace_class_specific_box_with_class_agnostic_occupancy":[307.283,94.023,-6.944],"move:replace_dense_attention_with_sparse_event_driven_attention":[236.301,135.368,271.576],"move:replace_explicit_action_head_with_tokenized_action_sequence":[-90.572,-150.417,233.73],"move:replace_explicit_critic_with_diffusion_score":[-229.276,259.689,-180.269],"move:replace_explicit_module_with_implicit_function":[436.882,-38.585,-209.186],"move:replace_handcrafted_sfm_with_feedforward_transformer":[148.277,52.494,-110.142],"move:replace_softmax_attention_with_linear_kernel_for_long_sequence":[319.993,120.032,185.326],"move:replace_value_function_with_implicit_max_via_expectile":[-228.131,302.474,-413.375],"move:replay_and_target_net":[74.055,144.888,-418.812],"move:replay_buffer_prioritize_safety_critical_transitions":[-227.194,216.703,291.931],"move:reproject_3d_query_to_2d_for_feature_sampling":[298.42,232.262,-96.093],"move:residual_connection":[204.442,80.048,188.078],"move:reward_model_from_pairwise_human_preferences":[-215.063,187.931,-267.713],"move:rewrite_continuous_video_as_token_sequence_for_transformer_world_model":[2.656,-190.746,-149.86],"move:run_active_learning_loop_to_query_hardest_unlabeled_frames":[-272.538,307.69,230.019],"move:run_continual_learning_with_rehearsal_buffer_against_forgetting":[-213.186,316.184,232.296],"move:run_replay_simulation_with_perturbed_initial_conditions_for_robustness":[-194.842,242.448,141.601],"move:safety_shield_filters_unsafe_actions":[-127.061,-100.171,-511.532],"move:scale_data_then_let_emergent_capabilities_appear":[-17.176,-38.332,25.938],"move:scale_pretraining_then_fine_tune_with_minimal_labels":[147.871,-57.455,-70.259],"move:self_play_with_search":[-185.034,62.195,-326.321],"move:set_prediction_with_hungarian":[73.874,193.476,105.146],"move:share_LiDAR_camera_calibration_via_continuous_time_optimization":[-387.805,263.309,184.605],"move:share_queries_across_multiple_tasks":[96.891,282.251,-10.418],"move:specify_safety_constraint_as_signal_temporal_logic_then_verify":[-184.157,94.008,428.881],"move:speculative_decoding_with_draft_model":[-97.646,-6.517,367.866],"move:spike_event_compute":[77.219,163.072,237.16],"move:swap_implicit_for_explicit_primitives_when_compute_allows":[476.064,32.22,-91.453],"move:tile_attention_to_fit_SRAM_for_speedup":[319.609,170.192,204.894],"move:tokenize_continuous_signal_to_use_transformer":[269.893,-52.869,-20.975],"move:tokenize_modalities":[12.984,6.515,-23.087],"move:tokenize_pixel_frames_for_autoregressive_world_model":[-57.652,-140.174,-175.303],"move:tool_use_function_calling":[-255.862,-78.481,168.948],"move:track_metric_correlation_offline_vs_closed_loop_to_select_models":[-271.593,269.162,18.64],"move:treat_corner_case_as_OOD_detection_then_route_to_human":[-292.485,115.716,432.063],"move:treat_detection_as_set_prediction_with_learnable_queries":[173.195,220.667,-36.057],"move:treat_planning_as_autoregressive_trajectory_generation":[-114.908,2.616,218.84],"move:treat_planning_as_conditional_generation":[-142.515,208.032,-60.025],"move:trust_region_step_for_monotonic_improvement":[-126.305,49.525,-508.966],"move:turn_offline_dataset_into_supervised_sequence_prediction":[50.035,187.642,-245.357],"move:two_stage_coarse_to_fine_trajectory":[37.495,305.869,-73.83],"move:use_difficulty_aware_curriculum_to_accelerate_RL":[-226.607,181.049,185.597],"move:use_diffusion_head_for_continuous_action":[-263.339,-31.161,-26.49],"move:use_event_camera_microsecond_latency_for_emergency_braking":[176.017,35.978,417.658],"move:use_geometry_as_self_supervision":[342.361,-75.092,-192.215],"move:use_language_explanation_as_auxiliary_supervision":[-187.087,-193.104,180.348],"move:use_n_step_returns_to_trade_bias_for_variance":[-45.969,63.843,-544.083],"move:use_pretrained_language_model_as_action_prior":[-180.576,120.583,15.517],"move:use_prioritized_replay_buffer":[-138.842,204.787,-442.571],"move:use_retrieval_augmented_memory_to_extend_context":[-329.605,66.643,193.24],"move:use_self_play_to_generate_unlimited_training_signal":[-360.167,66.097,-123.267],"move:use_visibility_mask_to_filter_supervision":[501.969,49.431,55.928],"move:use_world_model_rollout_as_critic_for_policy":[-131.386,-153.518,-125.007],"move:warm_start_rl_with_imitation_then_anneal":[-55.64,264.708,-289.625],"move:wrap_language_model_with_tool_calling_loop":[-345.313,110.99,184.205],"paper:2210.14222":[-56.954,159.74,66.597],"paper:2212.10156":[69.377,195.288,1.112],"paper:2307.01694":[145.821,103.735,219.577],"paper:2309.16292":[-206.251,27.489,86.392],"paper:2311.10813":[-222.327,47.016,117.728],"paper:2402.12289":[-98.694,-42.401,94.297],"paper:2508.10104":[110.494,-19.939,-7.236],"paper:2512.24426":[-141.789,-117.771,-22.519],"paper:3dgs":[414.276,18.258,-108.601],"paper:a3c_a2c":[-15.509,139.164,-471.181],"paper:ad_benchmarks":[-132.272,166.231,74.308],"paper:alphastar":[10.672,224.937,-394.026],"paper:apolloscape":[188.025,305.09,320.902],"paper:argoverse2":[-44.586,290.821,171.844],"paper:awq":[160.882,-57.461,198.646],"paper:bdd100k":[101.582,313.423,277.164],"paper:beit":[261.755,-41.267,26.41],"paper:bench2drive":[-145.065,242.116,43.691],"paper:bevdet":[239.427,118.968,-171.406],"paper:bevdet4d":[298.229,169.6,-229.192],"paper:bevformer_v2":[188.997,204.831,-235.592],"paper:bevfusion":[151.231,167.094,-119.36],"paper:blip2":[144.827,-168.581,47.728],"paper:bpref":[-294.419,143.217,-256.45],"paper:calql":[-71.705,306.987,-353.457],"paper:cambrian":[145.573,-180.848,141.2],"paper:carion2020":[111.624,157.602,34.386],"paper:carla_lb2":[-108.591,212.291,114.389],"paper:cilqr":[140.908,50.613,-426.237],"paper:claude":[-289.53,118.002,26.22],"paper:clip":[186.344,-116.22,64.242],"paper:codetraj":[-117.374,87.942,-18.239],"paper:commonroad":[-140.108,130.241,349.406],"paper:constitutional_ai":[-319.468,118.433,-78.246],"paper:cosmos":[-57.255,-165.801,-88.858],"paper:cot_wei2022":[-317.086,46.132,53.498],"paper:cpo_safe_rl":[-201.43,16.192,-462.31],"paper:cql":[-110.814,281.925,-345.365],"paper:debate":[-429.556,79.266,-42.241],"paper:decision_transformer":[-29.291,165.439,-188.708],"paper:depth_anything":[235.421,25.895,-117.445],"paper:detr3d":[238.142,210.707,-58.383],"paper:diffuser":[-142.654,159.717,-110.591],"paper:diffusion_planner":[-125.857,256.901,-97.527],"paper:diffusion_policy_chi2023":[-159.545,253.177,-160.633],"paper:dinov1":[220.38,-98.629,6.541],"paper:dinov2":[173.687,-49.523,1.053],"paper:distill_vlm":[21.216,-41.246,168.041],"paper:dit":[64.303,-103.672,-47.361],"paper:dreamer_v2":[-81.474,-150.353,-263.59],"paper:dreamer_v3":[-104.842,-107.091,-240.955],"paper:drivedreamer":[-75.479,-100.605,-99.411],"paper:drivelm":[-113.976,-189.616,117.273],"paper:drivemlm":[-224.396,6.514,218.031],"paper:drivinggaussian":[353.316,-22.318,-108.694],"paper:dvs_event_camera":[182.753,69.171,331.394],"paper:emernerf":[376.983,-72.756,-115.18],"paper:emma":[-133.907,-117.49,198.815],"paper:flamingo":[41.322,-181.282,159.659],"paper:flashattention":[229.487,150.306,132.995],"paper:florence":[243.508,-160.645,138.197],"paper:gaia1":[-44.895,-93.81,-113.273],"paper:gemini":[-167.285,-23.335,297.576],"paper:gpt3":[-83.743,31.368,59.603],"paper:gpt4":[-123.455,33.64,154.68],"paper:gpt4v":[-182.374,-71.733,260.171],"paper:gpt_driver":[-99.696,-48.29,217.834],"paper:gptq":[92.078,-25.912,170.901],"paper:grai":[171.554,103.995,447.233],"paper:gs_for_ad":[-137.193,255.753,203.545],"paper:he2015_resnet":[151.137,40.441,153.119],"paper:highway_env":[-183.948,184.387,332.897],"paper:ilqr_classic":[182.248,12.064,-443.887],"paper:impala":[32.254,159.938,-504.231],"paper:instructgpt":[-208.383,79.654,-20.962],"paper:interaction_dataset":[-198.496,293.287,33.051],"paper:interfuser":[-14.448,271.699,-129.909],"paper:internvl":[43.152,-115.921,262.874],"paper:iql":[-149.28,290.089,-371.902],"paper:iris_world_model":[3.471,-51.871,-156.943],"paper:iso26262":[-273.014,1.828,433.749],"paper:lagrangian_safe_rl":[-202.318,-22.459,-428.991],"paper:li2022bevformer":[137.173,152.858,-28.467],"paper:lidar_cam_calib":[-443.081,252.109,159.638],"paper:lift_splat_shoot":[213.508,137.07,-120.886],"paper:linear_attention":[294.851,119.924,103.132],"paper:lingo2":[-165.457,-168.489,109.688],"paper:llama":[-32.295,5.346,274.338],"paper:llava":[29.022,-102.421,105.067],"paper:lmdrive":[-144.159,29.055,279.422],"paper:loihi2":[143.791,105.397,342.546],"paper:lqr_classic":[160.074,-47.494,-424.895],"paper:lyft_l5":[-37.227,380.392,140.219],"paper:mae":[204.632,-76.648,-36.669],"paper:mamba":[340.683,133.628,127.622],"paper:mbrl_pets":[-126.857,-135.548,-297.584],"paper:metadrive":[-172.352,180.657,264.061],"paper:mile_driving":[-59.11,-183.114,-226.55],"paper:mistral":[-34.739,10.458,362.892],"paper:mnih2015_dqn":[-66.316,148.614,-448.908],"paper:most_simagents":[14.892,71.764,-171.867],"paper:mpc_book":[143.191,-2.561,-398.627],"paper:mpo":[-155.091,92.26,-500.773],"paper:muzero":[-184.569,-52.565,-240.187],"paper:navsim":[-232.013,241.774,27.627],"paper:nerf":[410.15,-43.819,-161.478],"paper:nuplan":[-189.66,219.366,37.474],"paper:nuplan_baselines":[-97.502,153.1,-55.468],"paper:occ3d":[437.619,47.163,54.709],"paper:occupancy_networks_tesla":[204.923,148.884,-0.833],"paper:octo":[-242.385,-171.638,91.666],"paper:openai_five":[23.254,130.921,-435.966],"paper:openocc_unic":[495.87,16.753,103.433],"paper:openvla":[-107.837,-167.484,172.396],"paper:palme":[-20.629,-114.801,212.51],"paper:pandaset":[43.165,335.299,252.972],"paper:pebble":[-286.607,168.332,-298.396],"paper:performer":[257.598,101.397,122.695],"paper:petr":[227.837,287.579,-117.712],"paper:petrv2":[183.182,329.771,-105.75],"paper:prism1":[-90.1,-199.79,-13.85],"paper:qwen":[29.783,-40.92,359.716],"paper:react":[-349.852,56.978,137.906],"paper:redq":[-140.369,196.991,-574.556],"paper:reflexion":[-402.29,23.099,185.452],"paper:rlhf_dpo":[-219.657,71.38,-140.202],"paper:roach":[4.603,303.374,-198.071],"paper:ross2011_dagger":[-131.731,222.682,-211.463],"paper:rt1":[-53.563,-232.074,237.148],"paper:rt2":[-46.062,-154.29,153.06],"paper:rtx":[-162.103,-213.509,147.606],"paper:sac":[-109.377,145.164,-506.574],"paper:sam":[216.737,-26.371,-3.877],"paper:schulman2017_ppo":[-69.354,84.336,-465.989],"paper:self_consistency":[-431.862,45.734,10.186],"paper:senna":[-187.823,-134.075,88.128],"paper:shielded_rl":[-178.154,-65.483,-477.914],"paper:shift_dataset":[-101.268,267.949,250.996],"paper:silver2017_alphazero":[-220.486,15.194,-205.262],"paper:simclr_mocov3":[224.755,-173.507,-60.284],"paper:simplebev":[290.56,101.405,-284.746],"paper:smarts":[-199.835,152.536,300.239],"paper:sora":[18.379,-147.503,-78.776],"paper:sotif_21448":[-259.619,72.135,427.783],"paper:streampetr":[251.096,227.633,-143.313],"paper:surroundocc":[243.966,63.906,-6.819],"paper:sutton_barto":[-10.702,74.894,-290.203],"paper:svd":[34.692,-197.326,-70.257],"paper:swiftsage":[-393.01,0.609,121.604],"paper:td3":[-50.379,185.389,-503.314],"paper:tesla_ai_day":[87.142,248.103,65.549],"paper:tesla_autolabel":[-160.225,270.985,154.333],"paper:thinktwice":[56.408,326.652,-155.794],"paper:tianjic":[150.591,112.447,386.215],"paper:toolformer":[-396.387,122.989,178.751],"paper:tot":[-400.181,26.717,-41.32],"paper:trajectory_transformer":[56.277,141.668,-155.695],"paper:trajeglish":[41.041,77.124,-115.358],"paper:transfuser":[-25.396,230.873,-32.258],"paper:truenorth":[98.276,112.174,395.107],"paper:ul4600":[-314.106,17.564,441.597],"paper:v2x_sim":[-26.345,239.973,262.684],"paper:vadv2":[-37.792,195.303,20.823],"paper:vaswani2017":[113.427,73.447,51.409],"paper:veo":[99.032,-228.302,-143.473],"paper:verifier":[-453.76,36.139,-79.755],"paper:vggt":[263.779,32.037,-70.885],"paper:vilt":[303.024,-193.656,11.201],"paper:vit":[137.493,4.938,48.794],"paper:voyager":[-370.541,28.268,163.46],"paper:waymo_motion":[-142.31,316.259,71.534],"paper:waymo_scenario_mining":[-234.257,312.353,176.256],"paper:womd_pred":[-121.094,290.109,102.714],"paper:world_models":[-79.297,-73.712,-181.59],"paradigm:brain_inspired_event_sparse_compute":[105.187,135.139,261.081],"paradigm:brain_inspired_neuromorphic_co_design":[169.296,96.499,286.43],"paradigm:camera_first_autonomy":[190.875,182.016,-59.235],"paradigm:closed_loop_data_engine_centric_development":[-213.907,238.961,90.243],"paradigm:counterfactual_data_centric_safety":[-158.129,-114.229,-75.739],"paradigm:differentiable_end_to_end_imitation":[5.377,205.568,24.379],"paradigm:foundation_model_axis":[14.101,-10.888,102.814],"paradigm:foundation_model_zero_shot_driving_agent":[-131.195,-45.696,113.244],"paradigm:imitation_learning":[-85.232,223.489,-177.597],"paradigm:knowledge_driven_reflective_agent":[-194.115,-4.812,55.492],"paradigm:llm_agent_paradigm":[-259.055,24.662,128.573],"paradigm:model_based_rl":[-72.746,-62.663,-270.489],"paradigm:model_based_world_imagination_planning":[-86.587,-25.839,-143.022],"paradigm:model_free_rl":[-41.743,96.458,-415.781],"paradigm:modular_perception_to_planning_pipeline":[120.688,216.121,48.795],"paradigm:neural_scene_reconstruction_as_engine":[258.737,-41.916,-117.01],"paradigm:offline_rl":[-84.963,229.232,-292.454],"paradigm:optimal_control":[92.532,1.983,-418.749],"paradigm:safe_rl":[-143.041,-1.506,-442.113],"paradigm:safety_by_constraint_layered_architecture":[-268.591,44.621,465.142],"paradigm:scaling_data_with_self_supervision":[117.201,-83.037,18.276],"paradigm:sequence_modeling_for_decision":[-33.436,121.486,-146.748],"paradigm:simulator_first_synthetic_data_centric":[-77.038,167.252,158.947],"paradigm:vla_paradigm":[-92.93,-114.691,90.313],"paradigm:world_model_paradigm":[-19.776,-101.482,-149.27],"problem:annotation_inconsistency_across_datasets":[406.829,99.683,4.003],"problem:auditability_of_decisions_for_regulatory_compliance":[-352.772,60.244,406.374],"problem:behavior_cloning_compounds_errors_over_time":[-82.215,298.053,-194.444],"problem:catastrophic_failure_on_rare_weather":[136.829,-11.823,-191.541],"problem:catastrophic_forgetting_after_action_finetuning":[-34.817,-214.358,145.671],"problem:catastrophic_forgetting_under_continual_learning":[-295.46,313.793,175.674],"problem:closed_loop_simulation_fidelity_gap":[-60.811,51.288,-53.491],"problem:counterfactual_reasoning_about_other_agents_intent":[-171.589,-164.823,-107.728],"problem:depth_ambiguity_in_low_parallax":[270.808,158.236,-282.935],"problem:distributional_shift_between_offline_data_and_deployment":[-101.05,312.896,-306.83],"problem:energy_budget_too_small_for_full_transformer_at_30fps":[274.656,100.219,272.829],"problem:evaluation_gap_between_offline_benchmark_and_closed_loop":[-236.669,73.152,231.468],"problem:exploration_in_safety_critical_systems":[-247.994,-7.485,-431.968],"problem:fine_grained_spatial_understanding_in_vision_language_model":[255.739,-175.317,54.444],"problem:grounding_language_token_to_continuous_physical_world":[-17.366,-159.354,209.085],"problem:hallucinated_action_from_vision_language_model_in_safety_critical_loop":[-286.792,-118.628,61.94],"problem:label_efficiency_for_3d_annotation":[240.049,-96.417,-131.255],"problem:label_noise_for_3d_object_categories":[-334.667,208.584,303.676],"problem:latency_budget_for_large_model_in_realtime_control":[-111.776,-40.869,266.511],"problem:long_horizon_credit_assignment_in_driving":[-148.688,26.35,-191.831],"problem:long_horizon_reasoning_with_finite_context_window":[-308.224,27.186,261.945],"problem:long_tail_object_categories_in_open_world":[320.759,-1.796,-14.177],"problem:multi_agent_interaction_modeling_in_dense_traffic":[-29.766,47.084,-151.018],"problem:multi_modal_calibration_drift":[119.617,251.98,-175.05],"problem:occlusion_reasoning_without_dense_lidar":[292.038,65.552,-82.933],"problem:offline_metric_does_not_predict_closed_loop_performance":[-295.192,232.968,79.733],"problem:open_world_corner_case_synthesis_for_training":[-85.612,-219.871,-70.163],"problem:planning_horizon_vs_compute_budget_tradeoff":[67.231,27.089,-379.806],"problem:rare_event_evaluation_with_no_ground_truth":[-180.787,124.526,-46.476],"problem:rare_safety_critical_events_dominate_real_risk_but_are_under_represented":[-248.547,245.813,220.162],"problem:realistic_other_agent_behavior_in_simulator":[-41.854,178.373,266.166],"problem:rendering_speed_vs_quality_tradeoff":[523.234,-1.341,-91.029],"problem:reward_hacking_in_learned_objectives":[-287.039,64.241,-259.659],"problem:reward_specification_for_safe_polite_driving":[-237.861,107.571,-260.141],"problem:sensor_calibration_drift_over_vehicle_lifetime":[-351.225,269.413,143.91],"problem:sim_to_real_gap_in_camera_only_perception":[239.661,70.861,48.16],"problem:simulator_visual_gap_breaks_perception_models":[-69.322,208.898,253.321],"problem:temporal_consistency_in_bev_segmentation":[348.691,204.854,-244.693],"problem:unknown_geometry_in_distant_or_dark_regions":[312.089,191.403,-302.596],"problem:verification_of_neural_network_safety_properties_at_scale":[-159.403,81.974,487.015],"problem:zero_shot_generalization_to_unseen_driving_scenes":[-103.672,-92.435,265.396],"validation:trace_alpha_zero_self_play_with_mcts_guided_policy":[-107.513,31.588,-279.095],"validation:trace_bird_eye_view_transformer_with_temporal_aggregation":[147.276,133.05,43.461],"validation:trace_brain_inspired_spike_attention":[118.396,99.787,174.2],"validation:trace_clipped_policy_gradient_surrogate":[-81.993,83.068,-383.632],"validation:trace_counterfactual_vla_replanner":[-120.399,-74.178,-54.076],"validation:trace_dataset_aggregation_for_imitation":[-138.071,200.537,-241.844],"validation:trace_decision_transformer_offline_rl_via_sequence_modeling":[-23.874,109.817,-63.575],"validation:trace_deep_q_network_with_replay_and_target":[-8.558,93.225,-375.119],"validation:trace_diffusion_policy_as_score_based_action_sampler":[-100.094,157.558,-154.943],"validation:trace_few_shot_in_context_learning_at_scale":[-18.213,31.095,68.075],"validation:trace_image_transformer_via_patch_tokenization":[96.77,11.39,76.381],"validation:trace_knowledge_driven_reflective_agent":[-145.691,23.622,71.988],"validation:trace_llm_decision_agent_for_driving":[-160.999,0.245,114.987],"validation:trace_modular_perception_pipeline_with_bev_fusion":[147.086,179.273,34.556],"validation:trace_neural_field_for_dynamic_driving_scene":[-14.689,-26.697,-79.905],"validation:trace_object_level_planning_transformer":[15.407,162.718,49.248],"validation:trace_safe_rl_via_lagrangian_constrained_optimization":[-125.163,54.989,-408.966],"validation:trace_scalable_self_supervised_vision_backbone":[85.277,-56.852,9.837],"validation:trace_self_attention_replaces_recurrence":[136.598,79.593,123.452],"validation:trace_set_prediction_with_object_queries":[117.901,149.39,98.427],"validation:trace_unified_planning_oriented_e2e_driving":[76.916,186.664,33.406],"validation:trace_vision_language_action_dual_loop":[-58.36,-30.26,86.251],"validation:trace_vision_language_pretrained_dual_encoder":[81.685,-73.407,67.425],"validation:trace_world_model_in_latent_imagination":[-71.972,-18.747,-265.63]} \ No newline at end of file +{"channel:3blue1brown":[212.104,70.469,116.044],"channel:ez_encoder_academy":[-60.323,106.315,106.906],"channel:mu_li_bilibili":[84.992,53.232,86.823],"concept:actor_critic":[-111.533,70.635,-397.593],"concept:bellman_eq":[10.628,-20.033,-352.274],"concept:bev":[157.285,197.065,-15.386],"concept:cot":[-146.309,-26.145,73.795],"concept:counterfactual":[-190.526,-155.058,-36.962],"concept:covariate_shift":[-149.059,194.141,-284.643],"concept:detr_query":[191.25,197.268,23.11],"concept:dqn":[-37.204,141.441,-361.091],"concept:imitation_learning":[-31.664,184.064,-113.899],"concept:mdp":[-29.264,-5.482,-336.371],"concept:meta_action":[-210.133,-123.488,-43.254],"concept:policy_gradient":[-97.212,19.041,-347.07],"concept:ppo":[-116.973,113.593,-383.01],"concept:replay_buffer":[10.149,143.17,-386.587],"concept:rlhf":[-322.317,125.252,-203.784],"concept:scaling_vs_knowledge":[-40.121,16.86,12.51],"concept:self_attention":[176.4,104.617,127.603],"concept:spiking_nn":[130.768,137.23,303.836],"concept:ssl":[157.738,-103.096,15.677],"concept:td_learning":[65.481,23.669,-327.646],"concept:tool_use":[-207.21,-32.529,117.618],"concept:transformer":[95.241,103.934,78.105],"concept:value_iteration":[76.975,-85.078,-312.887],"concept:vla":[-116.64,-92.062,53.804],"concept:vlm":[-37.723,-79.663,90.286],"course:cs285":[-99.544,129.0,-262.514],"course:zhao_rl":[-16.807,43.663,-348.63],"essay:bitter_lesson":[-13.716,59.168,3.676],"insight:agent_loop_is_just_iterated_conditional_generation":[-303.468,35.82,182.443],"insight:alignment_is_constraint_satisfaction_over_generation":[-269.683,97.329,-131.175],"insight:attention_is_typed_entity_communication":[157.849,146.824,84.46],"insight:bev_is_planning_friendly_intermediate":[147.196,195.107,-81.863],"insight:bigger_model_plus_more_data_beats_clever_priors":[18.602,147.82,-288.498],"insight:closed_loop_evaluation_is_the_only_ground_truth_for_planners":[-239.886,171.806,75.536],"insight:contrastive_alignment_creates_zero_shot_transfer":[41.007,-141.973,65.37],"insight:control_theory_and_rl_meet_in_optimal_control":[105.243,-45.274,-382.414],"insight:counterfactual_replanning_separates_intent_from_execution":[-152.999,-105.766,-119.177],"insight:data_engine_loop_is_more_valuable_than_static_dataset":[-276.095,248.345,145.125],"insight:differentiable_rendering_is_universal_inverse_solver":[375.11,-18.228,-160.97],"insight:diffusion_unifies_generation_and_decision":[-70.869,200.324,-89.782],"insight:dual_system_fast_slow_loop_marries_reactive_and_deliberative_control":[-297.4,-83.444,106.336],"insight:dual_system_handles_latency_quality_tradeoff":[-102.307,-75.837,159.825],"insight:emergent_planning_from_next_token_prediction_alone":[-214.571,65.163,168.326],"insight:end_to_end_differentiable_beats_handcraft_when_signal_strong":[4.633,200.2,50.206],"insight:event_driven_computation_matches_natural_sparsity_of_driving_scene":[252.521,121.274,322.088],"insight:event_sparse_compute_matches_energy_budget":[133.716,174.174,248.915],"insight:foundation_features_transfer_without_finetune":[183.153,-125.373,-35.849],"insight:foundation_model_decouples_perception_from_task_specific_training":[263.353,-69.731,107.712],"insight:foundation_pretraining_decouples_data_from_task":[100.239,-107.674,69.609],"insight:hardware_software_co_design_unlocks_orders_of_magnitude_efficiency":[242.355,84.688,350.931],"insight:human_demonstrations_compress_implicit_reward_function":[-197.192,245.54,-231.67],"insight:imitation_data_compresses_unspecified_reward":[-74.842,176.523,-177.461],"insight:imitation_learning_alone_cannot_recover_from_compounding_errors":[-136.36,331.366,-274.685],"insight:implicit_vs_explicit_is_a_continuum":[484.811,-16.21,-119.661],"insight:in_context_learning_emerges_at_scale":[-108.062,7.218,107.117],"insight:language_is_compressed_world_model_for_human_priors":[-75.358,0.504,192.338],"insight:long_tail_solved_by_synthesis_not_data_alone":[-97.066,-66.548,-76.36],"insight:masked_prediction_yields_self_supervised_signal":[134.379,-128.467,-20.278],"insight:multi_view_geometry_as_free_supervision":[331.108,-79.135,-129.375],"insight:occupancy_unifies_static_and_dynamic_scene":[344.176,110.692,-13.371],"insight:offline_metrics_co_evolve_with_methods_so_must_be_re_audited":[-280.271,220.977,-9.78],"insight:offline_rl_is_actually_constrained_dynamic_programming":[-74.16,280.746,-403.824],"insight:open_vocabulary_via_language_anchoring":[241.34,-106.317,75.375],"insight:open_weight_release_compounds_research_velocity":[-20.625,-48.051,334.387],"insight:residual_learning_unlocks_arbitrary_depth":[80.096,123.215,149.233],"insight:safety_constraints_via_lagrangian_dual":[-146.039,20.247,-398.269],"insight:safety_emerges_from_constraint_lagrangian_not_reward_shaping":[-250.025,18.976,-379.729],"insight:safety_emerges_from_layered_constraints_not_single_objective":[-209.293,64.06,475.322],"insight:scaling_data_unlocks_capabilities_not_present_in_smaller_models":[5.044,12.384,145.68],"insight:scaling_laws_predict_capability_emergence":[9.123,9.384,61.9],"insight:set_prediction_eliminates_postprocessing_heuristics":[142.115,217.806,99.736],"insight:simulator_realism_is_lower_bound_on_training_value":[-34.608,163.075,186.157],"insight:symbolic_intermediate_enables_interpretability_and_transfer":[-177.717,-41.192,87.004],"insight:temporal_aggregation_buys_what_depth_sensor_buys":[249.876,189.699,-189.681],"insight:test_time_compute_substitutes_train_time_via_search":[-156.904,40.285,-286.584],"insight:tokenization_collapses_modality_gap":[54.161,-32.305,86.963],"insight:tokenized_trajectories_let_planning_borrow_from_language_modeling":[56.311,84.818,-198.814],"insight:tool_use_extends_language_model_into_environment_grounded_actor":[-307.002,91.453,147.277],"insight:uncertainty_calibration_is_prerequisite_for_safe_delegation":[-326.591,86.428,431.043],"insight:world_model_as_inner_simulator_unlocks_long_horizon_planning":[-173.933,-71.904,-201.142],"insight:world_model_video_diffusion_is_implicit_physics_engine":[-6.513,-196.515,-98.551],"insight:world_models_let_planning_be_done_in_imagination":[-90.191,-36.156,-151.045],"lab:lab00":[515.411,138.343,175.977],"lab:lab01":[88.259,-24.153,-302.81],"lab:lab02":[-198.291,198.196,-161.143],"lab:lab03":[147.014,279.761,153.624],"lab:lab04":[-101.524,260.83,169.772],"lab:lab05":[268.725,-30.598,75.452],"lab:lab06":[206.871,194.961,273.892],"lab:lab07":[-327.734,-52.917,56.829],"lab:lab08":[-318.497,-58.578,177.694],"lab:lab09":[-235.984,-124.003,102.678],"lab:lab10":[-251.546,-190.994,-5.426],"move:add_auxiliary_perspective_supervision_to_bev":[375.588,186.67,-176.534],"move:add_entropy_bonus_to_encourage_exploration":[-87.272,132.36,-530.28],"move:add_explanation_head_to_promote_interpretability":[-376.75,7.867,415.949],"move:add_intrinsic_motivation_via_novelty_or_curiosity":[-342.925,206.124,-362.542],"move:add_lagrangian_safety_constraint_to_actor_critic":[-190.011,46.9,-451.183],"move:add_reflection_step_so_agent_critiques_its_own_output":[-342.319,-25.64,126.019],"move:add_shield_layer_that_rejects_unsafe_actions_at_inference":[-225.48,104.982,466.184],"move:apply_gae_to_smooth_advantage_estimation":[-6.261,94.238,-526.901],"move:apply_uncertainty_quantification_via_deep_ensemble_or_evidential_layer":[-317.733,126.518,388.642],"move:augment_dataset_via_offline_scenario_perturbation":[-142.631,217.856,207.014],"move:augment_supervised_training_with_counterfactual_or_synthetic_data":[-86.782,-180.758,-46.295],"move:augment_via_counterfactual_object_insertion":[294.99,-15.926,-148.169],"move:auto_label_with_offline_model_then_human_in_loop_validate":[-311.22,261.507,199.759],"move:bootstrap_target_network_to_stabilize_off_policy_learning":[-66.1,206.609,-493.447],"move:bridge_sim_and_real_via_neural_reconstruction":[328.014,-19.561,-63.499],"move:cache_KV_state_across_frames_to_amortize_attention_cost":[278.688,199.222,195.296],"move:cache_kv_state_to_amortize_long_context":[-75.096,36.577,318.369],"move:carry_object_query_across_time_as_recurrent_state":[281.523,236.74,-210.272],"move:cast_continuous_action_as_discretized_token_sequence":[64.354,126.038,-223.946],"move:cast_reasoning_as_search_over_thought_tree":[-386.727,17.548,-119.834],"move:clipped_surrogate_objective":[-152.779,94.269,-413.79],"move:co_design_silicon_with_algorithm_for_minimum_energy":[210.727,101.281,394.029],"move:co_finetune_language_model_with_action_data_jointly":[-90.779,-202.246,223.46],"move:condition_on_language_meta_action_then_emit_low_level_action":[-187.257,-149.9,48.071],"move:condition_video_generative_model_on_control_action_for_world_model":[-49.649,-183.506,-134.064],"move:contrast_corner_case_against_normal_case_in_training":[-86.012,-231.173,-40.042],"move:contrastive_alignment":[89.977,-149.551,34.496],"move:cotrain_dynamics_model_with_policy_to_share_representations":[-52.868,-217.509,-268.792],"move:counterfactual_replan":[-190.993,-83.829,-72.512],"move:cross_attention_query":[79.522,138.795,70.381],"move:dataset_aggregation":[-186.443,232.767,-310.831],"move:decompose_scene_into_static_and_dynamic_streams":[421.313,-75.303,-79.157],"move:design_closed_loop_metric_correlated_with_real_world_safety":[-286.607,220.003,37.162],"move:diffusion_denoise_sampling":[-58.474,128.039,-94.218],"move:discrete_latent_state_for_world_model":[-156.573,-191.437,-245.175],"move:distill_internet_data_into_small_specialist":[176.29,-13.934,-91.024],"move:distill_large_VLM_into_small_realtime_specialist":[191.69,4.937,250.018],"move:distill_large_model_into_specialist_for_deployment":[-147.217,-50.283,197.923],"move:distill_privileged_teacher_to_sensor_student":[6.608,267.846,-202.7],"move:double_q_to_reduce_overestimation":[-86.804,195.268,-546.906],"move:dual_system_fast_slow":[-129.012,-105.231,137.368],"move:embed_camera_geometry_into_positional_encoding":[222.893,329.562,-167.247],"move:emergent_segmentation_from_self_distillation":[277.313,-127.518,-10.41],"move:evaluate_open_loop_then_close_loop_for_realism":[-187.213,72.717,228.018],"move:expectile_or_quantile_target_for_distributional_robustness":[-209.954,277.76,-445.032],"move:expert_iteration_self_distillation":[-289.769,1.639,-270.987],"move:fine_tune_with_instruction_data_then_align_with_preferences":[-267.47,128.876,-82.27],"move:formalize_safety_case_with_claim_evidence_assumption":[-335.311,-8.588,404.863],"move:freeze_giant_backbone_train_small_adapter":[124.451,-198.306,16.053],"move:freeze_visual_encoder_and_only_train_connector":[85.473,-201.611,166.506],"move:fuse_modalities_in_shared_intermediate_space":[147.879,177.512,-174.545],"move:guided_sampling_through_classifier_gradients_at_inference":[-165.335,251.386,-83.425],"move:hindsight_experience_relabeling":[-196.283,199.126,-542.358],"move:implement_spiking_neuron_with_surrogate_gradient_for_backprop":[175.395,173.253,318.846],"move:latent_imagination_rollout":[-99.365,-2.651,-155.807],"move:league_play_for_policy_diversity":[66.156,275.11,-456.692],"move:learn_motion_in_latent_space_then_decode":[93.168,-63.776,-125.29],"move:learn_world_model_then_plan_in_latent_imagination":[-142.77,-154.244,-206.464],"move:lift_2d_features_to_3d_via_learned_depth_distribution":[204.141,159.288,-141.741],"move:lift_2d_to_3d":[189.716,228.871,85.466],"move:long_horizon_via_hierarchical_subgoal":[-345.849,7.264,201.204],"move:make_camera_only_temporal_match_lidar":[268.49,209.309,-263.925],"move:make_pipeline_differentiable_via_shared_latent":[28.711,271.916,53.632],"move:masking_for_pretext":[129.341,-111.454,-54.214],"move:open_vocabulary_via_text_alignment":[307.443,-103.811,52.247],"move:patchify_tokenization":[163.653,26.543,109.934],"move:perform_neural_architecture_search_with_latency_constraint":[347.372,56.129,254.231],"move:plan_via_cross_entropy_method_on_dynamics_model":[-228.054,-145.959,-285.775],"move:plan_with_mcts_in_learned_model":[-231.886,-103.354,-208.247],"move:plug_in_modality_encoder_to_frozen_language_model_via_projection":[29.173,-179.553,206.266],"move:pretrain_with_contrastive_alignment_between_modalities":[17.984,-231.192,127.669],"move:prompt_chain_with_explicit_persona_roles":[-387.706,56.337,67.189],"move:quantize_attention_to_int8_with_calibration":[233.756,0.106,229.267],"move:rasterize_differentiable_renderer_for_inverse_problem":[446.455,-7.295,-165.174],"move:replace_class_specific_box_with_class_agnostic_occupancy":[315.144,92.465,1.072],"move:replace_dense_attention_with_sparse_event_driven_attention":[244.847,134.452,276.179],"move:replace_explicit_action_head_with_tokenized_action_sequence":[-75.409,-140.825,248.988],"move:replace_explicit_critic_with_diffusion_score":[-202.84,272.932,-162.429],"move:replace_explicit_module_with_implicit_function":[437.376,-45.352,-206.327],"move:replace_handcrafted_sfm_with_feedforward_transformer":[150.314,50.678,-102.207],"move:replace_softmax_attention_with_linear_kernel_for_long_sequence":[322.288,115.573,190.628],"move:replace_value_function_with_implicit_max_via_expectile":[-218.597,310.986,-412.639],"move:replay_and_target_net":[83.764,115.257,-433.443],"move:replay_buffer_prioritize_safety_critical_transitions":[-243.328,215.743,287.621],"move:reproject_3d_query_to_2d_for_feature_sampling":[287.618,250.115,-105.295],"move:residual_connection":[158.79,147.159,182.921],"move:reward_model_from_pairwise_human_preferences":[-210.863,194.484,-263.872],"move:rewrite_continuous_video_as_token_sequence_for_transformer_world_model":[7.94,-188.107,-148.446],"move:run_active_learning_loop_to_query_hardest_unlabeled_frames":[-284.262,306.775,226.13],"move:run_continual_learning_with_rehearsal_buffer_against_forgetting":[-225.413,316.202,231.105],"move:run_replay_simulation_with_perturbed_initial_conditions_for_robustness":[-205.964,243.251,141.832],"move:safety_shield_filters_unsafe_actions":[-42.535,-109.464,-505.045],"move:scale_data_then_let_emergent_capabilities_appear":[-5.544,-33.429,30.365],"move:scale_pretraining_then_fine_tune_with_minimal_labels":[151.633,-50.305,-57.945],"move:self_play_with_search":[-189.415,65.076,-318.594],"move:set_prediction_with_hungarian":[83.75,207.915,100.809],"move:share_LiDAR_camera_calibration_via_continuous_time_optimization":[-397.972,260.918,178.091],"move:share_queries_across_multiple_tasks":[98.955,281.917,-13.295],"move:specify_safety_constraint_as_signal_temporal_logic_then_verify":[-192.286,96.822,429.961],"move:speculative_decoding_with_draft_model":[-94.74,-5.395,374.052],"move:spike_event_compute":[90.18,183.861,244.173],"move:swap_implicit_for_explicit_primitives_when_compute_allows":[478.255,26.08,-88.304],"move:tile_attention_to_fit_SRAM_for_speedup":[325.497,167.117,206.429],"move:tokenize_continuous_signal_to_use_transformer":[275.84,-57.779,-13.969],"move:tokenize_modalities":[26.697,8.624,-14.286],"move:tokenize_pixel_frames_for_autoregressive_world_model":[-66.998,-138.686,-169.221],"move:tool_use_function_calling":[-250.083,-79.632,180.323],"move:track_metric_correlation_offline_vs_closed_loop_to_select_models":[-278.717,264.783,15.408],"move:treat_corner_case_as_OOD_detection_then_route_to_human":[-299.749,116.867,433.4],"move:treat_detection_as_set_prediction_with_learnable_queries":[170.389,225.878,-40.918],"move:treat_planning_as_autoregressive_trajectory_generation":[-107.314,3.301,228.151],"move:treat_planning_as_conditional_generation":[-134.394,217.034,-45.935],"move:trust_region_step_for_monotonic_improvement":[-159.874,77.53,-495.365],"move:turn_offline_dataset_into_supervised_sequence_prediction":[58.122,174.909,-246.744],"move:two_stage_coarse_to_fine_trajectory":[42.685,311.403,-71.379],"move:use_difficulty_aware_curriculum_to_accelerate_RL":[-233.959,179.764,180.855],"move:use_diffusion_head_for_continuous_action":[-255.834,-18.098,-2.66],"move:use_event_camera_microsecond_latency_for_emergency_braking":[176.869,35.378,422.688],"move:use_geometry_as_self_supervision":[341.021,-78.463,-188.241],"move:use_language_explanation_as_auxiliary_supervision":[-169.271,-214.059,159.617],"move:use_n_step_returns_to_trade_bias_for_variance":[-41.859,68.4,-542.338],"move:use_pretrained_language_model_as_action_prior":[-176.615,120.589,22.252],"move:use_prioritized_replay_buffer":[51.44,188.059,-452.729],"move:use_retrieval_augmented_memory_to_extend_context":[-332.354,63.494,193.347],"move:use_self_play_to_generate_unlimited_training_signal":[-362.17,66.513,-119.011],"move:use_visibility_mask_to_filter_supervision":[507.047,45.703,58.708],"move:use_world_model_rollout_as_critic_for_policy":[-130.601,-159.95,-113.415],"move:warm_start_rl_with_imitation_then_anneal":[-55.261,272.134,-282.003],"move:wrap_language_model_with_tool_calling_loop":[-345.558,107.145,188.876],"paper:2210.14222":[-53.654,163.278,70.648],"paper:2212.10156":[75.181,195.215,3.926],"paper:2307.01694":[151.822,108.155,227.092],"paper:2309.16292":[-204.437,24.135,90.333],"paper:2311.10813":[-220.55,44.084,121.693],"paper:2402.12289":[-94.51,-44.554,96.588],"paper:2508.10104":[125.629,-13.672,21.292],"paper:2512.24426":[-134.736,-121.362,-13.269],"paper:3dgs":[415.962,13.042,-104.841],"paper:a3c_a2c":[-25.829,130.638,-479.058],"paper:ad_benchmarks":[-134.417,163.944,76.066],"paper:alphastar":[18.305,224.633,-393.937],"paper:altman_constrained_mdp":[-188.624,-59.595,-496.565],"paper:apolloscape":[176.541,311.21,324.402],"paper:argoverse2":[-54.368,291.036,173.357],"paper:awq":[169.437,-59.51,197.035],"paper:bdd100k":[90.262,316.721,279.802],"paper:bear":[-89.782,326.373,-439.289],"paper:beit":[264.915,-36.74,27.237],"paper:bench2drive":[-148.663,241.101,46.17],"paper:bevdet":[249.897,114.845,-174.161],"paper:bevdet4d":[289.365,174.985,-241.957],"paper:bevformer_v2":[325.908,179.608,-133.327],"paper:bevfusion":[163.398,155.325,-121.864],"paper:blip2":[146.013,-170.247,50.676],"paper:bpref":[-286.506,149.688,-252.907],"paper:calql":[-58.61,304.74,-347.425],"paper:cambrian":[144.92,-188.325,137.423],"paper:carion2020":[117.724,162.537,33.336],"paper:carla_lb2":[-113.202,213.062,117.404],"paper:chinchilla":[-12.509,52.807,156.101],"paper:cilqr":[146.757,42.604,-420.892],"paper:claude":[-286.864,113.633,27.316],"paper:clip":[188.151,-119.727,67.656],"paper:codetraj":[-113.37,84.758,-12.613],"paper:commonroad":[-149.865,134.358,349.255],"paper:constitutional_ai":[-316.509,116.512,-78.797],"paper:cosmos":[-48.148,-167.655,-84.854],"paper:cot_wei2022":[-315.664,44.084,59.029],"paper:cpo_safe_rl":[-190.202,7.339,-485.725],"paper:cql":[-97.208,302.459,-372.069],"paper:ddpm":[-38.774,178.54,-17.064],"paper:debate":[-429.971,78.13,-38.031],"paper:decision_transformer":[-21.266,152.84,-192.497],"paper:depth_anything":[242.321,24.182,-114.01],"paper:detr3d":[234.269,221.437,-63.677],"paper:diffuser":[-129.457,173.3,-96.23],"paper:diffusion_planner":[-118.568,274.984,-82.926],"paper:diffusion_policy_chi2023":[-133.252,246.495,-137.836],"paper:dinov1":[226.036,-94.345,16.216],"paper:dinov2":[181.845,-46.565,12.997],"paper:distill_vlm":[27.034,-46.878,169.004],"paper:dit":[70.736,-100.266,-45.166],"paper:dreamer_v2":[-99.485,-149.89,-253.794],"paper:dreamer_v3":[-130.534,-108.03,-239.334],"paper:drivedreamer":[-73.967,-101.472,-90.723],"paper:drivelm":[-103.679,-197.7,120.677],"paper:drivemlm":[-224.599,3.108,222.626],"paper:drivinggaussian":[351.962,-24.226,-106.055],"paper:dvs_event_camera":[187.479,69.951,337.14],"paper:emernerf":[375.708,-73.745,-110.626],"paper:emma":[-126.491,-116.672,206.356],"paper:flamingo":[39.093,-191.427,152.381],"paper:flashattention":[233.12,148.911,137.379],"paper:florence":[243.179,-167.172,139.078],"paper:gaia1":[-42.973,-94.202,-103.979],"paper:gemini":[-164.132,-23.095,304.72],"paper:gpt3":[-78.403,31.7,67.932],"paper:gpt4":[-117.805,32.571,162.446],"paper:gpt4v":[-168.23,-69.203,270.415],"paper:gpt_driver":[-90.874,-44.783,223.889],"paper:gptq":[100.64,-29.527,172.611],"paper:grai":[174.059,98.812,451.865],"paper:gs_for_ad":[-150.429,254.971,206.391],"paper:he2015_resnet":[132.259,59.106,158.476],"paper:highway_env":[-200.665,184.914,329.423],"paper:ilqr_classic":[188.467,4.42,-437.393],"paper:impala":[31.402,131.643,-516.101],"paper:instructgpt":[-203.621,79.215,-16.697],"paper:interaction_dataset":[-206.679,292.744,42.626],"paper:interfuser":[-5.339,273.209,-130.984],"paper:internvl":[52.747,-116.916,267.428],"paper:iql":[-141.03,277.96,-377.922],"paper:iris_world_model":[-6.44,-50.267,-152.011],"paper:iso26262":[-280.248,2.594,434.419],"paper:lagrangian_safe_rl":[-209.082,-27.968,-455.073],"paper:li2022bevformer":[153.497,150.972,-22.608],"paper:lidar_cam_calib":[-452.569,248.57,151.919],"paper:lift_splat_shoot":[227.994,132.357,-120.6],"paper:linear_attention":[293.882,117.92,110.745],"paper:lingo2":[-158.418,-180.818,101.036],"paper:llama":[-21.217,19.839,282.62],"paper:llava":[32.453,-106.049,106.585],"paper:lmdrive":[-140.988,31.916,285.48],"paper:loihi2":[149.197,106.582,349.536],"paper:lora":[49.763,110.171,275.277],"paper:lqr_classic":[166.081,-54.422,-418.341],"paper:lyft_l5":[-46.111,382.908,141.291],"paper:mae":[205.152,-72.256,-31.5],"paper:mamba":[341.607,130.577,133.147],"paper:mbrl_pets":[-145.766,-137.707,-287.926],"paper:metadrive":[-184.98,181.047,261.924],"paper:mile_driving":[-73.167,-180.77,-219.001],"paper:mistral":[-30.852,17.79,372.039],"paper:mnih2015_dqn":[-32.941,150.187,-439.358],"paper:most_simagents":[17.703,61.159,-168.578],"paper:mpc_book":[148.135,-9.306,-392.501],"paper:mpo":[-175.495,127.698,-465.639],"paper:muzero":[-210.521,-53.246,-240.332],"paper:navsim":[-237.114,238.486,24.796],"paper:nerf":[410.843,-49.881,-158.307],"paper:nuplan":[-192.681,219.914,41.696],"paper:nuplan_baselines":[-107.168,136.001,-59.366],"paper:occ3d":[443.371,44.429,60.492],"paper:occupancy_networks_tesla":[214.79,150.736,6.898],"paper:octo":[-236.736,-153.687,138.607],"paper:openai_five":[14.726,116.83,-445.092],"paper:openocc_unic":[500.171,14.054,107.408],"paper:openvla":[-100.455,-160.042,194.358],"paper:palme":[-20.139,-117.467,221.872],"paper:pandaset":[31.597,336.706,257.194],"paper:pebble":[-287.173,178.694,-295.423],"paper:performer":[256.981,96.988,133.093],"paper:petr":[216.898,293.268,-127.095],"paper:petrv2":[173.233,333.048,-113.494],"paper:pid_lagrangian":[-277.15,-15.451,-421.081],"paper:prism1":[-119.412,-204.113,-20.755],"paper:qwen":[38.29,-34.396,367.359],"paper:rcpo":[-153.044,-45.295,-536.006],"paper:react":[-349.332,52.781,141.458],"paper:redq":[-133.432,211.215,-561.514],"paper:reflexion":[-402.21,17.683,188.388],"paper:rlhf_dpo":[-214.094,73.46,-138.414],"paper:roach":[-7.124,312.059,-196.453],"paper:ross2011_dagger":[-127.041,213.934,-210.011],"paper:rt1":[-51.332,-227.109,255.112],"paper:rt2":[-37.99,-147.59,171.679],"paper:rtx":[-164.147,-186.339,198.609],"paper:sac":[-105.94,157.531,-491.904],"paper:safe_rl_carla":[-101.4,-49.603,-510.067],"paper:sam":[220.956,-24.548,-2.613],"paper:schulman2015_trpo":[-158.424,62.813,-535.67],"paper:schulman2017_ppo":[-85.093,85.32,-466.789],"paper:self_consistency":[-432.456,44.124,15.802],"paper:senna":[-178.218,-140.597,95.71],"paper:shielded_rl":[-110.329,-89.27,-476.269],"paper:shift_dataset":[-112.833,268.87,252.889],"paper:silver2017_alphazero":[-228.16,17.488,-198.233],"paper:simclr_mocov3":[228.926,-169.576,-52.21],"paper:simplebev":[309.529,97.411,-282.105],"paper:smarts":[-211.606,151.624,296.479],"paper:sora":[26.766,-145.548,-78.733],"paper:sotif_21448":[-266.996,73.944,429.133],"paper:streampetr":[239.683,236.669,-153.888],"paper:surroundocc":[255.735,64.145,6.789],"paper:sutton_barto":[0.226,63.729,-284.826],"paper:svd":[45.875,-194.793,-71.625],"paper:swiftsage":[-391.616,-3.72,122.325],"paper:td3":[-35.659,177.33,-508.394],"paper:tesla_ai_day":[95.422,248.841,63.61],"paper:tesla_autolabel":[-166.79,269.086,153.345],"paper:thinktwice":[55.457,331.35,-157.979],"paper:tianjic":[162.446,129.557,388.305],"paper:toolformer":[-397.422,117.355,182.243],"paper:tot":[-401.445,25.171,-35.037],"paper:trajectory_transformer":[62.62,134.204,-154.611],"paper:trajeglish":[42.802,70.445,-113.136],"paper:transfuser":[-22.944,238.598,-28.278],"paper:truenorth":[108.298,116.073,402.418],"paper:ul4600":[-321.348,18.968,442.278],"paper:v2x_sim":[-36.129,242.195,267.147],"paper:vadv2":[-30.568,201.781,31.051],"paper:vaswani2017":[111.335,77.81,56.033],"paper:veo":[109.926,-221.597,-148.154],"paper:verifier":[-455.936,34.395,-73.576],"paper:vggt":[265.519,30.591,-64.566],"paper:vilt":[292.971,-203.874,-0.773],"paper:vit":[141.966,8.595,56.041],"paper:voyager":[-372.458,23.864,164.228],"paper:watkins_dayan_qlearning":[-30.271,208.01,-381.9],"paper:waymo_motion":[-148.228,315.555,75.699],"paper:waymo_scenario_mining":[-245.144,311.685,173.85],"paper:womd_pred":[-127.686,288.608,105.939],"paper:world_models":[-86.424,-73.411,-172.186],"paradigm:brain_inspired_event_sparse_compute":[113.016,147.522,268.67],"paradigm:brain_inspired_neuromorphic_co_design":[175.421,99.617,292.853],"paradigm:camera_first_autonomy":[201.138,178.972,-59.296],"paradigm:closed_loop_data_engine_centric_development":[-219.745,236.984,89.916],"paradigm:counterfactual_data_centric_safety":[-155.952,-117.311,-64.914],"paradigm:differentiable_end_to_end_imitation":[13.83,201.453,12.588],"paradigm:foundation_model_axis":[17.162,-8.178,108.673],"paradigm:foundation_model_zero_shot_driving_agent":[-127.323,-49.949,115.564],"paradigm:imitation_learning":[-73.029,225.384,-176.226],"paradigm:knowledge_driven_reflective_agent":[-190.116,-8.597,60.521],"paradigm:llm_agent_paradigm":[-257.553,21.662,132.162],"paradigm:model_based_rl":[-85.561,-63.619,-262.274],"paradigm:model_based_world_imagination_planning":[-74.595,-13.573,-124.827],"paradigm:model_free_rl":[-36.686,86.516,-417.631],"paradigm:modular_perception_to_planning_pipeline":[135.781,220.147,45.206],"paradigm:neural_scene_reconstruction_as_engine":[257.115,-43.242,-112.645],"paradigm:offline_rl":[-71.858,228.842,-298.819],"paradigm:optimal_control":[98.455,-5.763,-412.807],"paradigm:safe_rl":[-134.604,-19.214,-478.125],"paradigm:safety_by_constraint_layered_architecture":[-275.603,45.266,465.485],"paradigm:scaling_data_with_self_supervision":[122.795,-77.226,22.259],"paradigm:sequence_modeling_for_decision":[-24.499,117.804,-143.12],"paradigm:simulator_first_synthetic_data_centric":[-85.741,168.149,164.19],"paradigm:vla_paradigm":[-86.983,-116.006,102.094],"paradigm:world_model_paradigm":[-22.56,-102.248,-143.837],"problem:annotation_inconsistency_across_datasets":[414.06,96.128,12.159],"problem:auditability_of_decisions_for_regulatory_compliance":[-359.541,60.705,406.692],"problem:behavior_cloning_compounds_errors_over_time":[-110.233,302.509,-208.305],"problem:catastrophic_failure_on_rare_weather":[137.432,-19.785,-184.442],"problem:catastrophic_forgetting_after_action_finetuning":[-26.154,-208.311,174.439],"problem:catastrophic_forgetting_under_continual_learning":[-304.471,312.939,171.087],"problem:closed_loop_simulation_fidelity_gap":[-61.383,48.846,-48.843],"problem:counterfactual_reasoning_about_other_agents_intent":[-166.807,-169.876,-91.36],"problem:depth_ambiguity_in_low_parallax":[269.092,153.022,-290.369],"problem:distributional_shift_between_offline_data_and_deployment":[-97.208,275.598,-327.793],"problem:energy_budget_too_small_for_full_transformer_at_30fps":[280.688,96.608,274.564],"problem:evaluation_gap_between_offline_benchmark_and_closed_loop":[-238.645,70.503,235.067],"problem:exploration_in_safety_critical_systems":[-166.551,-46.375,-438.048],"problem:fine_grained_spatial_understanding_in_vision_language_model":[254.487,-182.496,53.778],"problem:grounding_language_token_to_continuous_physical_world":[15.478,-135.873,223.332],"problem:hallucinated_action_from_vision_language_model_in_safety_critical_loop":[-279.559,-125.361,66.2],"problem:label_efficiency_for_3d_annotation":[239.376,-96.535,-122.951],"problem:label_noise_for_3d_object_categories":[-345.239,208.889,301.851],"problem:latency_budget_for_large_model_in_realtime_control":[-109.749,-42.402,269.626],"problem:long_horizon_credit_assignment_in_driving":[-159.946,21.53,-185.191],"problem:long_horizon_reasoning_with_finite_context_window":[-308.806,22.723,264.964],"problem:long_tail_object_categories_in_open_world":[324.417,-5.185,-9.362],"problem:multi_agent_interaction_modeling_in_dense_traffic":[-34.497,41.339,-152.761],"problem:multi_modal_calibration_drift":[140.672,212.8,-210.177],"problem:occlusion_reasoning_without_dense_lidar":[296.372,63.646,-73.317],"problem:offline_metric_does_not_predict_closed_loop_performance":[-304.358,228.702,78.634],"problem:open_world_corner_case_synthesis_for_training":[-109.351,-214.984,-82.237],"problem:planning_horizon_vs_compute_budget_tradeoff":[68.177,18.663,-376.2],"problem:rare_event_evaluation_with_no_ground_truth":[-187.926,116.62,-49.315],"problem:rare_safety_critical_events_dominate_real_risk_but_are_under_represented":[-260.804,244.525,216.304],"problem:realistic_other_agent_behavior_in_simulator":[-49.92,182.011,269.014],"problem:rendering_speed_vs_quality_tradeoff":[525.253,-7.233,-89.316],"problem:reward_hacking_in_learned_objectives":[-277.243,63.761,-267.53],"problem:reward_specification_for_safe_polite_driving":[-250.995,116.127,-279.119],"problem:sensor_calibration_drift_over_vehicle_lifetime":[-359.973,268.286,138.885],"problem:sim_to_real_gap_in_camera_only_perception":[231.092,83.162,52.733],"problem:simulator_visual_gap_breaks_perception_models":[-81.306,210.541,258.659],"problem:temporal_consistency_in_bev_segmentation":[327.099,227.109,-266.379],"problem:unknown_geometry_in_distant_or_dark_regions":[285.918,200.755,-318.93],"problem:verification_of_neural_network_safety_properties_at_scale":[-167.721,83.847,488.442],"problem:zero_shot_generalization_to_unseen_driving_scenes":[-92.571,-87.242,273.864],"validation:trace_alpha_zero_self_play_with_mcts_guided_policy":[-112.168,32.891,-269.404],"validation:trace_bird_eye_view_transformer_with_temporal_aggregation":[155.805,134.036,46.214],"validation:trace_brain_inspired_spike_attention":[118.897,111.246,182.615],"validation:trace_clipped_policy_gradient_surrogate":[-73.983,76.28,-377.473],"validation:trace_counterfactual_vla_replanner":[-120.251,-75.729,-43.867],"validation:trace_dataset_aggregation_for_imitation":[-106.581,196.067,-240.494],"validation:trace_decision_transformer_offline_rl_via_sequence_modeling":[-9.433,108.924,-58.648],"validation:trace_deep_q_network_with_replay_and_target":[12.996,76.868,-373.955],"validation:trace_diffusion_policy_as_score_based_action_sampler":[-82.106,180.511,-134.92],"validation:trace_few_shot_in_context_learning_at_scale":[-17.457,34.879,76.295],"validation:trace_image_transformer_via_patch_tokenization":[104.343,16.268,88.606],"validation:trace_knowledge_driven_reflective_agent":[-142.682,20.872,76.175],"validation:trace_llm_decision_agent_for_driving":[-157.575,-1.533,119.648],"validation:trace_modular_perception_pipeline_with_bev_fusion":[158.919,176.12,35.283],"validation:trace_neural_field_for_dynamic_driving_scene":[-17.875,-26.037,-68.785],"validation:trace_object_level_planning_transformer":[21.153,162.968,44.434],"validation:trace_safe_rl_via_lagrangian_constrained_optimization":[-105.908,37.611,-412.935],"validation:trace_scalable_self_supervised_vision_backbone":[92.891,-49.283,16.41],"validation:trace_self_attention_replaces_recurrence":[123.218,99.574,126.592],"validation:trace_set_prediction_with_object_queries":[123.923,159.097,96.044],"validation:trace_unified_planning_oriented_e2e_driving":[91.412,198.722,33.578],"validation:trace_vision_language_action_dual_loop":[-51.284,-33.648,88.393],"validation:trace_vision_language_pretrained_dual_encoder":[79.853,-71.73,64.807],"validation:trace_world_model_in_latent_imagination":[-77.885,-18.27,-251.586]} \ No newline at end of file