Skip to content

Wave E batch 3: Dreamer-pixels and CQL-offline reproduction labs#17

Merged
ChatGPU merged 1 commit into
mainfrom
claude/epic-ritchie-A7YtN
May 27, 2026
Merged

Wave E batch 3: Dreamer-pixels and CQL-offline reproduction labs#17
ChatGPU merged 1 commit into
mainfrom
claude/epic-ritchie-A7YtN

Conversation

@ChatGPU

@ChatGPU ChatGPU commented May 27, 2026

Copy link
Copy Markdown
Owner

Two more long-program reproduction labs in the new paradigm-grouped layout.

labs/world_models/lab_dreamer_cartpole_pixels/ — Dreamer-style world model on CartPole-v1 from pixels:

  • CNN encoder + RSSM (deterministic h_t + stochastic Gaussian z_t) + transposed-conv decoder + reward head.
  • Latent-imagination actor-critic with GAE λ-returns.
  • README, paper.md (links paper_world_models / paper_dreamer_v2 / paper_dreamer_v3), notebook narrative, full src/ module split.

labs/rl_decision/lab_cql_offline_minigrid/ — CQL vs BC vs DQN on an 8×8 sparse-reward gridworld:

  • Three trainers + unified eval pipeline + auto-tuned α ablation.
  • assets/: q_overestimation, q_overestimation_dqn_only, ood_action_density, action_histogram, eval_returns, ablation_alpha.
  • data/: bc.pt, dqn.pt, cql.pt, offline_dataset.pt.
  • paper.md links paper_cql, paper_bear, paper_iql.

Both labs follow the per-lab directory contract documented in labs/RESTRUCTURE_PROPOSAL.md.

https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7


Generated by Claude Code

labs/world_models/lab_dreamer_cartpole_pixels/
  - Dreamer-style world model on CartPole-v1 from pixels.
  - CNN encoder + RSSM (det h_t + stochastic Gaussian z_t) + transposed-
    conv decoder + reward head; latent imagination actor-critic with
    GAE lambda-returns.
  - README, paper.md (linking paper_world_models / paper_dreamer_v2 /
    paper_dreamer_v3), notebook narrative, full src module split
    (env, world_model, trainer, policy, viz, seeds). Training/asset
    PNGs are deferred to a follow-up; the code is end-to-end runnable.

labs/rl_decision/lab_cql_offline_minigrid/
  - CQL on an 8x8 sparse-reward gridworld, with BC + DQN baselines and
    an alpha-tuning ablation.
  - Three trainers (BC, DQN, CQL) plus a unified eval pipeline.
  - assets/ contains the four story PNGs: q_overestimation,
    q_overestimation_dqn_only, ood_action_density, action_histogram,
    eval_returns, ablation_alpha.
  - data/ checkpoints: bc.pt, dqn.pt, cql.pt, offline_dataset.pt.
  - paper.md links paper_cql / paper_bear / paper_iql.

Both labs follow the per-lab directory contract documented in
labs/RESTRUCTURE_PROPOSAL.md.

https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7
@ChatGPU ChatGPU merged commit a1a15f4 into main May 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants