Skip to content

Adds codon-fm-native-te recipe#1531

Merged
jomitchellnv merged 1 commit intomainfrom
jm/codonfm-native-te
Mar 25, 2026
Merged

Adds codon-fm-native-te recipe#1531
jomitchellnv merged 1 commit intomainfrom
jm/codonfm-native-te

Conversation

@jomitchellnv
Copy link
Copy Markdown
Collaborator

@jomitchellnv jomitchellnv commented Mar 21, 2026

Summary

  • Add bionemo-recipes/models/codonfm/ — a HuggingFace-compatible CodonFM model using TransformerEngine, following the ESM2 pattern
  • Add bionemo-recipes/recipes/codonfm_native_te/ — a self-contained FSDP2 training recipe for CodonFM with FP8/FP4 quantization
    support
  • Add golden value regression tests cross-validated against the codonfm_ptl_te non-exact (standard TETransformerLayer) implementation

models/codonfm/ (HF-compatible model)

The model code in models/codonfm/modeling_codonfm_te.py is the source of truth, auto-synced to the recipe via check_copied_files.py.

  • CodonFMConfig(PretrainedConfig) — HF-compatible config with 4 presets (200k, 80M, 600M, 1B)
  • CodonFMPreTrainedModel(PreTrainedModel) — base class with MAGNETO initialization (xavier_normal with scaled gain), meta device
    support
  • CodonFMForMaskedLM — returns MaskedLMOutput, supports output_hidden_states, per-layer FP8/FP4 precision via layer_precision config
  • CodonEmbedding — token + post-LayerNorm embedding
  • CodonFMEncoder — stack of transformer_engine.pytorch.TransformerLayer with RoPE, BSHD and THD attention formats
  • CodonFMLMHead — Dense + GELU + LayerNormLinear (quantization disabled for numerical stability)
  • CodonTokenizer — 3-mer codon tokenizer (69 tokens: 5 special + 64 codons)
  • dataset.py — BSHD/THD collators, synthetic and parquet dataset classes

73 tests (36 pass, 10 skip, 25 xfail, 2 xpass):

  • Forward/backward smoke tests (BSHD + THD)
  • FP8/FP4 quantization tests (DelayedScaling, Float8CurrentScaling, Float8BlockScaling, MXFP8BlockScaling, NVFP4BlockScaling)
  • Meta device and CUDA initialization tests
  • Golden value regression tests — weights generated from codonfm_ptl_te non-exact model, state dict mapped to native_te key format,
    cross-model logit equivalence verified
  • BSHD ↔ THD equivalence test — same weights, both formats, outputs compared

recipes/codonfm_native_te/ (training recipe)

Self-contained FSDP2 training recipe with:

  • Hydra config (defaults.yaml, L0_sanity.yaml)
  • train_fsdp2.py — FSDP2 training loop with gradient clipping, LR scheduling
  • checkpoint.py — FSDP2 checkpoint save/load with save_pretrained support
  • perf_logger.py — WandB + stdout logging (loss, perplexity, tokens/sec, GPU memory)
  • quantization.py — FP8/FP4 recipe utilities
  • Sample train.parquet for testing

63 tests covering model, tokenizer, quantization, and end-to-end training.

CI integration

  • ci/scripts/check_copied_files.py — added entries to sync:
    • models/codonfm/modeling_codonfm_te.py → recipes/codonfm_native_te/modeling_codonfm_te.py
    • models/esm2/tests/common/ → models/codonfm/tests/common/
  • .gitignore — added negation rule for golden value safetensors test fixtures

What's left out / future work

  • No HF Hub checkpoint yet — the published TE checkpoints (nvidia/NV-CodonFM-Encodon-TE-80M-v1) use the "exact" EncodonTELayer with
    extra post-attention/post-MLP LayerNorms not present in standard TETransformerLayer. Golden values will be updated once a native_te
    checkpoint is trained and uploaded.
  • Conversion tests skipped — CodonFM is natively TE; there is no HF variant to convert to/from.
  • THD padding tests skipped — pad_to_multiple_of and cu_seq_lens_q_padded not yet implemented for CodonFM's tokenizer.
  • _do_not_quantize patterns — currently ("lm_head.dense", "lm_head.layer_norm_linear"). May need tuning as quantization recipes
    evolve.
  • Dockerfile — recipe does not yet include a Dockerfile for containerized training.

Test plan

  • cd bionemo-recipes/models/codonfm && pytest -v tests/ — 36 pass, 10 skip, 25 xfail, 2 xpass
  • cd bionemo-recipes/recipes/codonfm_native_te && pytest -v tests/ — 63 pass
  • python ci/scripts/check_copied_files.py — no diffs
  • pre-commit run --all-files — clean
    ---------- END OF DESCRIPTION----

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Summary by CodeRabbit

  • New Features

    • Added a CodonFM training recipe: model, tokenizer, datasets/collators, dataloaders, distributed training entrypoint, checkpointing, scheduler, perf logging, and configurable FP8/FP4 quantization and debug stats.
  • Tests

    • Comprehensive test suite and utilities including golden-value generation, conversion/golden regression tests, FP8/THD coverage, and sanity training runs.
  • Documentation

    • Recipe and shared test-library READMEs added.
  • Chores

    • .gitignore adjusted to allow tracking of golden_state_dict.safetensors fixtures.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a34aafb-5ec9-4701-b141-58c3a769e79f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a complete CodonFM feature set: codon tokenizer, datasets/collators, TransformerEngine-native model and training recipe (FSDP2), quantization/FP8 tooling and debug configs, checkpointing, perf logging, extensive tests and golden fixtures, CI sync rules, and dependency manifests.

Changes

Cohort / File(s) Summary
Git Ignore
\.gitignore
Keep ignoring *.safetensors but added negation to include bionemo-recipes/models/*/tests/golden_state_dict.safetensors.
Model (TE) — library & recipe
bionemo-recipes/models/codonfm/modeling_codonfm_te.py, bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
New TransformerEngine-optimized CodonFM: CodonFMConfig, embeddings, TE encoder (BSHD/THD), LM head, TE-aware init/state handling, per-layer precision and quantized init, and MLM forward/loss.
Tokenizer
bionemo-recipes/models/codonfm/tokenizer.py, bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
New CodonTokenizer for DNA/RNA 3-mer tokenization with encode/decode and persistence (vocab = 5 specials + 64 codons).
Datasets & Collators
bionemo-recipes/models/codonfm/dataset.py, bionemo-recipes/recipes/codonfm_native_te/dataset.py
Added SyntheticCodonDataset, ParquetCodonDataset, CodonMLMCollator (BSHD) and CodonTHDCollator (THD/packed), plus dataloader factories using DistributedSampler and MLM masking logic.
Training recipe & entrypoint
bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
Hydra-driven FSDP2 training entrypoint: distributed init, model/precision setup, optimizer/scheduler, dataloaders, training loop, checkpoint save/load, perf logging, and quant-stats integration.
Checkpointing & Distributed config
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py, .../distributed_config.py
FSDP2-style checkpoint save/load/prune utilities, AppState for distributed state, and DistributedConfig dataclass for env-based rank/port handling.
Quantization & debug configs
.../quantization.py, .../fp4_debugging_stats.yaml, .../fp8_debugging_stats.yaml
Layer-precision resolution, layer-regex generation, quant-stats YAML updater, WandB logger bridge, and nvdlfw_inspect-based stats initialization.
Perf logging & scheduler
.../perf_logger.py, .../scheduler.py
PerfLogger capturing per-step metrics (loss, grad norm, throughput, perplexity, GPU mem) with WandB/tqdm; LR scheduler factory with warmup + decay.
Requirements
bionemo-recipes/models/codonfm/requirements.txt, bionemo-recipes/recipes/codonfm_native_te/requirements.txt
New dependency manifests (torch, transformer_engine, transformers, hydra-core, pyarrow, safetensors, wandb, torchmetrics, nvdlfw_inspect, etc.).
Tests: common library & suites
bionemo-recipes/models/codonfm/tests/common/*, bionemo-recipes/models/codonfm/tests/*, bionemo-recipes/recipes/codonfm_native_te/tests/*
Comprehensive tests: shared BaseModelTest, fixtures for FP8/THD/BSHD/backends, golden-value generation and fixtures, tokenizer/model/quantization/recipe tests, and training sanity runs.
Hydra configs & README
bionemo-recipes/recipes/codonfm_native_te/hydra_config/*, bionemo-recipes/recipes/codonfm_native_te/README.md
Default and L0_sanity Hydra configs, quant-debug YAMLs, and README documenting usage, presets, and examples.
CI Sync & script updates
ci/scripts/check_copied_files.py
Updated SOURCE map to sync model <-> recipe files and to copy new tests/common into model test locations.
Cross-repo test fixes
bionemo-recipes/models/*/tests/common/test_modeling_common.py
Test utilities updated to tolerate missing _tied_weights_keys, assert hidden-state length equality, add class-level caching for converted TE checkpoints, and fix THD real-token indexing.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer as Trainer (Hydra)
    participant Tokenizer as CodonTokenizer
    participant Dataloader as Dataloader (BSHD/THD)
    participant Model as CodonFM (TE on CUDA)
    participant Optimizer as Optimizer/Scheduler
    participant Checkpoint as Checkpoint (FSDP2)
    participant Perf as PerfLogger
    participant Debug as nvdlfw_inspect

    Trainer->>Tokenizer: load/instantiate tokenizer
    Trainer->>Dataloader: create dataloaders (DistributedSampler)
    Dataloader->>Tokenizer: tokenize & collate batch
    Trainer->>Model: forward(batch) on CUDA/TE
    Model->>Perf: emit loss/logits/metrics
    Trainer->>Optimizer: backward + step
    Trainer->>Checkpoint: conditional save/load
    Trainer->>Perf: log_step(metrics)
    Trainer->>Debug: init/collect quant-stats (optional)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

"I hopped through codons, three by three,
seeds of gold for tests to see.
Engines hum and dataloaders sing,
masks and checkpoints, a joyful spring.
A rabbit cheers: new models, full of glee!"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Adds codon-fm-native-te recipe' clearly summarizes the main addition: a new CodonFM training recipe using native TransformerEngine.
Description check ✅ Passed The description is comprehensive and covers all major sections: summary of changes, model/recipe components, test results, CI integration, and known limitations.
Docstring Coverage ✅ Passed Docstring coverage is 82.18% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jm/codonfm-native-te

Comment @coderabbitai help to get the list of available commands and usage tips.

An available TCP port number.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix

AI 27 days ago

In general, to avoid binding a socket to all interfaces, replace uses of '' or "0.0.0.0" in bind((host, port)) with a specific interface address (often 127.0.0.1 for local‑only services), or create separate sockets for each intended interface. For an ephemeral test port, binding only to the loopback interface is sufficient and safer.

In this file, the best minimal change is to adjust the unused_tcp_port fixture so the socket is bound to the loopback address instead of all interfaces. Replace s.bind(("", 0)) with s.bind(("127.0.0.1", 0)). This keeps the behavior of selecting an available ephemeral port while ensuring the temporary binding is only on localhost. No new imports or helper methods are required, and no other code paths in this file need changes.

Specifically:

  • In bionemo-recipes/models/codonfm/tests/common/fixtures.py, within the unused_tcp_port fixture (around line 41), change the bind call as described.
  • Keep SO_REUSEADDR and the rest of the function unchanged.
Suggested changeset 1
bionemo-recipes/models/codonfm/tests/common/fixtures.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bionemo-recipes/models/codonfm/tests/common/fixtures.py b/bionemo-recipes/models/codonfm/tests/common/fixtures.py
--- a/bionemo-recipes/models/codonfm/tests/common/fixtures.py
+++ b/bionemo-recipes/models/codonfm/tests/common/fixtures.py
@@ -38,7 +38,7 @@
         An available TCP port number.
     """
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
         s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
         return s.getsockname()[1]
 
EOF
@@ -38,7 +38,7 @@
An available TCP port number.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
s.bind(("127.0.0.1", 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]

Copilot is powered by AI and may make mistakes. Always verify output.
def unused_tcp_port():
"""Find and return an unused TCP port for torchrun rendezvous."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix

AI 28 days ago

In general, to fix this class of issue you should avoid binding sockets to all interfaces ('' or 0.0.0.0) unless absolutely necessary. Instead, bind to a specific interface (often 127.0.0.1 for local-only communication) or create multiple sockets if you genuinely need to listen on multiple interfaces.

For this specific fixture, the goal is to allocate an unused TCP port for use by torchrun or similar tooling, not to serve external clients. We can achieve the same behavior by binding to ("127.0.0.1", 0) instead of ("", 0). The OS will still choose an unused ephemeral port on the loopback interface, and returning that port number is sufficient for local rendezvous in tests. No additional imports are needed; only a single line change in bionemo-recipes/models/codonfm/tests/conftest.py at the s.bind call.

Suggested changeset 1
bionemo-recipes/models/codonfm/tests/conftest.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bionemo-recipes/models/codonfm/tests/conftest.py b/bionemo-recipes/models/codonfm/tests/conftest.py
--- a/bionemo-recipes/models/codonfm/tests/conftest.py
+++ b/bionemo-recipes/models/codonfm/tests/conftest.py
@@ -39,7 +39,7 @@
 def unused_tcp_port():
     """Find and return an unused TCP port for torchrun rendezvous."""
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
         s.listen(1)
         port = s.getsockname()[1]
     return port
EOF
@@ -39,7 +39,7 @@
def unused_tcp_port():
"""Find and return an unused TCP port for torchrun rendezvous."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
s.bind(("127.0.0.1", 0))
s.listen(1)
port = s.getsockname()[1]
return port
Copilot is powered by AI and may make mistakes. Always verify output.
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

@jomitchellnv It looks like your message may have been sent incomplete. Could you let me know how I can help? For example, if you'd like me to review the PR, you can comment:

  • @coderabbitai review — for a standard review
  • @coderabbitai full review — for a comprehensive review

Or feel free to ask any specific questions about the changes!

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Nitpick comments (8)
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py (1)

193-197: Potential race: non-main ranks proceed while main rank prunes.

After save_checkpoint_fsdp2, the main process logs and potentially runs prune_checkpoints (which deletes directories), while other ranks continue immediately. If subsequent operations on non-main ranks try to access the pruned checkpoint before the deletion completes, there could be issues.

Consider adding a barrier after pruning if downstream code might read checkpoints:

if max_checkpoints is not None and dist_config.is_main_process():
    prune_checkpoints(ckpt_path, max_checkpoints)
torch.distributed.barrier(group=process_group)  # Ensure pruning completes before continuing

This may not be an issue if checkpoints are only read at the start of training, but worth considering for robustness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` around lines 193 -
197, The current flow after save_checkpoint_fsdp2 lets non-main ranks continue
while the main rank may delete old checkpoints via prune_checkpoints(ckpt_path,
max_checkpoints), risking a race if other ranks access those files; modify the
shutdown so that after the conditional prune_checkpoints call (guarded by
dist_config.is_main_process()) you add a distributed synchronization (e.g.,
torch.distributed.barrier with the appropriate process_group) so all ranks wait
until pruning completes before proceeding; update calls around
save_checkpoint_fsdp2/prune_checkpoints and ensure the barrier uses the same
process group used for training to avoid deadlocks.
bionemo-recipes/recipes/codonfm_native_te/requirements.txt (1)

1-12: Consider pinning dependency versions for reproducibility.

All dependencies are unpinned, which can lead to reproducibility issues across environments and over time. Note that bionemo-recipes/models/codonfm/requirements.txt pins minimum versions (e.g., torch>=2.6.0). Consider aligning the approach:

  1. Pin minimum versions for critical dependencies (torch, transformer_engine)
  2. Move pytest to a separate dev requirements file or pyproject.toml test extras, as it's a test dependency rather than runtime
Example with version constraints
 hydra-core
 nvdlfw_inspect
 pandas
 pyarrow
-pytest
 pyyaml
 safetensors
-torch
+torch>=2.6.0
 torchmetrics
 tqdm
-transformer_engine[pytorch]
+transformer_engine[pytorch]>=1.14.0
 wandb
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt` around lines 1 -
12, Update the requirements handling to improve reproducibility: pin minimum (or
exact) versions for critical runtime packages such as torch and
transformer_engine (e.g., replace "torch" and "transformer_engine[pytorch]" with
constrained versions consistent with
bionemo-recipes/models/codonfm/requirements.txt), and move test-only
dependencies like pytest out of this runtime requirements list into a dev/test
requirements file or test extras in pyproject.toml; ensure
pandas/pyarrow/safetensors/torchmetrics/tqdm/wandb also have at least minimum
version constraints to avoid drift.
bionemo-recipes/recipes/codonfm_native_te/distributed_config.py (1)

36-40: Be aware: setdefault modifies the environment.

Using os.environ.setdefault() has the side effect of setting the environment variable if it doesn't exist. This is intentional per the docstring ("we set them to default values"), but callers should be aware that instantiating DistributedConfig can modify the process environment.

If read-only behavior is ever needed, consider using os.environ.get() instead:

rank: int = field(default_factory=lambda: int(os.environ.get("RANK", "0")))

This is informational—the current behavior appears intentional for ensuring consistent env vars across the training process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py` around lines
36 - 40, The DistributedConfig dataclass currently uses os.environ.setdefault in
the default_factory for fields rank, local_rank, world_size, _master_addr, and
_master_port which mutates the process environment; if you want read-only
behavior change each default_factory to use os.environ.get (e.g., replace
setdefault with get) so instantiating DistributedConfig does not modify
environment variables while preserving the same default values.
bionemo-recipes/models/codonfm/tests/golden_values.json (1)

1-1: Consider formatting JSON for improved maintainability.

This golden values file is a single line with ~87KB of data. While functionally correct, multi-line formatted JSON would make it easier to:

  • Review changes in diffs
  • Debug failing tests by inspecting specific values
  • Maintain over time

This is a low-priority suggestion since the file is auto-generated by generate_golden_values.py.

The test data structure looks correct:

  • seed: 42 for reproducibility
  • config matches a small test model configuration
  • labels correctly uses -100 for masked positions (PyTorch cross-entropy ignore index)
  • attention_mask properly indicates padding (0 at position 13 in sequence index 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/golden_values.json` at line 1, Golden
JSON file is a single long line; update the generator to emit pretty-printed
JSON for easier diffs and review. Modify generate_golden_values.py (the code
that writes the golden values) to call json.dump/json.dumps with indent=2 (or
similar) and ensure separators/default serialization remain unchanged, then
rewrite the output file (golden_values.json) using that formatted output so
subsequent runs produce multi-line indented JSON.
bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py (1)

22-22: Replace the private API usage with defensive error handling to maintain forward compatibility.

The imports and usage of _mesh_resources (lines 22 and 54) rely on a module-private API that is not exported in torch.distributed.device_mesh.__all__ and may change across PyTorch versions without notice. Real-world integrations explicitly warn that these internals are subject to change.

Wrap the _mesh_resources.mesh_stack.clear() call with a try/except to handle potential API shifts:

Suggested fix
# NOTE: _mesh_resources is a private API that may change in future PyTorch versions
try:
    _mesh_resources.mesh_stack.clear()
except AttributeError:
    pass  # API changed, mesh cleanup may be incomplete
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py` at line 22, The
test setup currently uses the private symbol _mesh_resources.mesh_stack.clear()
which can break across PyTorch versions; modify the conftest cleanup to
defensively handle missing or changed internals by wrapping the call to
_mesh_resources.mesh_stack.clear() in a try/except (catch AttributeError) so
tests won’t crash if mesh_stack or _mesh_resources shape changes, and leave a
brief comment noting this is a private API; keep usage of init_device_mesh
untouched but ensure any cleanup uses the guarded call to avoid relying on
non-public APIs.
bionemo-recipes/recipes/codonfm_native_te/scheduler.py (1)

19-43: Add type hints and complete docstring.

The scheduler logic is correct. Consider adding type hints and completing the Google-style docstring with Args/Returns sections for better documentation.

📝 Suggested improvements
+from torch.optim import Optimizer
 from torch.optim.lr_scheduler import LambdaLR


 def get_linear_schedule_with_warmup(
-    optimizer,
-    num_warmup_steps=2_000,
-    num_training_steps=500_000,
-    last_epoch=-1,
-):
+    optimizer: Optimizer,
+    num_warmup_steps: int = 2_000,
+    num_training_steps: int = 500_000,
+    last_epoch: int = -1,
+) -> LambdaLR:
     """Linear warmup and decay scheduler for CodonFM pretraining.

     The learning rate is warmed up over the first warmup steps to a peak value, then linearly decayed to
     one tenth of its peak value over 90% of training duration.
+
+    Args:
+        optimizer: The optimizer to schedule.
+        num_warmup_steps: Number of warmup steps.
+        num_training_steps: Total number of training steps.
+        last_epoch: The index of last epoch.
+
+    Returns:
+        LambdaLR scheduler with the warmup and decay schedule.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py` around lines 19 - 43,
Add type hints and a complete Google-style docstring to
get_linear_schedule_with_warmup: annotate parameters (optimizer:
torch.optim.Optimizer, num_warmup_steps: int, num_training_steps: int,
last_epoch: int) and the return type (torch.optim.lr_scheduler.LambdaLR), and
expand the existing docstring to include Args (describe each param), Returns
(describe the LambdaLR scheduler), and Raises (if any). Mention lr_lambda in the
docstring as the internal schedule function and that the returned object is an
instance of LambdaLR constructed with lr_lambda.
bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml (1)

7-23: Minor indentation inconsistency in YAML.

The LogFp8TensorStats section uses 12-space indentation (lines 8-18) while LogTensorStats uses 10-space indentation (lines 20-23). While YAML will parse this correctly, consistent indentation improves readability.

💅 Suggested fix for consistent indentation
         LogTensorStats:
-          enabled: True
-          stats: [max, min, mean, std, l1_norm]
-          tensors: [dgrad, wgrad]
-          freq: 1
+            enabled: True
+            stats: [max, min, mean, std, l1_norm]
+            tensors: [dgrad, wgrad]
+            freq: 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml` around
lines 7 - 23, The YAML has inconsistent indentation between the
LogFp8TensorStats block and the LogTensorStats block; update the indentation so
both top-level keys and their child mappings align consistently (e.g., make
LogTensorStats use the same indent width as LogFp8TensorStats), ensuring the
keys "LogFp8TensorStats", "tensors_struct", "LogTensorStats", "stats",
"tensors", and "freq" are vertically aligned for improved readability.
bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py (1)

1081-1097: Make create_inference_params() concrete, or move it to an autoregressive-only mixin.

The docstring says only autoregressive subclasses must override this, but @abstractmethod means Python will refuse to instantiate any subclass until it implements the method. That forces masked-LM suites to carry a dummy override even though every call site is already gated by is_autoregressive. (docs.python.org)

♻️ Proposed fix
-    `@abstractmethod`
     def create_inference_params(self, config, batch_size=1, max_seq_len=256, num_beams=1) -> Any:
         """Create inference params for KV-cache generation tests.
@@
-        pass
+        raise NotImplementedError(
+            "Autoregressive subclasses must implement create_inference_params()"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py` around
lines 1081 - 1097, The abstract create_inference_params method forces
non-autoregressive subclasses to implement a no-op even though callers guard by
is_autoregressive; make create_inference_params concrete by providing a default
implementation that raises a clear runtime error (or returns None) when called,
or move the method into an autoregressive-only mixin used by autoregressive
tests; update references to create_inference_params and is_autoregressive
accordingly so masked-LM test suites no longer need dummy overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py`:
- Around line 643-670: get_converted_te_model_checkpoint always rebuilds the
same converted checkpoint and causes repeated heavy HF load/convert/save cycles;
change it to cache the converted checkpoint once per test class (e.g., a class
attribute like _cached_converted_te_checkpoint or use functools.cached_property)
so subsequent calls simply return the cached Path if set and exists. Implement:
at start of get_converted_te_model_checkpoint (and similarly in
get_converted_te_model) check the cache attribute and return it if present,
otherwise run the existing conversion logic, set the class-level cache to
checkpoint_path before returning; ensure the cache is validated (exists on disk)
to recover if cleanup removed it.
- Around line 427-435: Add an explicit length check before the zip-based
hidden-state comparison: ensure te_outputs.hidden_states and
hf_outputs.hidden_states have the same length (e.g., compare
len(te_outputs.hidden_states) vs len(hf_outputs.hidden_states)) and raise/assert
with a clear message if they differ, then proceed to the existing loop that
iterates over them (the code using compare_hidden_states and the for i,
(te_hidden, hf_hidden) in enumerate(zip(...)) block). This ensures structural
mismatches are detected instead of silently ignored.
- Around line 300-307: The guard for tied-weight skipping assumes
model._tied_weights_keys is iterable and calls set(model._tied_weights_keys)
which can raise if it's None; change that to use a safe default like
set(getattr(model, "_tied_weights_keys", None) or []) when checking membership
in the if should_be_fp8 block so tied-weight handling mirrors the defensive
pattern used in test_modeling_codonfm_te.py; leave the rest (the
_do_not_quantize check using fnmatch and the final assertion asserting
module.weight is a QuantizedTensor) unchanged.
- Around line 787-798: cu_seq_lens_q[-1] is a 0-dim tensor and must be converted
to a Python int before calling torch.arange; change how num_real_tokens is
computed to use int(cu_seq_lens_q[-1]) and then simplify torch.arange(0,
num_real_tokens, device="cuda") to torch.arange(num_real_tokens, device="cuda")
when building real_idx and the other arange usage, so functions/variables like
num_real_tokens, cu_seq_lens_q, torch.arange, real_idx, input_data_thd and
input_data_thd_padded all operate with a Python integer end value; apply the
same change at both arange sites in the test.

In `@bionemo-recipes/models/codonfm/tests/conftest.py`:
- Around line 48-52: The fixture use_te_debug currently sets env vars with
monkeypatch.setenv and calls importlib.reload(transformer_engine.pytorch) but
never restores module state; convert it to a yield-style fixture that sets the
env vars (via monkeypatch.setenv as now), capture the original module object
(e.g., orig = sys.modules.get("transformer_engine.pytorch")) before calling
importlib.reload(transformer_engine.pytorch), yield to run the test, and after
yield restore the original module state by replacing
sys.modules["transformer_engine.pytorch"] with the saved orig (or reload it
again) so the importlib.reload side effect is undone while leaving monkeypatch
cleanup to handle the env vars.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py`:
- Around line 207-211: The code currently calls
native_model.load_state_dict(native_sd, strict=False) and only prints missing
keys, allowing unintended missing keys to silently pass; change this to
explicitly fail on any missing keys other than the expected rotary inv_freq key:
capture (missing, unexpected) from native_model.load_state_dict(native_sd,
strict=False), assert that unexpected is empty (as already done), then compute
unexpected_missing = set(missing) - {"rotary_emb.inv_freq"} (use the exact key
string shown in your codebase) and raise/assert if unexpected_missing is
non-empty so the test fails when any unexpected key is missing; keep the print
for the allowed missing key if you want but do not let other missing keys pass.

In `@bionemo-recipes/models/codonfm/tokenizer.py`:
- Around line 19-125: The CodonTokenizer class is missing the HuggingFace
save/load contract; implement save_pretrained(self, save_directory) and
`@classmethod` from_pretrained(cls, load_directory) on CodonTokenizer so the
tokenizer can be serialized and reloaded with model artifacts: persist/restore
seq_type, SPECIAL_TOKENS and the full encoder/decoder (or the codons list) as a
JSON (or vocab) file; on save_pretrained write a small metadata JSON containing
seq_type and SPECIAL_TOKENS and the encoder (or codons) and on from_pretrained
read that file to reconstruct self.seq_type, self.codons, self.encoder and
self.decoder exactly as done in __init__; ensure from_pretrained returns an
instance equivalent to calling CodonTokenizer(seq_type=...) so HF-style
round-tripping works.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Line 1: Update the SPDX copyright header year from 2026 to 2025 in the
top-of-file SPDX comment (the SPDX-FileCopyrightText line) so the file uses the
correct copyright year.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py`:
- Around line 62-68: The __init__ currently does eager loading with
pd.read_parquet(path); change it to keep only the path/metadata and create a
lazy reader so rows are loaded on-demand (e.g., store self.path or a
pyarrow.dataset.Dataset created via pyarrow.dataset.dataset(path) or a
pyarrow.parquet.ParquetFile instance) and remove self.df assignment;
implement/adjust __len__ and __getitem__ (or an iterator) to read specific rows
or batches from the underlying pyarrow reader (using row_group access or
Scanner.to_batches with a small batch_size) so memory is not multiplied per
rank; apply the same change to the mirrored implementation in
bionemo-recipes/models/codonfm/dataset.py and ensure places that previously used
self.df now call the on-demand reader methods.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Line 1: Update the SPDX copyright header year from 2026 to 2025 in the file's
top comment (the SPDX-FileCopyrightText line) so it matches the repository
standard; locate the SPDX header string and replace "2026" with "2025".

In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py`:
- Around line 105-115: The min_loss update is gated by the logging_frequency
check so it never runs on step 0 (and stays inf if total steps <=
logging_frequency); move the update self.min_loss = torch.minimum(self.min_loss,
outputs.loss) out of the if block so it runs unconditionally each step (i.e.,
place it before the if that checks step % self.logging_frequency) while leaving
the timing/logging code (elapsed_time, previous_step_time, step_time, token
counts) inside the existing conditional; ensure references to min_loss,
outputs.loss, logging_frequency and step remain unchanged.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py`:
- Around line 190-243: In resolve_layer_precision(), reject invalid layer
selections up-front: treat an explicit empty list (fp8_layers == [] or
fp4_layers == []) as an error and raise a ValueError explaining that an explicit
empty selection is not allowed; also validate that every entry in fp8_layers and
fp4_layers is an integer within 1..num_layers and raise a ValueError listing any
out-of-range or non-integer entries. Do these checks before the logic that fills
defaults or computes overlaps (i.e., before the fp8_enabled/fp4_enabled
conditional blocks and before computing claimed_by_fp4/claimed_by_fp8) so bad
configs are rejected rather than silently ignored.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Line 1: Update the SPDX copyright header year from 2026 to 2025 in the file's
top comment (the line starting with "# SPDX-FileCopyrightText"), matching the
other file (distributed_config.py); replace "2026 NVIDIA CORPORATION &
AFFILIATES" with "2025 NVIDIA CORPORATION & AFFILIATES".

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py`:
- Around line 135-154: The test reads a stale loss and is flaky due to
randomness; in test_loss_decreases ensure you recompute the loss after the final
optimizer.step by doing one more forward pass (call model(...) again) to obtain
final_loss, and remove randomness by fixing the RNG (torch.manual_seed(...)) at
the start of the test and switching the model to evaluation mode (model.eval())
before that final forward; reference the test function test_loss_decreases, the
model variable (CodonFMForMaskedLM), and outputs.loss to locate the changes.
- Around line 83-154: The tests in TestCodonFMForMaskedLM assume CUDA and will
fail on CPU; add a skip guard to the whole class by decorating
TestCodonFMForMaskedLM with `@pytest.mark.skipif`(not torch.cuda.is_available(),
reason="CUDA required for these tests") (ensure pytest and torch are imported
where the class is defined) so the entire class is skipped when CUDA is not
available instead of trying to call .cuda() or allocate CUDA tensors.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py`:
- Around line 96-132: Move the declaration of ptl_te_path outside/above the try
so it always exists for the finally cleanup (or guard the finally removal with a
check like "if 'ptl_te_path' in locals() and str(ptl_te_path) in sys.path");
also replace the hardcoded special-token cutoff "i >= 5" with a computed value
using the tokenizer's special-tokens length (e.g., use
len(CodonTokenizer.SPECIAL_TOKENS) or derive the cutoff from
PTLTokenizer/CodonTokenizer special token IDs) so the comparison uses a stable
symbol (CodonTokenizer.SPECIAL_TOKENS or PTLTokenizer special-tokens) instead of
the magic number.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 52-57: The distributed process group and logger initialization in
the DistributedConfig block (DistributedConfig(),
torch.distributed.init_process_group, torch.cuda.set_device) must be torn down
in a finally block: wrap the setup + training/IO code in try/finally and in
finally call torch.distributed.destroy_process_group() to ensure the group is
always cleaned up, and also call your logger teardown method (e.g.,
logger.finish() or logger.close() — use the actual method used by your logging
instance) so the logger is always finished even on exceptions; keep the
DistributedConfig and device setup at the top of the try so the finally can
unconditionally clean up.
- Around line 168-212: The training loop can hang if a rank receives zero
batches; modify the loop around the for batch in train_dataloader to detect and
fail fast: introduce a local counter (e.g., batches_in_epoch = 0) before the
for, increment it inside the loop, and after the for (before incrementing epoch
and calling sampler.set_epoch) check if batches_in_epoch == 0 and raise a
RuntimeError with a clear message referencing step, epoch and
args.num_train_steps; update references around step, train_dataloader,
sampler.set_epoch and the outer while to ensure the error breaks the run instead
of spinning forever.

---

Nitpick comments:
In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py`:
- Around line 1081-1097: The abstract create_inference_params method forces
non-autoregressive subclasses to implement a no-op even though callers guard by
is_autoregressive; make create_inference_params concrete by providing a default
implementation that raises a clear runtime error (or returns None) when called,
or move the method into an autoregressive-only mixin used by autoregressive
tests; update references to create_inference_params and is_autoregressive
accordingly so masked-LM test suites no longer need dummy overrides.

In `@bionemo-recipes/models/codonfm/tests/golden_values.json`:
- Line 1: Golden JSON file is a single long line; update the generator to emit
pretty-printed JSON for easier diffs and review. Modify
generate_golden_values.py (the code that writes the golden values) to call
json.dump/json.dumps with indent=2 (or similar) and ensure separators/default
serialization remain unchanged, then rewrite the output file
(golden_values.json) using that formatted output so subsequent runs produce
multi-line indented JSON.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 193-197: The current flow after save_checkpoint_fsdp2 lets
non-main ranks continue while the main rank may delete old checkpoints via
prune_checkpoints(ckpt_path, max_checkpoints), risking a race if other ranks
access those files; modify the shutdown so that after the conditional
prune_checkpoints call (guarded by dist_config.is_main_process()) you add a
distributed synchronization (e.g., torch.distributed.barrier with the
appropriate process_group) so all ranks wait until pruning completes before
proceeding; update calls around save_checkpoint_fsdp2/prune_checkpoints and
ensure the barrier uses the same process group used for training to avoid
deadlocks.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Around line 36-40: The DistributedConfig dataclass currently uses
os.environ.setdefault in the default_factory for fields rank, local_rank,
world_size, _master_addr, and _master_port which mutates the process
environment; if you want read-only behavior change each default_factory to use
os.environ.get (e.g., replace setdefault with get) so instantiating
DistributedConfig does not modify environment variables while preserving the
same default values.

In `@bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml`:
- Around line 7-23: The YAML has inconsistent indentation between the
LogFp8TensorStats block and the LogTensorStats block; update the indentation so
both top-level keys and their child mappings align consistently (e.g., make
LogTensorStats use the same indent width as LogFp8TensorStats), ensuring the
keys "LogFp8TensorStats", "tensors_struct", "LogTensorStats", "stats",
"tensors", and "freq" are vertically aligned for improved readability.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt`:
- Around line 1-12: Update the requirements handling to improve reproducibility:
pin minimum (or exact) versions for critical runtime packages such as torch and
transformer_engine (e.g., replace "torch" and "transformer_engine[pytorch]" with
constrained versions consistent with
bionemo-recipes/models/codonfm/requirements.txt), and move test-only
dependencies like pytest out of this runtime requirements list into a dev/test
requirements file or test extras in pyproject.toml; ensure
pandas/pyarrow/safetensors/torchmetrics/tqdm/wandb also have at least minimum
version constraints to avoid drift.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py`:
- Around line 19-43: Add type hints and a complete Google-style docstring to
get_linear_schedule_with_warmup: annotate parameters (optimizer:
torch.optim.Optimizer, num_warmup_steps: int, num_training_steps: int,
last_epoch: int) and the return type (torch.optim.lr_scheduler.LambdaLR), and
expand the existing docstring to include Args (describe each param), Returns
(describe the LambdaLR scheduler), and Raises (if any). Mention lr_lambda in the
docstring as the internal schedule function and that the returned object is an
instance of LambdaLR constructed with lr_lambda.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Line 22: The test setup currently uses the private symbol
_mesh_resources.mesh_stack.clear() which can break across PyTorch versions;
modify the conftest cleanup to defensively handle missing or changed internals
by wrapping the call to _mesh_resources.mesh_stack.clear() in a try/except
(catch AttributeError) so tests won’t crash if mesh_stack or _mesh_resources
shape changes, and leave a brief comment noting this is a private API; keep
usage of init_device_mesh untouched but ensure any cleanup uses the guarded call
to avoid relying on non-public APIs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 105ed43a-d143-4b91-acbc-4df8273fd6ad

📥 Commits

Reviewing files that changed from the base of the PR and between f437be0 and 81fabd9.

⛔ Files ignored due to path filters (1)
  • bionemo-recipes/recipes/codonfm_native_te/train.parquet is excluded by !**/*.parquet
📒 Files selected for processing (35)
  • .gitignore
  • bionemo-recipes/models/codonfm/dataset.py
  • bionemo-recipes/models/codonfm/modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/requirements.txt
  • bionemo-recipes/models/codonfm/tests/common/README.md
  • bionemo-recipes/models/codonfm/tests/common/__init__.py
  • bionemo-recipes/models/codonfm/tests/common/fixtures.py
  • bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
  • bionemo-recipes/models/codonfm/tests/conftest.py
  • bionemo-recipes/models/codonfm/tests/generate_golden_values.py
  • bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
  • bionemo-recipes/models/codonfm/tests/golden_values.json
  • bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
  • bionemo-recipes/recipes/codonfm_native_te/dataset.py
  • bionemo-recipes/recipes/codonfm_native_te/distributed_config.py
  • bionemo-recipes/recipes/codonfm_native_te/fp4_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/L0_sanity.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml
  • bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
  • bionemo-recipes/recipes/codonfm_native_te/perf_logger.py
  • bionemo-recipes/recipes/codonfm_native_te/quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/requirements.txt
  • bionemo-recipes/recipes/codonfm_native_te/scheduler.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/__init__.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
  • bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
  • ci/scripts/check_copied_files.py

Comment thread bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/tests/conftest.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py Outdated
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Comments resolved. Auto-approval is disabled; enable reviews.request_changes_workflow to approve automatically.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

@jomitchellnv Sure! I'll kick off a full re-review of the PR now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (4)
bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py (2)

83-96: ⚠️ Potential issue | 🟡 Minor

Add CUDA skip guard to the test class.

Every test in TestCodonFMForMaskedLM calls .cuda() unconditionally. On a CPU-only runner, this will cause hard failures instead of clean skips.

🛡️ Suggested fix
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA required")
 class TestCodonFMForMaskedLM:
     `@pytest.mark.parametrize`("preset_name", list(MODEL_PRESETS.keys()))
     def test_forward_all_presets(self, preset_name):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py` around lines
83 - 96, The tests in TestCodonFMForMaskedLM (notably test_forward_all_presets)
call .cuda() unconditionally which will fail on CPU-only runners; add a CUDA
skip guard by marking the class or the test with pytest.mark.skipif(not
torch.cuda.is_available(), reason="requires CUDA") or conditionally skip inside
test using torch.cuda.is_available() so the test suite cleanly skips when CUDA
is unavailable.

135-154: ⚠️ Potential issue | 🟠 Major

Fix the loss comparison to use a fresh forward pass.

Line 153 reads the stale loss from the final training iteration before optimizer.step() was applied. Additionally, the test doesn't fix the RNG or disable dropout, making the assertion flaky.

🔧 Suggested fix
     def test_loss_decreases(self, small_config):
+        torch.manual_seed(0)
+        torch.cuda.manual_seed_all(0)
+        small_config.hidden_dropout_prob = 0.0
+        small_config.attention_probs_dropout_prob = 0.0
         model = CodonFMForMaskedLM(small_config).cuda()
         optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

         batch_size, seq_len = 4, 32
         input_ids = torch.randint(5, small_config.vocab_size, (batch_size, seq_len), device="cuda")
         attention_mask = torch.ones(batch_size, seq_len, dtype=torch.long, device="cuda")
         labels = input_ids.clone()

         initial_loss = None
         for _ in range(20):
             outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
             if initial_loss is None:
                 initial_loss = outputs.loss.item()
             outputs.loss.backward()
             optimizer.step()
             optimizer.zero_grad()

-        final_loss = outputs.loss.item()
+        model.eval()
+        with torch.no_grad():
+            final_loss = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).loss.item()
         assert final_loss < initial_loss, f"Loss did not decrease: {initial_loss} -> {final_loss}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py` around lines
135 - 154, The test reads a stale loss from the last training iteration
(outputs.loss) and is flaky due to nondeterminism/dropout; update
test_loss_decreases to set torch.manual_seed(...) for reproducibility, run
training loop on CodonFMForMaskedLM as-is, then switch the model to eval() and
perform a fresh forward pass (e.g., with torch.no_grad()) using the same
input_ids/attention_mask/labels to compute final_loss for comparison; ensure
initial_loss is captured before any backward/step and final_loss is computed
from the new forward pass (not from the cached outputs), and keep
optimizer.step()/zero_grad() unchanged inside the loop.
bionemo-recipes/models/codonfm/tests/common/fixtures.py (1)

40-43: ⚠️ Potential issue | 🟡 Minor

Bind the probe socket to loopback only.

This helper only needs a local rendezvous port. Binding ("", 0) exposes it on all interfaces unnecessarily on shared CI hosts; ("127.0.0.1", 0) is sufficient.

Suggested fix
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py` around lines 40 -
43, Change the helper that binds a probe socket (the with socket.socket(...)
block that returns s.getsockname()[1]) to bind to loopback only by replacing
s.bind(("", 0)) with s.bind(("127.0.0.1", 0)) so the rendezvous port is only
exposed locally instead of on all interfaces.
bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py (1)

52-60: ⚠️ Potential issue | 🟠 Major

Start the try before initializing CUDA/distributed state.

init_process_group() and torch.cuda.set_device() still execute before the finally. If anything in that block throws after the process group is created, the job leaks a live NCCL group.

Suggested structure
-    dist_config = DistributedConfig()
-    logger.info("Initializing distributed training: %s", dist_config)
-    device = torch.device(f"cuda:{dist_config.local_rank}")
-    torch.distributed.init_process_group(backend="nccl", device_id=device)
-    torch.cuda.set_device(dist_config.local_rank)
-
-    perf_logger = None
-    try:
+    perf_logger = None
+    try:
+        dist_config = DistributedConfig()
+        logger.info("Initializing distributed training: %s", dist_config)
+        device = torch.device(f"cuda:{dist_config.local_rank}")
+        torch.distributed.init_process_group(backend="nccl", device_id=device)
+        torch.cuda.set_device(dist_config.local_rank)
         ...
     finally:
         if perf_logger is not None:
             perf_logger.finish()
-        torch.distributed.destroy_process_group()
+        if torch.distributed.is_initialized():
+            torch.distributed.destroy_process_group()

Also applies to: 234-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py` around lines 52 -
60, The distributed/CUDA init must be wrapped in the try so the finally can
clean up a partially-created NCCL group; move the try: to start before creating
DistributedConfig and calling torch.distributed.init_process_group and
torch.cuda.set_device (i.e., ensure DistributedConfig(),
torch.distributed.init_process_group(backend="nccl", device_id=device),
torch.cuda.set_device(dist_config.local_rank) are all inside the try) and keep
the finally logic to destroy the process group; make the same change for the
second initialization block that also creates DistributedConfig and calls
init_process_group.
🧹 Nitpick comments (14)
bionemo-recipes/recipes/codonfm_native_te/tokenizer.py (1)

19-125: Avoid a second independent CodonTokenizer implementation.

This class encodes checkpoint-critical vocab and special-token ordering, but the model package already has a CodonTokenizer. If the two drift even slightly, the recipe can silently train or load against the wrong token IDs. Please reuse the model tokenizer here, or add a direct equivalence test that locks the two implementations together.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tokenizer.py` around lines 19 -
125, This file introduces a second, divergent CodonTokenizer implementation
(class CodonTokenizer, methods tokenize/encode/decode, SPECIAL_TOKENS) which can
drift from the model's canonical tokenizer; either remove this duplicate and
import and use the model's tokenizer (e.g., replace the class with an import of
the model package's CodonTokenizer and update call sites to use that single
symbol), or if import is infeasible add a unit test that asserts exact
equivalence between the local implementation and the model tokenizer (compare
SPECIAL_TOKENS, vocab_size, encoder keys/values, decoder mappings and the
results of tokenize/encode/decode on representative sequences) so the two remain
locked together.
bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py (1)

37-44: Make the NCCL/device-mesh setup opt-in.

With autouse=True, every test in this package initializes distributed CUDA state via init_process_group() and cuda.set_device(), even tests that never use it. Tests like test_tokenizer.py, test_model.py, test_quantization.py, and test_train.py don't reference the fixture, forcing them to depend on GPU and NCCL availability unnecessarily.

Remove autouse=True to make the fixture opt-in. Tests that need the mocked distributed functions (patched at lines 46-50) can explicitly request the fixture by adding device_mesh as a parameter.

♻️ Remove autouse to make fixture opt-in
-@pytest.fixture(scope="session", autouse=True)
+@pytest.fixture(scope="session")
 def device_mesh():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py` around lines 37
- 44, The device_mesh fixture currently runs for every test due to autouse=True
which forces NCCL/CUDA initialization; remove autouse=True from the
pytest.fixture decorator on the device_mesh function and keep the existing setup
calls (DistributedConfig(), torch.distributed.init_process_group,
torch.cuda.set_device, init_device_mesh) intact so it only runs when tests
explicitly request the fixture by adding device_mesh as a parameter (update any
tests that need the distributed environment to include device_mesh in their
function signature).
bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py (2)

317-318: THD format squeeze assumes batch dimension of 1.

The squeeze operation hidden_states.squeeze(0) only triggers when hidden_states.size(0) == 1. This is correct for THD format but the assumption should be documented or asserted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py` around
lines 317 - 318, The code assumes THD attention input has a singleton batch dim
when doing hidden_states.squeeze(0) inside the block checking
self.config.attn_input_format == "thd"; add an explicit assertion or documented
guard before squeezing (e.g., assert hidden_states.size(0) == 1 or raise a clear
error) and/or a short comment explaining the THD layout expectation so the
squeeze on hidden_states in modeling_codonfm_te.py is safe and self-explanatory.

212-222: Inconsistent error handling for FP8 vs FP4 without layer_precision.

When layer_precision is None with only fp8_recipe, the code auto-populates it with ["fp8"] * num_layers and issues a warning. However, when only fp4_recipe is provided, it raises a RuntimeError. This asymmetry may be confusing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py` around
lines 212 - 222, The branch handling self.config.layer_precision is
inconsistent: when fp8_recipe is provided it warns and auto-sets
self.config.layer_precision = ["fp8"] * self.config.num_hidden_layers, but when
only fp4_recipe is provided it raises a RuntimeError. Make the behavior
consistent by changing the fp4_recipe branch to mirror the fp8 branch—emit a
UserWarning and auto-populate self.config.layer_precision = ["fp4"] *
self.config.num_hidden_layers (or alternatively make both branches raise);
update the code paths around fp8_recipe, fp4_recipe, and
self.config.layer_precision accordingly so the logic is symmetric.
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py (2)

188-191: Return value mismatch between sync and async save.

dcp_save returns None while dcp_async_save returns a Future. Assigning the result unconditionally to _ckpt_futures["fsdp2"] means the sync path stores None, which works but is semantically unclear.

🔧 Proposed clarification
     ckpt_save_func = dcp_async_save if async_save else dcp_save
-    _ckpt_futures["fsdp2"] = ckpt_save_func(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    result = ckpt_save_func(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    if async_save:
+        _ckpt_futures["fsdp2"] = result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` around lines 188 -
191, The sync and async save functions return different types but the code
unconditionally assigns their return to _ckpt_futures["fsdp2"]; update the logic
around ckpt_save_func(state_dict, checkpoint_id=checkpoint_path,
process_group=process_group) so that when using dcp_async_save you store the
returned Future in _ckpt_futures["fsdp2"], and when using dcp_save you either
set _ckpt_futures["fsdp2"] to None (or remove the key) to make the semantic
difference explicit; locate the assignment around state_dict, ckpt_save_func,
dcp_async_save, dcp_save and change it to a conditional that only assigns the
Future for the async path.

42-42: Global mutable state for checkpoint futures could cause issues.

The module-level _ckpt_futures dict works for single-checkpoint-type scenarios but may cause subtle bugs if extended to track multiple checkpoint types concurrently. Consider documenting this limitation or using a more structured approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` at line 42, The
module-level mutable _ckpt_futures dict can cause cross-type collisions; replace
it with a structured manager or namespaced mapping and update call sites to use
it: create a CheckpointManager (or change _ckpt_futures to a dict-of-dicts keyed
by checkpoint type) with methods like get_future(checkpoint_type, key) and
set_future(checkpoint_type, key, future), then refactor any functions that
reference _ckpt_futures to call these methods (or pass an explicit manager
instance) so futures are isolated per checkpoint type; alternatively, if you
keep the module-level variable, add clear documentation and a typed alias to
signal the limitation.
bionemo-recipes/recipes/codonfm_native_te/dataset.py (1)

131-138: MLM masking uses unseeded random.random(), causing non-deterministic results across workers and epochs.

The collator uses the global random module without seeding, so different DataLoader workers and different epochs will produce different masks for the same samples. This may be intentional for training diversity but makes debugging and reproducibility harder.

🔧 Proposed fix: Pass and use a generator
 class CodonMLMCollator:
     def __init__(
         self,
         tokenizer: CodonTokenizer,
         max_seq_length: int = 512,
         mlm_probability: float = 0.15,
+        seed: int | None = None,
     ):
         self.tokenizer = tokenizer
         self.max_seq_length = max_seq_length
         self.mlm_probability = mlm_probability
+        self.rng = random.Random(seed)

     def __call__(self, batch: list[dict[str, str]]) -> dict[str, torch.Tensor]:
         # ... in the masking loop:
-                if random.random() < self.mlm_probability:
+                if self.rng.random() < self.mlm_probability:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py` around lines 131 - 138,
The MLM masking uses the global random module (random.random()/random.randint())
which causes non-determinism; modify the collator/dataset to accept and use a
passed RNG (e.g., an instance named rng or self.rng) instead of the global
random, and replace calls random.random() and random.randint(...) with
rng.random() and rng.randint(5, self.tokenizer.vocab_size - 1); ensure the RNG
is seeded per worker (using torch.utils.data.get_worker_info or by creating
per-worker rng seeds) so that mlm_probability, ids assignment,
tokenizer.mask_token_id and labels behavior are reproducible across workers and
epochs.
bionemo-recipes/recipes/codonfm_native_te/README.md (1)

49-66: Add language specifier to fenced code block.

The directory structure code block is missing a language identifier. While text or plaintext would work, this improves markdown rendering consistency.

♻️ Suggested fix
-```
+```text
 codonfm_native_te/
 ├── modeling_codonfm_te.py    — HF-compatible CodonFM model with TE layers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md` around lines 49 - 66,
Edit the fenced code block in README.md that displays the codonfm_native_te/
directory tree (the block containing lines like "codonfm_native_te/" and
"modeling_codonfm_te.py") and add a language specifier (e.g., change ``` to
```text or ```plaintext) so the block is rendered as plain text.
bionemo-recipes/recipes/codonfm_native_te/scheduler.py (1)

19-24: Consider adding type hints for the function signature.

The function parameters lack type annotations, which would improve readability and enable better static analysis.

♻️ Suggested improvement
+from torch.optim import Optimizer
+
 def get_linear_schedule_with_warmup(
-    optimizer,
+    optimizer: Optimizer,
     num_warmup_steps: int = 2_000,
     num_training_steps: int = 500_000,
     last_epoch: int = -1,
-):
+) -> LambdaLR:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py` around lines 19 - 24,
Add type hints to get_linear_schedule_with_warmup by annotating parameters and
return type: change the signature to include optimizer: Optimizer,
num_warmup_steps: int = 2_000, num_training_steps: int = 500_000, last_epoch:
int = -1 and a return type of _LRScheduler (or LambdaLR) and import Optimizer
and _LRScheduler (or LambdaLR) from torch.optim / torch.optim.lr_scheduler;
update any internal references if needed to satisfy the new annotations for
get_linear_schedule_with_warmup.
bionemo-recipes/models/codonfm/modeling_codonfm_te.py (2)

206-216: Asymmetric handling of FP8 vs FP4 when no layer_precision is provided.

When layer_precision is None:

  • FP8 recipe alone → auto-populates layer_precision with ["fp8"] * num_layers (Line 211)
  • FP4 recipe alone → raises RuntimeError (Lines 213-216)

This asymmetry could be confusing. Consider either making FP4 behave consistently with FP8 (auto-populate), or documenting why FP4 requires explicit layer specification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py` around lines 206 -
216, The code currently auto-populates self.config.layer_precision when
fp8_recipe is present but raises for fp4_recipe; change this to handle FP4
symmetrically by warning and setting self.config.layer_precision = ["fp4"] *
self.config.num_hidden_layers when fp4_recipe is provided and fp8_recipe is None
(use the same warnings.warn pattern as for fp8), or alternatively add a clear
comment explaining why FP4 requires explicit layer_precision; update the
conditional branch around fp8_recipe/fp4_recipe in the layer-precision
initialization (referencing self.config.layer_precision, fp8_recipe, fp4_recipe,
and self.config.num_hidden_layers) accordingly.

251-291: Complex nested context managers for layer-wise precision.

The get_autocast_context method handles multiple scenarios (init vs forward, outer vs per-layer, fp8 vs fp4 vs bf16). While functional, the logic is intricate:

  • outer=True wraps the entire encoder stack for FP8 only
  • init=True with use_quantized_model_init applies quantized_model_init
  • Per-layer forward uses te.autocast with the appropriate recipe

The multiple conditional branches could benefit from inline comments explaining the control flow for maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py` around lines 251 -
291, Add concise inline comments inside get_autocast_context to document each
control path: note the early return when self.config.layer_precision is None;
explain the outer branch (outer True) only enables FP8 autocast and warns when
self._fp8_recipe is missing before calling transformer_engine.pytorch.autocast;
annotate the init branch (init True and self.config.use_quantized_model_init) to
state it uses transformer_engine.pytorch.quantized_model_init for FP8/FP4 and
raises if FP4 recipe (_fp4_recipe) is missing; and comment the per-layer forward
branches that choose between FP8/FP4 autocast using recipe variables
(_fp8_recipe/_fp4_recipe) or return a disabled autocast by default.
bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py (1)

149-170: Same masking randomness issue as in generate_golden_values.py.

torch.manual_seed(42) is called inside the loop at Line 156 for each sample, causing identical masking probabilities across samples at the same positions. This matches the generator script behavior, so the tests should pass, but it's worth noting the limitation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py` around
lines 149 - 170, The test currently calls torch.manual_seed(42) inside the
per-sample loop, causing each sample to use the same RNG state and produce
identical mask positions; move the torch.manual_seed(42) call out of the loop
(place it once before iterating over encoded) so masking remains deterministic
across test runs but varies between samples (reference variables:
torch.manual_seed, encoded loop, ids/labels generation; related file
generate_golden_values.py).
bionemo-recipes/recipes/codonfm_native_te/quantization.py (1)

101-109: Temporary file is not cleaned up.

The tempfile.NamedTemporaryFile is created with delete=False, but the file is never cleaned up. This could lead to temporary file accumulation over multiple runs.

Consider returning a cleanup callback or documenting the caller's responsibility

The caller (initialize_quant_stats_logging) passes this path to debug_api.initialize(), so the file needs to persist during the debug session. Document that cleanup is the caller's responsibility, or return a context manager/cleanup function.

Alternatively, if the temp file should persist for the duration of the training run, this is acceptable but should be documented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py` around lines 101 -
109, The temporary file created with tempfile.NamedTemporaryFile(delete=False)
(temp_file) in the quantization function is never removed; update the function
(the one that returns temp_file.name and is used by
initialize_quant_stats_logging -> debug_api.initialize) to either (A) return a
paired cleanup callback or context-manager that deletes the file when the caller
is done, or (B) document clearly that the caller is responsible for removing the
returned path; implement option (A) by returning a small function like cleanup()
that unlinks temp_file.name (or a contextmanager that yields the path and
unlinks on __exit__), and keep logger lines as-is to preserve debug output.
bionemo-recipes/models/codonfm/tests/generate_golden_values.py (1)

111-135: Masking randomness is not deterministic per-sample due to repeated seeding.

torch.manual_seed(SEED) is called inside the loop at Line 122 for each sample. This resets the RNG state before processing each sequence, meaning the masking pattern depends only on position, not on which sample is being processed. While this may be intentional for reproducibility, it creates identical masking probabilities for all samples at the same position indices.

If truly independent-but-reproducible masking per sample is desired:

     for enc in encoded:
         pad_len = max_len - len(enc)
         ids = enc + [tokenizer.pad_token_id] * pad_len
         mask = [1] * len(enc) + [0] * pad_len
         lbl = [-100] * max_len
-        torch.manual_seed(SEED)
+        torch.manual_seed(SEED + len(input_ids))  # per-sample seed
         for i in range(1, len(enc) - 1):

However, if this behavior is intentional for exact reproducibility with the test file, no change is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py` around lines
111 - 135, The prepare_test_inputs function currently calls
torch.manual_seed(SEED) inside the per-sample loop which resets RNG each
iteration and yields identical position-wise masking across samples; move the
seeding so it produces the intended behavior: either seed once before the loop
(keep torch.manual_seed(SEED) above the for enc in encoded: line) to get
reproducible but varied masks across samples, or implement per-sample
deterministic seeds (e.g., torch.manual_seed(SEED + i) or use a per-sample
torch.Generator seeded from SEED) so each sample has independent yet
reproducible masking; update the code around torch.manual_seed(SEED) and the
masking loop in prepare_test_inputs accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/codonfm/dataset.py`:
- Around line 125-139: The MLM masking loop uses the global random module
(random.random(), random.randint()) which makes masks non-reproducible across
DataLoader workers; fix by switching to a per-instance RNG seeded from a
deterministic seed (and worker id) and use that RNG inside the loop. Concretely:
add a RNG attribute (e.g., self.rng = random.Random(base_seed) or create a
torch.Generator seeded from base_seed in the dataset/collator __init__), derive
a per-worker seed via torch.utils.data.get_worker_info() when the
collator/dataset is used in a worker, and replace calls to random.random() and
random.randint(...) in the MLM masking block (the code referencing
self.mlm_probability, tokenizer.mask_token_id, tokenizer.vocab_size, labels,
ids) with self.rng.random() and self.rng.randint(...) (or generator-based
equivalents) so masking is deterministic and reproducible across runs and
workers.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py`:
- Around line 432-470: In _magneto_init_weights, the name-based Q/K detection
("query" or "key" in name) misses fused TE QKV modules (e.g. self_attention.qkv)
so their weights get the scaled MAGNETO gain instead of 1.0; update the
detection used when computing is_qk in the loop over named_modules() (in the
_magneto_init_weights method) to also treat "qkv" module names as Q/K (for
example add a check for "qkv" in name) so that the fused QKV module and its
fused weight attributes are initialized with w_gain=1.0.

In `@bionemo-recipes/models/codonfm/requirements.txt`:
- Around line 1-3: The requirements file pins a non-existent minimum for
transformer_engine (transformer_engine>=1.14.0); change this to a valid released
minimum such as transformer_engine>=2.1.0 (or >=1.13.0 if targeting 1.x) in the
requirements.txt entry for transformer_engine, and confirm the other entries
torch>=2.6.0 and transformers>=4.48.0 are correct for the codebase (especially
if your code uses FP8/FP4/QuantizedTensor features prefer >=2.1.0).

In `@bionemo-recipes/models/codonfm/tests/common/__init__.py`:
- Around line 33-41: Update the docstring example to reference CodonFM classes
instead of ESM2: rename the example class ESM2ModelTester to CodonFMModelTester
and change the returned class from NVEsmForMaskedLM to CodonFMForMaskedLM in the
get_model_class implementation so the sample reflects CodonFM test utilities.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py`:
- Around line 28-30: The fixture currently sets NVTE_DEBUG after importing
transformer_engine.pytorch so the setting is ineffective; change the fixture to
set os.environ['NVTE_DEBUG'] (use monkeypatch.setenv in tests) before importing
transformer_engine modules and then reload the affected module(s) with
importlib.reload (e.g., reload transformer_engine.pytorch or recipe_module) so
the new env var takes effect; update the copied source in
bionemo-recipes/models/esm2/tests/common/fixtures.py to match this pattern used
in conftest.py (set env var via monkeypatch, then import/reload
transformer_engine modules).

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 80-91: prune_checkpoints currently deletes older checkpoint dirs
and can race with other ranks that may still be writing; update the caller or
prune_checkpoints to synchronize before removing files: if a distributed backend
is initialized (e.g., torch.distributed.is_initialized()), call
torch.distributed.barrier() (or your project's sync_barrier wrapper) so all
ranks finish writing before running prune_checkpoints, then proceed to
list/sort/delete; alternatively, add a clear docstring to prune_checkpoints and
its call site noting it must only run after a global barrier. Ensure you
reference the prune_checkpoints function and the place where it is invoked so
reviewers can find and apply the barrier.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Around line 36-40: The field default factories in DistributedConfig (rank,
local_rank, world_size, _master_addr, _master_port) currently call
os.environ.setdefault(...) which mutates the process environment; change each
default_factory to use os.getenv(...) (e.g., os.getenv("RANK", "0")) so the
defaults are read without writing to os.environ, preserving process/test
isolation and avoiding leaking defaults into child processes—do this for the
lambda bodies that initialize rank, local_rank, world_size, _master_addr, and
_master_port.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml`:
- Around line 41-44: Replace the hard-coded scheduler horizon in
lr_scheduler_kwargs (num_training_steps: 500_000) so it derives from the
top-level training steps value (e.g., num_train_steps / args.num_train_steps)
instead of a fixed constant; update the defaults.yaml to reference the
global/train config variable for num_train_steps and ensure num_warmup_steps
remains relative to that horizon if needed so the scheduler matches the actual
training loop.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt`:
- Around line 1-12: requirements.txt is missing the transformers dependency
(imported in modeling_codonfm_te.py) and lacks version constraints; update
bionemo-recipes/recipes/codonfm_native_te/requirements.txt to include a
transformers entry and add >= style version bounds for key packages (e.g.,
transformers, torch, pandas, pyarrow, hydra-core, etc.) consistent with
models/codonfm/requirements.txt so installs are reproducible—ensure the package
name is "transformers" and use the same minimum versions used in models/codonfm
to keep compatibility.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Around line 46-49: The existing mocks don't intercept the directly imported
init_device_mesh used in train_fsdp2 (imported as from
torch.distributed.device_mesh import init_device_mesh), so add a patch for the
recipe module's namespace
(mock.patch("codonfm_native_te.train_fsdp2.init_device_mesh",
return_value=device_mesh)) to the with(...) context alongside the existing
patches; update the mocked targets in conftest.py to include this new patch so
calls inside train_fsdp2 use the mocked device_mesh.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py`:
- Around line 32-110: The tests currently inherit a fixed MASTER_PORT from
DistributedConfig which can cause cross-test rendezvous collisions; add a
per-test fixture that uses pytest's unused_tcp_port (or similar) to pick an
unused port and sets MASTER_PORT (and optionally MASTER_ADDR) in the test
environment before calling main_fsdp2. Modify conftest.py to include a
function-scoped fixture (used by tests like test_sanity_convergence_fsdp2,
test_sanity_convergence_fsdp2_no_meta_device, test_sanity_convergence_fsdp2_thd,
test_sanity_convergence_fsdp2_fp8,
test_sanity_convergence_fsdp2_fp32_master_weights) that monkeypatches
os.environ["MASTER_PORT"]=str(unused_tcp_port) (and
os.environ["MASTER_ADDR"]="127.0.0.1") so each test gets a unique rendezvous
port instead of relying on the session-scoped init_process_group mock.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Line 233: The function main() currently returns perf_logger.min_loss which is
a tensor; change the return to a Python float or None by converting the tensor
to a scalar (e.g. use perf_logger.min_loss.item() and wrap with float(...))
before returning so callers receive a CPU Python float consistent with the
declared return type float | None; update the return statement in main() that
references perf_logger.min_loss accordingly.
- Around line 82-92: The debug session started by debug_api.initialize() inside
initialize_quant_stats_logging() must be torn down even if an exception occurs
before PerfLogger is constructed; modify the code so debug_api.end_debug() is
called independently of PerfLogger.finish() — either move debug initialization
to occur after PerfLogger is created, or wrap the region between
initialize_quant_stats_logging(...) and perf_logger = PerfLogger(...) in a
try/finally (or register debug_api.end_debug with atexit) and call
debug_api.end_debug() in the finally/cleanup path; reference
initialize_quant_stats_logging, debug_api.initialize, debug_api.end_debug, and
PerfLogger.finish/PerfLogger to locate and implement the change.

---

Duplicate comments:
In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py`:
- Around line 40-43: Change the helper that binds a probe socket (the with
socket.socket(...) block that returns s.getsockname()[1]) to bind to loopback
only by replacing s.bind(("", 0)) with s.bind(("127.0.0.1", 0)) so the
rendezvous port is only exposed locally instead of on all interfaces.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py`:
- Around line 83-96: The tests in TestCodonFMForMaskedLM (notably
test_forward_all_presets) call .cuda() unconditionally which will fail on
CPU-only runners; add a CUDA skip guard by marking the class or the test with
pytest.mark.skipif(not torch.cuda.is_available(), reason="requires CUDA") or
conditionally skip inside test using torch.cuda.is_available() so the test suite
cleanly skips when CUDA is unavailable.
- Around line 135-154: The test reads a stale loss from the last training
iteration (outputs.loss) and is flaky due to nondeterminism/dropout; update
test_loss_decreases to set torch.manual_seed(...) for reproducibility, run
training loop on CodonFMForMaskedLM as-is, then switch the model to eval() and
perform a fresh forward pass (e.g., with torch.no_grad()) using the same
input_ids/attention_mask/labels to compute final_loss for comparison; ensure
initial_loss is captured before any backward/step and final_loss is computed
from the new forward pass (not from the cached outputs), and keep
optimizer.step()/zero_grad() unchanged inside the loop.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 52-60: The distributed/CUDA init must be wrapped in the try so the
finally can clean up a partially-created NCCL group; move the try: to start
before creating DistributedConfig and calling
torch.distributed.init_process_group and torch.cuda.set_device (i.e., ensure
DistributedConfig(), torch.distributed.init_process_group(backend="nccl",
device_id=device), torch.cuda.set_device(dist_config.local_rank) are all inside
the try) and keep the finally logic to destroy the process group; make the same
change for the second initialization block that also creates DistributedConfig
and calls init_process_group.

---

Nitpick comments:
In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py`:
- Around line 206-216: The code currently auto-populates
self.config.layer_precision when fp8_recipe is present but raises for
fp4_recipe; change this to handle FP4 symmetrically by warning and setting
self.config.layer_precision = ["fp4"] * self.config.num_hidden_layers when
fp4_recipe is provided and fp8_recipe is None (use the same warnings.warn
pattern as for fp8), or alternatively add a clear comment explaining why FP4
requires explicit layer_precision; update the conditional branch around
fp8_recipe/fp4_recipe in the layer-precision initialization (referencing
self.config.layer_precision, fp8_recipe, fp4_recipe, and
self.config.num_hidden_layers) accordingly.
- Around line 251-291: Add concise inline comments inside get_autocast_context
to document each control path: note the early return when
self.config.layer_precision is None; explain the outer branch (outer True) only
enables FP8 autocast and warns when self._fp8_recipe is missing before calling
transformer_engine.pytorch.autocast; annotate the init branch (init True and
self.config.use_quantized_model_init) to state it uses
transformer_engine.pytorch.quantized_model_init for FP8/FP4 and raises if FP4
recipe (_fp4_recipe) is missing; and comment the per-layer forward branches that
choose between FP8/FP4 autocast using recipe variables (_fp8_recipe/_fp4_recipe)
or return a disabled autocast by default.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py`:
- Around line 111-135: The prepare_test_inputs function currently calls
torch.manual_seed(SEED) inside the per-sample loop which resets RNG each
iteration and yields identical position-wise masking across samples; move the
seeding so it produces the intended behavior: either seed once before the loop
(keep torch.manual_seed(SEED) above the for enc in encoded: line) to get
reproducible but varied masks across samples, or implement per-sample
deterministic seeds (e.g., torch.manual_seed(SEED + i) or use a per-sample
torch.Generator seeded from SEED) so each sample has independent yet
reproducible masking; update the code around torch.manual_seed(SEED) and the
masking loop in prepare_test_inputs accordingly.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py`:
- Around line 149-170: The test currently calls torch.manual_seed(42) inside the
per-sample loop, causing each sample to use the same RNG state and produce
identical mask positions; move the torch.manual_seed(42) call out of the loop
(place it once before iterating over encoded) so masking remains deterministic
across test runs but varies between samples (reference variables:
torch.manual_seed, encoded loop, ids/labels generation; related file
generate_golden_values.py).

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 188-191: The sync and async save functions return different types
but the code unconditionally assigns their return to _ckpt_futures["fsdp2"];
update the logic around ckpt_save_func(state_dict,
checkpoint_id=checkpoint_path, process_group=process_group) so that when using
dcp_async_save you store the returned Future in _ckpt_futures["fsdp2"], and when
using dcp_save you either set _ckpt_futures["fsdp2"] to None (or remove the key)
to make the semantic difference explicit; locate the assignment around
state_dict, ckpt_save_func, dcp_async_save, dcp_save and change it to a
conditional that only assigns the Future for the async path.
- Line 42: The module-level mutable _ckpt_futures dict can cause cross-type
collisions; replace it with a structured manager or namespaced mapping and
update call sites to use it: create a CheckpointManager (or change _ckpt_futures
to a dict-of-dicts keyed by checkpoint type) with methods like
get_future(checkpoint_type, key) and set_future(checkpoint_type, key, future),
then refactor any functions that reference _ckpt_futures to call these methods
(or pass an explicit manager instance) so futures are isolated per checkpoint
type; alternatively, if you keep the module-level variable, add clear
documentation and a typed alias to signal the limitation.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py`:
- Around line 131-138: The MLM masking uses the global random module
(random.random()/random.randint()) which causes non-determinism; modify the
collator/dataset to accept and use a passed RNG (e.g., an instance named rng or
self.rng) instead of the global random, and replace calls random.random() and
random.randint(...) with rng.random() and rng.randint(5,
self.tokenizer.vocab_size - 1); ensure the RNG is seeded per worker (using
torch.utils.data.get_worker_info or by creating per-worker rng seeds) so that
mlm_probability, ids assignment, tokenizer.mask_token_id and labels behavior are
reproducible across workers and epochs.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py`:
- Around line 317-318: The code assumes THD attention input has a singleton
batch dim when doing hidden_states.squeeze(0) inside the block checking
self.config.attn_input_format == "thd"; add an explicit assertion or documented
guard before squeezing (e.g., assert hidden_states.size(0) == 1 or raise a clear
error) and/or a short comment explaining the THD layout expectation so the
squeeze on hidden_states in modeling_codonfm_te.py is safe and self-explanatory.
- Around line 212-222: The branch handling self.config.layer_precision is
inconsistent: when fp8_recipe is provided it warns and auto-sets
self.config.layer_precision = ["fp8"] * self.config.num_hidden_layers, but when
only fp4_recipe is provided it raises a RuntimeError. Make the behavior
consistent by changing the fp4_recipe branch to mirror the fp8 branch—emit a
UserWarning and auto-populate self.config.layer_precision = ["fp4"] *
self.config.num_hidden_layers (or alternatively make both branches raise);
update the code paths around fp8_recipe, fp4_recipe, and
self.config.layer_precision accordingly so the logic is symmetric.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py`:
- Around line 101-109: The temporary file created with
tempfile.NamedTemporaryFile(delete=False) (temp_file) in the quantization
function is never removed; update the function (the one that returns
temp_file.name and is used by initialize_quant_stats_logging ->
debug_api.initialize) to either (A) return a paired cleanup callback or
context-manager that deletes the file when the caller is done, or (B) document
clearly that the caller is responsible for removing the returned path; implement
option (A) by returning a small function like cleanup() that unlinks
temp_file.name (or a contextmanager that yields the path and unlinks on
__exit__), and keep logger lines as-is to preserve debug output.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md`:
- Around line 49-66: Edit the fenced code block in README.md that displays the
codonfm_native_te/ directory tree (the block containing lines like
"codonfm_native_te/" and "modeling_codonfm_te.py") and add a language specifier
(e.g., change ``` to ```text or ```plaintext) so the block is rendered as plain
text.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py`:
- Around line 19-24: Add type hints to get_linear_schedule_with_warmup by
annotating parameters and return type: change the signature to include
optimizer: Optimizer, num_warmup_steps: int = 2_000, num_training_steps: int =
500_000, last_epoch: int = -1 and a return type of _LRScheduler (or LambdaLR)
and import Optimizer and _LRScheduler (or LambdaLR) from torch.optim /
torch.optim.lr_scheduler; update any internal references if needed to satisfy
the new annotations for get_linear_schedule_with_warmup.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Around line 37-44: The device_mesh fixture currently runs for every test due
to autouse=True which forces NCCL/CUDA initialization; remove autouse=True from
the pytest.fixture decorator on the device_mesh function and keep the existing
setup calls (DistributedConfig(), torch.distributed.init_process_group,
torch.cuda.set_device, init_device_mesh) intact so it only runs when tests
explicitly request the fixture by adding device_mesh as a parameter (update any
tests that need the distributed environment to include device_mesh in their
function signature).

In `@bionemo-recipes/recipes/codonfm_native_te/tokenizer.py`:
- Around line 19-125: This file introduces a second, divergent CodonTokenizer
implementation (class CodonTokenizer, methods tokenize/encode/decode,
SPECIAL_TOKENS) which can drift from the model's canonical tokenizer; either
remove this duplicate and import and use the model's tokenizer (e.g., replace
the class with an import of the model package's CodonTokenizer and update call
sites to use that single symbol), or if import is infeasible add a unit test
that asserts exact equivalence between the local implementation and the model
tokenizer (compare SPECIAL_TOKENS, vocab_size, encoder keys/values, decoder
mappings and the results of tokenize/encode/decode on representative sequences)
so the two remain locked together.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b5818f7-f406-4483-a2d3-2205cf9c7c24

📥 Commits

Reviewing files that changed from the base of the PR and between f437be0 and 4cc7e70.

⛔ Files ignored due to path filters (1)
  • bionemo-recipes/recipes/codonfm_native_te/train.parquet is excluded by !**/*.parquet
📒 Files selected for processing (40)
  • .gitignore
  • bionemo-recipes/models/codonfm/dataset.py
  • bionemo-recipes/models/codonfm/modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/requirements.txt
  • bionemo-recipes/models/codonfm/tests/common/README.md
  • bionemo-recipes/models/codonfm/tests/common/__init__.py
  • bionemo-recipes/models/codonfm/tests/common/fixtures.py
  • bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
  • bionemo-recipes/models/codonfm/tests/conftest.py
  • bionemo-recipes/models/codonfm/tests/generate_golden_values.py
  • bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
  • bionemo-recipes/models/codonfm/tests/golden_values.json
  • bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/tokenizer.py
  • bionemo-recipes/models/esm2/tests/common/test_modeling_common.py
  • bionemo-recipes/models/llama3/tests/common/test_modeling_common.py
  • bionemo-recipes/models/mixtral/tests/common/test_modeling_common.py
  • bionemo-recipes/models/qwen/tests/common/test_modeling_common.py
  • bionemo-recipes/recipes/codonfm_native_te/README.md
  • bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
  • bionemo-recipes/recipes/codonfm_native_te/dataset.py
  • bionemo-recipes/recipes/codonfm_native_te/distributed_config.py
  • bionemo-recipes/recipes/codonfm_native_te/fp4_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/L0_sanity.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml
  • bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
  • bionemo-recipes/recipes/codonfm_native_te/perf_logger.py
  • bionemo-recipes/recipes/codonfm_native_te/quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/requirements.txt
  • bionemo-recipes/recipes/codonfm_native_te/scheduler.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/__init__.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
  • bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
  • ci/scripts/check_copied_files.py

Comment thread bionemo-recipes/models/codonfm/dataset.py
Comment thread bionemo-recipes/models/codonfm/modeling_codonfm_te.py
Comment thread bionemo-recipes/models/codonfm/requirements.txt Outdated
Comment thread bionemo-recipes/models/codonfm/tests/common/__init__.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/requirements.txt
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py Outdated
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

@jomitchellnv Sure! I'll kick off a full re-review of the PR now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (8)
bionemo-recipes/recipes/codonfm_native_te/requirements.txt (1)

1-13: ⚠️ Potential issue | 🟡 Minor

Keep version bounds on the recipe stack.

The missing transformers entry is fixed, but this manifest is still completely unbounded. For a training recipe that couples torch, transformer_engine[pytorch], and transformers, upstream releases can change numerics or break installs unexpectedly. Please align the minimum versions with bionemo-recipes/models/codonfm/requirements.txt (or pin this recipe separately) so the environment stays reproducible.

Run this read-only check to compare bounded vs. unbounded entries:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path

ops = ("==", ">=", "<=", "~=", "!=", ">", "<")
paths = [
    Path("bionemo-recipes/recipes/codonfm_native_te/requirements.txt"),
    Path("bionemo-recipes/models/codonfm/requirements.txt"),
]

for path in paths:
    print(f"\n== {path} ==")
    for i, raw in enumerate(path.read_text().splitlines(), 1):
        line = raw.strip()
        if not line or line.startswith("#"):
            continue
        bounded = any(op in line for op in ops)
        print(f"{i:>3}: {line:<35} {'bounded' if bounded else 'UNBOUNDED'}")
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt` around lines 1 -
13, The requirements.txt for the codonfm_native_te recipe contains unbounded
package entries (e.g., hydra-core, torch, transformers,
transformer_engine[pytorch]) which risks unreproducible breaks; update this
recipe's requirements.txt to include the same minimum version bounds (or exact
pins) used in the models/codonfm/requirements.txt so torch,
transformer_engine[pytorch], transformers, and related packages match the model
stack; specifically add version specifiers (>= or ==) for transformers, torch,
transformer_engine[pytorch], and any other unbounded names listed (hydra-core,
pandas, pyarrow, etc.) to mirror the model requirements and ensure consistent
installs.
bionemo-recipes/models/codonfm/requirements.txt (1)

1-3: ⚠️ Potential issue | 🔴 Critical

Fix invalid transformer_engine>=1.14.0 constraint.

The specified version 1.14.0 does not exist on PyPI (versions go from 1.13.0 directly to 2.1.0). Update to either >=1.13.0 or >=2.1.0 depending on the required features. Given the FP8/FP4 quantization features used in this codebase, >=2.1.0 is likely appropriate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/requirements.txt` around lines 1 - 3, The
transformer_engine version constraint is invalid (1.14.0 doesn't exist); update
the requirement for the package named transformer_engine in requirements.txt to
a valid release that supports FP8/FP4 quantization (e.g., change the spec from
transformer_engine>=1.14.0 to transformer_engine>=2.1.0) so the installer can
resolve a real PyPI version while preserving needed features.
bionemo-recipes/recipes/codonfm_native_te/distributed_config.py (1)

36-40: ⚠️ Potential issue | 🟠 Major

Replace setdefault() with getenv() to prevent environment mutation.

Using os.environ.setdefault() in field default factories writes values to the global process environment, which can leak into child processes and break test isolation.

🛠️ Suggested fix
-    rank: int = field(default_factory=lambda: int(os.environ.setdefault("RANK", "0")))
-    local_rank: int = field(default_factory=lambda: int(os.environ.setdefault("LOCAL_RANK", "0")))
-    world_size: int = field(default_factory=lambda: int(os.environ.setdefault("WORLD_SIZE", "1")))
-    _master_addr: str = field(default_factory=lambda: os.environ.setdefault("MASTER_ADDR", "localhost"))
-    _master_port: str = field(default_factory=lambda: os.environ.setdefault("MASTER_PORT", "12355"))
+    rank: int = field(default_factory=lambda: int(os.getenv("RANK", "0")))
+    local_rank: int = field(default_factory=lambda: int(os.getenv("LOCAL_RANK", "0")))
+    world_size: int = field(default_factory=lambda: int(os.getenv("WORLD_SIZE", "1")))
+    _master_addr: str = field(default_factory=lambda: os.getenv("MASTER_ADDR", "localhost"))
+    _master_port: str = field(default_factory=lambda: os.getenv("MASTER_PORT", "12355"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py` around lines
36 - 40, The field default factories (rank, local_rank, world_size,
_master_addr, _master_port) currently call os.environ.setdefault(...) which
mutates the global environment; change each lambda to use os.getenv(...) (or
os.environ.get(...)) instead and keep the same conversions (int(...) for
rank/local_rank/world_size and str/identity for _master_addr/_master_port) so
defaults are read without writing to os.environ. Ensure you update the five
factories: rank, local_rank, world_size, _master_addr, and _master_port to use
getenv/get instead of setdefault.
bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml (1)

41-44: ⚠️ Potential issue | 🟠 Major

Scheduler horizon not tied to training steps.

The num_training_steps: 500_000 is hardcoded while num_train_steps at line 4 is a separate required field. This mismatch causes incorrect warmup/decay curves when only the top-level step count is overridden.

🔧 Suggested fix using Hydra interpolation
 lr_scheduler_kwargs:
   num_warmup_steps: 2_000
-  num_training_steps: 500_000
+  num_training_steps: ${num_train_steps}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml` around
lines 41 - 44, The lr_scheduler_kwargs currently hardcodes num_training_steps to
500_000 while the top-level training step count is defined as num_train_steps;
change num_training_steps inside lr_scheduler_kwargs to reference the top-level
num_train_steps via Hydra interpolation (replace the literal 500_000 with an
expression that points to num_train_steps) so the scheduler horizon follows
overrides to num_train_steps; keep num_warmup_steps as-is and ensure the key
names lr_scheduler_kwargs, num_warmup_steps, num_training_steps and
num_train_steps are used exactly to locate and update the values.
bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py (1)

136-155: ⚠️ Potential issue | 🟡 Minor

Stale loss value in assertion.

Line 154 reads outputs.loss.item() from the last forward pass before the final optimizer.step(), so final_loss reflects the state before the last weight update. Additionally, without fixing the RNG seed or disabling dropout within this test, results may be flaky.

Consider recomputing the loss after the training loop completes:

🧪 Suggested fix
+        torch.manual_seed(0)
+        torch.cuda.manual_seed_all(0)
         model = CodonFMForMaskedLM(small_config).cuda()
+        model.train()
         optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
@@
-        final_loss = outputs.loss.item()
+        model.eval()
+        with torch.no_grad():
+            final_loss = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).loss.item()
         assert final_loss < initial_loss, f"Loss did not decrease: {initial_loss} -> {final_loss}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py` around lines
136 - 155, The test uses the last cached outputs (outputs.loss) from the final
forward pass before the final optimizer.step, so final_loss is stale and may be
flaky due to nondeterminism; fix test_loss_decreases by (1) after the training
loop perform a fresh forward pass to recompute the loss (call
model(input_ids=..., attention_mask=..., labels=...) again and use that loss for
final_loss), and (2) make the final evaluation deterministic by setting a seed
(e.g., torch.manual_seed(...)) and/or switching the model to eval mode
(model.eval()) for that final forward pass; reference the test function
test_loss_decreases and the model class CodonFMForMaskedLM and ensure you
compare the recomputed final_loss against initial_loss.
bionemo-recipes/recipes/codonfm_native_te/dataset.py (1)

85-103: ⚠️ Potential issue | 🟠 Major

Masking RNG still duplicates across workers.

Each worker receives a collator copy with the same random.Random(seed) state, and create_*_dataloader() never forwards the factory seed into either collator anyway. With num_workers > 1, masking patterns repeat across workers, and changing the dataloader seed does not actually change masking. Seed the collator from the factory input and derive a per-worker stream via torch.utils.data.get_worker_info(). Keep the mirror in bionemo-recipes/models/codonfm/dataset.py aligned.

Also applies to: 158-176, 263-267, 318-322

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py` around lines 85 - 103,
The collator's RNG is created with a fixed seed in the dataset __init__
(self.rng = random.Random(seed)) and not re-seeded per worker, causing duplicate
masking across dataloader workers; modify the code to accept and store the
factory seed (pass seed through the create_*_dataloader() call into the
dataset/collator constructor) and in the collator/dataset masking code detect
worker via torch.utils.data.get_worker_info() and derive a per-worker RNG (e.g.,
combine the stored seed with worker_info.id to create a new Random instance)
instead of using the single shared self.rng; apply the same change to the
mirrored implementation in bionemo-recipes/models/codonfm/dataset.py so both
remain aligned.
bionemo-recipes/recipes/codonfm_native_te/quantization.py (1)

191-223: ⚠️ Potential issue | 🟠 Major

Reject [] for explicit layer selections.

fp8_layers=[] / fp4_layers=[] still bypass the validator and silently mean “enabled, but on zero layers”. That makes an invalid quantization config look successful while actually leaving those layers in BF16.

Suggested guard
     # Validate layer entries are integers within valid range.
     for name, layers in [("fp8_layers", fp8_layers), ("fp4_layers", fp4_layers)]:
-        if layers is not None and len(layers) > 0:
+        if layers == []:
+            raise ValueError(f"{name} cannot be an explicit empty list; use null to mean 'unspecified'.")
+        if layers is not None:
             invalid = [x for x in layers if not isinstance(x, int) or x < 1 or x > num_layers]
             if invalid:
                 raise ValueError(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py` around lines 191 -
223, The code currently allows fp8_layers=[] or fp4_layers=[] which silently
means "enabled but zero layers"; update the validation for fp8_layers and
fp4_layers so empty lists are rejected when provided: in the loop that inspects
("fp8_layers", fp8_layers) and ("fp4_layers", fp4_layers) treat layers is not
None (not layers and not len>0) as invalid—i.e., if layers is not None then
check both that len(layers) > 0 and that every entry is an int in 1..num_layers,
and raise a ValueError (mentioning fp8_layers/fp4_layers and num_layers) if the
list is empty or contains out-of-range/non-integer entries; keep the later logic
that fills defaults when layers is None unchanged (so explicit empty lists are
rejected but omitted lists still get auto-assigned).
bionemo-recipes/recipes/codonfm_native_te/perf_logger.py (1)

107-145: ⚠️ Potential issue | 🟠 Major

The logging window is still one step late.

step is zero-based here, so step % self.logging_frequency == 0 and step > 0 first fires after logging_frequency + 1 optimizer steps. On a 100-step run with frequency 10, that emits only 9 updates and closes the progress bar at 90/100. Switch the gate to (step + 1) % self.logging_frequency == 0, and flush the trailing partial window in finish().

Also applies to: 150-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py` around lines 107 -
145, The logging gate is off-by-one: replace the condition "if step %
self.logging_frequency == 0 and step > 0" in PerfLogger's per-step logging block
(the code referencing self.logging_frequency, outputs, self.metrics, wandb.log,
and self._progress_bar) with "(step + 1) % self.logging_frequency == 0" (and
remove the step > 0 requirement), and make the identical change to the other
logging gate in the file (the second block that updates metrics and progress
bar). Additionally, implement flushing in finish(): when finishing training, if
there are any accumulated metrics in self.metrics (a partial window), compute
and convert them the same way you do in the per-step block, call wandb.log(...,
step=last_step), update the progress bar by the remaining count, set_postfix
with the latest loss, then reset metrics so the final partial window is emitted.
🧹 Nitpick comments (7)
bionemo-recipes/recipes/codonfm_native_te/scheduler.py (2)

19-29: Use a Google-style docstring on this helper.

This new public function’s docstring is missing the standard Args/Returns sections used elsewhere in the repo.

Docstring shape
 def get_linear_schedule_with_warmup(
     optimizer,
     num_warmup_steps=2_000,
     num_training_steps=500_000,
     last_epoch=-1,
 ):
-    """Linear warmup and decay scheduler for CodonFM pretraining.
-
-    The learning rate is warmed up over the first warmup steps to a peak value, then linearly decayed to
-    one tenth of its peak value over 90% of training duration.
-    """
+    """Linear warmup and decay scheduler for CodonFM pretraining.
+
+    Args:
+        optimizer: Optimizer whose learning rate will be scheduled.
+        num_warmup_steps: Number of warmup steps.
+        num_training_steps: Total number of training steps.
+        last_epoch: Last completed epoch when resuming training.
+
+    Returns:
+        A `LambdaLR` implementing warmup, decay, and the final 0.1 floor.
+    """

As per coding guidelines, "Ensure all Python files follow Google-style docstrings (pydocstyle convention)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py` around lines 19 - 29,
The function get_linear_schedule_with_warmup has a free-form docstring; replace
it with a Google-style docstring that includes a short summary line, an Args
section documenting optimizer (type), num_warmup_steps, num_training_steps, and
last_epoch (with defaults and meanings), and a Returns section describing the
returned scheduler object (type and behavior: linear warmup to peak then linear
decay to one tenth over 90% of training). Also include any Raises if applicable
(e.g., invalid step counts) to match the repo's pydocstyle conventions.

30-41: Handle warmup values above the decay cutoff explicitly.

If num_warmup_steps falls between int(num_training_steps * 0.9) and num_training_steps, the linear-decay branch is never reached and the LR cliffs straight from warmup to 0.1. That makes a bad Hydra value silently change the training regime instead of failing fast.

Suggested guard
 def get_linear_schedule_with_warmup(
     optimizer,
     num_warmup_steps=2_000,
     num_training_steps=500_000,
     last_epoch=-1,
 ):
@@
+    if num_training_steps <= 0:
+        raise ValueError("num_training_steps must be positive")
+    if num_warmup_steps < 0:
+        raise ValueError("num_warmup_steps must be non-negative")
     decay_steps = int(num_training_steps * 0.9)
+    if decay_steps < num_warmup_steps < num_training_steps:
+        raise ValueError(
+            "num_warmup_steps cannot exceed the decay cutoff unless warmup spans the full run"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py` around lines 30 - 41,
The lr scheduling logic in lr_lambda allows num_warmup_steps to fall into
(decay_steps, num_training_steps] which causes a sudden cliff from warmup to
0.1; add an explicit guard that validates num_warmup_steps <= decay_steps (where
decay_steps = int(num_training_steps * 0.9)) and fails fast by raising a clear
ValueError (mentioning num_warmup_steps, decay_steps and num_training_steps)
before defining or returning lr_lambda; alternatively, if you prefer runtime
checking, detect the case inside lr_lambda (using current_step and
num_warmup_steps vs decay_steps) and raise an error instead of silently
producing the cliff. Ensure references to lr_lambda, num_warmup_steps,
decay_steps and num_training_steps are used in the error message to aid
debugging.
bionemo-recipes/recipes/codonfm_native_te/README.md (1)

49-66: Add language specifier to fenced code block.

The repository structure code block should have a language specifier (e.g., text or plaintext) to satisfy markdown linting rules.

📝 Suggested fix
-```
+```text
 codonfm_native_te/
 ├── modeling_codonfm_te.py    — HF-compatible CodonFM model with TE layers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md` around lines 49 - 66,
The fenced repository-structure code block in README.md (the block starting with
the line containing "codonfm_native_te/") lacks a language specifier; update the
opening fence from ``` to ```text (or ```plaintext) so the block becomes a
plaintext code block and satisfies markdown linting rules while preserving the
same content and indentation.
bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py (1)

54-57: Using private API _mesh_resources.mesh_stack.

_mesh_resources.mesh_stack.clear() accesses a private API that may change in future PyTorch versions. Consider adding a comment explaining why this is necessary, or wrapping it in a try/except to handle potential API changes.

💡 Defensive approach
     torch.distributed.destroy_process_group()
-    _mesh_resources.mesh_stack.clear()
+    # Clean up mesh resources (private API, may change in future PyTorch versions)
+    try:
+        _mesh_resources.mesh_stack.clear()
+    except AttributeError:
+        pass  # API changed, cleanup not critical
     torch.cuda.empty_cache()
     torch.cuda.synchronize()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py` around lines 54
- 57, The test teardown calls private API _mesh_resources.mesh_stack.clear(); to
make this robust, wrap that call in a try/except that falls back gracefully
(e.g., log/debug a warning) if the attribute or method does not exist, and add a
brief comment explaining why the private mesh stack must be cleared; keep the
surrounding cleanup calls (torch.distributed.destroy_process_group(),
torch.cuda.empty_cache(), torch.cuda.synchronize()) as-is so the overall cleanup
order is preserved and only protect or document the
_mesh_resources.mesh_stack.clear() usage.
bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py (2)

287-288: strict=False may silently ignore missing keys.

Using strict=False in load_state_dict could mask issues where the golden state dict doesn't match the model architecture. Consider logging or asserting on the returned missing/unexpected keys for debugging purposes.

💡 Example verification
-        model.load_state_dict(golden_sd, strict=False)
+        missing, unexpected = model.load_state_dict(golden_sd, strict=False)
+        if missing or unexpected:
+            import warnings
+            warnings.warn(f"State dict mismatch - missing: {missing}, unexpected: {unexpected}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py` around
lines 287 - 288, The call model.load_state_dict(golden_sd, strict=False) can
silently ignore mismatches; change it to capture the returned result (e.g.,
result = model.load_state_dict(golden_sd, strict=False)) and then assert or log
result.missing_keys and result.unexpected_keys (or raise if either is non-empty)
before calling model.eval(), so any missing/unexpected parameters in golden_sd
are surfaced during tests; reference model.load_state_dict, golden_sd, and
model.eval when making this change.

156-160: RNG reset produces identical masking per sequence.

torch.manual_seed(42) is called inside the loop for each sequence, so every sequence gets the same masking pattern (same random positions). If you intend different masking per sequence, seed once before the loop. If identical masking is intentional for reproducibility, this is fine.

Also applies to: 179-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py` around
lines 156 - 160, The fixed code should stop reseeding the global RNG inside each
sequence loop: move the torch.manual_seed(42) call out of the per-sequence loop
(e.g., place it once before iterating sequences) so each sequence gets a
different masking pattern, or remove/make the seed conditional if identical
masking per sequence is desired; update the occurrences around the loop that set
lbl, ids, and tokenizer.mask_token_id (and the similar block at lines 179-183)
accordingly.
bionemo-recipes/models/codonfm/tests/conftest.py (1)

38-45: Minor inconsistency with shared fixture.

This unused_tcp_port fixture differs slightly from the one in tests/common/fixtures.py (Lines 33-42) by adding s.listen(1), which is unnecessary for port discovery. Consider using the shared fixture from the plugin instead to avoid duplication.

Note: The static analysis warning about binding to '' is expected behavior for ephemeral port allocation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/conftest.py` around lines 38 - 45, The
fixture unused_tcp_port duplicates the shared fixture in
tests/common/fixtures.py and adds an unnecessary s.listen(1); remove this
duplication by importing and using the shared unused_tcp_port fixture from
tests.common.fixtures (or delete this local definition), or if you must keep a
local version, drop the s.listen(1) call and keep the socket bind-to-0 logic so
it matches the shared implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/codonfm/dataset.py`:
- Around line 22-23: Replace the bare/top-level imports by using a local
relative tokenizer import and a structural type for the distributed config:
remove "from distributed_config import DistributedConfig" and "from tokenizer
import CodonTokenizer", import the tokenizer relatively (e.g., a local
CodonTokenizer from the package using a relative import), and define a small
Protocol (e.g., DistConfigProtocol with attributes/methods rank and world_size)
to use in type annotations and function/class signatures that previously
referenced DistributedConfig (replace references to DistributedConfig with
DistConfigProtocol and update call sites to rely only on rank/world_size).
Ensure all imports are relative to the package and update any code that
constructed or accessed DistributedConfig to accept the protocol-compatible
object instead.

In `@bionemo-recipes/models/codonfm/tokenizer.py`:
- Around line 114-130: The decode method currently assumes special token IDs are
0..N-1 by building special_ids = set(range(len(self.SPECIAL_TOKENS))) but
from_pretrained() never reloads SPECIAL_TOKENS or the exact special token ID
mapping saved by save_pretrained(), breaking round-trip decoding; fix by
rehydrating the full tokenizer state in from_pretrained() (restore
SPECIAL_TOKENS, special token -> id mapping, and any saved encoder/decoder
mappings) so decode() can compute special_ids from the actual saved special
token IDs (e.g., use the saved special_token_to_id map or stored
SPECIAL_TOKENS_IDS) instead of range(len(...)), update any uses in decode,
save_pretrained, and from_pretrained to read/write the same metadata keys to
ensure consistency between encoder/decoder and special token IDs.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 182-191: The current async save may remain in-flight because we
only wait for the previous future before starting a new one; ensure the final
future is awaited before exit by checking _ckpt_futures["fsdp2"] after
scheduling (and in save_final_model_fsdp2) and calling .result() if it is not
None, and repeat the same pattern for the other async slots handled in the
200-218 region; update usages around dcp_async_save / dcp_save and
save_final_model_fsdp2 to join outstanding futures (e.g., call
_ckpt_futures["fsdp2"].result()) so the last async checkpoint completes before
process exit.

In `@bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml`:
- Around line 10-18: Update the LogFp8TensorStats entries in
fp8_debugging_stats.yaml to use explicit recipe-prefixed stat names to match
fp4_debugging_stats.yaml and the fp8_recipe in defaults.yaml; replace the
unprefixed stats under the tensor blocks (activation, gradient, weight) with
fp8_delayed_scaling_underflows%, fp8_delayed_scaling_scale_inv_min,
fp8_delayed_scaling_scale_inv_max, and fp8_delayed_scaling_mse (referencing the
fp8_recipe: transformer_engine.common.recipe.DelayedScaling and the
LogFp8TensorStats hook) so all stats are consistently prefixed.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py`:
- Around line 358-395: The LM head always uses GELU in forward but CodonFMConfig
supports config.hidden_act (e.g., "relu" or "silu"), causing encoder/head
mismatch; fix by storing the configured activation in __init__ (e.g.,
self.hidden_act = config.hidden_act or map it to a callable) and then use that
activation instead of torch.nn.functional.gelu in forward (while keeping the
transformer_engine.pytorch.autocast(enabled=False) context and existing
operations on self.dense and self.layer_norm_linear); ensure supported names
("gelu","relu","silu") map to the correct torch.nn.functional functions and
raise a clear error for unsupported values.
- Around line 322-325: The code builds RoPE using
self.rotary_embeddings(max_seq_len=self.config.max_position_embeddings) which
uses the config max instead of the current batch length; change it to compute
max_seq_len from the active input (e.g., derive from hidden_states.shape[1] or,
if using packed sequences, from cu_seq_lens_q[-1] like the PTL reference) and
pass that to self.rotary_embeddings so rotary positions match the current batch;
keep the subsequent .to(hidden_states.device, non_blocking=True) call.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py`:
- Around line 160-168: The positive assertions in test_matches_correct_sublayers
are too loose because they use suffixes like "_something" and will still pass if
generate_layer_regex overmatches prefixes; update the test (function
test_matches_correct_sublayers) to assert against the exact sublayer names
produced by generate_layer_regex (e.g., use
"encoder.layers.1.self_attention.layernorm_qkv",
"encoder.layers.1.self_attention.proj", "encoder.layers.1.layernorm_mlp.fc1",
"encoder.layers.1.layernorm_mlp.fc2") or include a strict boundary (dot or
end-of-string) after the sublayer name so only the intended sublayers match, and
keep the negative case "encoder.layers.1.self_attention.some_other_thing"
unchanged.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 224-231: The final_model export is missing tokenizer metadata
required by CodonTokenizer.from_pretrained; update save_final_model_fsdp2 to
also write the tokenizer artifacts (tokenizer_config.json and any vocab/merges
or special files) into the final_model directory. Specifically, ensure
save_final_model_fsdp2 (and its caller in train_fsdp2) either accepts a
tokenizer argument or locates the original tokenizer source from the
model/config and then calls tokenizer.save_pretrained/save files or copies
tokenizer_config.json alongside the existing model.safetensors and config.json
so CodonTokenizer.from_pretrained can load the tokenizer from the self-contained
final_model directory.

---

Duplicate comments:
In `@bionemo-recipes/models/codonfm/requirements.txt`:
- Around line 1-3: The transformer_engine version constraint is invalid (1.14.0
doesn't exist); update the requirement for the package named transformer_engine
in requirements.txt to a valid release that supports FP8/FP4 quantization (e.g.,
change the spec from transformer_engine>=1.14.0 to transformer_engine>=2.1.0) so
the installer can resolve a real PyPI version while preserving needed features.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py`:
- Around line 85-103: The collator's RNG is created with a fixed seed in the
dataset __init__ (self.rng = random.Random(seed)) and not re-seeded per worker,
causing duplicate masking across dataloader workers; modify the code to accept
and store the factory seed (pass seed through the create_*_dataloader() call
into the dataset/collator constructor) and in the collator/dataset masking code
detect worker via torch.utils.data.get_worker_info() and derive a per-worker RNG
(e.g., combine the stored seed with worker_info.id to create a new Random
instance) instead of using the single shared self.rng; apply the same change to
the mirrored implementation in bionemo-recipes/models/codonfm/dataset.py so both
remain aligned.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Around line 36-40: The field default factories (rank, local_rank, world_size,
_master_addr, _master_port) currently call os.environ.setdefault(...) which
mutates the global environment; change each lambda to use os.getenv(...) (or
os.environ.get(...)) instead and keep the same conversions (int(...) for
rank/local_rank/world_size and str/identity for _master_addr/_master_port) so
defaults are read without writing to os.environ. Ensure you update the five
factories: rank, local_rank, world_size, _master_addr, and _master_port to use
getenv/get instead of setdefault.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml`:
- Around line 41-44: The lr_scheduler_kwargs currently hardcodes
num_training_steps to 500_000 while the top-level training step count is defined
as num_train_steps; change num_training_steps inside lr_scheduler_kwargs to
reference the top-level num_train_steps via Hydra interpolation (replace the
literal 500_000 with an expression that points to num_train_steps) so the
scheduler horizon follows overrides to num_train_steps; keep num_warmup_steps
as-is and ensure the key names lr_scheduler_kwargs, num_warmup_steps,
num_training_steps and num_train_steps are used exactly to locate and update the
values.

In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py`:
- Around line 107-145: The logging gate is off-by-one: replace the condition "if
step % self.logging_frequency == 0 and step > 0" in PerfLogger's per-step
logging block (the code referencing self.logging_frequency, outputs,
self.metrics, wandb.log, and self._progress_bar) with "(step + 1) %
self.logging_frequency == 0" (and remove the step > 0 requirement), and make the
identical change to the other logging gate in the file (the second block that
updates metrics and progress bar). Additionally, implement flushing in finish():
when finishing training, if there are any accumulated metrics in self.metrics (a
partial window), compute and convert them the same way you do in the per-step
block, call wandb.log(..., step=last_step), update the progress bar by the
remaining count, set_postfix with the latest loss, then reset metrics so the
final partial window is emitted.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py`:
- Around line 191-223: The code currently allows fp8_layers=[] or fp4_layers=[]
which silently means "enabled but zero layers"; update the validation for
fp8_layers and fp4_layers so empty lists are rejected when provided: in the loop
that inspects ("fp8_layers", fp8_layers) and ("fp4_layers", fp4_layers) treat
layers is not None (not layers and not len>0) as invalid—i.e., if layers is not
None then check both that len(layers) > 0 and that every entry is an int in
1..num_layers, and raise a ValueError (mentioning fp8_layers/fp4_layers and
num_layers) if the list is empty or contains out-of-range/non-integer entries;
keep the later logic that fills defaults when layers is None unchanged (so
explicit empty lists are rejected but omitted lists still get auto-assigned).

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt`:
- Around line 1-13: The requirements.txt for the codonfm_native_te recipe
contains unbounded package entries (e.g., hydra-core, torch, transformers,
transformer_engine[pytorch]) which risks unreproducible breaks; update this
recipe's requirements.txt to include the same minimum version bounds (or exact
pins) used in the models/codonfm/requirements.txt so torch,
transformer_engine[pytorch], transformers, and related packages match the model
stack; specifically add version specifiers (>= or ==) for transformers, torch,
transformer_engine[pytorch], and any other unbounded names listed (hydra-core,
pandas, pyarrow, etc.) to mirror the model requirements and ensure consistent
installs.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py`:
- Around line 136-155: The test uses the last cached outputs (outputs.loss) from
the final forward pass before the final optimizer.step, so final_loss is stale
and may be flaky due to nondeterminism; fix test_loss_decreases by (1) after the
training loop perform a fresh forward pass to recompute the loss (call
model(input_ids=..., attention_mask=..., labels=...) again and use that loss for
final_loss), and (2) make the final evaluation deterministic by setting a seed
(e.g., torch.manual_seed(...)) and/or switching the model to eval mode
(model.eval()) for that final forward pass; reference the test function
test_loss_decreases and the model class CodonFMForMaskedLM and ensure you
compare the recomputed final_loss against initial_loss.

---

Nitpick comments:
In `@bionemo-recipes/models/codonfm/tests/conftest.py`:
- Around line 38-45: The fixture unused_tcp_port duplicates the shared fixture
in tests/common/fixtures.py and adds an unnecessary s.listen(1); remove this
duplication by importing and using the shared unused_tcp_port fixture from
tests.common.fixtures (or delete this local definition), or if you must keep a
local version, drop the s.listen(1) call and keep the socket bind-to-0 logic so
it matches the shared implementation.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py`:
- Around line 287-288: The call model.load_state_dict(golden_sd, strict=False)
can silently ignore mismatches; change it to capture the returned result (e.g.,
result = model.load_state_dict(golden_sd, strict=False)) and then assert or log
result.missing_keys and result.unexpected_keys (or raise if either is non-empty)
before calling model.eval(), so any missing/unexpected parameters in golden_sd
are surfaced during tests; reference model.load_state_dict, golden_sd, and
model.eval when making this change.
- Around line 156-160: The fixed code should stop reseeding the global RNG
inside each sequence loop: move the torch.manual_seed(42) call out of the
per-sequence loop (e.g., place it once before iterating sequences) so each
sequence gets a different masking pattern, or remove/make the seed conditional
if identical masking per sequence is desired; update the occurrences around the
loop that set lbl, ids, and tokenizer.mask_token_id (and the similar block at
lines 179-183) accordingly.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md`:
- Around line 49-66: The fenced repository-structure code block in README.md
(the block starting with the line containing "codonfm_native_te/") lacks a
language specifier; update the opening fence from ``` to ```text (or
```plaintext) so the block becomes a plaintext code block and satisfies markdown
linting rules while preserving the same content and indentation.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py`:
- Around line 19-29: The function get_linear_schedule_with_warmup has a
free-form docstring; replace it with a Google-style docstring that includes a
short summary line, an Args section documenting optimizer (type),
num_warmup_steps, num_training_steps, and last_epoch (with defaults and
meanings), and a Returns section describing the returned scheduler object (type
and behavior: linear warmup to peak then linear decay to one tenth over 90% of
training). Also include any Raises if applicable (e.g., invalid step counts) to
match the repo's pydocstyle conventions.
- Around line 30-41: The lr scheduling logic in lr_lambda allows
num_warmup_steps to fall into (decay_steps, num_training_steps] which causes a
sudden cliff from warmup to 0.1; add an explicit guard that validates
num_warmup_steps <= decay_steps (where decay_steps = int(num_training_steps *
0.9)) and fails fast by raising a clear ValueError (mentioning num_warmup_steps,
decay_steps and num_training_steps) before defining or returning lr_lambda;
alternatively, if you prefer runtime checking, detect the case inside lr_lambda
(using current_step and num_warmup_steps vs decay_steps) and raise an error
instead of silently producing the cliff. Ensure references to lr_lambda,
num_warmup_steps, decay_steps and num_training_steps are used in the error
message to aid debugging.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Around line 54-57: The test teardown calls private API
_mesh_resources.mesh_stack.clear(); to make this robust, wrap that call in a
try/except that falls back gracefully (e.g., log/debug a warning) if the
attribute or method does not exist, and add a brief comment explaining why the
private mesh stack must be cleared; keep the surrounding cleanup calls
(torch.distributed.destroy_process_group(), torch.cuda.empty_cache(),
torch.cuda.synchronize()) as-is so the overall cleanup order is preserved and
only protect or document the _mesh_resources.mesh_stack.clear() usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de4fdcbb-4a47-4730-9c0c-fcfdc219106f

📥 Commits

Reviewing files that changed from the base of the PR and between f437be0 and 5067cfe.

⛔ Files ignored due to path filters (1)
  • bionemo-recipes/recipes/codonfm_native_te/train.parquet is excluded by !**/*.parquet
📒 Files selected for processing (40)
  • .gitignore
  • bionemo-recipes/models/codonfm/dataset.py
  • bionemo-recipes/models/codonfm/modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/requirements.txt
  • bionemo-recipes/models/codonfm/tests/common/README.md
  • bionemo-recipes/models/codonfm/tests/common/__init__.py
  • bionemo-recipes/models/codonfm/tests/common/fixtures.py
  • bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
  • bionemo-recipes/models/codonfm/tests/conftest.py
  • bionemo-recipes/models/codonfm/tests/generate_golden_values.py
  • bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
  • bionemo-recipes/models/codonfm/tests/golden_values.json
  • bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/tokenizer.py
  • bionemo-recipes/models/esm2/tests/common/test_modeling_common.py
  • bionemo-recipes/models/llama3/tests/common/test_modeling_common.py
  • bionemo-recipes/models/mixtral/tests/common/test_modeling_common.py
  • bionemo-recipes/models/qwen/tests/common/test_modeling_common.py
  • bionemo-recipes/recipes/codonfm_native_te/README.md
  • bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
  • bionemo-recipes/recipes/codonfm_native_te/dataset.py
  • bionemo-recipes/recipes/codonfm_native_te/distributed_config.py
  • bionemo-recipes/recipes/codonfm_native_te/fp4_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/L0_sanity.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml
  • bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
  • bionemo-recipes/recipes/codonfm_native_te/perf_logger.py
  • bionemo-recipes/recipes/codonfm_native_te/quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/requirements.txt
  • bionemo-recipes/recipes/codonfm_native_te/scheduler.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/__init__.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
  • bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
  • ci/scripts/check_copied_files.py

Comment thread bionemo-recipes/models/codonfm/dataset.py
Comment thread bionemo-recipes/models/codonfm/tokenizer.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
Comment thread bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

@jomitchellnv Sure! I'll kick off a full re-review of the PR now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (10)
bionemo-recipes/recipes/codonfm_native_te/distributed_config.py (1)

36-40: ⚠️ Potential issue | 🟠 Major

Replace setdefault() with getenv() to prevent environment mutation.

Using os.environ.setdefault() in field default factories writes to the global process environment on every DistributedConfig() instantiation, which can leak defaults into child processes and break test isolation.

🛠️ Proposed fix
-    rank: int = field(default_factory=lambda: int(os.environ.setdefault("RANK", "0")))
-    local_rank: int = field(default_factory=lambda: int(os.environ.setdefault("LOCAL_RANK", "0")))
-    world_size: int = field(default_factory=lambda: int(os.environ.setdefault("WORLD_SIZE", "1")))
-    _master_addr: str = field(default_factory=lambda: os.environ.setdefault("MASTER_ADDR", "localhost"))
-    _master_port: str = field(default_factory=lambda: os.environ.setdefault("MASTER_PORT", "12355"))
+    rank: int = field(default_factory=lambda: int(os.getenv("RANK", "0")))
+    local_rank: int = field(default_factory=lambda: int(os.getenv("LOCAL_RANK", "0")))
+    world_size: int = field(default_factory=lambda: int(os.getenv("WORLD_SIZE", "1")))
+    _master_addr: str = field(default_factory=lambda: os.getenv("MASTER_ADDR", "localhost"))
+    _master_port: str = field(default_factory=lambda: os.getenv("MASTER_PORT", "12355"))

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py` around lines
36 - 40, The field default factories in DistributedConfig (rank, local_rank,
world_size, _master_addr, _master_port) currently call
os.environ.setdefault(...) which mutates the global environment; change each
lambda to use os.getenv(...) (or os.environ.get(...)) instead so defaults are
read without writing to os.environ, e.g. replace the default_factory lambdas for
rank, local_rank, world_size, _master_addr and _master_port to call os.getenv/
.get with the same default strings.
bionemo-recipes/models/codonfm/requirements.txt (1)

1-3: ⚠️ Potential issue | 🔴 Critical

Fix invalid transformer_engine>=1.14.0 constraint.

Version 1.14.0 of transformer_engine does not exist on PyPI. The version progression goes from 1.13.0 directly to 2.x. Given the FP8/FP4 quantization features used in this codebase, update to >=2.1.0.

🛠️ Proposed fix
 torch>=2.6.0
-transformer_engine>=1.14.0
+transformer_engine>=2.1.0
 transformers>=4.48.0

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/requirements.txt` around lines 1 - 3, Update
the invalid dependency constraint for transformer_engine in requirements.txt:
replace the non-existent "transformer_engine>=1.14.0" with
"transformer_engine>=2.1.0" so the project uses a PyPI-available release that
includes the FP8/FP4 quantization features referenced by the code; ensure the
line in requirements.txt containing transformer_engine is edited accordingly and
keep the existing torch and transformers lines unchanged.
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py (1)

196-197: ⚠️ Potential issue | 🟡 Minor

Race condition risk when pruning checkpoints.

prune_checkpoints is called only on the main process, but other ranks may still be writing to checkpoint directories. The barrier at line 187 synchronizes before saving, but there's no barrier after saving completes. For async saves, this is especially risky since the save may not be complete when pruning runs.

Consider adding a barrier after the save completes (for sync saves) or documenting that async_save and max_checkpoints should not be used together.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` around lines 196 -
197, Race condition: pruning (prune_checkpoints) runs only on the main rank
(dist_config.is_main_process()) while other ranks may still be saving; add a
synchronization point after all ranks finish saving or disallow async saves with
max_checkpoints. Specifically, after the checkpoint save completes (the code
path that calls save_checkpoint / async_save), insert a distributed barrier (use
the existing dist_config barrier primitive) so all ranks have finished writing
before the main rank calls prune_checkpoints(ckpt_path, max_checkpoints);
alternatively, if async_save is used, validate and raise or log an error when
max_checkpoints is set to prevent pruning during in-progress async saves and
document this constraint in the function that handles checkpoint saving and
pruning.
bionemo-recipes/models/codonfm/tests/conftest.py (2)

39-45: ⚠️ Potential issue | 🟡 Minor

Bind the probe socket to loopback only.

This helper only needs a local rendezvous port. s.bind(("", 0)) listens on every interface and is the reason CodeQL flagged this line; binding to 127.0.0.1 keeps the fixture local without changing its contract.

🔒 Minimal change
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/conftest.py` around lines 39 - 45, The
probe socket in unused_tcp_port currently binds to all interfaces (s.bind(("",
0))) which is flagged by CodeQL; change the bind call in the unused_tcp_port
function to bind only to the loopback interface (e.g., s.bind(("127.0.0.1", 0)))
so the fixture remains local while keeping the same behavior of returning an
unused TCP port.

48-58: ⚠️ Potential issue | 🟡 Minor

The orig restore does not undo the reload.

importlib.reload() re-executes the existing module object in place. Here orig and transformer_engine.pytorch are the same object, so reassigning sys.modules["transformer_engine.pytorch"] = orig after yield leaves the debug-loaded module in place. Restore the env vars first and reload again in teardown.

♻️ Minimal cleanup
 `@pytest.fixture`(autouse=True)
-def use_te_debug(monkeypatch):
-    monkeypatch.setenv("NVTE_DEBUG", "1")
-    monkeypatch.setenv("NVTE_DEBUG_LEVEL", "2")
-    orig = sys.modules.get("transformer_engine.pytorch")
-    importlib.reload(transformer_engine.pytorch)
-    yield
-    if orig is not None:
-        sys.modules["transformer_engine.pytorch"] = orig
-    else:
-        importlib.reload(transformer_engine.pytorch)
+def use_te_debug():
+    prev_debug = os.environ.get("NVTE_DEBUG")
+    prev_level = os.environ.get("NVTE_DEBUG_LEVEL")
+    os.environ["NVTE_DEBUG"] = "1"
+    os.environ["NVTE_DEBUG_LEVEL"] = "2"
+    importlib.reload(transformer_engine.pytorch)
+    try:
+        yield
+    finally:
+        if prev_debug is None:
+            os.environ.pop("NVTE_DEBUG", None)
+        else:
+            os.environ["NVTE_DEBUG"] = prev_debug
+        if prev_level is None:
+            os.environ.pop("NVTE_DEBUG_LEVEL", None)
+        else:
+            os.environ["NVTE_DEBUG_LEVEL"] = prev_level
+        importlib.reload(transformer_engine.pytorch)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/conftest.py` around lines 48 - 58, The
teardown must restore the original environment variables before reloading the
transformer_engine.pytorch module because orig is a reference to the same module
object and assigning it back doesn't undo the reload; change the fixture
use_te_debug to capture the original NVTE_DEBUG and NVTE_DEBUG_LEVEL (or their
absence) and the orig module reference, set the env vars and call
importlib.reload(transformer_engine.pytorch) to enable debug, then in the
teardown restore the env vars to their previous values (using monkeypatch.setenv
or monkeypatch.delenv as appropriate) and finally call
importlib.reload(transformer_engine.pytorch) (if orig was present) or delete
sys.modules["transformer_engine.pytorch"] (if orig was None) instead of
reassigning sys.modules to orig.
bionemo-recipes/recipes/codonfm_native_te/requirements.txt (1)

1-13: ⚠️ Potential issue | 🟠 Major

Add version floors for the core ML stack.

transformers is listed now, but the recipe is still completely unbounded. This code relies on newer Torch/TransformerEngine/Transformers APIs, so a fresh install can resolve to combinations the recipe here does not support. Please mirror the minimum versions from bionemo-recipes/models/codonfm/requirements.txt for at least torch, transformer_engine[pytorch], transformers, pandas, and pyarrow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt` around lines 1 -
13, The requirements.txt has unpinned core ML packages; update it to include
minimum version floors for torch, transformer_engine[pytorch], transformers,
pandas, and pyarrow to match the floors used in
bionemo-recipes/models/codonfm/requirements.txt so the recipe uses compatible
APIs; locate the package entries (torch, transformer_engine[pytorch],
transformers, pandas, pyarrow) in
bionemo-recipes/recipes/codonfm_native_te/requirements.txt and replace them with
the same ">=<version>" specifiers found in
bionemo-recipes/models/codonfm/requirements.txt.
bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml (1)

42-44: ⚠️ Potential issue | 🟠 Major

Keep the scheduler horizon tied to num_train_steps.

The training loop exits on args.num_train_steps, so overriding only the top-level step count leaves the LR schedule on a 500k-step curve. Interpolate lr_scheduler_kwargs.num_training_steps from the top-level value instead of hard-coding it.

Suggested fix
 lr_scheduler_kwargs:
   num_warmup_steps: 2_000
-  num_training_steps: 500_000
+  num_training_steps: ${num_train_steps}

As per coding guidelines, **/bionemo-recipes/recipes/**/*.yaml: Use Hydra configs for recipe configuration in bionemo-recipes/recipes/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml` around
lines 42 - 44, The LR scheduler horizon is hard-coded to 500_000; change
lr_scheduler_kwargs.num_training_steps to interpolate from the top-level
training steps so it tracks the real exit condition (e.g., replace the literal
500_000 with a Hydra interpolation referencing the top-level num_train_steps),
keeping num_warmup_steps as-is; update the lr_scheduler_kwargs block (look for
lr_scheduler_kwargs, num_warmup_steps, num_training_steps) to use the
interpolated value instead of the hard-coded number.
bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py (1)

322-325: ⚠️ Potential issue | 🟠 Major

Build RoPE from the active batch length, not the config max.

In BSHD this over-allocates every forward; in THD it becomes wrong once the packed token count exceeds max_position_embeddings, because later packed tokens never get matching rotary positions. Derive max_seq_len from the current input and land the fix in bionemo-recipes/models/codonfm/modeling_codonfm_te.py before syncing this copy.

Suggested fix
         with torch.autocast(device_type="cuda", enabled=False):
-            te_rope_emb = self.rotary_embeddings(max_seq_len=self.config.max_position_embeddings)
+            if self.config.attn_input_format == "bshd":
+                max_seq_len = hidden_states.shape[1]
+            else:
+                max_seq_len = int(kwargs["cu_seq_lens_q"][-1])
+            te_rope_emb = self.rotary_embeddings(max_seq_len=max_seq_len)
             te_rope_emb = te_rope_emb.to(hidden_states.device, non_blocking=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py` around
lines 322 - 325, The rotary positional embedding is being built using the config
max_position_embeddings which over-allocates and breaks packed sequences;
instead compute max_seq_len from the active input length (e.g., seq_len =
hidden_states.size(1) or hidden_states.shape[1]) and pass that into
self.rotary_embeddings (replace the existing
self.rotary_embeddings(max_seq_len=self.config.max_position_embeddings) call),
then keep the .to(hidden_states.device, non_blocking=True) call; update the
torch.autocast block around that code in the method where hidden_states is
available (the block containing self.rotary_embeddings and te_rope_emb) so RoPE
is sized to the current batch.
bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py (1)

224-231: ⚠️ Potential issue | 🟠 Major

Final model export is missing tokenizer metadata.

save_final_model_fsdp2() writes model.safetensors and config.json, but CodonTokenizer.from_pretrained() requires tokenizer_config.json. The resulting final_model directory won't be self-contained for inference.

🐛 Suggested fix to include tokenizer
         # Save final model
         if args.checkpoint.save_final_model and ckpt_path:
+            tokenizer = CodonTokenizer()
             save_final_model_fsdp2(
                 model=model,
                 config=config,
                 save_directory=ckpt_path / "final_model",
                 dist_config=dist_config,
             )
+            if dist_config.is_main_process():
+                tokenizer.save_pretrained(ckpt_path / "final_model")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py` around lines 224 -
231, The final model export written by save_final_model_fsdp2() only emits
model.safetensors and config.json, but CodonTokenizer.from_pretrained() expects
tokenizer files (e.g., tokenizer_config.json and vocab/merges or
tokenizer.json); update save_final_model_fsdp2 to also load the tokenizer used
during training and write its artifacts into the save_directory (use the
tokenizer.save_pretrained or explicitly write tokenizer_config.json and any
vocab files) so the final_model directory is self-contained and can be loaded by
CodonTokenizer.from_pretrained; reference save_final_model_fsdp2,
CodonTokenizer.from_pretrained, and the save_directory/ckpt_path "final_model"
target when implementing this change.
bionemo-recipes/recipes/codonfm_native_te/dataset.py (1)

66-79: ⚠️ Potential issue | 🟠 Major

pq.read_table() is still eager here.

Switching away from pandas removes one copy, but pq.read_table(path, columns=["sequence"], memory_map=True) still materializes the whole sequence column in __init__. Every rank will preload the full dataset before training starts, so the original host-memory scaling problem is still present. A lazy ParquetFile / dataset scanner is needed to actually fix that.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py` around lines 66 - 79,
The constructor currently eagerly materializes the entire column via
pq.read_table; change __init__ to open a pyarrow.parquet.ParquetFile (store as
self._pqfile) and precompute metadata like total rows (self._num_rows) and
row-group row counts so you can map an index to a row-group and local offset;
update __len__ to return self._num_rows; change __getitem__ to lazily read only
the needed row (use self._pqfile.read_row_group(row_group, columns=["sequence"])
or read with slice/batch for that row-group and extract the single sequence)
instead of using self._table/_sequences, so no full-column materialization
occurs and memory stays constant per process.
🧹 Nitpick comments (13)
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py (1)

189-191: Potential None assignment when async_save=False.

dcp_save (synchronous) returns None, while dcp_async_save returns a future. Assigning the result unconditionally to _ckpt_futures["fsdp2"] means subsequent code checking for a pending future will find None (which is fine), but the pattern is confusing and could lead to bugs if the check logic changes.

♻️ Clearer approach
-    ckpt_save_func = dcp_async_save if async_save else dcp_save
-    _ckpt_futures["fsdp2"] = ckpt_save_func(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    if async_save:
+        _ckpt_futures["fsdp2"] = dcp_async_save(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    else:
+        dcp_save(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+        _ckpt_futures["fsdp2"] = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` around lines 189 -
191, The current assignment stores the return of dcp_save or dcp_async_save into
_ckpt_futures["fsdp2"], but dcp_save is synchronous and returns None which makes
the stored value ambiguous; change the logic so that when async_save is True you
assign the future returned by dcp_async_save to _ckpt_futures["fsdp2"], and when
async_save is False you either remove any existing _ckpt_futures["fsdp2"] key or
explicitly set it to a sentinel (e.g., False or not present) to avoid storing
None; update the block that calls dcp_save/dcp_async_save (refer to the
variables async_save, dcp_save, dcp_async_save, _ckpt_futures["fsdp2"], and
checkpoint_path) to use this explicit branch so later checks for a pending
future are unambiguous.
bionemo-recipes/recipes/codonfm_native_te/tokenizer.py (1)

86-91: Minor: Remove redundant length check in tokenize().

The if len(codon) == 3 check is redundant. Given the loop range(0, len(sequence) - 2, 3), the slice sequence[i:i+3] will always produce exactly 3 characters for all valid loop iterations.

♻️ Proposed simplification
     def tokenize(self, sequence: str) -> list[str]:
         sequence = sequence.upper()
-        tokens = []
-        for i in range(0, len(sequence) - 2, 3):
-            codon = sequence[i : i + 3]
-            if len(codon) == 3:
-                tokens.append(codon)
-        return tokens
+        return [sequence[i : i + 3] for i in range(0, len(sequence) - 2, 3)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tokenizer.py` around lines 86 - 91,
The redundant length check in tokenize() should be removed: inside the loop that
iterates with range(0, len(sequence) - 2, 3) simply append the sliced codon to
tokens without the if len(codon) == 3 guard; update the body of tokenize() to
directly do tokens.append(codon) for each codon (references: function tokenize,
local variables tokens and codon, and the for loop using range(0, len(sequence)
- 2, 3)).
bionemo-recipes/recipes/codonfm_native_te/scheduler.py (1)

19-24: Add type hint for optimizer parameter.

The optimizer parameter is missing a type annotation, which would improve IDE support and type checking consistency.

♻️ Proposed fix
+from torch.optim import Optimizer
+
 def get_linear_schedule_with_warmup(
-    optimizer,
+    optimizer: Optimizer,
     num_warmup_steps=2_000,
     num_training_steps=500_000,
     last_epoch=-1,
-):
+) -> LambdaLR:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py` around lines 19 - 24,
The get_linear_schedule_with_warmup function is missing a type annotation for
the optimizer parameter; update its signature to annotate optimizer as
torch.optim.Optimizer (or Optimizer if you add "from torch.optim import
Optimizer" at the top) so IDEs and type checkers recognize it, e.g. def
get_linear_schedule_with_warmup(optimizer: torch.optim.Optimizer, ...), and add
the corresponding import if not already present.
bionemo-recipes/recipes/codonfm_native_te/perf_logger.py (1)

36-37: Hardcoded PAD_TOKEN_ID may diverge from tokenizer.

The constant PAD_TOKEN_ID = 3 is hardcoded here but should match CodonTokenizer.pad_token_id. If the tokenizer's special token ordering ever changes, this would silently break.

♻️ Import from tokenizer or document the coupling
+# PAD_TOKEN_ID must match CodonTokenizer().pad_token_id
 PAD_TOKEN_ID = 3

Or better, pass the pad token ID as a parameter to PerfLogger.__init__.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py` around lines 36 -
37, PAD_TOKEN_ID is hardcoded (PAD_TOKEN_ID = 3) and can drift from the
tokenizer; update PerfLogger to accept the pad token id from the tokenizer
instead of a hardcoded constant: remove or stop using PAD_TOKEN_ID, add a
pad_token_id parameter to PerfLogger.__init__ (or read
CodonTokenizer.pad_token_id where PerfLogger is constructed) and propagate that
value into any methods that reference the pad id so the logger always uses
CodonTokenizer.pad_token_id rather than a literal 3.
bionemo-recipes/models/codonfm/tokenizer.py (2)

147-164: from_pretrained doesn't restore codons list.

The from_pretrained method restores encoder/decoder but leaves self.codons with the default generated codons. While this doesn't break functionality (since encoding/decoding use encoder/decoder), it could cause subtle bugs if code later iterates over self.codons.

♻️ Restore codons from encoder
     `@classmethod`
     def from_pretrained(cls, load_directory: str) -> "CodonTokenizer":
         with open(os.path.join(load_directory, TOKENIZER_CONFIG_FILE)) as f:
             config = json.load(f)

         tokenizer = cls(seq_type=config["seq_type"])
         # Restore the exact encoder/decoder from the saved config to guarantee round-trip fidelity.
         tokenizer.encoder = {tok: int(i) for tok, i in config["encoder"].items()}
         tokenizer.decoder = {i: tok for tok, i in tokenizer.encoder.items()}
+        # Restore codons list from encoder (tokens not in special tokens)
+        special_set = set(tokenizer.SPECIAL_TOKENS)
+        tokenizer.codons = [
+            tok for tok, _ in sorted(tokenizer.encoder.items(), key=lambda x: x[1])
+            if tok not in special_set
+        ]
         return tokenizer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tokenizer.py` around lines 147 - 164, The
from_pretrained method restores encoder/decoder but leaves self.codons as the
default; update from_pretrained (in class CodonTokenizer) to reconstruct
tokenizer.codons from the saved encoder mapping (config["encoder"]) by ordering
tokens by their integer id (i.e., create a list where index == token id), then
assign that list to tokenizer.codons before returning; keep the existing
restoration of tokenizer.encoder and tokenizer.decoder.

114-130: Decode uses class constant instead of instance state for special token IDs.

The decode method computes special_ids from self.SPECIAL_TOKENS (class constant) but lookups use self.encoder (instance state). If from_pretrained ever loads a tokenizer with different special tokens, this could cause inconsistencies. However, since from_pretrained currently only restores the encoder without modifying SPECIAL_TOKENS, this works correctly.

♻️ Consider storing special_tokens as instance attribute for future-proofing
 class CodonTokenizer:
     SPECIAL_TOKENS: list[str] = ["<CLS>", "<SEP>", "<UNK>", "<PAD>", "<MASK>"]

     def __init__(self, seq_type="dna"):
         ...
+        self.special_tokens = list(self.SPECIAL_TOKENS)
         all_tokens = self.SPECIAL_TOKENS + self.codons
         ...

     def decode(self, ids: list[int], skip_special_tokens: bool = True) -> str:
-        special_ids = {self.encoder[tok] for tok in self.SPECIAL_TOKENS if tok in self.encoder}
+        special_ids = {self.encoder[tok] for tok in self.special_tokens if tok in self.encoder}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tokenizer.py` around lines 114 - 130, The
decode method computes special_ids from the class constant SPECIAL_TOKENS which
can diverge from instance state; change decode to use an instance attribute
(e.g., self.special_tokens) instead of self.SPECIAL_TOKENS, compute special_ids
from self.special_tokens and self.encoder, and ensure the tokenizer loader
(from_pretrained) restores/sets self.special_tokens when deserializing so
special token lookups remain consistent between encoder/decoder and special
token configuration.
bionemo-recipes/models/codonfm/dataset.py (1)

263-267: Collator seed is not passed from dataloader factory.

The seed parameter is passed to SyntheticCodonDataset and DistributedSampler but not to the collators. This means all collators use the default seed 42, which could cause identical masking patterns across different dataloader instances.

♻️ Pass seed to collators
     collator = CodonMLMCollator(
         tokenizer=tokenizer,
         max_seq_length=max_seq_length,
         mlm_probability=mlm_probability,
+        seed=seed,
     )

Also applies to: 318-322

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/dataset.py` around lines 263 - 267, The
collator is being instantiated without the dataset/dataloader seed, causing all
collators to use the default seed; update the CodonMLMCollator instantiations to
accept and use the same seed value passed to
SyntheticCodonDataset/DistributedSampler by adding the seed argument when
constructing collator (e.g., change CodonMLMCollator(tokenizer=...,
max_seq_length=..., mlm_probability=...) to include seed=seed), and make the
analogous update for the second collator creation later in the file so each
dataloader instance receives the intended seed.
bionemo-recipes/recipes/codonfm_native_te/quantization.py (2)

191-199: Validation allows empty list [] to silently disable quantization.

The validation at lines 192-199 only checks non-empty lists for invalid entries. An empty list [] passes validation but results in no layers being quantized. This is intentional per the test test_fp8_enabled_empty_list, but the docstring doesn't document this behavior clearly.

The past review suggested rejecting empty lists, but the current implementation treats [] as "explicitly no layers" which is a valid use case. Consider documenting this behavior.

📝 Document the empty list behavior in the docstring
     Args:
         num_layers: Total number of transformer layers in the model.
         fp8_enabled: Whether FP8 quantization is enabled.
         fp4_enabled: Whether FP4 quantization is enabled.
-        fp8_layers: 1-indexed list of layers for FP8, or None if not specified.
-        fp4_layers: 1-indexed list of layers for FP4, or None if not specified.
+        fp8_layers: 1-indexed list of layers for FP8. None means "use default"
+            (all layers when only FP8 is enabled, remaining layers otherwise).
+            An empty list ``[]`` explicitly means "no FP8 layers".
+        fp4_layers: 1-indexed list of layers for FP4. Same semantics as fp8_layers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py` around lines 191 -
199, Update the docstring for the quantization configuration (near the
fp8_layers/fp4_layers validation in quantization.py) to explicitly state that
passing an empty list ([]) for fp8_layers or fp4_layers is treated as
"explicitly no layers" and is allowed (it disables quantization for that dtype),
while non-empty lists must contain integers in 1..num_layers; reference the
existing test test_fp8_enabled_empty_list for expected behavior. Ensure the
docstring mentions the accepted types (None, list of ints, or empty list) and
the semantic difference between None (use defaults) and [] (disable
quantization).

101-109: Temp file is not cleaned up after use.

tempfile.NamedTemporaryFile(delete=False) creates a file that persists until explicitly deleted. While this is intentional (the file path is returned for later use), the caller should document or handle cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py` around lines 101 -
109, The temp file created with tempfile.NamedTemporaryFile(delete=False)
(temp_file / temp_file.name returned by this function) is left on disk; either
change the function to not persist the file (write to a temporary file with
delete=True or return the YAML string instead of returning temp_file.name) or
explicitly document/implement cleanup responsibility: add a clear docstring note
that the caller must unlink temp_file.name (or add an optional parameter like
cleanup=False that when True deletes the file before returning or registers an
atexit.unlink for temp_file.name). Ensure references to temp_file,
temp_file.name, yaml.dump and the return value are updated accordingly.
bionemo-recipes/models/codonfm/tests/generate_golden_values.py (1)

36-49: Consider documenting the path setup rationale.

The sys.path manipulation is necessary to import both PTL and native TE modules, but a brief comment explaining why this is needed would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py` around lines
36 - 49, Add a concise explanatory comment above the path setup block describing
why we modify sys.path and set CODON_FM_TE_IMPL: explain that we need to prepend
PTL_TE_DIR, PTL_TE_DIR/src, and NATIVE_TE_DIR to sys.path so tests can import
the PTL-based recipe model and the native codonfm model side-by-side, and note
that CODON_FM_TE_IMPL="nonexact" forces the non-exact TETransformerLayer
implementation for deterministic golden-value generation; place this comment
adjacent to the existing REPO_ROOT/PTL_TE_DIR/NATIVE_TE_DIR and os.environ lines
so future readers understand the intent.
bionemo-recipes/models/codonfm/tests/common/fixtures.py (2)

117-134: Environment variables not restored on fixture teardown in error cases.

If a test using te_attn_backend raises an exception before yielding, the cleanup code after yield won't run, leaving env vars and the backend flag in an inconsistent state.

♻️ Use try/finally for cleanup
 `@pytest.fixture`(params=["flash_attn", "fused_attn"])
 def te_attn_backend(request):
     """Fixture to parametrize the attention implementation."""
-    if request.param == "flash_attn":
-        os.environ["NVTE_FUSED_ATTN"] = "0"
-        os.environ["NVTE_FLASH_ATTN"] = "1"
-        _attention_backends["backend_selection_requires_update"] = True
-
-    else:
-        os.environ["NVTE_FUSED_ATTN"] = "1"
-        os.environ["NVTE_FLASH_ATTN"] = "0"
+    try:
+        if request.param == "flash_attn":
+            os.environ["NVTE_FUSED_ATTN"] = "0"
+            os.environ["NVTE_FLASH_ATTN"] = "1"
+        else:
+            os.environ["NVTE_FUSED_ATTN"] = "1"
+            os.environ["NVTE_FLASH_ATTN"] = "0"
         _attention_backends["backend_selection_requires_update"] = True
 
-    yield request.param
-
-    os.environ.pop("NVTE_FUSED_ATTN", None)
-    os.environ.pop("NVTE_FLASH_ATTN", None)
-    _attention_backends["backend_selection_requires_update"] = True
+        yield request.param
+    finally:
+        os.environ.pop("NVTE_FUSED_ATTN", None)
+        os.environ.pop("NVTE_FLASH_ATTN", None)
+        _attention_backends["backend_selection_requires_update"] = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py` around lines 117 -
134, The te_attn_backend fixture sets NVTE_FUSED_ATTN/NVTE_FLASH_ATTN and flips
_attention_backends["backend_selection_requires_update"] but if an exception
occurs before the yield the post-yield cleanup won't run; wrap the setup/yield
in a try/finally (or register a finalizer via request.addfinalizer) so you first
capture original values of os.environ.get("NVTE_FUSED_ATTN") and
os.environ.get("NVTE_FLASH_ATTN"), set the desired values based on request.param
inside the try, yield request.param, and in the finally restore the original env
values (pop if None) and reset
_attention_backends["backend_selection_requires_update"] = True to ensure
consistent state even on errors in te_attn_backend.

40-43: Socket binds to all interfaces.

The empty string "" binds the socket to all network interfaces. While this is acceptable for finding an ephemeral port in a test environment, CodeQL flagged it as a potential security concern.

♻️ Consider binding to localhost only
 def unused_tcp_port() -> int:
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
         s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
         return s.getsockname()[1]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py` around lines 40 -
43, The test helper currently binds the ephemeral socket to all interfaces via
s.bind(("", 0)); change the bind call to bind only to localhost (e.g.,
s.bind(("127.0.0.1", 0)) or ("localhost", 0)) while keeping the surrounding
socket.socket(...), s.setsockopt(...), and return s.getsockname()[1] logic
intact so the function still discovers an ephemeral port but only on the
loopback interface.
bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py (1)

185-189: Broaden the input_data type contract for THD metadata.

This base class annotates get_test_input_data() and compare_outputs() as Dict[str, torch.Tensor], but the current THD paths already rely on non-tensor entries like max_length_q, max_length_k, and pad_between_seqs. Keeping the narrower type hides Pyright signal and forces downstream workarounds. A union or TypedDict for the mixed tensor/int/bool payload would be more accurate.

🛠️ Proposed fix
-    ) -> Dict[str, torch.Tensor]:
+    ) -> Dict[str, torch.Tensor | int | bool]:
@@
-        input_data: Dict[str, torch.Tensor],
+        input_data: Dict[str, torch.Tensor | int | bool],

As per coding guidelines, **/*.py: Use Pyright for type checking.

Also applies to: 381-381

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py` around
lines 185 - 189, The type annotation for get_test_input_data and compare_outputs
is too narrow (Dict[str, torch.Tensor]) because THD paths include non-tensor
metadata (e.g., max_length_q, max_length_k, pad_between_seqs); update the type
contract to reflect a mixed payload by introducing a TypedDict or a Union type
(e.g., InputDataTypedDict with torch.Tensor fields and int/bool fields) and
replace occurrences of Dict[str, torch.Tensor] on get_test_input_data and
compare_outputs (and any other mentions like input_data in tests) with the new
type so Pyright correctly recognizes the int/bool metadata used by THD code
paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py`:
- Around line 109-125: Add upfront validation for self.attn_input_format in the
model initializer: ensure its value is one of the supported options (e.g.,
"bshd" or "thd") and raise a clear ValueError if not. Locate the validation
block near the existing checks (the same area that validates self.hidden_size,
self.hidden_act, and self.layer_precision) inside the ModelingCodonfmTe
initializer and add a check that inspects self.attn_input_format and raises a
descriptive error referencing the invalid value.
- Around line 447-466: The value parameter is being initialized with w_gain
derived from is_qk, which wrongly makes value share Q/K's gain; change the
initialization so query and key weights still use w_gain (as computed from
is_qk) but module.value_weight uses the scaled gain variable (gain) instead, and
ensure module.value_bias is zeroed when present; locate the block iterating
self.named_modules() (checking isinstance(module, (nn.Linear,
transformer_engine.pytorch.Linear))) and update the module.value_weight and
module.value_bias initialization to use gain rather than w_gain while leaving
query_weight and key_weight logic unchanged.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py`:
- Around line 28-30: The fixture sets NVTE_DEBUG after Transformer Engine
modules are imported (recipe_module, fp8, _attention_backends), so move the env
var setup to run before those imports: either set NVTE_DEBUG via
monkeypatch.setenv() and then import/reload transformer_engine modules inside
the fixture (use importlib.reload on recipe_module, fp8 module, and
transformer_engine.pytorch.attention.dot_product_attention) or delay the
original top-level imports by moving them into the fixture after the env is set;
apply the identical fix to the original copied test fixture so both files are in
sync.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py`:
- Around line 398-439: The guards using hasattr(te_outputs,
"...")/hasattr(hf_outputs, "...") are wrong because dataclass fields exist but
may be None; replace them with explicit None checks: e.g., for loss use checks
like compare_loss and getattr(te_outputs, "loss", None) is not None and
getattr(hf_outputs, "loss", None) is not None; for logits ensure
input_data["attention_mask"] handling still only indexes when te_outputs.logits
and hf_outputs.logits are not None; and for hidden states replace the hasattr
guards with te_outputs.hidden_states is not None and hf_outputs.hidden_states is
not None before calling len(...) or iterating in the loop to avoid len(None) and
assert_close(None, ...).

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py`:
- Line 287: The test is loading the golden checkpoint with
model.load_state_dict(golden_sd, strict=False) which hides missing/unexpected
keys; change each call (the one shown and the other two occurrences around the
file) to capture the return value like missing, unexpected =
model.load_state_dict(golden_sd, strict=False) and add assertions that missing
== [] and unexpected == [] (or use assert not missing and assert not unexpected)
so the regression test fails when the checkpoint doesn't fully match the model
architecture; apply this change to the call in test_modeling_codonfm_te (the
shown model.load_state_dict invocation) and the two other similar calls in the
file.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Line 27: The import in checkpoint.py should use a relative import to ensure
the package-local module is resolved; replace the bare "from distributed_config
import DistributedConfig" with a package-relative import (e.g. "from
.distributed_config import DistributedConfig") so the DistributedConfig symbol
is imported from the local module rather than relying on sys.path.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py`:
- Around line 118-122: The truncation currently calls tokenizer.encode(...,
add_special_tokens=True) then slices ids which can drop the terminal SEP;
instead call tokenizer.encode(..., add_special_tokens=False) or otherwise obtain
the raw codon token ids, truncate that raw list to self.max_seq_length -
N_reserved (reserve slots for CLS/SEP), then append the special tokens (CLS and
SEP) explicitly so SEP is preserved; update the loop that builds ids in the
batch processing (the for sample in batch block using tokenizer.encode and ids =
ids[: self.max_seq_length] and seq_len = len(ids)) and make the identical change
in the second collator loop (the other block similar to lines 191-194) so both
collators truncate raw codon IDs first and then add CLS/SEP.
- Around line 85-103: The collators use a hardcoded Random(42) and the
dataloader factories don't forward the seed, causing identical masking across
workers; update create_bshd_dataloader() and create_thd_dataloader() to accept
and pass the caller seed into CodonMLMCollator and CodonTHDCollator
constructors, remove reliance on a single class-level rng (e.g., the
Random(seed) stored in __init__), and inside CodonMLMCollator and
CodonTHDCollator derive a worker-local RNG for masking using
torch.utils.data.get_worker_info().seed (or torch.initial_seed() if
get_worker_info() is None) to seed a per-worker random.Random/torch.Generator so
each worker produces independent masking streams and worker respawns don't
replay the same masks.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md`:
- Around line 49-66: The fenced repository-tree code block (the triple-backtick
block showing the codonfm_native_te directory tree) is missing a language tag
and triggers MD040; update the opening fence to use a language tag by changing
``` to ```text so the tree is treated as plain text in the README's
codonfm_native_te section.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Around line 37-42: The device_mesh fixture currently constructs
DistributedConfig() which may default MASTER_PORT and collide; before calling
DistributedConfig() pick an ephemeral free loopback port (e.g., bind a temporary
socket to 127.0.0.1:0, get the assigned port, close the socket) and set
os.environ["MASTER_PORT"] to that port string, then instantiate
DistributedConfig(), create the torch.device and call
torch.distributed.init_process_group as before; update the device_mesh fixture
to perform this pre-seed so each test session uses a unique rendezvous port.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py`:
- Around line 49-56: The test composes the L0_sanity config but leaves
checkpoint.ckpt_dir null, risking reuse of stale checkpoints; update the
overrides passed to compose (the overrides list in the block that calls compose
for L0_sanity) to include "+checkpoint.ckpt_dir={tmp_path}" so the checkpoint
directory is explicitly set to the temporary path (consistent with other sanity
variants) and ensures test isolation when resume_from_checkpoint is true.
- Around line 81-94: The FP8 test test_sanity_convergence_fsdp2_fp8 currently
lacks a proper GPU capability guard; replace the imprecise major-version check
with the existing helper transformer_engine.pytorch.fp8.check_fp8_support() to
skip the test when FP8 isn't available (following the pattern used in
esm2_native_te/tests/test_train.py), i.e., import check_fp8_support and apply it
as a pytest skip/marker guard around test_sanity_convergence_fsdp2_fp8 so the
test runs only when check_fp8_support() returns True.

---

Duplicate comments:
In `@bionemo-recipes/models/codonfm/requirements.txt`:
- Around line 1-3: Update the invalid dependency constraint for
transformer_engine in requirements.txt: replace the non-existent
"transformer_engine>=1.14.0" with "transformer_engine>=2.1.0" so the project
uses a PyPI-available release that includes the FP8/FP4 quantization features
referenced by the code; ensure the line in requirements.txt containing
transformer_engine is edited accordingly and keep the existing torch and
transformers lines unchanged.

In `@bionemo-recipes/models/codonfm/tests/conftest.py`:
- Around line 39-45: The probe socket in unused_tcp_port currently binds to all
interfaces (s.bind(("", 0))) which is flagged by CodeQL; change the bind call in
the unused_tcp_port function to bind only to the loopback interface (e.g.,
s.bind(("127.0.0.1", 0))) so the fixture remains local while keeping the same
behavior of returning an unused TCP port.
- Around line 48-58: The teardown must restore the original environment
variables before reloading the transformer_engine.pytorch module because orig is
a reference to the same module object and assigning it back doesn't undo the
reload; change the fixture use_te_debug to capture the original NVTE_DEBUG and
NVTE_DEBUG_LEVEL (or their absence) and the orig module reference, set the env
vars and call importlib.reload(transformer_engine.pytorch) to enable debug, then
in the teardown restore the env vars to their previous values (using
monkeypatch.setenv or monkeypatch.delenv as appropriate) and finally call
importlib.reload(transformer_engine.pytorch) (if orig was present) or delete
sys.modules["transformer_engine.pytorch"] (if orig was None) instead of
reassigning sys.modules to orig.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 196-197: Race condition: pruning (prune_checkpoints) runs only on
the main rank (dist_config.is_main_process()) while other ranks may still be
saving; add a synchronization point after all ranks finish saving or disallow
async saves with max_checkpoints. Specifically, after the checkpoint save
completes (the code path that calls save_checkpoint / async_save), insert a
distributed barrier (use the existing dist_config barrier primitive) so all
ranks have finished writing before the main rank calls
prune_checkpoints(ckpt_path, max_checkpoints); alternatively, if async_save is
used, validate and raise or log an error when max_checkpoints is set to prevent
pruning during in-progress async saves and document this constraint in the
function that handles checkpoint saving and pruning.

In `@bionemo-recipes/recipes/codonfm_native_te/dataset.py`:
- Around line 66-79: The constructor currently eagerly materializes the entire
column via pq.read_table; change __init__ to open a pyarrow.parquet.ParquetFile
(store as self._pqfile) and precompute metadata like total rows (self._num_rows)
and row-group row counts so you can map an index to a row-group and local
offset; update __len__ to return self._num_rows; change __getitem__ to lazily
read only the needed row (use self._pqfile.read_row_group(row_group,
columns=["sequence"]) or read with slice/batch for that row-group and extract
the single sequence) instead of using self._table/_sequences, so no full-column
materialization occurs and memory stays constant per process.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Around line 36-40: The field default factories in DistributedConfig (rank,
local_rank, world_size, _master_addr, _master_port) currently call
os.environ.setdefault(...) which mutates the global environment; change each
lambda to use os.getenv(...) (or os.environ.get(...)) instead so defaults are
read without writing to os.environ, e.g. replace the default_factory lambdas for
rank, local_rank, world_size, _master_addr and _master_port to call os.getenv/
.get with the same default strings.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml`:
- Around line 42-44: The LR scheduler horizon is hard-coded to 500_000; change
lr_scheduler_kwargs.num_training_steps to interpolate from the top-level
training steps so it tracks the real exit condition (e.g., replace the literal
500_000 with a Hydra interpolation referencing the top-level num_train_steps),
keeping num_warmup_steps as-is; update the lr_scheduler_kwargs block (look for
lr_scheduler_kwargs, num_warmup_steps, num_training_steps) to use the
interpolated value instead of the hard-coded number.

In `@bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py`:
- Around line 322-325: The rotary positional embedding is being built using the
config max_position_embeddings which over-allocates and breaks packed sequences;
instead compute max_seq_len from the active input length (e.g., seq_len =
hidden_states.size(1) or hidden_states.shape[1]) and pass that into
self.rotary_embeddings (replace the existing
self.rotary_embeddings(max_seq_len=self.config.max_position_embeddings) call),
then keep the .to(hidden_states.device, non_blocking=True) call; update the
torch.autocast block around that code in the method where hidden_states is
available (the block containing self.rotary_embeddings and te_rope_emb) so RoPE
is sized to the current batch.

In `@bionemo-recipes/recipes/codonfm_native_te/requirements.txt`:
- Around line 1-13: The requirements.txt has unpinned core ML packages; update
it to include minimum version floors for torch, transformer_engine[pytorch],
transformers, pandas, and pyarrow to match the floors used in
bionemo-recipes/models/codonfm/requirements.txt so the recipe uses compatible
APIs; locate the package entries (torch, transformer_engine[pytorch],
transformers, pandas, pyarrow) in
bionemo-recipes/recipes/codonfm_native_te/requirements.txt and replace them with
the same ">=<version>" specifiers found in
bionemo-recipes/models/codonfm/requirements.txt.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 224-231: The final model export written by
save_final_model_fsdp2() only emits model.safetensors and config.json, but
CodonTokenizer.from_pretrained() expects tokenizer files (e.g.,
tokenizer_config.json and vocab/merges or tokenizer.json); update
save_final_model_fsdp2 to also load the tokenizer used during training and write
its artifacts into the save_directory (use the tokenizer.save_pretrained or
explicitly write tokenizer_config.json and any vocab files) so the final_model
directory is self-contained and can be loaded by CodonTokenizer.from_pretrained;
reference save_final_model_fsdp2, CodonTokenizer.from_pretrained, and the
save_directory/ckpt_path "final_model" target when implementing this change.

---

Nitpick comments:
In `@bionemo-recipes/models/codonfm/dataset.py`:
- Around line 263-267: The collator is being instantiated without the
dataset/dataloader seed, causing all collators to use the default seed; update
the CodonMLMCollator instantiations to accept and use the same seed value passed
to SyntheticCodonDataset/DistributedSampler by adding the seed argument when
constructing collator (e.g., change CodonMLMCollator(tokenizer=...,
max_seq_length=..., mlm_probability=...) to include seed=seed), and make the
analogous update for the second collator creation later in the file so each
dataloader instance receives the intended seed.

In `@bionemo-recipes/models/codonfm/tests/common/fixtures.py`:
- Around line 117-134: The te_attn_backend fixture sets
NVTE_FUSED_ATTN/NVTE_FLASH_ATTN and flips
_attention_backends["backend_selection_requires_update"] but if an exception
occurs before the yield the post-yield cleanup won't run; wrap the setup/yield
in a try/finally (or register a finalizer via request.addfinalizer) so you first
capture original values of os.environ.get("NVTE_FUSED_ATTN") and
os.environ.get("NVTE_FLASH_ATTN"), set the desired values based on request.param
inside the try, yield request.param, and in the finally restore the original env
values (pop if None) and reset
_attention_backends["backend_selection_requires_update"] = True to ensure
consistent state even on errors in te_attn_backend.
- Around line 40-43: The test helper currently binds the ephemeral socket to all
interfaces via s.bind(("", 0)); change the bind call to bind only to localhost
(e.g., s.bind(("127.0.0.1", 0)) or ("localhost", 0)) while keeping the
surrounding socket.socket(...), s.setsockopt(...), and return s.getsockname()[1]
logic intact so the function still discovers an ephemeral port but only on the
loopback interface.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py`:
- Around line 185-189: The type annotation for get_test_input_data and
compare_outputs is too narrow (Dict[str, torch.Tensor]) because THD paths
include non-tensor metadata (e.g., max_length_q, max_length_k,
pad_between_seqs); update the type contract to reflect a mixed payload by
introducing a TypedDict or a Union type (e.g., InputDataTypedDict with
torch.Tensor fields and int/bool fields) and replace occurrences of Dict[str,
torch.Tensor] on get_test_input_data and compare_outputs (and any other mentions
like input_data in tests) with the new type so Pyright correctly recognizes the
int/bool metadata used by THD code paths.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py`:
- Around line 36-49: Add a concise explanatory comment above the path setup
block describing why we modify sys.path and set CODON_FM_TE_IMPL: explain that
we need to prepend PTL_TE_DIR, PTL_TE_DIR/src, and NATIVE_TE_DIR to sys.path so
tests can import the PTL-based recipe model and the native codonfm model
side-by-side, and note that CODON_FM_TE_IMPL="nonexact" forces the non-exact
TETransformerLayer implementation for deterministic golden-value generation;
place this comment adjacent to the existing REPO_ROOT/PTL_TE_DIR/NATIVE_TE_DIR
and os.environ lines so future readers understand the intent.

In `@bionemo-recipes/models/codonfm/tokenizer.py`:
- Around line 147-164: The from_pretrained method restores encoder/decoder but
leaves self.codons as the default; update from_pretrained (in class
CodonTokenizer) to reconstruct tokenizer.codons from the saved encoder mapping
(config["encoder"]) by ordering tokens by their integer id (i.e., create a list
where index == token id), then assign that list to tokenizer.codons before
returning; keep the existing restoration of tokenizer.encoder and
tokenizer.decoder.
- Around line 114-130: The decode method computes special_ids from the class
constant SPECIAL_TOKENS which can diverge from instance state; change decode to
use an instance attribute (e.g., self.special_tokens) instead of
self.SPECIAL_TOKENS, compute special_ids from self.special_tokens and
self.encoder, and ensure the tokenizer loader (from_pretrained) restores/sets
self.special_tokens when deserializing so special token lookups remain
consistent between encoder/decoder and special token configuration.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 189-191: The current assignment stores the return of dcp_save or
dcp_async_save into _ckpt_futures["fsdp2"], but dcp_save is synchronous and
returns None which makes the stored value ambiguous; change the logic so that
when async_save is True you assign the future returned by dcp_async_save to
_ckpt_futures["fsdp2"], and when async_save is False you either remove any
existing _ckpt_futures["fsdp2"] key or explicitly set it to a sentinel (e.g.,
False or not present) to avoid storing None; update the block that calls
dcp_save/dcp_async_save (refer to the variables async_save, dcp_save,
dcp_async_save, _ckpt_futures["fsdp2"], and checkpoint_path) to use this
explicit branch so later checks for a pending future are unambiguous.

In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py`:
- Around line 36-37: PAD_TOKEN_ID is hardcoded (PAD_TOKEN_ID = 3) and can drift
from the tokenizer; update PerfLogger to accept the pad token id from the
tokenizer instead of a hardcoded constant: remove or stop using PAD_TOKEN_ID,
add a pad_token_id parameter to PerfLogger.__init__ (or read
CodonTokenizer.pad_token_id where PerfLogger is constructed) and propagate that
value into any methods that reference the pad id so the logger always uses
CodonTokenizer.pad_token_id rather than a literal 3.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py`:
- Around line 191-199: Update the docstring for the quantization configuration
(near the fp8_layers/fp4_layers validation in quantization.py) to explicitly
state that passing an empty list ([]) for fp8_layers or fp4_layers is treated as
"explicitly no layers" and is allowed (it disables quantization for that dtype),
while non-empty lists must contain integers in 1..num_layers; reference the
existing test test_fp8_enabled_empty_list for expected behavior. Ensure the
docstring mentions the accepted types (None, list of ints, or empty list) and
the semantic difference between None (use defaults) and [] (disable
quantization).
- Around line 101-109: The temp file created with
tempfile.NamedTemporaryFile(delete=False) (temp_file / temp_file.name returned
by this function) is left on disk; either change the function to not persist the
file (write to a temporary file with delete=True or return the YAML string
instead of returning temp_file.name) or explicitly document/implement cleanup
responsibility: add a clear docstring note that the caller must unlink
temp_file.name (or add an optional parameter like cleanup=False that when True
deletes the file before returning or registers an atexit.unlink for
temp_file.name). Ensure references to temp_file, temp_file.name, yaml.dump and
the return value are updated accordingly.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py`:
- Around line 19-24: The get_linear_schedule_with_warmup function is missing a
type annotation for the optimizer parameter; update its signature to annotate
optimizer as torch.optim.Optimizer (or Optimizer if you add "from torch.optim
import Optimizer" at the top) so IDEs and type checkers recognize it, e.g. def
get_linear_schedule_with_warmup(optimizer: torch.optim.Optimizer, ...), and add
the corresponding import if not already present.

In `@bionemo-recipes/recipes/codonfm_native_te/tokenizer.py`:
- Around line 86-91: The redundant length check in tokenize() should be removed:
inside the loop that iterates with range(0, len(sequence) - 2, 3) simply append
the sliced codon to tokens without the if len(codon) == 3 guard; update the body
of tokenize() to directly do tokens.append(codon) for each codon (references:
function tokenize, local variables tokens and codon, and the for loop using
range(0, len(sequence) - 2, 3)).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36a2c98d-c4b7-4c5a-a41c-dd19dde3fdce

📥 Commits

Reviewing files that changed from the base of the PR and between f437be0 and 2c6df67.

⛔ Files ignored due to path filters (1)
  • bionemo-recipes/recipes/codonfm_native_te/train.parquet is excluded by !**/*.parquet
📒 Files selected for processing (40)
  • .gitignore
  • bionemo-recipes/models/codonfm/dataset.py
  • bionemo-recipes/models/codonfm/modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/requirements.txt
  • bionemo-recipes/models/codonfm/tests/common/README.md
  • bionemo-recipes/models/codonfm/tests/common/__init__.py
  • bionemo-recipes/models/codonfm/tests/common/fixtures.py
  • bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
  • bionemo-recipes/models/codonfm/tests/conftest.py
  • bionemo-recipes/models/codonfm/tests/generate_golden_values.py
  • bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
  • bionemo-recipes/models/codonfm/tests/golden_values.json
  • bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/tokenizer.py
  • bionemo-recipes/models/esm2/tests/common/test_modeling_common.py
  • bionemo-recipes/models/llama3/tests/common/test_modeling_common.py
  • bionemo-recipes/models/mixtral/tests/common/test_modeling_common.py
  • bionemo-recipes/models/qwen/tests/common/test_modeling_common.py
  • bionemo-recipes/recipes/codonfm_native_te/README.md
  • bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
  • bionemo-recipes/recipes/codonfm_native_te/dataset.py
  • bionemo-recipes/recipes/codonfm_native_te/distributed_config.py
  • bionemo-recipes/recipes/codonfm_native_te/fp4_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/L0_sanity.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml
  • bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
  • bionemo-recipes/recipes/codonfm_native_te/perf_logger.py
  • bionemo-recipes/recipes/codonfm_native_te/quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/requirements.txt
  • bionemo-recipes/recipes/codonfm_native_te/scheduler.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/__init__.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
  • bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
  • ci/scripts/check_copied_files.py

Comment thread bionemo-recipes/models/codonfm/modeling_codonfm_te.py
Comment thread bionemo-recipes/models/codonfm/modeling_codonfm_te.py
Comment thread bionemo-recipes/models/codonfm/tests/common/fixtures.py
Comment thread bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/dataset.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/README.md
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

@jomitchellnv Sure! I'll kick off a full re-review of the PR now.

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Comments resolved. Auto-approval is disabled; enable reviews.request_changes_workflow to approve automatically.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (21)
bionemo-recipes/models/codonfm/requirements.txt (1)

2-2: ⚠️ Potential issue | 🔴 Critical

Use a published Transformer Engine minimum.

transformer_engine>=1.14.0 is not satisfiable from PyPI: the published release history goes from 1.13.0 to 2.1.0, so there is no 1.14.0 release for pip to install. Pick a released floor on the API line you actually support (>=1.13.0 for the last 1.x line or >=2.1.0 for 2.x), otherwise clean installs fail before the CodonFM model or recipe tests even start. (pypi.org)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/requirements.txt` at line 2, The requirements
entry "transformer_engine>=1.14.0" in requirements.txt is invalid because PyPI
has no 1.14.0 release; update that line to a published minimum you support (for
example "transformer_engine>=1.13.0" if you want the last 1.x line or
"transformer_engine>=2.1.0" for 2.x) so installs succeed; locate the package
line "transformer_engine>=1.14.0" in requirements.txt and replace it with the
chosen published floor.
bionemo-recipes/recipes/codonfm_native_te/distributed_config.py (1)

26-40: ⚠️ Potential issue | 🟠 Major

Stop mutating os.environ during config construction.

These default factories write fallback values back into the process environment instead of just reading launcher-provided state. os.environ is a live environment mapping, and dict.setdefault() inserts the default when the key is missing, so constructing DistributedConfig() has global side effects here. Use os.getenv() for these defaults, and update the wording at Line 28 so the docstring says the class falls back to single-process defaults rather than setting them. (docs.python.org)

🛠️ Suggested change
 `@dataclass`(frozen=True)
 class DistributedConfig:
     """Class to track distributed ranks and handle basic distributed training setup.
 
-    If torch distributed environment variables are not set, we set them to default values for single-process training.
+    If torch distributed environment variables are not set, we fall back to single-process defaults.
 
     Attributes:
         rank: The rank of the process.
         local_rank: The local rank of the process.
         world_size: The total number of processes.
     """
 
-    rank: int = field(default_factory=lambda: int(os.environ.setdefault("RANK", "0")))
-    local_rank: int = field(default_factory=lambda: int(os.environ.setdefault("LOCAL_RANK", "0")))
-    world_size: int = field(default_factory=lambda: int(os.environ.setdefault("WORLD_SIZE", "1")))
-    _master_addr: str = field(default_factory=lambda: os.environ.setdefault("MASTER_ADDR", "localhost"))
-    _master_port: str = field(default_factory=lambda: os.environ.setdefault("MASTER_PORT", "12355"))
+    rank: int = field(default_factory=lambda: int(os.getenv("RANK", "0")))
+    local_rank: int = field(default_factory=lambda: int(os.getenv("LOCAL_RANK", "0")))
+    world_size: int = field(default_factory=lambda: int(os.getenv("WORLD_SIZE", "1")))
+    _master_addr: str = field(default_factory=lambda: os.getenv("MASTER_ADDR", "localhost"))
+    _master_port: str = field(default_factory=lambda: os.getenv("MASTER_PORT", "12355"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py` around lines
26 - 40, The dataclass currently mutates os.environ via dict.setdefault in the
default_factories for rank, local_rank, world_size, _master_addr, and
_master_port; change those default_factory lambdas to use os.getenv(...,
default) so they only read environment values and do not write them, and update
the class docstring sentence that says the class "sets" torch distributed env
vars to instead state it "falls back to" single-process defaults (refer to
DistributedConfig and the fields rank, local_rank, world_size, _master_addr,
_master_port).
bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py (1)

37-44: ⚠️ Potential issue | 🟡 Minor

Pick a free MASTER_PORT to avoid test collisions.

DistributedConfig defaults to MASTER_PORT=12355. Concurrent pytest sessions on the same host can collide on the same NCCL rendezvous port, causing nondeterministic failures.

🛠️ Suggested fix using ephemeral port
+import os
+import socket
 import sys
 from pathlib import Path
...

 `@pytest.fixture`(scope="session", autouse=True)
 def device_mesh():
     """Create a re-usable device mesh for testing."""
+    if "MASTER_PORT" not in os.environ:
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            s.bind(("127.0.0.1", 0))
+            os.environ["MASTER_PORT"] = str(s.getsockname()[1])
     dist_config = DistributedConfig()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py` around lines 37
- 44, The fixture device_mesh can collide across concurrent test runs because
DistributedConfig defaults MASTER_PORT to 12355; reserve an ephemeral TCP port
and set MASTER_PORT before creating DistributedConfig / calling
torch.distributed.init_process_group so each test session uses a unique port.
Specifically, in the device_mesh fixture obtain a free port (bind to port 0 on
localhost to get the OS-assigned port), set os.environ["MASTER_PORT"] to that
port string before instantiating DistributedConfig or before calling
torch.distributed.init_process_group, then proceed with creating the device,
setting CUDA device, and calling init_device_mesh("cuda", mesh_shape=(1,),
mesh_dim_names=("dp",)); ensure the socket is closed after reading the assigned
port so the port is available for the process.
bionemo-recipes/models/codonfm/tests/common/__init__.py (1)

32-42: ⚠️ Potential issue | 🟡 Minor

Docstring example references ESM2 instead of CodonFM.

The example code mentions ESM2ModelTester and NVEsmForMaskedLM which are ESM2-specific. Since this is a copied file for CodonFM, consider updating the example to reference CodonFM classes for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/__init__.py` around lines 32 -
42, Update the docstring example to reference CodonFM-specific classes instead
of ESM2; replace the ESM2 identifiers (ESM2ModelTester, NVEsmForMaskedLM) with
the CodonFM equivalents used in this codebase (e.g., a CodonFMModelTester
subclass of BaseModelTest and the model class name used for CodonFM), ensuring
the example imports (from tests.common import BaseModelTest, TestTolerances) and
method get_model_class remain consistent with the CodonFM test helpers.
bionemo-recipes/recipes/codonfm_native_te/README.md (1)

49-66: ⚠️ Potential issue | 🟡 Minor

Add language tag to fenced code block.

The directory tree block uses bare triple backticks, which triggers MD040 (fenced-code-language) in pre-commit checks.

🔧 Add `text` language tag
-```
+```text
 codonfm_native_te/
 ├── modeling_codonfm_te.py    — HF-compatible CodonFM model with TE layers
 ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md` around lines 49 - 66,
The fenced code block containing the directory tree in the README (the block
that starts with the triple backticks showing codonfm_native_te/ and its files)
is missing a language tag and triggers MD040; update that opening fence from ```
to ```text so the block becomes a text-coded fenced block (i.e., add the "text"
language tag to the directory-tree code fence in README.md).
bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py (2)

47-59: ⚠️ Potential issue | 🟡 Minor

Add checkpoint directory override for test isolation.

This test disables meta device but doesn't set checkpoint.ckpt_dir, unlike other tests. With resume_from_checkpoint: true in defaults, this could resume from a stale checkpoint instead of testing a fresh non-meta-device initialization.

🔧 Suggested fix
     with initialize_config_dir(config_dir=str(recipe_path / "hydra_config"), version_base="1.2"):
         sanity_config = compose(
             config_name="L0_sanity",
             overrides=[
                 f"+wandb_init_args.dir={tmp_path}",
+                f"checkpoint.ckpt_dir={tmp_path}",
                 "use_meta_device=false",
             ],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py` around lines
47 - 59, The test test_sanity_convergence_fsdp2_no_meta_device can accidentally
resume from an existing checkpoint because it doesn't override
checkpoint.ckpt_dir while defaults include resume_from_checkpoint=true; update
the compose overrides in the test (the compose call inside
test_sanity_convergence_fsdp2_no_meta_device) to set a per-test checkpoint
directory (e.g. add an override that points checkpoint.ckpt_dir to tmp_path or a
tmp subdir) so the run is isolated and cannot pick up stale checkpoints.

81-94: ⚠️ Potential issue | 🟠 Major

Add GPU capability guard for FP8 test.

FP8 requires compute capability 8.9+ (Ada Lovelace and newer). Without a guard, this test will fail on unsupported hardware instead of being skipped.

🔧 Suggested fix using TransformerEngine's check_fp8_support
+from transformer_engine.pytorch.fp8 import check_fp8_support
+
+requires_fp8 = pytest.mark.skipif(
+    not torch.cuda.is_available() or not check_fp8_support()[0],
+    reason="FP8 requires compute capability 8.9+",
+)
+
+@requires_fp8
 def test_sanity_convergence_fsdp2_fp8(tmp_path, recipe_path):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py` around lines
81 - 94, Add a GPU capability guard to the test_sanity_convergence_fsdp2_fp8
function so it skips on hardware that doesn't support FP8: call
TransformerEngine.check_fp8_support() (or equivalent) at the start of
test_sanity_convergence_fsdp2_fp8 and if it returns False use pytest.skip(...)
(or apply pytest.mark.skipif with that predicate) to avoid running main_fsdp2
when FP8 is unsupported; ensure the skip message mentions FP8/compute capability
so test output is clear.
bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml (1)

41-44: ⚠️ Potential issue | 🟠 Major

Scheduler horizon should derive from num_train_steps.

The LR scheduler uses a hardcoded num_training_steps: 500_000, but the training loop stops at args.num_train_steps. If a config overrides only num_train_steps, the warmup/decay curve will be incorrect.

🔧 Suggested fix using Hydra interpolation
 lr_scheduler_kwargs:
   num_warmup_steps: 2_000
-  num_training_steps: 500_000
+  num_training_steps: ${num_train_steps}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml` around
lines 41 - 44, The lr_scheduler_kwargs currently hardcodes num_training_steps to
500_000 which can diverge from the actual training loop parameter
(args.num_train_steps); update the config so num_training_steps is interpolated
from the main training steps value instead of a literal (use Hydra/Omega conf
interpolation to reference the training steps field, e.g. point
lr_scheduler_kwargs.num_training_steps to the top-level num_train_steps), and
ensure num_warmup_steps remains consistent (can be expressed as an absolute or
derived value via interpolation if desired) so the scheduler horizon follows
args.num_train_steps at runtime.
bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py (1)

136-155: ⚠️ Potential issue | 🟡 Minor

Recompute loss after training loop for accurate comparison.

Line 154 reads outputs.loss.item() from the last forward pass before the final optimizer.step(). This checks pre-step loss, not post-step loss. Additionally, without fixed RNG seeds and with dropout enabled (default 0.1), this test can be flaky.

🔧 Suggested fix for deterministic testing
     def test_loss_decreases(self, small_config):
+        torch.manual_seed(42)
+        torch.cuda.manual_seed_all(42)
+        small_config.hidden_dropout_prob = 0.0
+        small_config.attention_probs_dropout_prob = 0.0
         model = CodonFMForMaskedLM(small_config).cuda()
         optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
         ...
-        final_loss = outputs.loss.item()
+        # Recompute loss after final optimizer step
+        model.eval()
+        with torch.no_grad():
+            final_outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
+        final_loss = final_outputs.loss.item()
         assert final_loss < initial_loss, f"Loss did not decrease: {initial_loss} -> {final_loss}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py` around lines
136 - 155, The test_loss_decreases test reads outputs.loss from the last forward
pass before the final optimizer.step, producing an incorrect pre-step final_loss
and making the check flaky; update the test to perform one additional forward
pass after the training loop (call model(...) again with
input_ids/attention_mask/labels) and read loss from that fresh outputs.loss for
final_loss, and make the test deterministic by fixing RNG seeds
(torch.manual_seed, torch.cuda.manual_seed_all) and disabling/dropout (set
model.eval() or set dropout to 0) before training so CodonFMForMaskedLM behavior
is stable.
bionemo-recipes/models/codonfm/modeling_codonfm_te.py (2)

109-117: ⚠️ Potential issue | 🟡 Minor

Validate attn_input_format in CodonFMConfig.

Any value other than "bshd" currently falls through the THD branch later, so a typo turns into a missing cu_seq_lens_* runtime failure instead of a clear config error.

🛠️ Suggested fix
         if self.hidden_act not in ("gelu", "relu", "silu"):
             raise ValueError(f"hidden_act must be one of: gelu, relu, silu, got {self.hidden_act}")
+        if self.attn_input_format not in {"bshd", "thd"}:
+            raise ValueError(f'attn_input_format must be "bshd" or "thd", got {self.attn_input_format!r}')
         if self.layer_precision is not None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py` around lines 109 -
117, The config field attn_input_format is not validated, so typos (anything
other than "bshd") silently fall into the THD branch and cause runtime failures;
add an explicit validation in the same validation block (near the existing
checks for hidden_size, hidden_act, layer_precision) that checks
self.attn_input_format (or CodonFMConfig.attn_input_format) and raises a
ValueError if it is not exactly "bshd" (with a clear message listing the allowed
value) so misconfigured values fail fast with a clear error.

447-466: ⚠️ Potential issue | 🟠 Major

Use the scaled MAGNETO gain for value_weight.

The docstring says only Q/K should use gain=1.0, but value_weight currently inherits w_gain. In the split-QKV path, that initializes V as if it were Q/K.

🛠️ Suggested fix
-                is_qk = "query" in name or "key" in name or "qkv" in name
-                w_gain = 1.0 if is_qk else gain
+                is_qk_module = "query" in name or "key" in name or "qkv" in name
+                w_gain = 1.0 if is_qk_module else gain
                 if getattr(module, "weight", None) is not None:
                     nn.init.xavier_normal_(module.weight, gain=w_gain)
                     if getattr(module, "bias", None) is not None:
                         module.bias.data.zero_()
                 if getattr(module, "query_weight", None) is not None:
                     nn.init.xavier_normal_(module.query_weight, gain=w_gain)
                     if getattr(module, "query_bias", None) is not None:
                         module.query_bias.data.zero_()
                 if getattr(module, "key_weight", None) is not None:
                     nn.init.xavier_normal_(module.key_weight, gain=w_gain)
                     if getattr(module, "key_bias", None) is not None:
                         module.key_bias.data.zero_()
                 if getattr(module, "value_weight", None) is not None:
-                    nn.init.xavier_normal_(module.value_weight, gain=w_gain)
+                    nn.init.xavier_normal_(module.value_weight, gain=gain)
                     if getattr(module, "value_bias", None) is not None:
                         module.value_bias.data.zero_()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py` around lines 447 -
466, The value_weight branch incorrectly uses w_gain (which is set to 1.0 for
Q/K) so V gets Q/K scaling; in the named_modules loop inside
modeling_codonfm_te.py (look for is_qk, w_gain, gain) change the initialization
of module.value_weight to use the unmodified gain variable (and leave value_bias
zeroing as-is) so that only query/key weights use gain=1.0 while value_weight
uses the scaled MAGNETO gain.
bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py (1)

286-287: ⚠️ Potential issue | 🟠 Major

Assert that the golden state dict loads cleanly.

All three regression paths ignore the missing_keys / unexpected_keys result from load_state_dict(..., strict=False). That can hide fixture drift and still let the test compare partially initialized models.

🛠️ Suggested fix
-        model.load_state_dict(golden_sd, strict=False)
+        missing_keys, unexpected_keys = model.load_state_dict(golden_sd, strict=False)
+        assert not missing_keys and not unexpected_keys

Apply the same change to the other two load_state_dict(golden_sd, strict=False) call sites in this file.

#!/bin/bash
python - <<'PY'
try:
    import torch
    from torch import nn

    result = nn.Linear(1, 1).load_state_dict({"weight": torch.zeros(1, 1)}, strict=False)
    print(type(result).__name__)
    print("missing_keys:", result.missing_keys)
    print("unexpected_keys:", result.unexpected_keys)
except Exception as exc:
    print(f"PyTorch inspection unavailable: {exc}")
PY

rg -n "load_state_dict\(golden_sd, strict=False\)" bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py

Also applies to: 352-353, 365-366

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py` around
lines 286 - 287, The test currently calls load_state_dict(golden_sd,
strict=False) on CodonFMForMaskedLM instances and ignores the returned result,
which can hide missing/unexpected keys; update each load_state_dict(golden_sd,
strict=False) call (the ones used when instantiating CodonFMForMaskedLM and the
two other occurrences) to capture the return value (e.g., result =
model.load_state_dict(...)) and add assertions that result.missing_keys == []
and result.unexpected_keys == [] (or use assert not result.missing_keys and
assert not result.unexpected_keys) so the golden state dict must load cleanly.
Ensure the assertions are applied to all three call sites referenced in the
test.
bionemo-recipes/recipes/codonfm_native_te/checkpoint.py (1)

182-191: ⚠️ Potential issue | 🔴 Critical

Only store async futures in _ckpt_futures.

dcp_save() is the synchronous path, but its return value is still written into _ckpt_futures["fsdp2"]. save_final_model_fsdp2() later calls .result() on whatever is stored there, so a run that saved a synchronous checkpoint can fail during final export.

🛠️ Suggested fix
-    if async_save and "fsdp2" in _ckpt_futures and _ckpt_futures["fsdp2"] is not None:
-        _ckpt_futures["fsdp2"].result()
+    if "fsdp2" in _ckpt_futures and _ckpt_futures["fsdp2"] is not None:
+        _ckpt_futures["fsdp2"].result()
+        _ckpt_futures["fsdp2"] = None
@@
     state_dict = {"app": AppState(model=model, optimizer=optimizer, scheduler=scheduler, step=step, epoch=epoch)}
     ckpt_save_func = dcp_async_save if async_save else dcp_save
-    _ckpt_futures["fsdp2"] = ckpt_save_func(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    save_result = ckpt_save_func(state_dict, checkpoint_id=checkpoint_path, process_group=process_group)
+    _ckpt_futures["fsdp2"] = save_result if async_save else None
#!/bin/bash
python - <<'PY'
import inspect

try:
    from torch.distributed.checkpoint.state_dict_saver import save, async_save

    print("save signature:", inspect.signature(save))
    print("async_save signature:", inspect.signature(async_save))

    print("--- save source tail ---")
    for line in inspect.getsource(save).splitlines()[-8:]:
        print(line)

    print("--- async_save source tail ---")
    for line in inspect.getsource(async_save).splitlines()[-8:]:
        print(line)
except Exception as exc:
    print(f"PyTorch checkpoint API inspection unavailable: {exc}")
PY

rg -n "_ckpt_futures\\[\"fsdp2\"\\]|save_final_model_fsdp2|dcp_async_save|dcp_save" bionemo-recipes/recipes/codonfm_native_te/checkpoint.py

Also applies to: 207-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py` around lines 182 -
191, The code is storing the return value of the synchronous saver into the
_ckpt_futures map, causing save_final_model_fsdp2 to later call .result() on a
non-future; change the assignment so that only the async path writes a future
into _ckpt_futures["fsdp2"]: call dcp_save(state_dict, ...) without assigning
its return when async_save is False, and only assign the result of
dcp_async_save(...) to _ckpt_futures["fsdp2"] when async_save is True; update
the block around ckpt_save_func/dcp_save/dcp_async_save and any other
occurrences (e.g., the similar 207-210 region) to follow the same pattern so
save_final_model_fsdp2 always sees either a Future or None in
_ckpt_futures["fsdp2"].
bionemo-recipes/models/codonfm/tokenizer.py (1)

124-145: ⚠️ Potential issue | 🟡 Minor

Reload the serialized special-token state in from_pretrained().

save_pretrained() persists special_tokens, but from_pretrained() ignores them and decode() still derives special IDs from the class constant. That makes the saved metadata ineffective and breaks exact round-tripping as soon as token ordering changes.

🛠️ Suggested fix
-        special_ids = {self.encoder[tok] for tok in self.SPECIAL_TOKENS if tok in self.encoder}
+        special_tokens = getattr(self, "special_tokens", self.SPECIAL_TOKENS)
+        special_ids = {self.encoder[tok] for tok in special_tokens if tok in self.encoder}
@@
         config = {
             "seq_type": self.seq_type,
-            "special_tokens": self.SPECIAL_TOKENS,
+            "special_tokens": getattr(self, "special_tokens", self.SPECIAL_TOKENS),
             "encoder": self.encoder,
         }
@@
         tokenizer = cls(seq_type=config["seq_type"])
+        tokenizer.special_tokens = list(config.get("special_tokens", tokenizer.SPECIAL_TOKENS))
         # Restore the exact encoder/decoder from the saved config to guarantee round-trip fidelity.
         tokenizer.encoder = {tok: int(i) for tok, i in config["encoder"].items()}
         tokenizer.decoder = {i: tok for tok, i in tokenizer.encoder.items()}
+        tokenizer.codons = [
+            tok
+            for tok, _ in sorted(tokenizer.encoder.items(), key=lambda item: item[1])
+            if tok not in tokenizer.special_tokens
+        ]
         return tokenizer

Also applies to: 157-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tokenizer.py` around lines 124 - 145,
from_pretrained currently ignores the serialized "special_tokens" saved by
save_pretrained, so reload the saved special-token state in from_pretrained by
reading the "special_tokens" entry from the config JSON and using it to set the
instance's SPECIAL_TOKENS (and rebuild any dependent structures like
special_ids); ensure from_pretrained populates self.SPECIAL_TOKENS (or an
instance attribute used by decode) before creating encoder/decoder mappings so
decode() derives special IDs from the loaded config rather than the class
constant, and similarly handle the alternate location referenced in lines
~157-163.
bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py (1)

160-168: ⚠️ Potential issue | 🟡 Minor

Make test_matches_correct_sublayers reject prefix overmatches.

These positives still pass if generate_layer_regex() starts matching prefixes too broadly, because the assertions use *_something names. Assert the exact sublayer names and explicitly reject the suffixed variants.

🛠️ Suggested fix
 def test_matches_correct_sublayers():
     """Regex should only match layernorm_qkv, proj, fc1, fc2."""
     regex = generate_layer_regex([1])
-    assert re.search(regex, "encoder.layers.1.self_attention.layernorm_qkv_something")
-    assert re.search(regex, "encoder.layers.1.self_attention.proj_something")
-    assert re.search(regex, "encoder.layers.1.layernorm_mlp.fc1_something")
-    assert re.search(regex, "encoder.layers.1.layernorm_mlp.fc2_something")
+    assert re.search(regex, "encoder.layers.1.self_attention.layernorm_qkv")
+    assert re.search(regex, "encoder.layers.1.self_attention.proj")
+    assert re.search(regex, "encoder.layers.1.layernorm_mlp.fc1")
+    assert re.search(regex, "encoder.layers.1.layernorm_mlp.fc2")
+    assert not re.search(regex, "encoder.layers.1.self_attention.layernorm_qkv_something")
+    assert not re.search(regex, "encoder.layers.1.self_attention.proj_something")
+    assert not re.search(regex, "encoder.layers.1.layernorm_mlp.fc1_something")
+    assert not re.search(regex, "encoder.layers.1.layernorm_mlp.fc2_something")
     # Should not match unrelated sublayer names
     assert not re.search(regex, "encoder.layers.1.self_attention.some_other_thing")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py` around
lines 160 - 168, The test currently accepts suffixes like "_something" which
hides over-broad prefix matches from generate_layer_regex; update
test_matches_correct_sublayers to assert matches against the exact sublayer
names ("encoder.layers.1.self_attention.layernorm_qkv",
"encoder.layers.1.self_attention.proj", "encoder.layers.1.layernorm_mlp.fc1",
"encoder.layers.1.layernorm_mlp.fc2") and add negative assertions that the regex
does NOT match the suffixed variants ("...layernorm_qkv_something",
"...proj_something", "...fc1_something", "...fc2_something") as well as the
unrelated sublayer ("encoder.layers.1.self_attention.some_other_thing") so the
test fails if generate_layer_regex only matches prefixes.
bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py (1)

224-231: ⚠️ Potential issue | 🟠 Major

Final export is still not self-contained.

This call only serializes weights and config.json through save_final_model_fsdp2() in checkpoint.py. CodonTokenizer.from_pretrained() expects tokenizer metadata too, so final_model/ will not round-trip until the tokenizer is saved alongside the model.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py` around lines 224 -
231, The final export only writes weights and config via save_final_model_fsdp2
(checkpoint.py) but misses tokenizer metadata required by
CodonTokenizer.from_pretrained; update the checkpoint saving block (where
args.checkpoint.save_final_model and ckpt_path are checked) to also save the
tokenizer into the same save_directory (e.g., call the tokenizer's save method
such as tokenizer.save_pretrained or tokenizer.save to write tokenizer files
into ckpt_path / "final_model") so the final_model directory is self-contained
for from_pretrained loading.
bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py (1)

398-439: ⚠️ Potential issue | 🟡 Minor

Use None checks for optional ModelOutput fields.

MaskedLMOutput attributes exist even when their value is None, so hasattr(...) still lets None into the comparison paths and the hidden-state branch can hit len(None) / assert_close(None, ...). Since this is a copied file, please fix the source and re-sync.

Suggested fix
-        if compare_loss and hasattr(te_outputs, "loss") and hasattr(hf_outputs, "loss"):
+        if (
+            compare_loss
+            and getattr(te_outputs, "loss", None) is not None
+            and getattr(hf_outputs, "loss", None) is not None
+        ):
             torch.testing.assert_close(
                 te_outputs.loss,
                 hf_outputs.loss,
@@
-        if compare_logits and hasattr(te_outputs, "logits") and hasattr(hf_outputs, "logits"):
+        if (
+            compare_logits
+            and getattr(te_outputs, "logits", None) is not None
+            and getattr(hf_outputs, "logits", None) is not None
+        ):
             # Only compare logits where attention mask is True
             if "attention_mask" in input_data:
@@
-        if compare_hidden_states and hasattr(te_outputs, "hidden_states") and hasattr(hf_outputs, "hidden_states"):
+        if (
+            compare_hidden_states
+            and getattr(te_outputs, "hidden_states", None) is not None
+            and getattr(hf_outputs, "hidden_states", None) is not None
+        ):
             assert len(te_outputs.hidden_states) == len(hf_outputs.hidden_states), (
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py` around
lines 398 - 439, The comparisons currently use hasattr(...) which returns True
even when ModelOutput fields are present but set to None; update the conditions
to also guard against None before comparing. Specifically, in the loss branch
(compare_loss) require te_outputs.loss is not None and hf_outputs.loss is not
None; in the logits branch (compare_logits) require te_outputs.logits is not
None and hf_outputs.logits is not None (and still apply the attention_mask
boolean indexing only after confirming logits are not None); and in the
hidden-states branch (compare_hidden_states) require te_outputs.hidden_states is
not None and hf_outputs.hidden_states is not None before calling len(...) or
iterating through hidden states. Use the existing symbols (compare_loss,
compare_logits, compare_hidden_states, te_outputs, hf_outputs, input_data,
tolerances) to locate and update the conditional checks.
bionemo-recipes/recipes/codonfm_native_te/quantization.py (1)

191-205: ⚠️ Potential issue | 🟠 Major

Reject explicit empty layer lists.

[] bypasses the current validation and is treated as an explicit assignment, so fp8_enabled=True, fp8_layers=[] returns an all-None precision plan and silently disables quantization. Fail fast here before the defaulting logic.

Suggested fix
     # Validate layer entries are integers within valid range.
     for name, layers in [("fp8_layers", fp8_layers), ("fp4_layers", fp4_layers)]:
-        if layers is not None and len(layers) > 0:
+        if layers == []:
+            raise ValueError(f"{name} cannot be an explicit empty list.")
+        if layers is not None:
             invalid = [x for x in layers if not isinstance(x, int) or x < 1 or x > num_layers]
             if invalid:
                 raise ValueError(

Also applies to: 207-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py` around lines 191 -
205, The validator currently allows explicit empty lists for
fp8_layers/fp4_layers which silently disables quantization; update the
validation in the loop that inspects ("fp8_layers", fp8_layers) and
("fp4_layers", fp4_layers) to treat an empty list as invalid: if layers is not
None and len(layers) == 0 raise a ValueError stating the layer list must not be
empty, and otherwise keep the existing integer/range checks for non-empty lists;
ensure this change also addresses the logic around fp8_enabled and fp4_enabled
(the fp8_enabled and fp4_enabled check that follows should no longer need to
handle empty lists because they will be rejected) and apply the same empty-list
rejection in the similar validation block referenced later (lines ~207-233) for
fp8_layers/fp4_layers.
bionemo-recipes/models/codonfm/dataset.py (2)

22-23: ⚠️ Potential issue | 🟠 Major

These imports break the model package outside recipe-specific path hacks.

bionemo-recipes/models/codonfm/tokenizer.py lives in this package, so it should be imported relatively, and the only DistributedConfig shown in this PR lives under recipes/codonfm_native_te/. Importing this module as part of the model package will fail unless callers mutate sys.path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/dataset.py` around lines 22 - 23, The
top-level imports in bionemo-recipes/models/codonfm/dataset.py break package
usage; change the tokenizer import to a relative import (use from .tokenizer
import CodonTokenizer) and stop importing DistributedConfig from a
recipe-specific location inside the model package—either remove the
DistributedConfig import from this module and pass a config object into the
functions/classes that need it, or import it from the recipe-level module that
actually defines it (e.g., move any DistributedConfig usage out of dataset.py
into the recipes/codonfm_native_te code and pass the config in), ensuring
references to CodonTokenizer and any config are located via .tokenizer and not
top-level names.

85-103: ⚠️ Potential issue | 🟠 Major

Masking RNG still replays across workers.

Both collators keep a single random.Random(seed) instance, and the dataloader factories never pass their seed argument into them. With num_workers > 0, workers start from identical RNG state and worker respawns repeat the same masking stream. Seed per worker from torch.utils.data.get_worker_info().seed/torch.initial_seed() and thread the caller seed through both factories.

Also applies to: 159-177, 266-270, 321-325

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/dataset.py` around lines 85 - 103, The masking
RNG should not be a single shared random.Random(seed) stored on initialization
(self.rng in __init__); instead derive a per-worker RNG at mask-time using
torch.utils.data.get_worker_info().seed (or torch.initial_seed() when
worker_info is None) combined with the caller/base seed, e.g., create
random.Random(worker_seed ^ base_seed) inside the collator or Dataset.masking
method; also update the dataloader factory functions to accept and forward the
base seed into the collator/Dataset constructors so the per-worker RNG can be
deterministically derived from that base seed. Ensure the change is applied to
the Dataset.__init__ (remove/stop using self.rng), the collator classes that
currently hold a random.Random(seed), and both dataloader factory functions so
every worker obtains a unique RNG.
bionemo-recipes/recipes/codonfm_native_te/dataset.py (1)

85-103: ⚠️ Potential issue | 🟠 Major

Masking RNG still replays across workers.

Both collators keep a single random.Random(seed) instance, and the dataloader factories never pass their seed argument into them. With num_workers > 0, workers start from identical RNG state and worker respawns repeat the same masking stream. Seed per worker from torch.utils.data.get_worker_info().seed/torch.initial_seed() and thread the caller seed through both factories.

Also applies to: 159-177, 266-270, 321-325

🧹 Nitpick comments (2)
bionemo-recipes/models/codonfm/tests/conftest.py (1)

38-45: Minor inconsistency between local and shared unused_tcp_port fixtures.

This fixture differs from the one in tests/common/fixtures.py (lines 33-42): it adds s.listen(1) and omits SO_REUSEADDR. The listen() call is unnecessary for port discovery and slightly changes behavior. Consider aligning with the shared implementation for consistency.

Regarding the CodeQL warning about binding to '': this is standard practice for discovering an available port and is safe in this test context since the socket is immediately closed.

♻️ Align with shared fixture
 `@pytest.fixture`
 def unused_tcp_port():
     """Find and return an unused TCP port for torchrun rendezvous."""
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
         s.bind(("", 0))
+        s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
-        s.listen(1)
-        port = s.getsockname()[1]
-    return port
+        return s.getsockname()[1]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/conftest.py` around lines 38 - 45, The
unused_tcp_port fixture should match the shared implementation: remove the
unnecessary s.listen(1) call and set SO_REUSEADDR on the socket before binding
so behavior is consistent with tests/common/fixtures.py; update the fixture
named unused_tcp_port to create the socket, call s.setsockopt(socket.SOL_SOCKET,
socket.SO_REUSEADDR, 1), bind to ("", 0), get the port from s.getsockname()[1],
close the socket, and return the port.
bionemo-recipes/models/codonfm/tests/generate_golden_values.py (1)

111-135: Consider adding documentation about the masking strategy.

The function applies 15% random masking (line 124) but re-seeds the RNG per sequence (line 122), which affects reproducibility across sequences. This is fine for golden value generation but worth documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py` around lines
111 - 135, The prepare_test_inputs function applies 15% random masking but calls
torch.manual_seed(SEED) inside the per-sequence loop, which re-seeds the RNG
each iteration and causes the same mask pattern for every sequence; either move
torch.manual_seed(SEED) to just before the for enc in encoded: loop (or before
encoding) to get a reproducible but varied mask across sequences, or if you
intentionally want identical per-sequence masks, update the function docstring
to state the masking rate (0.15), that masking uses torch.rand with
torch.manual_seed(SEED) called per sequence, and mention tokenizer.mask_token_id
and labels use -100 for non-masked positions so reviewers understand the
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/recipes/codonfm_native_te/perf_logger.py`:
- Around line 107-128: The metrics loss, grad_norm, learning_rate and perplexity
are only updated inside the logging gate; move their updates to run every step
so the MeanMetric/Perplexity instances accumulate over the window: for each step
(before the if step % self.logging_frequency == 0 block) call
self.metrics["train/loss"].update(outputs.loss),
self.metrics["train/learning_rate"].update(lr),
self.metrics["train/grad_norm"].update(grad_norm if isinstance(grad_norm,
torch.Tensor) else torch.tensor(grad_norm)), and update perplexity using logits
(apply the same logits.dim() < 3 -> logits = logits.unsqueeze(0] handling) with
self.metrics["train/perplexity"].update(logits, batch["labels"]); keep the
timing/tokens-per-second and the reset/report logic inside the existing
logging_frequency gate so only reporting/reset happens there.

In `@bionemo-recipes/recipes/codonfm_native_te/scheduler.py`:
- Around line 30-41: Check and enforce the warmup/decay boundary before
constructing lr_lambda: validate that num_warmup_steps < decay_steps (where
decay_steps = int(num_training_steps * 0.9)) and either raise a clear ValueError
or clamp decay_steps (e.g., set decay_steps = max(decay_steps, num_warmup_steps
+ 1)) so the linear-decay region exists; update the logic around decay_steps and
num_warmup_steps used in lr_lambda to rely on the validated/clamped value and
fail fast on invalid configs instead of silently producing a clipped schedule.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 190-210: The logging and checkpointing use a 0-based step because
step is incremented after perf_logger.log_step and should_save_checkpoint,
causing off-by-one behavior; update the flow to use a 1-based global_step for
both perf_logger.log_step and should_save_checkpoint/save_checkpoint_fsdp2 by
deriving global_step = step + 1 (or increment step before these calls) and pass
global_step into perf_logger.log_step and
should_save_checkpoint/save_checkpoint_fsdp2 (and to any saved metadata like
step/epoch) so logs and periodic checkpoints occur at the intended 1-based
boundaries; keep the internal step variable consistent by either using the new
global_step only for logging/checkpointing or by making step itself 1-based.
- Around line 152-166: After restoring state with load_checkpoint_fsdp2(...) you
must call sampler.set_epoch(epoch) to align DistributedSampler with the restored
epoch; add sampler.set_epoch(epoch) immediately after the checkpoint branch that
sets start_step and epoch (i.e., right after load_checkpoint_fsdp2 returns) and
before entering the training loop, and ensure any existing mid-epoch resume
logic uses start_step to skip already-processed batches so indices are not
duplicated.

---

Duplicate comments:
In `@bionemo-recipes/models/codonfm/dataset.py`:
- Around line 22-23: The top-level imports in
bionemo-recipes/models/codonfm/dataset.py break package usage; change the
tokenizer import to a relative import (use from .tokenizer import
CodonTokenizer) and stop importing DistributedConfig from a recipe-specific
location inside the model package—either remove the DistributedConfig import
from this module and pass a config object into the functions/classes that need
it, or import it from the recipe-level module that actually defines it (e.g.,
move any DistributedConfig usage out of dataset.py into the
recipes/codonfm_native_te code and pass the config in), ensuring references to
CodonTokenizer and any config are located via .tokenizer and not top-level
names.
- Around line 85-103: The masking RNG should not be a single shared
random.Random(seed) stored on initialization (self.rng in __init__); instead
derive a per-worker RNG at mask-time using
torch.utils.data.get_worker_info().seed (or torch.initial_seed() when
worker_info is None) combined with the caller/base seed, e.g., create
random.Random(worker_seed ^ base_seed) inside the collator or Dataset.masking
method; also update the dataloader factory functions to accept and forward the
base seed into the collator/Dataset constructors so the per-worker RNG can be
deterministically derived from that base seed. Ensure the change is applied to
the Dataset.__init__ (remove/stop using self.rng), the collator classes that
currently hold a random.Random(seed), and both dataloader factory functions so
every worker obtains a unique RNG.

In `@bionemo-recipes/models/codonfm/modeling_codonfm_te.py`:
- Around line 109-117: The config field attn_input_format is not validated, so
typos (anything other than "bshd") silently fall into the THD branch and cause
runtime failures; add an explicit validation in the same validation block (near
the existing checks for hidden_size, hidden_act, layer_precision) that checks
self.attn_input_format (or CodonFMConfig.attn_input_format) and raises a
ValueError if it is not exactly "bshd" (with a clear message listing the allowed
value) so misconfigured values fail fast with a clear error.
- Around line 447-466: The value_weight branch incorrectly uses w_gain (which is
set to 1.0 for Q/K) so V gets Q/K scaling; in the named_modules loop inside
modeling_codonfm_te.py (look for is_qk, w_gain, gain) change the initialization
of module.value_weight to use the unmodified gain variable (and leave value_bias
zeroing as-is) so that only query/key weights use gain=1.0 while value_weight
uses the scaled MAGNETO gain.

In `@bionemo-recipes/models/codonfm/requirements.txt`:
- Line 2: The requirements entry "transformer_engine>=1.14.0" in
requirements.txt is invalid because PyPI has no 1.14.0 release; update that line
to a published minimum you support (for example "transformer_engine>=1.13.0" if
you want the last 1.x line or "transformer_engine>=2.1.0" for 2.x) so installs
succeed; locate the package line "transformer_engine>=1.14.0" in
requirements.txt and replace it with the chosen published floor.

In `@bionemo-recipes/models/codonfm/tests/common/__init__.py`:
- Around line 32-42: Update the docstring example to reference CodonFM-specific
classes instead of ESM2; replace the ESM2 identifiers (ESM2ModelTester,
NVEsmForMaskedLM) with the CodonFM equivalents used in this codebase (e.g., a
CodonFMModelTester subclass of BaseModelTest and the model class name used for
CodonFM), ensuring the example imports (from tests.common import BaseModelTest,
TestTolerances) and method get_model_class remain consistent with the CodonFM
test helpers.

In `@bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py`:
- Around line 398-439: The comparisons currently use hasattr(...) which returns
True even when ModelOutput fields are present but set to None; update the
conditions to also guard against None before comparing. Specifically, in the
loss branch (compare_loss) require te_outputs.loss is not None and
hf_outputs.loss is not None; in the logits branch (compare_logits) require
te_outputs.logits is not None and hf_outputs.logits is not None (and still apply
the attention_mask boolean indexing only after confirming logits are not None);
and in the hidden-states branch (compare_hidden_states) require
te_outputs.hidden_states is not None and hf_outputs.hidden_states is not None
before calling len(...) or iterating through hidden states. Use the existing
symbols (compare_loss, compare_logits, compare_hidden_states, te_outputs,
hf_outputs, input_data, tolerances) to locate and update the conditional checks.

In `@bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py`:
- Around line 286-287: The test currently calls load_state_dict(golden_sd,
strict=False) on CodonFMForMaskedLM instances and ignores the returned result,
which can hide missing/unexpected keys; update each load_state_dict(golden_sd,
strict=False) call (the ones used when instantiating CodonFMForMaskedLM and the
two other occurrences) to capture the return value (e.g., result =
model.load_state_dict(...)) and add assertions that result.missing_keys == []
and result.unexpected_keys == [] (or use assert not result.missing_keys and
assert not result.unexpected_keys) so the golden state dict must load cleanly.
Ensure the assertions are applied to all three call sites referenced in the
test.

In `@bionemo-recipes/models/codonfm/tokenizer.py`:
- Around line 124-145: from_pretrained currently ignores the serialized
"special_tokens" saved by save_pretrained, so reload the saved special-token
state in from_pretrained by reading the "special_tokens" entry from the config
JSON and using it to set the instance's SPECIAL_TOKENS (and rebuild any
dependent structures like special_ids); ensure from_pretrained populates
self.SPECIAL_TOKENS (or an instance attribute used by decode) before creating
encoder/decoder mappings so decode() derives special IDs from the loaded config
rather than the class constant, and similarly handle the alternate location
referenced in lines ~157-163.

In `@bionemo-recipes/recipes/codonfm_native_te/checkpoint.py`:
- Around line 182-191: The code is storing the return value of the synchronous
saver into the _ckpt_futures map, causing save_final_model_fsdp2 to later call
.result() on a non-future; change the assignment so that only the async path
writes a future into _ckpt_futures["fsdp2"]: call dcp_save(state_dict, ...)
without assigning its return when async_save is False, and only assign the
result of dcp_async_save(...) to _ckpt_futures["fsdp2"] when async_save is True;
update the block around ckpt_save_func/dcp_save/dcp_async_save and any other
occurrences (e.g., the similar 207-210 region) to follow the same pattern so
save_final_model_fsdp2 always sees either a Future or None in
_ckpt_futures["fsdp2"].

In `@bionemo-recipes/recipes/codonfm_native_te/distributed_config.py`:
- Around line 26-40: The dataclass currently mutates os.environ via
dict.setdefault in the default_factories for rank, local_rank, world_size,
_master_addr, and _master_port; change those default_factory lambdas to use
os.getenv(..., default) so they only read environment values and do not write
them, and update the class docstring sentence that says the class "sets" torch
distributed env vars to instead state it "falls back to" single-process defaults
(refer to DistributedConfig and the fields rank, local_rank, world_size,
_master_addr, _master_port).

In `@bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml`:
- Around line 41-44: The lr_scheduler_kwargs currently hardcodes
num_training_steps to 500_000 which can diverge from the actual training loop
parameter (args.num_train_steps); update the config so num_training_steps is
interpolated from the main training steps value instead of a literal (use
Hydra/Omega conf interpolation to reference the training steps field, e.g. point
lr_scheduler_kwargs.num_training_steps to the top-level num_train_steps), and
ensure num_warmup_steps remains consistent (can be expressed as an absolute or
derived value via interpolation if desired) so the scheduler horizon follows
args.num_train_steps at runtime.

In `@bionemo-recipes/recipes/codonfm_native_te/quantization.py`:
- Around line 191-205: The validator currently allows explicit empty lists for
fp8_layers/fp4_layers which silently disables quantization; update the
validation in the loop that inspects ("fp8_layers", fp8_layers) and
("fp4_layers", fp4_layers) to treat an empty list as invalid: if layers is not
None and len(layers) == 0 raise a ValueError stating the layer list must not be
empty, and otherwise keep the existing integer/range checks for non-empty lists;
ensure this change also addresses the logic around fp8_enabled and fp4_enabled
(the fp8_enabled and fp4_enabled check that follows should no longer need to
handle empty lists because they will be rejected) and apply the same empty-list
rejection in the similar validation block referenced later (lines ~207-233) for
fp8_layers/fp4_layers.

In `@bionemo-recipes/recipes/codonfm_native_te/README.md`:
- Around line 49-66: The fenced code block containing the directory tree in the
README (the block that starts with the triple backticks showing
codonfm_native_te/ and its files) is missing a language tag and triggers MD040;
update that opening fence from ``` to ```text so the block becomes a text-coded
fenced block (i.e., add the "text" language tag to the directory-tree code fence
in README.md).

In `@bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py`:
- Around line 37-44: The fixture device_mesh can collide across concurrent test
runs because DistributedConfig defaults MASTER_PORT to 12355; reserve an
ephemeral TCP port and set MASTER_PORT before creating DistributedConfig /
calling torch.distributed.init_process_group so each test session uses a unique
port. Specifically, in the device_mesh fixture obtain a free port (bind to port
0 on localhost to get the OS-assigned port), set os.environ["MASTER_PORT"] to
that port string before instantiating DistributedConfig or before calling
torch.distributed.init_process_group, then proceed with creating the device,
setting CUDA device, and calling init_device_mesh("cuda", mesh_shape=(1,),
mesh_dim_names=("dp",)); ensure the socket is closed after reading the assigned
port so the port is available for the process.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py`:
- Around line 136-155: The test_loss_decreases test reads outputs.loss from the
last forward pass before the final optimizer.step, producing an incorrect
pre-step final_loss and making the check flaky; update the test to perform one
additional forward pass after the training loop (call model(...) again with
input_ids/attention_mask/labels) and read loss from that fresh outputs.loss for
final_loss, and make the test deterministic by fixing RNG seeds
(torch.manual_seed, torch.cuda.manual_seed_all) and disabling/dropout (set
model.eval() or set dropout to 0) before training so CodonFMForMaskedLM behavior
is stable.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py`:
- Around line 160-168: The test currently accepts suffixes like "_something"
which hides over-broad prefix matches from generate_layer_regex; update
test_matches_correct_sublayers to assert matches against the exact sublayer
names ("encoder.layers.1.self_attention.layernorm_qkv",
"encoder.layers.1.self_attention.proj", "encoder.layers.1.layernorm_mlp.fc1",
"encoder.layers.1.layernorm_mlp.fc2") and add negative assertions that the regex
does NOT match the suffixed variants ("...layernorm_qkv_something",
"...proj_something", "...fc1_something", "...fc2_something") as well as the
unrelated sublayer ("encoder.layers.1.self_attention.some_other_thing") so the
test fails if generate_layer_regex only matches prefixes.

In `@bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py`:
- Around line 47-59: The test test_sanity_convergence_fsdp2_no_meta_device can
accidentally resume from an existing checkpoint because it doesn't override
checkpoint.ckpt_dir while defaults include resume_from_checkpoint=true; update
the compose overrides in the test (the compose call inside
test_sanity_convergence_fsdp2_no_meta_device) to set a per-test checkpoint
directory (e.g. add an override that points checkpoint.ckpt_dir to tmp_path or a
tmp subdir) so the run is isolated and cannot pick up stale checkpoints.
- Around line 81-94: Add a GPU capability guard to the
test_sanity_convergence_fsdp2_fp8 function so it skips on hardware that doesn't
support FP8: call TransformerEngine.check_fp8_support() (or equivalent) at the
start of test_sanity_convergence_fsdp2_fp8 and if it returns False use
pytest.skip(...) (or apply pytest.mark.skipif with that predicate) to avoid
running main_fsdp2 when FP8 is unsupported; ensure the skip message mentions
FP8/compute capability so test output is clear.

In `@bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py`:
- Around line 224-231: The final export only writes weights and config via
save_final_model_fsdp2 (checkpoint.py) but misses tokenizer metadata required by
CodonTokenizer.from_pretrained; update the checkpoint saving block (where
args.checkpoint.save_final_model and ckpt_path are checked) to also save the
tokenizer into the same save_directory (e.g., call the tokenizer's save method
such as tokenizer.save_pretrained or tokenizer.save to write tokenizer files
into ckpt_path / "final_model") so the final_model directory is self-contained
for from_pretrained loading.

---

Nitpick comments:
In `@bionemo-recipes/models/codonfm/tests/conftest.py`:
- Around line 38-45: The unused_tcp_port fixture should match the shared
implementation: remove the unnecessary s.listen(1) call and set SO_REUSEADDR on
the socket before binding so behavior is consistent with
tests/common/fixtures.py; update the fixture named unused_tcp_port to create the
socket, call s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1), bind to
("", 0), get the port from s.getsockname()[1], close the socket, and return the
port.

In `@bionemo-recipes/models/codonfm/tests/generate_golden_values.py`:
- Around line 111-135: The prepare_test_inputs function applies 15% random
masking but calls torch.manual_seed(SEED) inside the per-sequence loop, which
re-seeds the RNG each iteration and causes the same mask pattern for every
sequence; either move torch.manual_seed(SEED) to just before the for enc in
encoded: loop (or before encoding) to get a reproducible but varied mask across
sequences, or if you intentionally want identical per-sequence masks, update the
function docstring to state the masking rate (0.15), that masking uses
torch.rand with torch.manual_seed(SEED) called per sequence, and mention
tokenizer.mask_token_id and labels use -100 for non-masked positions so
reviewers understand the behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a8b9a7b4-2bf1-4951-80a8-52fcd8990444

📥 Commits

Reviewing files that changed from the base of the PR and between f437be0 and 57a8fa0.

⛔ Files ignored due to path filters (1)
  • bionemo-recipes/recipes/codonfm_native_te/train.parquet is excluded by !**/*.parquet
📒 Files selected for processing (40)
  • .gitignore
  • bionemo-recipes/models/codonfm/dataset.py
  • bionemo-recipes/models/codonfm/modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/requirements.txt
  • bionemo-recipes/models/codonfm/tests/common/README.md
  • bionemo-recipes/models/codonfm/tests/common/__init__.py
  • bionemo-recipes/models/codonfm/tests/common/fixtures.py
  • bionemo-recipes/models/codonfm/tests/common/test_modeling_common.py
  • bionemo-recipes/models/codonfm/tests/conftest.py
  • bionemo-recipes/models/codonfm/tests/generate_golden_values.py
  • bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
  • bionemo-recipes/models/codonfm/tests/golden_values.json
  • bionemo-recipes/models/codonfm/tests/test_modeling_codonfm_te.py
  • bionemo-recipes/models/codonfm/tokenizer.py
  • bionemo-recipes/models/esm2/tests/common/test_modeling_common.py
  • bionemo-recipes/models/llama3/tests/common/test_modeling_common.py
  • bionemo-recipes/models/mixtral/tests/common/test_modeling_common.py
  • bionemo-recipes/models/qwen/tests/common/test_modeling_common.py
  • bionemo-recipes/recipes/codonfm_native_te/README.md
  • bionemo-recipes/recipes/codonfm_native_te/checkpoint.py
  • bionemo-recipes/recipes/codonfm_native_te/dataset.py
  • bionemo-recipes/recipes/codonfm_native_te/distributed_config.py
  • bionemo-recipes/recipes/codonfm_native_te/fp4_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/fp8_debugging_stats.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/L0_sanity.yaml
  • bionemo-recipes/recipes/codonfm_native_te/hydra_config/defaults.yaml
  • bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py
  • bionemo-recipes/recipes/codonfm_native_te/perf_logger.py
  • bionemo-recipes/recipes/codonfm_native_te/quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/requirements.txt
  • bionemo-recipes/recipes/codonfm_native_te/scheduler.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/__init__.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/conftest.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_model.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_quantization.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/tests/test_train.py
  • bionemo-recipes/recipes/codonfm_native_te/tokenizer.py
  • bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
  • ci/scripts/check_copied_files.py

Comment thread bionemo-recipes/recipes/codonfm_native_te/perf_logger.py Outdated
Comment thread bionemo-recipes/recipes/codonfm_native_te/scheduler.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py
Comment thread bionemo-recipes/recipes/codonfm_native_te/train_fsdp2.py Outdated
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Comments resolved. Auto-approval is disabled; enable reviews.request_changes_workflow to approve automatically.

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

/ok to test c05e3a2

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

/ok to test 8dbcb98

@jomitchellnv jomitchellnv force-pushed the jm/codonfm-native-te branch 3 times, most recently from 02a1501 to 8c00221 Compare March 25, 2026 03:32
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread bionemo-recipes/models/llama3/tests/common/fixtures.py
Comment thread bionemo-recipes/models/llama3/tests/common/test_modeling_common.py
Comment thread bionemo-recipes/models/codonfm/modeling_codonfm_te.py
Comment thread bionemo-recipes/models/esm2/tests/common/test_modeling_common.py Outdated
Comment thread bionemo-recipes/models/esm2/tests/common/test_modeling_common.py Outdated
Comment thread bionemo-recipes/models/codonfm/tests/golden_state_dict.safetensors
@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

/ok to test ef905b5

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve

@jomitchellnv
Copy link
Copy Markdown
Collaborator Author

/ok to test bc82beb

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@jomitchellnv jomitchellnv force-pushed the jm/codonfm-native-te branch from bc82beb to bdb5cf2 Compare March 25, 2026 22:27
Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
@jomitchellnv jomitchellnv force-pushed the jm/codonfm-native-te branch from bdb5cf2 to 93cdb62 Compare March 25, 2026 22:27
@jomitchellnv jomitchellnv enabled auto-merge March 25, 2026 22:27
@jomitchellnv jomitchellnv added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit 1942554 Mar 25, 2026
23 checks passed
@jomitchellnv jomitchellnv deleted the jm/codonfm-native-te branch March 25, 2026 22:58
pstjohn added a commit to pstjohn/bionemo-framework that referenced this pull request Mar 31, 2026
The models/codonfm nightly CI job started failing after transformers
5.4.0 was released on March 27, 2026. The PR CI (NVIDIA#1531) passed on
March 25 when the latest version was 5.3.0. The 5.4.0 release
introduced significant breaking changes including dataclass-based
PretrainedConfig and weight tying behavior modifications.

Pin to <5.4.0 to restore the working configuration. A follow-up
should investigate compatibility with transformers>=5.4.0.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants