Fixes JIT and ONNX export of image-only CNN policies by 0xadvait · Pull Request #6159 · isaac-sim/IsaacLab

0xadvait · 2026-06-11T23:53:41Z

Description

The CNNModel in isaaclab_rl.rsl_rl.models overrides get_latent to support image-only observations during training (e.g. the Isaac-Cartpole-RGB-* camera tasks, whose only observation group is the camera image). The export path was, however, left on the upstream wrappers: as_jit() / as_onnx() from rsl_rl always declare a 1D observation input. For image-only policies, the exported TorchScript module must be called with a zero-width placeholder tensor, and the exported ONNX graph declares a mandatory input obs with shape [1, 0]:

inputs: [('obs', [1, 0]), ('policy', [1, 3, 100, 100])]

This is awkward to deploy (every consumer has to construct and feed an empty tensor) and zero-width inputs are rejected by some downstream toolchains.

Fix: override as_jit() / as_onnx() in the IsaacLab CNNModel to return image-only export wrappers when no 1D observation groups are active. The exported models then only take the 2D observation groups as inputs:

inputs: [('policy', [1, 3, 100, 100])]

Models with mixed 1D + 2D observation groups keep the upstream export interface unchanged.

Verification (CPU, no Isaac Sim required):

New test module source/isaaclab_rl/test/test_rsl_rl_models.py covering: image-only forward, image-only JIT export (script -> save -> reload -> output parity vs the original model), image-only ONNX export (graph declares only the image input + onnxruntime output parity), and mixed-obs JIT/ONNX parity as regression guards. The two image-only export tests fail without the fix and pass with it; the mixed-obs tests pass before and after.
End-to-end exercise of the full OnPolicyRunner.export_policy_to_jit / export_policy_to_onnx call pattern (same torch.onnx.export arguments) across MLP, mixed CNN, image-only CNN and GRU policies with rsl-rl-lib==5.0.1 (the pinned version): all exports succeed with JIT parity exact and ONNX parity within 1.5e-7.

Related to #4592

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

The CNNModel subclass in isaaclab_rl supports image-only observations during training by overriding get_latent, but the export wrappers inherited from rsl_rl always declare a 1D observation input. For image-only policies (e.g. the cartpole camera tasks), the exported TorchScript and ONNX graphs therefore require feeding a zero-width placeholder tensor, which is awkward to deploy and rejected by some downstream toolchains. Override as_jit and as_onnx to return image-only export wrappers when no 1D observation groups are active. The exported models now only take the 2D observation groups as inputs. Models with mixed 1D and 2D observations keep the upstream export interface. Verified on CPU by exporting and reloading both variants and checking output parity against the original model (torch.jit and onnxruntime). Related to isaac-sim#4592 Signed-off-by: Advait Jayant <advait@vannalabs.ai>

greptile-apps · 2026-06-11T23:57:10Z

Greptile Summary

This PR fixes JIT and ONNX export of image-only CNN policies by overriding as_jit() and as_onnx() in the Isaac Lab CNNModel subclass. When no 1D observation groups are present, the new wrappers (_TorchImageOnlyCNNModel and _OnnxImageOnlyCNNModel) expose only the 2D (image) inputs, eliminating the zero-width obs placeholder tensor that previously broke some downstream toolchains.

Adds as_jit/as_onnx overrides in CNNModel that branch on self.obs_groups; models with mixed 1D+2D observations fall through to the upstream implementation unchanged.
Introduces two private export wrapper classes that deep-copy the CNN encoders, MLP, and deterministic output head, then expose a clean image-only forward signature for TorchScript and ONNX tracing respectively.
New test file covers image-only forward, JIT round-trip, ONNX graph input verification, and mixed-obs regression parity.

Confidence Score: 4/5

Safe to merge — the change is well-scoped, mixed-obs models fall through to the unmodified upstream path, and the new export wrappers are tested for both output parity and graph structure.

The fix correctly targets the image-only code path and leaves the mixed-obs path untouched. The only findings are a dead verbose field and duplicated initialization code across the two wrapper classes — neither affects correctness.

source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py — the two private wrapper classes share near-identical init/forward logic and carry a dead verbose attribute worth cleaning up before the code grows further.

Important Files Changed

Filename	Overview
source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py	Adds `as_jit`/`as_onnx` overrides and two private export wrappers to remove zero-width `obs` inputs from image-only CNN policies; logic is correct but contains a dead `verbose` attribute and duplicated initialization code.
source/isaaclab_rl/test/test_rsl_rl_models.py	New test module covering image-only and mixed-obs JIT/ONNX export parity; guards skip gracefully when optional deps are absent.
source/isaaclab_rl/changelog.d/0xadvait-fix-cnn-image-only-export.rst	Changelog fragment accurately describes the fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CNNModel.as_jit() / as_onnx()"] --> B{obs_groups empty?}
    B -- "Yes (image-only)" --> C["Return _TorchImageOnlyCNNModel / _OnnxImageOnlyCNNModel"]
    B -- "No (mixed / MLP-only)" --> D["Return upstream rsl_rl wrapper (unchanged)"]

    C --> E["forward(*obs_2d) — image tensors only"]
    E --> F["ModuleList CNNs"]
    F --> G["torch.cat latents"]
    G --> H["MLP"]
    H --> I["deterministic_output"]
    I --> J["actions tensor"]

    D --> K["forward(obs, obs_2d) — obs may be zero-width"]
    K --> L["Upstream rsl_rl CNNExportModel"]

_{Reviews (1): Last reviewed commit: "Fix JIT and ONNX export of image-only CN..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T23:57:14Z

+
+    def __init__(self, model: CNNModel, verbose: bool):
+        super().__init__()
+        self.verbose = verbose


Dead verbose attribute

self.verbose is assigned in __init__ but is never read anywhere in _OnnxImageOnlyCNNModel. The upstream rsl_rl ONNX wrapper presumably prints tracing info when verbose=True, but this class omits that behavior without surfacing the omission. If verbosity is intentionally dropped here, the field should not be stored; if it should be forwarded (e.g. to torch.onnx.export), the call site in as_onnx has the parameter ready.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Kept deliberately for structural parity with the upstream wrappers: rsl_rl's _OnnxCNNModel stores self.verbose the same way without reading it in forward (the runner passes verbose to torch.onnx.export separately). Keeping the same shape means these classes stay directly diff-able against the upstream ones they specialize, which also makes them easy to upstream into rsl_rl later.

greptile-apps · 2026-06-11T23:57:15Z

+class _TorchImageOnlyCNNModel(nn.Module):
+    """Exportable image-only CNN model for JIT.
+
+    Unlike ``rsl_rl``'s exportable CNN model, the forward pass only takes the 2D observation
+    groups as input, without a placeholder for the (empty) 1D observations.
+    """
+
+    def __init__(self, model: CNNModel):
+        super().__init__()
+        # Convert ModuleDict to ModuleList for ordered iteration
+        self.cnns = nn.ModuleList([copy.deepcopy(model.cnns[g]) for g in model.obs_groups_2d])
+        self.mlp = copy.deepcopy(model.mlp)
+        if model.distribution is not None:
+            self.deterministic_output = model.distribution.as_deterministic_output_module()
+        else:
+            self.deterministic_output = nn.Identity()
+
+    def forward(self, obs_2d: list[torch.Tensor]) -> torch.Tensor:
+        """Run deterministic inference from the 2D observation groups."""
+        latent_cnn_list = []
+        for i, cnn in enumerate(self.cnns):  # We assume obs_2d list matches the order of obs_groups_2d
+            latent_cnn_list.append(cnn(obs_2d[i]))
+        latent = torch.cat(latent_cnn_list, dim=-1)
+        out = self.mlp(latent)
+        return self.deterministic_output(out)
+
+    @torch.jit.export
+    def reset(self) -> None:
+        """Reset recurrent export state (no-op for CNN exports)."""
+        pass
+
+
+class _OnnxImageOnlyCNNModel(nn.Module):
+    """Exportable image-only CNN model for ONNX.
+
+    Unlike ``rsl_rl``'s exportable CNN model, the forward pass only takes the 2D observation
+    groups as input, without a placeholder for the (empty) 1D observations.
+    """
+
+    def __init__(self, model: CNNModel, verbose: bool):
+        super().__init__()
+        self.verbose = verbose
+        # Convert ModuleDict to ModuleList for ordered iteration
+        self.cnns = nn.ModuleList([copy.deepcopy(model.cnns[g]) for g in model.obs_groups_2d])
+        self.mlp = copy.deepcopy(model.mlp)
+        if model.distribution is not None:
+            self.deterministic_output = model.distribution.as_deterministic_output_module()
+        else:
+            self.deterministic_output = nn.Identity()
+
+        self.obs_groups_2d = model.obs_groups_2d
+        self.obs_dims_2d = model.obs_dims_2d
+        self.obs_channels_2d = model.obs_channels_2d
+
+    def forward(self, *obs_2d: torch.Tensor) -> torch.Tensor:
+        """Run deterministic inference for ONNX export."""
+        latent_cnn_list = []
+        for i, cnn in enumerate(self.cnns):
+            latent_cnn_list.append(cnn(obs_2d[i]))
+        latent = torch.cat(latent_cnn_list, dim=-1)
+        out = self.mlp(latent)
+        return self.deterministic_output(out)
+
+    def get_dummy_inputs(self) -> tuple[torch.Tensor, ...]:
+        """Return representative dummy inputs for ONNX tracing."""
+        dummy_2d = []
+        for i in range(len(self.obs_groups_2d)):
+            h, w = self.obs_dims_2d[i]
+            c = self.obs_channels_2d[i]
+            dummy_2d.append(torch.zeros(1, c, h, w))
+        return tuple(dummy_2d)
+
+    @property
+    def input_names(self) -> list[str]:
+        """Return ONNX input tensor names."""
+        return list(self.obs_groups_2d)
+
+    @property
+    def output_names(self) -> list[str]:
+        """Return ONNX output tensor names."""
+        return ["actions"]


Duplicated initialization logic across wrapper classes

_TorchImageOnlyCNNModel and _OnnxImageOnlyCNNModel share identical __init__ logic: deep-copying cnns, mlp, and computing deterministic_output. The forward body is also nearly identical. A common base class (or a single wrapper that can serve both export paths after construction) would eliminate the redundancy and ensure future changes to one path are automatically reflected in the other.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Same rationale as the other thread: the duplication mirrors rsl_rl's own _TorchCNNModel / _OnnxCNNModel, which duplicate the identical init/forward logic between themselves. The two classes can't share a forward (TorchScript can't script *args, while the ONNX path wants positional tensor inputs for named graph inputs), and I'd rather keep these recognizable as the image-only counterparts of the upstream pair than introduce a base-class hierarchy upstream doesn't have. If maintainers prefer a shared mixin I'm happy to refactor.

That's a solid rationale — the *args vs list[Tensor] distinction is a real TorchScript constraint, and keeping the image-only wrappers structurally parallel to the upstream pair makes the intent immediately obvious to anyone familiar with rsl_rl. No objection to leaving it as-is. Thanks for the context!

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

0xadvait requested review from ClemensSchwarke and Mayankm96 as code owners June 11, 2026 23:53

github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Jun 11, 2026

0xadvait mentioned this pull request Jun 11, 2026

[Bug Report] ONNX Export Fails for ActorCriticCNN with Mixed 1D + CNN Observations #4592

Open

4 tasks

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes JIT and ONNX export of image-only CNN policies#6159

Fixes JIT and ONNX export of image-only CNN policies#6159
0xadvait wants to merge 1 commit into
isaac-sim:developfrom
0xadvait:0xadvait/fix-cnn-image-only-export

0xadvait commented Jun 11, 2026

Uh oh!

greptile-apps Bot commented Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

0xadvait Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

0xadvait Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xadvait commented Jun 11, 2026

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented Jun 11, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

0xadvait Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

0xadvait Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant