Skip to content

Fixes JIT and ONNX export of image-only CNN policies#6159

Open
0xadvait wants to merge 1 commit into
isaac-sim:developfrom
0xadvait:0xadvait/fix-cnn-image-only-export
Open

Fixes JIT and ONNX export of image-only CNN policies#6159
0xadvait wants to merge 1 commit into
isaac-sim:developfrom
0xadvait:0xadvait/fix-cnn-image-only-export

Conversation

@0xadvait

Copy link
Copy Markdown

Description

The CNNModel in isaaclab_rl.rsl_rl.models overrides get_latent to support image-only observations during training (e.g. the Isaac-Cartpole-RGB-* camera tasks, whose only observation group is the camera image). The export path was, however, left on the upstream wrappers: as_jit() / as_onnx() from rsl_rl always declare a 1D observation input. For image-only policies, the exported TorchScript module must be called with a zero-width placeholder tensor, and the exported ONNX graph declares a mandatory input obs with shape [1, 0]:

inputs: [('obs', [1, 0]), ('policy', [1, 3, 100, 100])]

This is awkward to deploy (every consumer has to construct and feed an empty tensor) and zero-width inputs are rejected by some downstream toolchains.

Fix: override as_jit() / as_onnx() in the IsaacLab CNNModel to return image-only export wrappers when no 1D observation groups are active. The exported models then only take the 2D observation groups as inputs:

inputs: [('policy', [1, 3, 100, 100])]

Models with mixed 1D + 2D observation groups keep the upstream export interface unchanged.

Verification (CPU, no Isaac Sim required):

  • New test module source/isaaclab_rl/test/test_rsl_rl_models.py covering: image-only forward, image-only JIT export (script -> save -> reload -> output parity vs the original model), image-only ONNX export (graph declares only the image input + onnxruntime output parity), and mixed-obs JIT/ONNX parity as regression guards. The two image-only export tests fail without the fix and pass with it; the mixed-obs tests pass before and after.
  • End-to-end exercise of the full OnPolicyRunner.export_policy_to_jit / export_policy_to_onnx call pattern (same torch.onnx.export arguments) across MLP, mixed CNN, image-only CNN and GRU policies with rsl-rl-lib==5.0.1 (the pinned version): all exports succeed with JIT parity exact and ONNX parity within 1.5e-7.

Related to #4592

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

The CNNModel subclass in isaaclab_rl supports image-only observations
during training by overriding get_latent, but the export wrappers
inherited from rsl_rl always declare a 1D observation input. For
image-only policies (e.g. the cartpole camera tasks), the exported
TorchScript and ONNX graphs therefore require feeding a zero-width
placeholder tensor, which is awkward to deploy and rejected by some
downstream toolchains.

Override as_jit and as_onnx to return image-only export wrappers when
no 1D observation groups are active. The exported models now only
take the 2D observation groups as inputs. Models with mixed 1D and 2D
observations keep the upstream export interface.

Verified on CPU by exporting and reloading both variants and checking
output parity against the original model (torch.jit and onnxruntime).

Related to isaac-sim#4592

Signed-off-by: Advait Jayant <advait@vannalabs.ai>
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes JIT and ONNX export of image-only CNN policies by overriding as_jit() and as_onnx() in the Isaac Lab CNNModel subclass. When no 1D observation groups are present, the new wrappers (_TorchImageOnlyCNNModel and _OnnxImageOnlyCNNModel) expose only the 2D (image) inputs, eliminating the zero-width obs placeholder tensor that previously broke some downstream toolchains.

  • Adds as_jit/as_onnx overrides in CNNModel that branch on self.obs_groups; models with mixed 1D+2D observations fall through to the upstream implementation unchanged.
  • Introduces two private export wrapper classes that deep-copy the CNN encoders, MLP, and deterministic output head, then expose a clean image-only forward signature for TorchScript and ONNX tracing respectively.
  • New test file covers image-only forward, JIT round-trip, ONNX graph input verification, and mixed-obs regression parity.

Confidence Score: 4/5

Safe to merge — the change is well-scoped, mixed-obs models fall through to the unmodified upstream path, and the new export wrappers are tested for both output parity and graph structure.

The fix correctly targets the image-only code path and leaves the mixed-obs path untouched. The only findings are a dead verbose field and duplicated initialization code across the two wrapper classes — neither affects correctness.

source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py — the two private wrapper classes share near-identical init/forward logic and carry a dead verbose attribute worth cleaning up before the code grows further.

Important Files Changed

Filename Overview
source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py Adds as_jit/as_onnx overrides and two private export wrappers to remove zero-width obs inputs from image-only CNN policies; logic is correct but contains a dead verbose attribute and duplicated initialization code.
source/isaaclab_rl/test/test_rsl_rl_models.py New test module covering image-only and mixed-obs JIT/ONNX export parity; guards skip gracefully when optional deps are absent.
source/isaaclab_rl/changelog.d/0xadvait-fix-cnn-image-only-export.rst Changelog fragment accurately describes the fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CNNModel.as_jit() / as_onnx()"] --> B{obs_groups empty?}
    B -- "Yes (image-only)" --> C["Return _TorchImageOnlyCNNModel / _OnnxImageOnlyCNNModel"]
    B -- "No (mixed / MLP-only)" --> D["Return upstream rsl_rl wrapper (unchanged)"]

    C --> E["forward(*obs_2d) — image tensors only"]
    E --> F["ModuleList CNNs"]
    F --> G["torch.cat latents"]
    G --> H["MLP"]
    H --> I["deterministic_output"]
    I --> J["actions tensor"]

    D --> K["forward(obs, obs_2d) — obs may be zero-width"]
    K --> L["Upstream rsl_rl CNNExportModel"]
Loading

Reviews (1): Last reviewed commit: "Fix JIT and ONNX export of image-only CN..." | Re-trigger Greptile


def __init__(self, model: CNNModel, verbose: bool):
super().__init__()
self.verbose = verbose

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead verbose attribute

self.verbose is assigned in __init__ but is never read anywhere in _OnnxImageOnlyCNNModel. The upstream rsl_rl ONNX wrapper presumably prints tracing info when verbose=True, but this class omits that behavior without surfacing the omission. If verbosity is intentionally dropped here, the field should not be stored; if it should be forwarded (e.g. to torch.onnx.export), the call site in as_onnx has the parameter ready.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept deliberately for structural parity with the upstream wrappers: rsl_rl's _OnnxCNNModel stores self.verbose the same way without reading it in forward (the runner passes verbose to torch.onnx.export separately). Keeping the same shape means these classes stay directly diff-able against the upstream ones they specialize, which also makes them easy to upstream into rsl_rl later.

Comment on lines +55 to +135
class _TorchImageOnlyCNNModel(nn.Module):
"""Exportable image-only CNN model for JIT.

Unlike ``rsl_rl``'s exportable CNN model, the forward pass only takes the 2D observation
groups as input, without a placeholder for the (empty) 1D observations.
"""

def __init__(self, model: CNNModel):
super().__init__()
# Convert ModuleDict to ModuleList for ordered iteration
self.cnns = nn.ModuleList([copy.deepcopy(model.cnns[g]) for g in model.obs_groups_2d])
self.mlp = copy.deepcopy(model.mlp)
if model.distribution is not None:
self.deterministic_output = model.distribution.as_deterministic_output_module()
else:
self.deterministic_output = nn.Identity()

def forward(self, obs_2d: list[torch.Tensor]) -> torch.Tensor:
"""Run deterministic inference from the 2D observation groups."""
latent_cnn_list = []
for i, cnn in enumerate(self.cnns): # We assume obs_2d list matches the order of obs_groups_2d
latent_cnn_list.append(cnn(obs_2d[i]))
latent = torch.cat(latent_cnn_list, dim=-1)
out = self.mlp(latent)
return self.deterministic_output(out)

@torch.jit.export
def reset(self) -> None:
"""Reset recurrent export state (no-op for CNN exports)."""
pass


class _OnnxImageOnlyCNNModel(nn.Module):
"""Exportable image-only CNN model for ONNX.

Unlike ``rsl_rl``'s exportable CNN model, the forward pass only takes the 2D observation
groups as input, without a placeholder for the (empty) 1D observations.
"""

def __init__(self, model: CNNModel, verbose: bool):
super().__init__()
self.verbose = verbose
# Convert ModuleDict to ModuleList for ordered iteration
self.cnns = nn.ModuleList([copy.deepcopy(model.cnns[g]) for g in model.obs_groups_2d])
self.mlp = copy.deepcopy(model.mlp)
if model.distribution is not None:
self.deterministic_output = model.distribution.as_deterministic_output_module()
else:
self.deterministic_output = nn.Identity()

self.obs_groups_2d = model.obs_groups_2d
self.obs_dims_2d = model.obs_dims_2d
self.obs_channels_2d = model.obs_channels_2d

def forward(self, *obs_2d: torch.Tensor) -> torch.Tensor:
"""Run deterministic inference for ONNX export."""
latent_cnn_list = []
for i, cnn in enumerate(self.cnns):
latent_cnn_list.append(cnn(obs_2d[i]))
latent = torch.cat(latent_cnn_list, dim=-1)
out = self.mlp(latent)
return self.deterministic_output(out)

def get_dummy_inputs(self) -> tuple[torch.Tensor, ...]:
"""Return representative dummy inputs for ONNX tracing."""
dummy_2d = []
for i in range(len(self.obs_groups_2d)):
h, w = self.obs_dims_2d[i]
c = self.obs_channels_2d[i]
dummy_2d.append(torch.zeros(1, c, h, w))
return tuple(dummy_2d)

@property
def input_names(self) -> list[str]:
"""Return ONNX input tensor names."""
return list(self.obs_groups_2d)

@property
def output_names(self) -> list[str]:
"""Return ONNX output tensor names."""
return ["actions"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated initialization logic across wrapper classes

_TorchImageOnlyCNNModel and _OnnxImageOnlyCNNModel share identical __init__ logic: deep-copying cnns, mlp, and computing deterministic_output. The forward body is also nearly identical. A common base class (or a single wrapper that can serve both export paths after construction) would eliminate the redundancy and ensure future changes to one path are automatically reflected in the other.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same rationale as the other thread: the duplication mirrors rsl_rl's own _TorchCNNModel / _OnnxCNNModel, which duplicate the identical init/forward logic between themselves. The two classes can't share a forward (TorchScript can't script *args, while the ONNX path wants positional tensor inputs for named graph inputs), and I'd rather keep these recognizable as the image-only counterparts of the upstream pair than introduce a base-class hierarchy upstream doesn't have. If maintainers prefer a shared mixin I'm happy to refactor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a solid rationale — the *args vs list[Tensor] distinction is a real TorchScript constraint, and keeping the image-only wrappers structurally parallel to the upstream pair makes the intent immediately obvious to anyone familiar with rsl_rl. No objection to leaving it as-is. Thanks for the context!

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant