OMPE-95818: Fix bimodal Dexsuite-Lift success rate by jmart-nv · Pull Request #6155 · isaac-sim/IsaacLab

jmart-nv · 2026-06-11T21:47:33Z

Description

Kuka-Allegro-Lift converged to either ~0.0 or ~0.8 success depending on seed. The reward landscape had gradient gaps: a narrow reach kernel and farmable constant grip reward that made holding the cube in place a plateau. Reshaped the reward so progress toward the goal is the dominant, unfarmable signal:

Add delivery_progress (weight 6): contact-gated reward for net progress of the object toward the commanded goal; unfarmable (clamped, latched).
Fix the approach desert on the fingers-to-object reach reward: widen its tanh kernel (std 0.4 to 0.8) and raise its pre-grasp reach credit via a new no_contact_scale param on object_ee_distance (default 0.1, behavior unchanged elsewhere; 0.3 here).
Widen the object-to-goal position-tracking kernel (std 0.2 to 0.5) to broaden the gradient near the goal.
Decay grip rewards after first grasp (floor 0.3 over 200 steps) so holding without delivering is no longer farmable.
Add pure-logic CPU unit tests for the decay and delivery_progress terms (latch, decay-to-floor, no-regrip-reset, contact gating, d0 capture/refresh).

Type of change

Bug fix (non-breaking change which fixes an issue)

Screenshots

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Kuka-Allegro-Lift converged to either ~0.0 or ~0.8 success depending on seed. The reward landscape had gradient gaps: a narrow reach kernel and farmable constant grip reward that made holding the cube in place a plateau. Reshaped the reward so progress toward the goal is the dominant, unfarmable signal: - Add delivery_progress (weight 6): contact-gated reward for net progress of the object toward the commanded goal; unfarmable (clamped, latched). - Fix the approach desert on the fingers-to-object reach reward: widen its tanh kernel (std 0.4→0.8) and raise its pre-grasp reach credit via a new no_contact_scale param on object_ee_distance (default 0.1, behaviour unchanged elsewhere; 0.3 here). - Widen the object-to-goal position-tracking kernel (std 0.2→0.5) to broaden the gradient near the goal. - Decay grip rewards after first grasp (floor 0.3 over 200 steps) so holding without delivering is no longer farmable. - Add pure-logic CPU unit tests for the decay and delivery_progress terms (latch, decay-to-floor, no-regrip-reset, contact gating, d0 capture/refresh).

greptile-apps · 2026-06-11T21:59:10Z

Greptile Summary

This PR reshapes the Dexsuite Kuka-Allegro-Lift reward to eliminate bimodal convergence (seeds collapsing to ~0 or ~0.8 success). The core insight is that constant grip rewards created an unfarmable plateau; the fix introduces a dominant contact-gated delivery-progress signal and time-since-grasp decay on all grip rewards.

Adds delivery_progress (weight 6): linear, clamped, contact-gated progress fraction (d0 − d) / d0; d0 is captured at episode start or goal resample and latched, so holding the object in place earns nothing. _GraspAgeDecay subclasses fade grip rewards from 1.0 → 0.3 over 200 steps after first grasp, preventing the hold-in-place plateau.
Widens reach and position-tracking tanh kernels (std 0.4→0.8 and 0.2→0.5) and raises the pre-grasp reach gate via a new no_contact_scale parameter on object_ee_distance.
Adds CPU-only unit tests verifying decay latch, floor clamp, no-regrip-reset, contact gating, and d0 capture/refresh behavior.

Confidence Score: 4/5

Safe to merge; the new reward terms are mathematically correct and well-tested, with changes confined to the Dexsuite Kuka-Allegro reward stack.

The decay latch, floor clamp, no-regrip-reset, and d0 capture/refresh logic are all implemented correctly and covered by unit tests. The one open question is that d0 is fixed at episode start before any contact, so if the object drifts away from the goal during approach the dominant delivery_progress reward stays dark until the object re-crosses the original threshold. Configuration wiring follows established patterns and the init.pyi stub is kept in sync.

rewards.py — specifically the d0 capture logic in delivery_progress.call and the double contacts() call in object_ee_distance_decay, both of which are correct but merit a second read.

Important Files Changed

Filename	Overview
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/rewards.py	Adds delivery_progress, _GraspAgeDecay, and three decay-subclass reward terms; decay latch/floor logic and d0 capture are sound, but d0 is fixed at episode start before first contact, which can create a zero-progress window if the object is displaced during approach.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/dexsuite_env_cfg.py	Widens tanh kernels (std 0.4 to 0.8, 0.2 to 0.5), adds delivery_progress term with weight 6 to the base RewardsCfg; required params follow the existing deferred-params-via-post_init pattern and are correctly wired by the Kuka-Allegro subclass.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/config/kuka_allegro/dexsuite_kuka_allegro_env_cfg.py	Switches grip rewards to decay variants; adds thumb/finger params for delivery_progress in post_init; contact_count_decay gains the required grasp-latch detection args.
source/isaaclab_tasks/test/core/test_dexsuite_lift_rewards.py	Comprehensive CPU-only unit tests for decay and progress logic; all use new to bypass ManagerTermBase init, missing coverage of actual initialization code paths.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/init.pyi	Adds four new reward symbols to all and the from-import block; matches the implementations exactly.

Comments Outside Diff (1)

source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/rewards.py, line 339-365 (link)

Double contacts() read with different thresholds in object_ee_distance_decay

object_ee_distance (called at line 353) already invokes contacts(env, contact_threshold=1.0, ...) internally to compute contact_bonus. Then contacts(env, grasp_threshold=0.1, ...) is called again directly for grasped. The double read is correct since the two calls serve distinct roles (bonus scale vs. decay latch), but it is non-obvious. A brief comment clarifying this would prevent future maintainers from treating it as redundant.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "OMPE-95818: Fix bimodal Dexsuite-Lift su..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T21:59:14Z

+        # (Re)capture d0 on reset or goal resample. Updated unconditionally (vectorised) to avoid a
+        # per-step host-device sync; unflagged envs keep their stored d0/goal via the masked where.
+        capture = self._need_capture | (torch.linalg.norm(des_pos_w - self._goal_w, dim=1) > 1e-4)
+        self._d0 = torch.where(capture, d.clamp(min=1e-3), self._d0)
+        self._goal_w = torch.where(capture.unsqueeze(-1), des_pos_w, self._goal_w)
+        self._need_capture &= ~capture
+        progress = ((self._d0 - d) / self._d0).clamp(0.0, 1.0)
+        return progress * contacts(env, contact_threshold, thumb_name, finger_names).float()


d0 is captured contact-free, so a pre-grasp object bump can permanently raise the progress baseline

d0 is (re)captured unconditionally on reset or goal resample before any contact occurs. If during the approach phase the robot accidentally nudges the object away from the goal, d exceeds d0, causing progress = clamp((d0-d)/d0, 0, 1) = 0 until the object is returned past the original d0 threshold. The dominant delivery_progress (weight 6) stays dark until then, potentially creating a reward dead zone. Consider capturing d0 only at first contact, or refreshing it each step while contacts == False. Was the decision to fix d0 at episode start (rather than at first-contact) intentional, knowing that pre-grasp object displacement could create a zero-reward zone before d < d0?

greptile-apps · 2026-06-11T21:59:16Z

+    term = rewards.good_finger_contact_decay.__new__(rewards.good_finger_contact_decay)
+    term._t_grasp = torch.full((num_envs,), -1, dtype=torch.long)
+    return term


__new__ bypass skips ManagerTermBase.__init__ and future subclass init logic

All decay-term tests construct instances via ClassName.__new__(ClassName) and manually populate _t_grasp / _d0 / etc., sidestepping ManagerTermBase.__init__. If initialization logic is ever added to _GraspAgeDecay.__init__ or delivery_progress.__init__, the tests will silently not exercise it. Adding a smoke-test that calls the real __init__ with a minimal stub config and env would make the suite more future-proof.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 11, 2026

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMPE-95818: Fix bimodal Dexsuite-Lift success rate#6155

OMPE-95818: Fix bimodal Dexsuite-Lift success rate#6155
jmart-nv wants to merge 1 commit into
isaac-sim:developfrom
jmart-nv:jmart/dex-lift-bimodal

jmart-nv commented Jun 11, 2026

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmart-nv commented Jun 11, 2026

Description

Type of change

Screenshots

Checklist

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading