Skip to content

OMPE-95818: Fix bimodal Dexsuite-Lift success rate#6155

Open
jmart-nv wants to merge 1 commit into
isaac-sim:developfrom
jmart-nv:jmart/dex-lift-bimodal
Open

OMPE-95818: Fix bimodal Dexsuite-Lift success rate#6155
jmart-nv wants to merge 1 commit into
isaac-sim:developfrom
jmart-nv:jmart/dex-lift-bimodal

Conversation

@jmart-nv

Copy link
Copy Markdown

Description

Kuka-Allegro-Lift converged to either ~0.0 or ~0.8 success depending on seed. The reward landscape had gradient gaps: a narrow reach kernel and farmable constant grip reward that made holding the cube in place a plateau. Reshaped the reward so progress toward the goal is the dominant, unfarmable signal:

  • Add delivery_progress (weight 6): contact-gated reward for net progress of the object toward the commanded goal; unfarmable (clamped, latched).
  • Fix the approach desert on the fingers-to-object reach reward: widen its tanh kernel (std 0.4 to 0.8) and raise its pre-grasp reach credit via a new no_contact_scale param on object_ee_distance (default 0.1, behavior unchanged elsewhere; 0.3 here).
  • Widen the object-to-goal position-tracking kernel (std 0.2 to 0.5) to broaden the gradient near the goal.
  • Decay grip rewards after first grasp (floor 0.3 over 200 steps) so holding without delivering is no longer farmable.
  • Add pure-logic CPU unit tests for the decay and delivery_progress terms (latch, decay-to-floor, no-regrip-reset, contact gating, d0 capture/refresh).

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

reward_landscape_before_after

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Kuka-Allegro-Lift converged to either ~0.0 or ~0.8 success depending on seed. The reward landscape had gradient gaps: a narrow reach kernel and farmable constant grip reward that made holding the cube in place a plateau. Reshaped the reward so progress toward the goal is the dominant, unfarmable signal:

- Add delivery_progress (weight 6): contact-gated reward for net progress of the object toward the commanded goal; unfarmable (clamped, latched).
- Fix the approach desert on the fingers-to-object reach reward: widen its tanh kernel (std 0.4→0.8) and raise its pre-grasp reach credit via a new no_contact_scale param on object_ee_distance (default 0.1, behaviour unchanged elsewhere; 0.3 here).
- Widen the object-to-goal position-tracking kernel (std 0.2→0.5) to broaden the gradient near the goal.
- Decay grip rewards after first grasp (floor 0.3 over 200 steps) so holding without delivering is no longer farmable.
- Add pure-logic CPU unit tests for the decay and delivery_progress terms (latch, decay-to-floor, no-regrip-reset, contact gating, d0 capture/refresh).
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 11, 2026
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reshapes the Dexsuite Kuka-Allegro-Lift reward to eliminate bimodal convergence (seeds collapsing to ~0 or ~0.8 success). The core insight is that constant grip rewards created an unfarmable plateau; the fix introduces a dominant contact-gated delivery-progress signal and time-since-grasp decay on all grip rewards.

  • Adds delivery_progress (weight 6): linear, clamped, contact-gated progress fraction (d0 − d) / d0; d0 is captured at episode start or goal resample and latched, so holding the object in place earns nothing. _GraspAgeDecay subclasses fade grip rewards from 1.0 → 0.3 over 200 steps after first grasp, preventing the hold-in-place plateau.
  • Widens reach and position-tracking tanh kernels (std 0.4→0.8 and 0.2→0.5) and raises the pre-grasp reach gate via a new no_contact_scale parameter on object_ee_distance.
  • Adds CPU-only unit tests verifying decay latch, floor clamp, no-regrip-reset, contact gating, and d0 capture/refresh behavior.

Confidence Score: 4/5

Safe to merge; the new reward terms are mathematically correct and well-tested, with changes confined to the Dexsuite Kuka-Allegro reward stack.

The decay latch, floor clamp, no-regrip-reset, and d0 capture/refresh logic are all implemented correctly and covered by unit tests. The one open question is that d0 is fixed at episode start before any contact, so if the object drifts away from the goal during approach the dominant delivery_progress reward stays dark until the object re-crosses the original threshold. Configuration wiring follows established patterns and the init.pyi stub is kept in sync.

rewards.py — specifically the d0 capture logic in delivery_progress.call and the double contacts() call in object_ee_distance_decay, both of which are correct but merit a second read.

Important Files Changed

Filename Overview
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/rewards.py Adds delivery_progress, _GraspAgeDecay, and three decay-subclass reward terms; decay latch/floor logic and d0 capture are sound, but d0 is fixed at episode start before first contact, which can create a zero-progress window if the object is displaced during approach.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/dexsuite_env_cfg.py Widens tanh kernels (std 0.4 to 0.8, 0.2 to 0.5), adds delivery_progress term with weight 6 to the base RewardsCfg; required params follow the existing deferred-params-via-post_init pattern and are correctly wired by the Kuka-Allegro subclass.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/config/kuka_allegro/dexsuite_kuka_allegro_env_cfg.py Switches grip rewards to decay variants; adds thumb/finger params for delivery_progress in post_init; contact_count_decay gains the required grasp-latch detection args.
source/isaaclab_tasks/test/core/test_dexsuite_lift_rewards.py Comprehensive CPU-only unit tests for decay and progress logic; all use new to bypass ManagerTermBase init, missing coverage of actual initialization code paths.
source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/init.pyi Adds four new reward symbols to all and the from-import block; matches the implementations exactly.

Comments Outside Diff (1)

  1. source/isaaclab_tasks/isaaclab_tasks/core/dexsuite/mdp/rewards.py, line 339-365 (link)

    P2 Double contacts() read with different thresholds in object_ee_distance_decay

    object_ee_distance (called at line 353) already invokes contacts(env, contact_threshold=1.0, ...) internally to compute contact_bonus. Then contacts(env, grasp_threshold=0.1, ...) is called again directly for grasped. The double read is correct since the two calls serve distinct roles (bonus scale vs. decay latch), but it is non-obvious. A brief comment clarifying this would prevent future maintainers from treating it as redundant.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "OMPE-95818: Fix bimodal Dexsuite-Lift su..." | Re-trigger Greptile

Comment on lines +265 to +272
# (Re)capture d0 on reset or goal resample. Updated unconditionally (vectorised) to avoid a
# per-step host-device sync; unflagged envs keep their stored d0/goal via the masked where.
capture = self._need_capture | (torch.linalg.norm(des_pos_w - self._goal_w, dim=1) > 1e-4)
self._d0 = torch.where(capture, d.clamp(min=1e-3), self._d0)
self._goal_w = torch.where(capture.unsqueeze(-1), des_pos_w, self._goal_w)
self._need_capture &= ~capture
progress = ((self._d0 - d) / self._d0).clamp(0.0, 1.0)
return progress * contacts(env, contact_threshold, thumb_name, finger_names).float()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 d0 is captured contact-free, so a pre-grasp object bump can permanently raise the progress baseline

d0 is (re)captured unconditionally on reset or goal resample before any contact occurs. If during the approach phase the robot accidentally nudges the object away from the goal, d exceeds d0, causing progress = clamp((d0-d)/d0, 0, 1) = 0 until the object is returned past the original d0 threshold. The dominant delivery_progress (weight 6) stays dark until then, potentially creating a reward dead zone. Consider capturing d0 only at first contact, or refreshing it each step while contacts == False. Was the decision to fix d0 at episode start (rather than at first-contact) intentional, knowing that pre-grasp object displacement could create a zero-reward zone before d < d0?

Comment on lines +34 to +36
term = rewards.good_finger_contact_decay.__new__(rewards.good_finger_contact_decay)
term._t_grasp = torch.full((num_envs,), -1, dtype=torch.long)
return term

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 __new__ bypass skips ManagerTermBase.__init__ and future subclass init logic

All decay-term tests construct instances via ClassName.__new__(ClassName) and manually populate _t_grasp / _d0 / etc., sidestepping ManagerTermBase.__init__. If initialization logic is ever added to _GraspAgeDecay.__init__ or delivery_progress.__init__, the tests will silently not exercise it. Adding a smoke-test that calls the real __init__ with a minimal stub config and env would make the suite more future-proof.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant