Skip to content

Expose MMFF forcefield API to python#119

Merged
scal444 merged 5 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:new_pr2_chp
Mar 30, 2026
Merged

Expose MMFF forcefield API to python#119
scal444 merged 5 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:new_pr2_chp

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented Mar 25, 2026

Add Python bindings for MMFF batched forcefield energy and gradient
computation via Boost.Python. Includes property bridge from RDKit
MMFFMolProperties to internal MMFFProperties transport.

- NativeMMFFBatchedForcefield C++ binding with compute_energy/compute_gradients
- MMFFBatchedForcefield Python wrapper with per-molecule properties, conf_id,
  nonBondedThreshold, and ignoreInterfragInteractions support
- _mmff_bridge module for RDKit MMFFMolProperties to native conversion
- MMFFComputeEnergies/MMFFComputeGradients convenience functions
- Tests for single-mol, batched, property variants, per-molecule properties,
  and conformer id selection vs RDKit reference

scal444 added 3 commits March 25, 2026 14:50
Add Python bindings for MMFF batched forcefield energy and gradient
computation via Boost.Python. Includes property bridge from RDKit
MMFFMolProperties to internal MMFFProperties transport.

- NativeMMFFBatchedForcefield C++ binding with compute_energy/compute_gradients
- MMFFBatchedForcefield Python wrapper with per-molecule properties, conf_id,
  nonBondedThreshold, and ignoreInterfragInteractions support
- _mmff_bridge module for RDKit MMFFMolProperties to native conversion
- MMFFComputeEnergies/MMFFComputeGradients convenience functions
- Tests for single-mol, batched, property variants, per-molecule properties,
  and conformer id selection vs RDKit reference

Made-with: Cursor
@scal444 scal444 requested a review from evasnow1992 March 25, 2026 20:15
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR exposes nvMolKit's MMFF batched forcefield to Python via a new Boost.Python extension (_batchedForcefield), a bridge module (_mmff_bridge) that converts RDKit MMFFMolProperties into nvMolKit's internal transport, a Python wrapper (MMFFBatchedForcefield) that handles scalar/per-molecule normalisation of all parameters, and a thorough test suite validating against RDKit reference values.

Key highlights:

  • The previous id()-based memory-leak and phantom-reuse issue in the bridge has been addressed with weakref.WeakKeyDictionary.
  • The C++ layer uses an async device-vector copy pattern with correct cudaStreamSynchronize ordering and proper CUDA error propagation.
  • The extractMMFFPropertiesList fallback-to-defaults path in C++ is unreachable from the Python wrapper (which always sends exactly numMols entries), so it is purely defensive.
  • The PR description lists MMFFComputeEnergies/MMFFComputeGradients convenience functions as delivered, but these do not appear in the changed files — the description should be updated or the functions added.
  • The extract_mmff_settings error path does not identify the missing key by name, making debugging in exotic RDKit builds unnecessarily difficult.
  • num_molecules is exposed as a public attribute alongside __len__, creating minor API duplication.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style and documentation issues with no impact on correctness.

No P0 or P1 issues were found. The prior concerns about memory leaks and id()-reuse have been addressed with WeakKeyDictionary. The three remaining comments are all P2 (missing convenience functions from PR description, a non-specific error message, and a redundant public attribute), none of which affect runtime correctness.

No files require special attention.

Important Files Changed

Filename Overview
nvmolkit/batchedForcefield.cpp New C++ Boost.Python extension exposing NativeMMFFBatchedForcefield with compute_energy/compute_gradients; CUDA error checking, async device-vector copy pattern, and gradient splitting all look correct.
nvmolkit/batchedForcefield.py Python wrapper normalising scalar/per-molecule properties, conf_ids, thresholds, and lazy-building the native forcefield; minor issues: missing MMFFComputeEnergies/Gradients convenience functions and a redundant num_molecules attribute alongside len.
nvmolkit/_mmff_bridge.py New bridge module converting RDKit MMFFMolProperties to nvMolKit's internal transport; uses WeakKeyDictionary to avoid the prior id()-based memory leak and uses fallback getters for externally-created properties objects; error message when a getter is missing could be more specific.
nvmolkit/tests/test_batched_forcefield.py Comprehensive test suite covering single-mol, batched, property variants, per-molecule properties, and conformer id selection against RDKit reference values; tolerances and helper utilities look appropriate.
nvmolkit/CMakeLists.txt Adds _batchedForcefield MODULE target following the same pattern as existing targets; link libraries and include directories look consistent with the rest of the build file.

Reviews (2): Last reviewed commit: "Address review comment" | Re-trigger Greptile

Comment thread nvmolkit/_mmff_bridge.py Outdated
Comment thread nvmolkit/batchedForcefield.py
Copy link
Copy Markdown
Collaborator

@evasnow1992 evasnow1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for exposing batch energy and gradients computation APIs to python. Changes look good to me.

@scal444 scal444 merged commit d59b033 into NVIDIA-Digital-Bio:main Mar 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants