Fix Evoformer compilation#7760
Conversation
50e371a to
643cac1
Compare
0fd060b to
1e71bdb
Compare
4ea96ed to
8c9ef4c
Compare
9c049d9 to
a9701c2
Compare
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
…ystems Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
a9701c2 to
f0c7b42
Compare
|
Hi @sdvillal By the way,
|
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
|
Thanks a lot for the quick review and merge @tohtana! I have fixed formatting (sorry about it, one should read the contributing guidelines before contributing...). I have not personally experienced the mismatch. We have been running on:
I could try to run the test a few times in this context and see if it happens for me, could that info be useful? In any case, I feel the extension is showing its age and it might require some love to these GEMMs and generally to make it worthwhile to use on Hopper and newer. |
`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., #5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
|
@sdvillal Yes, I encountered the issue with H100. The test doesn't throw an error with L40S on our CI. |
`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., deepspeedai#5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760
|
After this it doesn't work at all for me anymore: One issue without this PR is with e.g.
We want to use the same installation also for older GPUs and this failure is very unexpected as the order shouldn't matter. With this PR it doesn't compile anymore because the How did that work for you? I don't see how it could ever succeed. Proposed fix for that in #7862 |
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760 Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
|
Thanks for the report and the fix @Flamefire, and apologies that this broke compilation for manually defined paths - I believe testing this case likely slipped my attention as I was getting the PR ready for review and it is not being tested by the CIs? I will give a hand with the PR, I suggest we propose a test to avoid this happening again. Regarding the order issue, do you think it is also related to the changes in this PR? |
The test seems to not be run on CI and nothing on CI seems to build this kernel/op at all.
No, I noticed the issue since at least 0.14.5, way before this PR. But it persists on current master |
|
As a workaround, would pointing to the relevant include dirs (e.g., using CPATH) and removing deepspeed specific configuration (i.e., setting CUTLASS_PATH=DS_IGNORE_CUTLASS_DETECTION) work for you? |
|
Workaround for what exactly? This Path issue? Doesn't help for the arch issues of course |
|
I would recommend opening a separate issue for the ordering problem, if there is no one already, as it is unrelated to the changes in this PR. |
|
Done: #7863 |
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes regression introduced in #7760 cc @sdvillal Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes regression introduced in deepspeedai#7760 cc @sdvillal Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de> Signed-off-by: nathon-lee <leejianwoo@gmail.com>
* Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io>
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes regression introduced in deepspeedai#7760 cc @sdvillal Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de> Signed-off-by: nathon-lee <leejianwoo@gmail.com>
* Modernize conda environment (#34) * Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io> * fix linter problems * add pre-commit * add pixi.excalidraw to docs * remove blackwell build instructions (obsolete) * update docs to recommend pixi * better docs on pixi * update pixi.lock * docker build and tests for pixi * set a sensible 2mb default * more context manager plus dirty dataclass * unit tests * more linting * missed a dep: regenerate pixi.lock * remove duplicate projects * review: comments from Jennifer * update pixi.lock --------- Co-authored-by: Santi Villalba <sdvillal@users.noreply.github.com> Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jennifer Wei <97625454+jnwei@users.noreply.github.com>
* Modernize conda environment (#34) * Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io> * fix linter problems * add pre-commit * add pixi.excalidraw to docs * remove blackwell build instructions (obsolete) * update docs to recommend pixi * better docs on pixi * update pixi.lock * docker build and tests for pixi * set a sensible 2mb default * more context manager plus dirty dataclass * unit tests * more linting * missed a dep: regenerate pixi.lock * remove duplicate projects * First draft for rocm env * First working install * Remove pytorch-lighting dep in pixi.toml * test: add per-platform snapshots for triangular attention and multiplicative update Floating point arithmetic is not associative: different hardware parallelizes reductions (e.g. matrix multiplications, attention softmax) in different orders, accumulating rounding errors differently. CUDA and ROCm therefore produce results that diverge by up to ~2e-6 even on identical inputs. Snapshot comparisons are now routed to nvidia/ or rocm/ subdirectories based on torch.version.hip, so each platform validates consistency with itself across code changes. * Regenerate pixi.lock * Revert accidental formatting changes * Update docs --------- Co-authored-by: Santi Villalba <sdvillal@users.noreply.github.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io> Co-authored-by: Ubuntu <ubuntu@ip-10-149-152-207.eu-central-1.compute.internal> Co-authored-by: Gagan <gagandeep.singh@amd.com>
* Modernize conda environment (#34) * Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io> * fix linter problems * add pre-commit * add pixi.excalidraw to docs * remove blackwell build instructions (obsolete) * update docs to recommend pixi * better docs on pixi * update pixi.lock * docker build and tests for pixi * set a sensible 2mb default * more context manager plus dirty dataclass * unit tests * more linting * missed a dep: regenerate pixi.lock * remove duplicate projects * review: comments from Jennifer * update pixi.lock * cuequivariance support in pixi * update documentation for cuequivariance for kernels * regenerate pixi.lock --------- Co-authored-by: Santi Villalba <sdvillal@users.noreply.github.com> Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: jnwei <jennifer.wei@omsf.io>
EvoformerAttnBuilderhas some problems which preclude compiling the extension on several scenarios (e.g., isolated conda environment with cuda toolchain, lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities.Changes
Fix evoformer CUTLASS detection:
Fix evoformer compilation with no GPU is present:
Make all these changes configurable and explicit through documented environment variables
Tested in all scenarios.