Skip to content

Mxfp8 cast optimization#507

Open
alextmagro wants to merge 7 commits intodevfrom
mxfp8_cast_optimization
Open

Mxfp8 cast optimization#507
alextmagro wants to merge 7 commits intodevfrom
mxfp8_cast_optimization

Conversation

@alextmagro
Copy link
Copy Markdown
Contributor

Description

Improves performance of HIP MXFP8 Cast Kernels -- see confluence page for details
Adds benchmarking scripts for reproducability

@alextmagro alextmagro requested a review from ipanfilo March 27, 2026 16:43
@alextmagro alextmagro requested a review from ipanfilo March 30, 2026 19:26
${TESTS_CPP_DIR}
)

if(DEFINED ENV{NVTE_ROCM_ARCH})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using build_tools/rocm_utils.cmake instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the Cmake file to use rocm_utils.cmake. This also added HIP to the project, and removed the offload-arch list.

-O3
-DNDEBUG
-DUSE_ROCM
--offload-arch=${GPU_TARGETS}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offload-arch does not support lists.

set(CMAKE_CXX_COMPILER hipcc)
endif()

project(transformer_engine_benchmarks LANGUAGES CXX)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIP? Looks like only CXX file is test_common

@alextmagro alextmagro mentioned this pull request Apr 4, 2026
@alextmagro alextmagro force-pushed the mxfp8_cast_optimization branch from 1b3fdc7 to be54090 Compare April 4, 2026 18:55
@alextmagro alextmagro requested a review from ipanfilo April 4, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants