[BUG] gfx1201 (RDNA4 / Radeon AI PRO R9700) not in AITER arch table — FP8 WMMA silently falls back to FP32

Environment
GPU: AMD Radeon AI PRO R9700 (32GB GDDR6, gfx1201, RDNA4)
ROCm: 7.2.1
OS: Ubuntu 24.04 LTS
Frameworks affected: vLLM, SGLang, ROCm TransformerEngine


Problem

AMD's official product guide advertises "128 AI accelerators with FP8 
support." ROCm 7.2.1 silently dequantizes all FP8 weights to FP32 on 
gfx1201 with no warning. The AI accelerators do zero FP8 work. 
Throughput is ~18-22 tok/s instead of expected ~35-40 tok/s.

Root Cause
gfx1201 is missing from _ARCH_TO_DEVICE in
aiter/ops/triton/utils/arch_info.py causing silent FP32 fallback.

Fix (community validated)
    'gfx1201': 'MI350X'

RDNA4 uses FP8 E4M3FN identical to MI350X. Triton kernel path works 
correctly. Non-breaking for existing CDNA deployments.

Request: Official ETA for merging this two-line fix into AITER mainline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] gfx1201 (RDNA4 / Radeon AI PRO R9700) not in AITER arch table — FP8 WMMA silently falls back to FP32 #520

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] gfx1201 (RDNA4 / Radeon AI PRO R9700) not in AITER arch table — FP8 WMMA silently falls back to FP32 #520

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions