Is your feature request related to a problem? Please describe.
Grouped MLP Block in MOE can be roughly -equated to GroupedLinear + Activation + GroupedLinear. Each of the GroupedLinear needs to go through group_quantize. For the cases where this is unfused we need to atleast enable fusing Activation with the group_quantize happening in the second GroupedLinear layer.
Describe the solution you'd like
-
Expose the right tex functions in pytorch that fuses activation and quantize. This is already supported for normal pytorch tensors and idea needs to be extended for GroupedTensors.
-
Enable te.Sequential to handle the fusion for grouped MLP block
Is your feature request related to a problem? Please describe.
Grouped MLP Block in MOE can be roughly -equated to GroupedLinear + Activation + GroupedLinear. Each of the GroupedLinear needs to go through group_quantize. For the cases where this is unfused we need to atleast enable fusing Activation with the group_quantize happening in the second GroupedLinear layer.
Describe the solution you'd like
Expose the right tex functions in pytorch that fuses activation and quantize. This is already supported for normal pytorch tensors and idea needs to be extended for GroupedTensors.
Enable te.Sequential to handle the fusion for grouped MLP block