Activation + Group Quantize Fusion with te.Sequential

**Is your feature request related to a problem? Please describe.**
Grouped MLP Block in MOE can be roughly -equated to  GroupedLinear + Activation + GroupedLinear. Each of the GroupedLinear needs to go through group_quantize. For the cases where this is unfused we need to atleast enable fusing Activation with the group_quantize happening in the second GroupedLinear layer.

**Describe the solution you'd like**
- Expose the right tex functions in pytorch that fuses activation and quantize. This is already supported for normal pytorch tensors and idea needs to be extended for GroupedTensors.

- Enable te.Sequential to handle the fusion for grouped MLP block






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation + Group Quantize Fusion with te.Sequential #2988

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Activation + Group Quantize Fusion with te.Sequential #2988

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions