Skip to content

Activation + Group Quantize Fusion with te.Sequential #2988

@vthumbe1503

Description

@vthumbe1503

Is your feature request related to a problem? Please describe.
Grouped MLP Block in MOE can be roughly -equated to GroupedLinear + Activation + GroupedLinear. Each of the GroupedLinear needs to go through group_quantize. For the cases where this is unfused we need to atleast enable fusing Activation with the group_quantize happening in the second GroupedLinear layer.

Describe the solution you'd like

  • Expose the right tex functions in pytorch that fuses activation and quantize. This is already supported for normal pytorch tensors and idea needs to be extended for GroupedTensors.

  • Enable te.Sequential to handle the fusion for grouped MLP block

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions