Use tag::any for int8 matmul weight desc to create pd#155
Open
Xia-Weiwen wants to merge 1 commit intointel:ideep_devfrom
Open
Use tag::any for int8 matmul weight desc to create pd#155Xia-Weiwen wants to merge 1 commit intointel:ideep_devfrom
Xia-Weiwen wants to merge 1 commit intointel:ideep_devfrom
Conversation
We didn't change the weight at runtime. Now, we have to do that? PyTorch code has to be adapted too, right? |
Contributor
Author
We are checking weight desc and reordering weight if needed after preparing now: https://github.com/pytorch/pytorch/blob/3726d232191088e8e7a9c1a2ab3244cdd9250bf2/aten/src/ATen/native/quantized/cpu/qlinear.cpp#L851 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An issue was found that int8 matmul runs into
ref:anykernel, which is very slow. It was found with stock PyTorch + onednn 3.0.It is because dst scales have an impact on pd creation. When prepacking weight, dst scales are not set to create pd (int8 and fp32 share the same
expected_weight_descfunction). Then we can create a pd that gives weight desc in layout A.But at runtime, dst scales are set and we specify weight layout A to create pd. Onednn may find that layout A is improper, and it finally runs into
ref:jitkernel.Now we use
tag::anyfor weight desc to create pd at runtime regardless of the layout of prepacked weight. Then pd can give a better layout for weight. The prepacked weight will be reordered again on the first run.Previously:
tag::anyand without info of src/dst scales/zero points.ref:anykernel is used.Now:
tag::anyand without info of src/dst scales/zero points.tag::anyand with info of src/dst scales/zero points.Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.
Test plan
@jgong5 @XiaobingSuper @yanbing-j @leslie-fang-intel Please review. Thanks!