Closed
Conversation
Move 1-based to 0-based index lowering from the DSL (operations.jl) into a compiler rewrite pass (index_lower_pass!). The load/store/gather operations now pass through Julia's natural 1-based indices, and the pass inserts subi(elem, 1) for each index element in load_partition_view and store_partition_view calls during compilation.
Member
Author
|
Algebraic simplifications have turned out powerful enough to not need this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an experiment.
Summary: Move 1-based to 0-based index lowering from the DSL (operations.jl) into a compiler rewrite pass (index_lower_pass!). The load/store/gather operations now pass through Julia's natural 1-based indices, and the pass inserts subi(elem, 1) for each index element in load_partition_view and store_partition_view calls during compilation.
The motivation here is to simplify the IR we emit (the
layernormexample currently emits 4x as much SASS instructions as cuTile Python does). A large part of that IR cruft comes from the repetitive+1/-1we do as part of the 0-based to 1-based index conversion. For example,bid(1)returns a 1-based index (viaaddi(blockId_x, 1)), then eachload/storecall emits its ownsubi(..., 1)to undo it, resulting in 3 redundant constant+subipairs:On this branch, the indices passed around are kept 1-based, and a pass converts them late at the load/store boundary, result in significantly simpler IR:
Although this is nice, it both doesn't improve performance, and makes it harder to compare our IR to cuTile Python's. So not sure we want this.