-
Notifications
You must be signed in to change notification settings - Fork 825
[HLK] Modify Wave match test logic to support modifications in different lanes and vector position #7991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[HLK] Modify Wave match test logic to support modifications in different lanes and vector position #7991
Conversation
| uint64_t LowBits; | ||
| uint64_t HighBits; | ||
|
|
||
| Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) { | |
| Wave(UINT NumWaves, uint64_t LB, uint64_t HB) { |
| uint64_t LowBits; | ||
| uint64_t HighBits; | ||
|
|
||
| Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you only ever pass LB = HB = ~0 or LB = HB = 0 to this. There may be a way to simplify to have a single value passed in?
| template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {}; | ||
|
|
||
| // Helper struct to build the expected result for WaveMatch tests. | ||
| struct Wave { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs a longer name. WaveMatchResultBuilder?
| uint64_t LowWaveMask; | ||
| uint64_t HighWaveMask; | ||
|
|
||
| uint64_t LowBits; | ||
| uint64_t HighBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should aim to make these private.
What if there was function that looked like this?
void SetExpected(std::vector<UINT>::iterator It)
That would write the values into the 4 elements of the vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure taking an iterator is a great suggestion. Maybe void SetExpected(UINT* Dest) might be a better starting point.
alsepkow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments. It looks like we need a few fixes on this iteration still.
| const UINT HighWaves = NumWaves - LowWaves; | ||
| LowWaveMask = (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL; | ||
| HighWaveMask = (HighWaves < 64) ? (1ULL << HighWaves) - 1 : ~0ULL; | ||
| LowBits &= LowWaveMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment to LowBits and HighBits seems redundant? They're initialized to zero so the result of these operations will still always just be 0?
| D3D12_FEATURE_D3D12_OPTIONS1, &WaveOpts, sizeof(WaveOpts))); | ||
|
|
||
| WaveSize = WaveOpts.WaveLaneCountMin; | ||
| WaveSize = 128; // WaveOpts.WaveLaneCountMin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you intended to leave this commented out
| WaveMatchResultBuilder(UINT NumWaves) : LowBits(0), HighBits(0) { | ||
| const UINT LowWaves = std::min(64U, NumWaves); | ||
| const UINT HighWaves = NumWaves - LowWaves; | ||
| LowWaveMask = (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could help improve readability of this code (not that its bad) by using inline helpers or static constexp helpers for some of the bit math instead.
LowWaveMask = lowNBitsSet(LowWaves);
'~0ULL' is probably fine. It's simple and common.
| private: | ||
| uint64_t LowWaveMask; | ||
| uint64_t HighWaveMask; | ||
| uint64_t LowBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you opted for two uint64 values instead of 4 uints or one 128-bit uint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard C++ does not currently include a 128-bit uint. This requires a compiler extension, I was having trouble compiling using it, and I didn't want to modify the cmake definition to support it. So, I created my own. Using the 2 64 bit uint seemed the simplest implementation with built-in types.
|
|
||
| const UINT LowWaves = std::min(64U, WaveSize); | ||
| const UINT HighWaves = WaveSize - LowWaves; | ||
| const UINT MidBit = WaveSize / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming on this should be 'MidLaneID' and 'LastLaneID' respectively.
| const uint64_t LowExpected = ~1ULL & LowWaveMask; | ||
| const uint64_t HighExpected = ~0ULL & HighWaveMask; | ||
| if (I == 0 || MidBit == I || LastBit == I) { | ||
| WaveMatchResultBuilder ChangedLanes(WaveSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What were you intending to do with ChangedLanes? It's declared as a local in this for loop and not used outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChangedLanes is a helper that helps me compute the expected result for WaveMatch, once the value is computed, I use ChangedLanes.SetExpected, to save it into Expected array. I can rewrite the loop to move ChangedLanes declaration outside of it, like so:
WaveMatchResultBuilder ChangedLanes(WaveSize);
for (UINT LaneID = 0; LaneID < WaveSize; ++LaneID) {
const UINT Index = LaneID * 4;
if (LaneID == 0 || LaneID == MidLaneID || LaneID == LastLaneID) {
ChangedLanes.SetLane(LaneID);
ChangedLanes.SetExpected(&Expected[Index]);
ChangedLanes.ClearLane(LaneID);
continue;
}
UnchangedLanes.SetExpected(&Expected[Index]);
}Is that better @alsepkow?
| uint32_t GetWord(const std::bitset<128> &b, uint32_t wordPos) { | ||
| uint32_t v = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Need capitalized variable names
d884ec7 to
f0d2cfd
Compare
79cc228 to
2138c2a
Compare
|
|
||
| template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {}; | ||
|
|
||
| uint32_t GetWord(const std::bitset<128> &b, uint32_t wordPos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to capitalize 'b'
| std::bitset<128> UnchangedLanes(~0ULL); | ||
| UnchangedLanes &= (1ULL << WaveSize) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify this to
std::bitset<128> UnchangedLanes((1ULL << WaveSize) - 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is bit shifting a uint64_t but we're trying to assign up to 128 bits.
You could do a for loop instead.
for (UINT I = 0; I < WaveSize; ++I) UnchangedLanes.set(I);
| (HighWaves < 64) ? (1ULL << HighWaves) - 1 : ~0ULL; | ||
| std::bitset<128> UnchangedLanes(~0ULL); | ||
| UnchangedLanes &= (1ULL << WaveSize) - 1; | ||
| UnchangedLanes.reset(0).reset(MidLaneID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think its cleaner to avoid the chaining and instead call reset on two subsequent lines.
|
|
||
| template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {}; | ||
|
|
||
| uint32_t GetWord(const std::bitset<128> &b, uint32_t WordPos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to implement these helpers.
Bitset has them already. See: to_ulong
And to_ullong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried asking copilot if it could see any optimizations here:
Your GetWord builds a 32‑bit word by OR’ing each bit. bitset can slice cleanly:
lo64 = (bits & low64mask).to_ullong();
hi64 = ((bits >> 64) & low64mask).to_ullong();
Then split each 64 into two UINTs. (No UB: bitset::to_ullong() is defined up to 64 bits.)
| return Word; | ||
| } | ||
|
|
||
| void StoreWords(UINT *Dest, std::bitset<128> LanesState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update LanesState to be a const reference to avoid the copy every time this is called.
| const UINT HighWaves = WaveSize - LowWaves; | ||
| const size_t VectorSize = Inputs[0].size(); | ||
|
|
||
| Expected.assign(WaveSize * 4, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already called on line 1643. I think you only meant to call it once?
| const uint64_t LowWaveMask = | ||
| (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL; | ||
| const UINT MidLaneID = WaveSize / 2; | ||
| const UINT LastLaneID = WaveSize - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid the subsequent checks to see if LastLaneID is in range by instead doing std::min(WaveSize-1, Inputs[0].size())
And then update your shader code appropriately as well.
| AllMemoryBarrierWithGroupSync(); | ||
| uint4 result = WaveMatch(Vector); | ||
| uint index = WaveGetLaneIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'result' and 'index' should be capitalized.
This patch modifies the Wave Match test to test modifications in different lanes and vector
indexes. This is achieved by forcing lanes
0,WAVE_SIZE/2andWAVE_SIZE -1, to modifythe vector at indexes
0,WAVE_SIZE/2orWAVE_SIZE -1, respectively.