Skip to content

Conversation

@joaosaffran
Copy link
Collaborator

This patch modifies the Wave Match test to test modifications in different lanes and vector
indexes. This is achieved by forcing lanes 0, WAVE_SIZE/2 and WAVE_SIZE -1, to modify
the vector at indexes 0, WAVE_SIZE/2 or WAVE_SIZE -1, respectively.

uint64_t LowBits;
uint64_t HighBits;

Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) {
Wave(UINT NumWaves, uint64_t LB, uint64_t HB) {

uint64_t LowBits;
uint64_t HighBits;

Wave(CONST UINT NumWaves, uint64_t LB, uint64_t HB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only ever pass LB = HB = ~0 or LB = HB = 0 to this. There may be a way to simplify to have a single value passed in?

template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {};

// Helper struct to build the expected result for WaveMatch tests.
struct Wave {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs a longer name. WaveMatchResultBuilder?

Comment on lines 1598 to 1602
uint64_t LowWaveMask;
uint64_t HighWaveMask;

uint64_t LowBits;
uint64_t HighBits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should aim to make these private.

What if there was function that looked like this?

void SetExpected(std::vector<UINT>::iterator It)

That would write the values into the 4 elements of the vector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure taking an iterator is a great suggestion. Maybe void SetExpected(UINT* Dest) might be a better starting point.

@joaosaffran joaosaffran requested a review from damyanp December 9, 2025 19:57
Copy link
Contributor

@alsepkow alsepkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments. It looks like we need a few fixes on this iteration still.

const UINT HighWaves = NumWaves - LowWaves;
LowWaveMask = (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL;
HighWaveMask = (HighWaves < 64) ? (1ULL << HighWaves) - 1 : ~0ULL;
LowBits &= LowWaveMask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment to LowBits and HighBits seems redundant? They're initialized to zero so the result of these operations will still always just be 0?

D3D12_FEATURE_D3D12_OPTIONS1, &WaveOpts, sizeof(WaveOpts)));

WaveSize = WaveOpts.WaveLaneCountMin;
WaveSize = 128; // WaveOpts.WaveLaneCountMin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you intended to leave this commented out

WaveMatchResultBuilder(UINT NumWaves) : LowBits(0), HighBits(0) {
const UINT LowWaves = std::min(64U, NumWaves);
const UINT HighWaves = NumWaves - LowWaves;
LowWaveMask = (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could help improve readability of this code (not that its bad) by using inline helpers or static constexp helpers for some of the bit math instead.

LowWaveMask = lowNBitsSet(LowWaves);

'~0ULL' is probably fine. It's simple and common.

private:
uint64_t LowWaveMask;
uint64_t HighWaveMask;
uint64_t LowBits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you opted for two uint64 values instead of 4 uints or one 128-bit uint?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard C++ does not currently include a 128-bit uint. This requires a compiler extension, I was having trouble compiling using it, and I didn't want to modify the cmake definition to support it. So, I created my own. Using the 2 64 bit uint seemed the simplest implementation with built-in types.


const UINT LowWaves = std::min(64U, WaveSize);
const UINT HighWaves = WaveSize - LowWaves;
const UINT MidBit = WaveSize / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming on this should be 'MidLaneID' and 'LastLaneID' respectively.

const uint64_t LowExpected = ~1ULL & LowWaveMask;
const uint64_t HighExpected = ~0ULL & HighWaveMask;
if (I == 0 || MidBit == I || LastBit == I) {
WaveMatchResultBuilder ChangedLanes(WaveSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What were you intending to do with ChangedLanes? It's declared as a local in this for loop and not used outside of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChangedLanes is a helper that helps me compute the expected result for WaveMatch, once the value is computed, I use ChangedLanes.SetExpected, to save it into Expected array. I can rewrite the loop to move ChangedLanes declaration outside of it, like so:

WaveMatchResultBuilder ChangedLanes(WaveSize);

for (UINT LaneID = 0; LaneID < WaveSize; ++LaneID) {
      const UINT Index = LaneID * 4;

      if (LaneID == 0 ||  LaneID == MidLaneID ||  LaneID == LastLaneID) {
        ChangedLanes.SetLane(LaneID);
        ChangedLanes.SetExpected(&Expected[Index]);
        ChangedLanes.ClearLane(LaneID);
        continue;
      }

      UnchangedLanes.SetExpected(&Expected[Index]);
    }

Is that better @alsepkow?

@github-project-automation github-project-automation bot moved this from New to In progress in HLSL Roadmap Dec 12, 2025
@joaosaffran joaosaffran requested a review from alsepkow January 5, 2026 23:35
Comment on lines 1617 to 1620
uint32_t GetWord(const std::bitset<128> &b, uint32_t wordPos) {
uint32_t v = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Need capitalized variable names

@joaosaffran joaosaffran force-pushed the hlk/improve-wave-match-test-coverage branch from d884ec7 to f0d2cfd Compare January 7, 2026 20:04
@joaosaffran joaosaffran force-pushed the hlk/improve-wave-match-test-coverage branch from 79cc228 to 2138c2a Compare January 7, 2026 20:05
@joaosaffran joaosaffran requested a review from alsepkow January 7, 2026 20:40

template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {};

uint32_t GetWord(const std::bitset<128> &b, uint32_t wordPos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to capitalize 'b'

Comment on lines +1652 to +1653
std::bitset<128> UnchangedLanes(~0ULL);
UnchangedLanes &= (1ULL << WaveSize) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this to
std::bitset<128> UnchangedLanes((1ULL << WaveSize) - 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is bit shifting a uint64_t but we're trying to assign up to 128 bits.
You could do a for loop instead.

for (UINT I = 0; I < WaveSize; ++I) UnchangedLanes.set(I);

(HighWaves < 64) ? (1ULL << HighWaves) - 1 : ~0ULL;
std::bitset<128> UnchangedLanes(~0ULL);
UnchangedLanes &= (1ULL << WaveSize) - 1;
UnchangedLanes.reset(0).reset(MidLaneID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think its cleaner to avoid the chaining and instead call reset on two subsequent lines.


template <typename T> struct Op<OpType::WaveMatch, T, 1> : StrictValidation {};

uint32_t GetWord(const std::bitset<128> &b, uint32_t WordPos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to implement these helpers.
Bitset has them already. See: to_ulong
And to_ullong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried asking copilot if it could see any optimizations here:

Your GetWord builds a 32‑bit word by OR’ing each bit. bitset can slice cleanly:

lo64 = (bits & low64mask).to_ullong();
hi64 = ((bits >> 64) & low64mask).to_ullong();
Then split each 64 into two UINTs. (No UB: bitset::to_ullong() is defined up to 64 bits.)

return Word;
}

void StoreWords(UINT *Dest, std::bitset<128> LanesState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update LanesState to be a const reference to avoid the copy every time this is called.

const UINT HighWaves = WaveSize - LowWaves;
const size_t VectorSize = Inputs[0].size();

Expected.assign(WaveSize * 4, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already called on line 1643. I think you only meant to call it once?

const uint64_t LowWaveMask =
(LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL;
const UINT MidLaneID = WaveSize / 2;
const UINT LastLaneID = WaveSize - 1;
Copy link
Contributor

@alsepkow alsepkow Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid the subsequent checks to see if LastLaneID is in range by instead doing std::min(WaveSize-1, Inputs[0].size())

And then update your shader code appropriately as well.

AllMemoryBarrierWithGroupSync();
uint4 result = WaveMatch(Vector);
uint index = WaveGetLaneIndex();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'result' and 'index' should be capitalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

4 participants