Skip to content

Conversation

@nguyenvuduc
Copy link

Hi Carol-He,

I think it is a brilliant idea with a minor flaw.
I found the CUDA algorithm will miss the remainder part when the number of elements is not evenly divisible by the factor (10 or 2) that was hardcoded in the CUDA kernel. I have fixed it, and I would like to contribute to your original work.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant