Describe the feature request
Since quantization techniques are orthogonal to sparsity, we should be able to leverage the benefits of both and stack them together.
Describe the solution you'd like
We have similar dtype templates in cuda which we need to replicate for CPU and instruction sets like AVX
template <>
__global__ void sparse_mlp_combined_cuda_kernel<float>(...)
Describe the feature request
Since quantization techniques are orthogonal to sparsity, we should be able to leverage the benefits of both and stack them together.
Describe the solution you'd like
We have similar dtype templates in cuda which we need to replicate for CPU and instruction sets like AVX