[FEATURE] Int8 kernels for Sparsity

**Describe the feature request**
Since quantization techniques are orthogonal to sparsity, we should be able to leverage the benefits of both and stack them together. 

**Describe the solution you'd like**

We have similar dtype templates in [cuda](https://github.com/NimbleEdge/sparse_transformers/blob/main/sparse_transformers/csrc/sparse_mlp_cuda.cu#L9) which we need to replicate for CPU and instruction sets like AVX

```cpp
template <>
__global__ void sparse_mlp_combined_cuda_kernel<float>(...)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Int8 kernels for Sparsity #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Int8 kernels for Sparsity #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions