[Doc]: FP8 KV Cache: Does softmax output multiply with FP8 V directly or after dequantization?

### 📚 The doc issue

https://docs.vllm.ai/en/v0.8.5.post1/features/quantization/quantized_kvcache.html
Question:
In the FP8 KV Cache implementation, after computing attention scores and softmax at higher precision (FP16/BF16), is the resulting attention weight matrix:
Quantized to FP8 and multiplied directly with FP8 V cache, or
Multiplied with V cache after dequantizing V to higher precision?
The documentation mentions "no fused dequantization and attention operations yet" but doesn't specify the precision of this final multiplication. Clarifying this detail would help understand the accuracy-performance tradeoff.
Thanks!


### Suggest a potential alternative/fix

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Doc]: FP8 KV Cache: Does softmax output multiply with FP8 V directly or after dequantization? #31023

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Doc]: FP8 KV Cache: Does softmax output multiply with FP8 V directly or after dequantization? #31023

Description

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions