Description
The calculate_footprint function already supports a batch_size argument to scale dynamic KV cache calculations, but there is no CLI flag to expose this.
We need to add a --batch-size argument to cli.py (default: 1) and pass it through analyze_model to calculate_footprint.
Use Case
Allows users to calculate accurate VRAM footprints for multi-batch inference without editing code.
Description
The
calculate_footprintfunction already supports abatch_sizeargument to scale dynamic KV cache calculations, but there is no CLI flag to expose this.We need to add a
--batch-sizeargument tocli.py(default: 1) and pass it throughanalyze_modeltocalculate_footprint.Use Case
Allows users to calculate accurate VRAM footprints for multi-batch inference without editing code.