[benchmark_inference.py] Specify `tp_size` to `StaticCache` by crcrpar · Pull Request #2784 · Lightning-AI/lightning-thunder

crcrpar · 2025-12-03T08:50:22Z

What does this PR do?

KV values seem to be intact even when tensor parallel is enabled, thus specify tp_size in StaticCache.

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Copilot

Pull request overview

This PR enhances tensor parallel support in the inference benchmark by specifying the tp_size parameter when initializing StaticCache for transformers >= 4.55. The changes also fix tensor parallel plan patterns to be more specific and add sanity checks to verify proper sharding.

Key Changes

Fixed tensor parallel plan patterns from *.layers.* to model.layers.* for more precise module matching
Added tp_size parameter to StaticCache initialization to properly handle sharded KV heads in tensor parallel configurations
Added DTensor verification assertions for attention projection weights to ensure proper sharding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

shino16

Thanks! I remember I faced the same issue at some older commits, and I was wondering how it could be reproduced.

…ding_fix

fix sharding scheme

adf5377

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar requested a review from Copilot December 3, 2025 08:50

crcrpar marked this pull request as ready for review December 3, 2025 08:50

crcrpar requested review from KaelanDt, lantiga and mruberry as code owners December 3, 2025 08:50

Copilot started reviewing on behalf of crcrpar December 3, 2025 08:50 View session

Copilot finished reviewing on behalf of crcrpar December 3, 2025 08:52

Copilot AI reviewed Dec 3, 2025

View reviewed changes

shino16 approved these changes Dec 3, 2025

View reviewed changes

crcrpar requested a review from kshitij12345 December 8, 2025 12:51

KaelanDt and others added 2 commits January 5, 2026 16:03

Merge branch 'main' into mkozuki/inference_bench_tensor_parallel_shar…

1069a34

…ding_fix

Merge branch 'main' into mkozuki/inference_bench_tensor_parallel_shar…

9a58218

…ding_fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark_inference.py] Specify `tp_size` to `StaticCache`#2784

[benchmark_inference.py] Specify `tp_size` to `StaticCache`#2784
crcrpar wants to merge 3 commits intomainfrom
mkozuki/inference_bench_tensor_parallel_sharding_fix

crcrpar commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

shino16 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

crcrpar commented Dec 3, 2025

What does this PR do?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

shino16 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants