Skip to content

helm: add nodeSelector and tolerations to NIMCache templates#1645

Open
abhay1999 wants to merge 2 commits intoNVIDIA:mainfrom
abhay1999:1636-nimcache-nodeselector-tolerations
Open

helm: add nodeSelector and tolerations to NIMCache templates#1645
abhay1999 wants to merge 2 commits intoNVIDIA:mainfrom
abhay1999:1636-nimcache-nodeselector-tolerations

Conversation

@abhay1999
Copy link
Copy Markdown

Summary

Fixes #1636

NIMCache resources were missing nodeSelector and tolerations fields across all NIM Operator mode Helm templates. Without these fields, NIMCache pods cannot be scheduled on GPU nodes that have taints or require node selection (e.g. cloud provider GPU node pools with nvidia.com/gpu: NoSchedule taints).

The NIMService sections of the same templates already exposed these fields correctly — this PR makes NIMCache consistent with NIMService.

Changes

Added nodeSelector and tolerations to the NIMCache spec in all 8 affected templates:

  • helm/templates/llama-nemotron-embed-1b-v2.yaml
  • helm/templates/llama-nemotron-rerank-1b-v2.yaml
  • helm/templates/nemotron-graphic-elements-v1.yaml
  • helm/templates/nemotron-ocr-v1.yaml
  • helm/templates/nemotron-page-elements-v3.yaml
  • helm/templates/nemotron-table-structure-v1.yaml
  • helm/templates/nemotron-nano-12b-v2-vl.yaml (also adds missing nodeSelector to NIMService)
  • helm/templates/nemotron-parse.yaml (also adds missing nodeSelector to NIMService)

Before / After

Before — NIMCache would ignore nodeSelector/tolerations from values.yaml, causing pods to be unschedulable on tainted GPU nodes:

kind: NIMCache
spec:
  source: ...
  storage: ...
  # nodeSelector and tolerations missing

After — consistent with NIMService:

kind: NIMCache
spec:
  source: ...
  storage: ...
  nodeSelector:
    {{ toYaml .Values.nimOperator.<model>.nodeSelector | nindent 4 }}
  tolerations:
    {{ toYaml .Values.nimOperator.<model>.tolerations | nindent 4 }}

Testing

Verified template structure matches the existing NIMService pattern used consistently across all templates in the chart.

NIMCache resources were missing nodeSelector and tolerations fields
across all Operator-mode templates. Without these, NIMCache pods cannot
be scheduled on GPU nodes that have taints or require specific node
selection (e.g. cloud provider GPU node pools).

The NIMService sections of the same templates already expose these
fields correctly. This commit adds the equivalent fields to the NIMCache
sections of all eight affected templates, and also adds the missing
nodeSelector to the NIMService sections of nemotron-nano-12b-v2-vl and
nemotron-parse which only had tolerations.

Fixes NVIDIA#1636

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
@abhay1999 abhay1999 requested a review from a team as a code owner March 18, 2026 01:53
@abhay1999 abhay1999 requested a review from charlesbluca March 18, 2026 01:53
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Remove extra trailing blank line to satisfy pre-commit end-of-file-fixer hook.

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
@abhay1999
Copy link
Copy Markdown
Author

Hi team! 👋 Following up on this PR — it's been 15 days since opening.

This adds nodeSelector and tolerations support to the NIMCache Helm templates, making it possible to schedule cache pods on specific nodes (GPU node pools, spot instances, etc.).

I noticed the copy-pr-bot flagged that this needs additional vetting before workflows can run on NVIDIA's runners. Happy to assist with anything needed from my side to unblock the validation. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Helm chart NIMCache templates missing nodeSelector and tolerations fields

1 participant