Describe the bug
I have qtree based volumes (ontap-nas-economy driver) with a huge amount of (small) files. When these volumes are deleted my Netapp Ontap starts deleting all these files in order to delete the qtree. This takes a lot of time.
Meanwhile, Trident's job status poller has a MaxElapsedTime set of 2 minutes which is too short for all the files and eventually the qtree to be deleted.
When the Trident delete operation is started, the qtree is renamed to have a "deleted" prefix. Only after this an async qtree delete is triggered.
After receiving an error, from the job status poller timeout for example, the original name is restored.
This rename causes the running delete on the Netapp Ontap system to be cancelled.
Trident does retry the delete, but this is not efficient. I noticed on multiple occasions that Trident stops retrying and the volume remains orphaned.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 25.10
- Trident installation flags used: /
- Container runtime: containerd v2.1.5-k3s1
- Kubernetes version: v1.34.3
- Kubernetes orchestrator: Rancher RKE2 v1.34.3+rke2r3
- Kubernetes enabled feature gates: /
- OS: Debian 13 (Linux 6.12.69+deb13-amd64)
- NetApp backend types: ONTAP
To Reproduce
Using ontap-nas-economy driver create a pvc.
Create a huge amount of files (I tested with 1M files) in it.
Delete the pvc.
I've not been able to reproduce a case where trident stops retrying the delete.
Expected behavior
I expect the delete to just run to completion from the first try.
As it's running just fine, there is no need to cancel it and retry it.