HDDS-15362. [DiskBalancer] Handle zero-capacity volumes in status report.#10355
HDDS-15362. [DiskBalancer] Handle zero-capacity volumes in status report.#10355slfan1989 wants to merge 3 commits into
Conversation
|
@adoroszlai @Gargi-jais11 Could you please help review this PR? Thank you very much! |
| double utilization = v.getUsage().getCapacity() == 0 | ||
| ? 0.0 : v.getUtilization(); |
There was a problem hiding this comment.
I think getUtilization should return 1.0 (for 100%) for capacity=0, not null.
Alternatively, it could be NaN, but then it needs to be handled as special case just like null.
VolumeFixedUsage.utilization should be double, not Double in either case.
There was a problem hiding this comment.
That makes sense. Since utilization is exposed as a double value, using nullable Double makes callers handle an implicit special case and can lead to NPEs.
I will change VolumeFixedUsage.utilization to primitive double and use 1.0 for zero-capacity volumes. This keeps zero-capacity volumes from being treated as valid low-utilization targets, while avoiding NaN-specific handling in sorting, reporting, and threshold comparisons.
| this.utilization = usage.getCapacity() > 0 ? computeUtilization(usage, volume.getCommittedBytes(), delta) : null; | ||
| this.utilization = usage.getCapacity() > 0 | ||
| ? computeUtilization(usage, volume.getCommittedBytes(), delta) | ||
| : 1.0; |
There was a problem hiding this comment.
Thanks @slfan1989 , but let's use 0.0 instead of 1.0.
For zero-capacity volume, ideal usage is 0.0, if threshold is 10%, and utilization is 1.0, then this volume is over utilized.
I don't recommend to change the logic to make ideal usage 1.0 too for this case.
There was a problem hiding this comment.
Thanks for the clarification. Updated the zero-capacity volume utilization from 1.0 to 0.0, and adjusted the related tests accordingly.
| this.utilization = usage.getCapacity() > 0 ? computeUtilization(usage, volume.getCommittedBytes(), delta) : null; | ||
| this.utilization = usage.getCapacity() > 0 | ||
| ? computeUtilization(usage, volume.getCommittedBytes(), delta) | ||
| : 0.0; |
There was a problem hiding this comment.
if there is no Capacity, should utilization be 100%? Would 0 utilization be treated that volume is a candidate for moving data or creating containers?
There was a problem hiding this comment.
Good question. 0.0 is only used as a safe value for zero-capacity volumes to avoid reporting them as over-utilized. Such volumes should not become DiskBalancer candidates because DefaultContainerChoosingPolicy filters out volumes with capacity <= 0 before selecting source/destination volumes, and destination selection also requires positive usable space.
What changes were proposed in this pull request?
This pull request fixes a possible
NullPointerExceptionin the DiskBalancer status/report generation path when a datanode contains a zero-capacity volume.VolumeFixedUsagedoes not calculate utilization for a volume whose capacity is 0, leaving its utilization value asnull. However,DiskBalancerService.buildVolumeReportProto()previously attempted to retrieve utilization for every volume while constructing status report entries. As a result, a zero-capacity volume could cause DiskBalancer status reporting to fail withNullPointerException: utilization == null.This change keeps zero-capacity volumes in the generated report and reports their utilization as 0.0. Normal volumes continue to use their calculated utilization values. This allows status/report requests to complete successfully while preserving useful information about zero-capacity volumes.
A regression unit test is added to verify that a zero-capacity volume can be included in the volume report with utilization set to 0.0.
What is the link to the Apache JIRA
HDDS-15362. [DiskBalancer] Handle zero-capacity volumes in status report.
How was this patch tested?
Added a unit test.
CI: https://github.com/slfan1989/ozone/actions/runs/26357147502