🐛 OCPBUGS-78787: fix(operator-controller): clean up orphaned temp dirs in catalog cache#2574
Conversation
…in catalog cache
filesystemCache.writeFS creates a temp dir (.{catalog}-{random}) and renames
it into place atomically. If the process is interrupted before the rename, the
temp dir persists. Each restart adds another, eventually filling the disk.
Additionally, writeFS had no defer os.RemoveAll(tmpDir), so any error during
WalkMetasReader or the rename step also left the temp dir behind — no process
kill required.
Two fixes:
- Add defer os.RemoveAll(tmpDir) so errors during normal operation clean up.
- Add removeOrphanedTempDirs, called at the start of writeFS (under the write
mutex), to clean up dirs orphaned by a previous process run. This bounds
worst-case accumulation to one orphaned dir per catalog regardless of
restart rate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Related to #2537 which fixed catalogd issues. |
There was a problem hiding this comment.
Pull request overview
This pull request adds cleanup for orphaned temporary directories in the filesystem cache implementation. These orphaned directories can be left behind if a write operation is interrupted (e.g., pod eviction or crash) before the temporary staging directory is renamed to the final cache location. The changes improve reliability by automatically cleaning up these dangling directories when a new write operation begins.
Changes:
- Added
removeOrphanedTempDirs()method to scan and remove temporary directories with the catalog-specific prefix pattern that were left behind by interrupted writes - Integrated orphaned directory cleanup into the
writeFS()method to run before creating a new temporary directory - Added a defer statement to ensure temporary directories are cleaned up if the write operation fails
- Added comprehensive test coverage for the orphaned directory cleanup functionality
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/operator-controller/catalogmetadata/cache/cache.go | Added orphaned temp directory cleanup with removeOrphanedTempDirs() method and integrated it into writeFS() flow |
| internal/operator-controller/catalogmetadata/cache/cache_test.go | Added TestFilesystemCachePutCleansOrphanedTempDirs() test to verify orphaned directories are cleaned up while preserving directories for other catalogs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2574 +/- ##
==========================================
+ Coverage 63.42% 68.58% +5.16%
==========================================
Files 131 131
Lines 9333 9348 +15
==========================================
+ Hits 5919 6411 +492
+ Misses 2939 2442 -497
- Partials 475 495 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
filesystemCache.writeFS creates a temp dir (.{catalog}-{random}) and renames it into place atomically. If the process is interrupted before the rename, the temp dir persists. Each restart adds another, eventually filling the disk.
Additionally, writeFS had no defer os.RemoveAll(tmpDir), so any error during WalkMetasReader or the rename step also left the temp dir behind — no process kill required.
Two fixes:
Description
Reviewer Checklist