Background
While reviewing #5447, Yicong-Huang flagged that the Files.walk(...) stream-leak pattern Copilot caught in my test cleanup helpers may exist elsewhere. A sweep of the codebase confirmed one production usage that leaks the underlying directory handle:
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/util/dataset/GitVersionControlLocalFileStorage.java, lines 80-85:
public static void deleteRepo(Path directoryPath) throws IOException {
Files.walk(directoryPath)
.sorted(Comparator.reverseOrder())
.map(Path::toFile)
.forEach(File::delete);
}
Files.walk(...) returns a closeable java.util.stream.Stream backed by an open directory handle. Without an explicit close(), the handle stays open until GC — which can flake temp-dir deletion on Windows and leak file descriptors on long-lived JVMs (e.g. the dataset service that calls deleteRepo whenever a Git-backed dataset is removed).
Every other Files.walk / Files.list usage in the codebase already wraps the stream in try/finally and closes it (verified across PveManager.scala, HuggingFaceModelResource.scala, HuggingFaceModelResourceSpec.scala).
What needs to change
Convert deleteRepo to use try-with-resources so the stream is closed even if iteration throws:
public static void deleteRepo(Path directoryPath) throws IOException {
try (var stream = Files.walk(directoryPath)) {
stream
.sorted(Comparator.reverseOrder())
.map(Path::toFile)
.forEach(File::delete);
}
}
No behavior change for callers; just the stream-lifecycle fix.
Scope
- Single-method edit in
GitVersionControlLocalFileStorage.java.
- No new tests required (existing dataset deletion paths already exercise this code).
Background
While reviewing #5447, Yicong-Huang flagged that the
Files.walk(...)stream-leak pattern Copilot caught in my test cleanup helpers may exist elsewhere. A sweep of the codebase confirmed one production usage that leaks the underlying directory handle:common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/util/dataset/GitVersionControlLocalFileStorage.java, lines 80-85:Files.walk(...)returns a closeablejava.util.stream.Streambacked by an open directory handle. Without an explicitclose(), the handle stays open until GC — which can flake temp-dir deletion on Windows and leak file descriptors on long-lived JVMs (e.g. the dataset service that callsdeleteRepowhenever a Git-backed dataset is removed).Every other
Files.walk/Files.listusage in the codebase already wraps the stream intry/finallyand closes it (verified acrossPveManager.scala,HuggingFaceModelResource.scala,HuggingFaceModelResourceSpec.scala).What needs to change
Convert
deleteRepoto use try-with-resources so the stream is closed even if iteration throws:No behavior change for callers; just the stream-lifecycle fix.
Scope
GitVersionControlLocalFileStorage.java.