Core: Keep FileSystem reachable during Avro writes when Hadoop FS cache is disabled#16642
Open
wombatu-kun wants to merge 1 commit into
Open
Core: Keep FileSystem reachable during Avro writes when Hadoop FS cache is disabled#16642wombatu-kun wants to merge 1 commit into
wombatu-kun wants to merge 1 commit into
Conversation
…he is disabled Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Follow-up to #16641, which fixed this class of bug for the Parquet write path (reported in #16640). When the Hadoop FileSystem cache is disabled (for example
fs.abfs.impl.disable.cache=true), aFileSystemresolved for a write has no shared strong referrer and can be garbage-collected mid-write. On Azure,AzureBlobFileSystem.finalize()then shuts down the thread pool that the openAbfsOutputStreamdepends on, and the write fails withCould not submit task to executor ... ThreadPoolExecutor [Terminated].Root cause
AvroFileAppenderkeeps only the output stream, not theOutputFile. The data and delete writers that wrap it (DataWriter,PositionDeleteWriter,EqualityDeleteWriter) keep the appender and a location string, but not theOutputFileeither. So for an Avro data or delete file written with the cache disabled, nothing keeps the write'sFileSystemreachable, and it can be collected while the file is still being written.Manifests are not affected:
ManifestWriterandManifestListWriterretain theOutputFilethemselves, so theFileSystemstays reachable through them. ORC is also unaffected becauseOrcFileAppenderalready retains itsOutputFile.Change
Retain the
OutputFileonAvroFileAppenderso itsFileSystemstays reachable for the appender's lifetime, mirroringOrcFileAppender. The retained file is also used to include the file location in the write-error message.Tests
Added
TestAvroWriteFileSystemReachability, an end-to-end test that writes an Avro position-delete file throughPositionDeleteWriter(which drops theOutputFile) with the Hadoop FileSystem cache disabled, against a local FileSystem that mimicsAzureBlobFileSystem: itsfinalize()terminates a thread pool the open stream depends on, and the stream references the pool rather than the FileSystem. Without the production change the write FileSystem is collected mid-write andclose()fails withCould not submit task to executor: thread pool was terminated; with the change the FileSystem stays reachable and the write completes. The test fails without the fix and passes with it.Related to #16640 and #16641.