Skip to content

fix: Clean up temporary PyPI artifact files after use#185

Open
ajma wants to merge 1 commit intoGoogleCloudDataproc:mainfrom
ajma:fix/cleanup-temp-files
Open

fix: Clean up temporary PyPI artifact files after use#185
ajma wants to merge 1 commit intoGoogleCloudDataproc:mainfrom
ajma:fix/cleanup-temp-files

Conversation

@ajma
Copy link
Copy Markdown
Contributor

@ajma ajma commented Apr 7, 2026

write_packages_config() creates temp JSON files that were never deleted, accumulating on disk over many addArtifacts() calls. Now removes the file in a finally block after addArtifact completes.

@ajma ajma requested review from Deependra-Patel and medb April 7, 2026 22:55
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the addArtifacts method in session.py to ensure that temporary configuration files created for PyPI packages are deleted after use. The reviewer suggested also removing the session-specific parent directory to prevent the accumulation of empty directories in the temporary folder.

write_packages_config() creates temp JSON files that were never deleted,
accumulating on disk over many addArtifacts() calls. Now removes the
file in a finally block after addArtifact completes.
@ajma ajma force-pushed the fix/cleanup-temp-files branch from 48d7bea to 888c52f Compare April 7, 2026 22:59
finally:
try:
os.remove(config_path)
os.rmdir(os.path.dirname(config_path))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not delete the parent directory even if empty. Let's say in future write_packages_config directly creates file under /tmp directory (instead of current session specific ones), in that case we may end up deleting tmp directory itself (if empty).

I believe @tim-u was working on replacing this entire logic with direct run command call (supported in latest Spark release), in that case these temporary file won't be needed at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants