fix: Clean up temporary PyPI artifact files after use#185
fix: Clean up temporary PyPI artifact files after use#185ajma wants to merge 1 commit intoGoogleCloudDataproc:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the addArtifacts method in session.py to ensure that temporary configuration files created for PyPI packages are deleted after use. The reviewer suggested also removing the session-specific parent directory to prevent the accumulation of empty directories in the temporary folder.
write_packages_config() creates temp JSON files that were never deleted, accumulating on disk over many addArtifacts() calls. Now removes the file in a finally block after addArtifact completes.
48d7bea to
888c52f
Compare
| finally: | ||
| try: | ||
| os.remove(config_path) | ||
| os.rmdir(os.path.dirname(config_path)) |
There was a problem hiding this comment.
Let's not delete the parent directory even if empty. Let's say in future write_packages_config directly creates file under /tmp directory (instead of current session specific ones), in that case we may end up deleting tmp directory itself (if empty).
I believe @tim-u was working on replacing this entire logic with direct run command call (supported in latest Spark release), in that case these temporary file won't be needed at all.
write_packages_config() creates temp JSON files that were never deleted, accumulating on disk over many addArtifacts() calls. Now removes the file in a finally block after addArtifact completes.