Fix: Update ZipConverter docstring and optimize memory handling for large entries#2079
Fix: Update ZipConverter docstring and optimize memory handling for large entries#2079KhudaBuxMagsi wants to merge 1 commit into
Conversation
|
@microsoft-github-policy-service agree |
|
Looks good to me. |
|
Good improvement. Switching from extracting all files at once to streaming them one at a time is the right call for memory efficiency, especially with large ZIPs. A few observations:
LGTM, nice cleanup. |
|
Good memory optimization for the ZipConverter. Processing files in-memory via A few observations:
Minor: there's a trailing whitespace on the empty line after the directory check (line with just spaces in the Otherwise, solid improvement. Approve. |
noezhiya-dot
left a comment
There was a problem hiding this comment.
Code Review: Fix: Update ZipConverter docstring and optimize memory handling for large entries
I have concerns about this PR.
Issue: The memory optimization claim is misleading
The PR title and comment say "optimize memory handling" and "prevent high memory usage (OOM) on large files", but the code still calls member_file.read() which reads the entire file contents into memory. Then io.BytesIO(member_file.read()) wraps it in another buffer. This does NOT reduce memory usage compared to the original zipObj.read(name) — both approaches load the full file into memory.
The actual changes are:
- Docstring update — accurate, the old docstring mentioned temp directories which no longer applies.
- Directory skip — the if name.endswith('/'): continue addition is a good fix.
- Using zipObj.open() instead of zipObj.read() — functionally equivalent for memory since .read() is called immediately.
Suggestions
- Remove the misleading "prevent high memory usage (OOM)" comment. The directory skip is the real fix.
- If the goal is truly to handle large files, the converter would need to process chunks or use temp files.
The directory skip fix is good. The docstring update is fine. But the PR should not claim memory optimization that is not implemented.
Docstring Correction: Updated the class docstring to accurately reflect that files are processed in-memory rather than extracted to a temporary directory on disk.
Memory Optimization: Safely structured the file reading process inside the zipObj loop using zipObj.open to ensure better stream handling, reducing the risk of unexpected memory usage spikes.