fix(deps): pin lxml >=6.1.0 — close PYSEC-2026-87 (XXE in DOCX/PPTX/EPUB/PDF parsing)#66
Open
nelsonduarte wants to merge 1 commit into
Open
fix(deps): pin lxml >=6.1.0 — close PYSEC-2026-87 (XXE in DOCX/PPTX/EPUB/PDF parsing)#66nelsonduarte wants to merge 1 commit into
nelsonduarte wants to merge 1 commit into
Conversation
A pip-audit sweep flagged lxml 6.0.2 as vulnerable to PYSEC-2026-87. lxml is pulled in transitively by python-docx, python-pptx, ebooklib, and pikepdf — all four of which we use, so the vulnerable lxml ships inside the PDFApps .exe (confirmed via the PyInstaller xref). The bug: lxml's two parsers, in their default configuration (resolve_entities=True), resolve external entities. A crafted DOCX, PPTX, EPUB, or a PDF with hostile XMP metadata can therefore embed references to /etc/passwd, ~/.ssh/id_rsa, etc., and the parser will inline those contents on parse — classic XML External Entity (XXE) attack. Fixed upstream in 6.1.0. PDFApps is exactly the kind of tool that opens untrusted files from users' machines, so the attack surface is real. Severity: medium (requires user to open a malicious file). Fix: add an explicit `lxml>=6.1.0` line to requirements.txt. None of the four packages that pull lxml in pin it, so pip will pick whatever matches — usually the latest. The explicit pin guarantees the safe floor on every install, including the GitHub Actions release runner that produces the shipped MSIX / installer. Other findings from the same sweep (all NOT shipping in the .exe and so no action needed): - urllib3 2.6.3 / idna 3.11 — only present as dev deps (no requests imports in our code; "idna" entry in the PyInstaller xref is the stdlib encodings/idna module, not the third-party package). - pip 26.0.1 — build-time tool, never bundled. - pytest 9.0.2 — only used under tests/, never in runtime modules. No version bump in this commit. The fix will ride along with the next natural release (currently v1.13.12 is in cert; tagging v1.13.13 just for this would churn the Store certification queue). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`pip-audit` flagged lxml 6.0.2 as vulnerable to PYSEC-2026-87 — an XXE (XML External Entity) bug. lxml is transitive via `python-docx`, `python-pptx`, `ebooklib`, and `pikepdf` — all four of which we use, so vulnerable lxml ships inside the .exe (verified via the PyInstaller xref).
The bug
Pre-6.1.0 lxml parsers, with default `resolve_entities=True`, resolve external entities. A crafted DOCX / PPTX / EPUB / PDF-XMP can embed references to `/etc/passwd`, `~/.ssh/id_rsa`, etc., and the parser will inline those contents. Classic XXE.
PDFApps is a tool that opens untrusted files from users' machines, so the surface is real. Severity: medium (requires user interaction).
Fix
Add `lxml>=6.1.0` to `requirements.txt`. None of the four upstream packages that pull lxml in pin it, so pip picks whatever matches — usually the latest, but no guarantee. The explicit pin guarantees a safe floor on every install, including the GitHub Actions release runner.
Why no version bump
v1.13.12 is currently in Microsoft Store certification. Tagging v1.13.13 just for this would churn the cert queue. The fix rides along with the next natural release.
Other audit findings — no action needed
Test plan
🤖 Generated with Claude Code