Skip to content

fix(deps): pin lxml >=6.1.0 — close PYSEC-2026-87 (XXE in DOCX/PPTX/EPUB/PDF parsing)#66

Open
nelsonduarte wants to merge 1 commit into
mainfrom
fix/lxml-xxe-pysec-2026-87
Open

fix(deps): pin lxml >=6.1.0 — close PYSEC-2026-87 (XXE in DOCX/PPTX/EPUB/PDF parsing)#66
nelsonduarte wants to merge 1 commit into
mainfrom
fix/lxml-xxe-pysec-2026-87

Conversation

@nelsonduarte
Copy link
Copy Markdown
Owner

Summary

`pip-audit` flagged lxml 6.0.2 as vulnerable to PYSEC-2026-87 — an XXE (XML External Entity) bug. lxml is transitive via `python-docx`, `python-pptx`, `ebooklib`, and `pikepdf` — all four of which we use, so vulnerable lxml ships inside the .exe (verified via the PyInstaller xref).

The bug

Pre-6.1.0 lxml parsers, with default `resolve_entities=True`, resolve external entities. A crafted DOCX / PPTX / EPUB / PDF-XMP can embed references to `/etc/passwd`, `~/.ssh/id_rsa`, etc., and the parser will inline those contents. Classic XXE.

PDFApps is a tool that opens untrusted files from users' machines, so the surface is real. Severity: medium (requires user interaction).

Fix

Add `lxml>=6.1.0` to `requirements.txt`. None of the four upstream packages that pull lxml in pin it, so pip picks whatever matches — usually the latest, but no guarantee. The explicit pin guarantees a safe floor on every install, including the GitHub Actions release runner.

Why no version bump

v1.13.12 is currently in Microsoft Store certification. Tagging v1.13.13 just for this would churn the cert queue. The fix rides along with the next natural release.

Other audit findings — no action needed

Pkg CVE Why we don't ship vulnerable code
urllib3 2.6.3 PYSEC-2026-141/142 We don't import `requests` or `urllib3` (verified: 0 hits in app/, pdfapps.py, installer.py, uninstaller.py).
idna 3.11 CVE-2026-45409 "idna.py" in the PyInstaller xref is stdlib `encodings/idna`, not the third-party `idna` package.
pip 26.0.1 CVE-2026-3219, 6357 Build-time tool, never bundled.
pytest 9.0.2 CVE-2025-71176 Only used under `tests/`, never imported at runtime.

Test plan

  • CI green (CodeQL + GitGuardian)
  • After merge, the next release build picks up lxml 6.1.x automatically — verify by running `pip-audit` on the v1.13.13 venv when we tag.

🤖 Generated with Claude Code

A pip-audit sweep flagged lxml 6.0.2 as vulnerable to PYSEC-2026-87.
lxml is pulled in transitively by python-docx, python-pptx, ebooklib,
and pikepdf — all four of which we use, so the vulnerable lxml ships
inside the PDFApps .exe (confirmed via the PyInstaller xref).

The bug: lxml's two parsers, in their default configuration
(resolve_entities=True), resolve external entities. A crafted DOCX,
PPTX, EPUB, or a PDF with hostile XMP metadata can therefore embed
references to /etc/passwd, ~/.ssh/id_rsa, etc., and the parser will
inline those contents on parse — classic XML External Entity (XXE)
attack. Fixed upstream in 6.1.0.

PDFApps is exactly the kind of tool that opens untrusted files from
users' machines, so the attack surface is real. Severity: medium
(requires user to open a malicious file).

Fix: add an explicit `lxml>=6.1.0` line to requirements.txt. None of
the four packages that pull lxml in pin it, so pip will pick whatever
matches — usually the latest. The explicit pin guarantees the safe
floor on every install, including the GitHub Actions release runner
that produces the shipped MSIX / installer.

Other findings from the same sweep (all NOT shipping in the .exe and
so no action needed):
- urllib3 2.6.3 / idna 3.11 — only present as dev deps (no requests
  imports in our code; "idna" entry in the PyInstaller xref is the
  stdlib encodings/idna module, not the third-party package).
- pip 26.0.1 — build-time tool, never bundled.
- pytest 9.0.2 — only used under tests/, never in runtime modules.

No version bump in this commit. The fix will ride along with the next
natural release (currently v1.13.12 is in cert; tagging v1.13.13 just
for this would churn the Store certification queue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant