Block XML entity expansion in VRT and GDALMetadata parsers (#1579)#1584
Merged
Conversation
`xml.etree.ElementTree.fromstring` expands internal entities by default, so a crafted `.vrt` file or a hostile `GDALMetadata` tag (42112) in a TIFF could trigger billion-laughs DoS against any caller of `open_geotiff` or `read_vrt`. Route both parsers through a new `_safe_xml.safe_fromstring` helper that refuses DOCTYPE declarations (the only path to entity expansion in stdlib XML) and prefers `defusedxml` when installed. VRT and GDALMetadata payloads never carry DTDs in legitimate files, so this is loss-free for real inputs. The GDALMetadata branch falls through to an empty dict on rejection (matching the existing ParseError fallback for non-essential metadata); the VRT branch raises ValueError because the entire file is the payload. Test coverage: 9 new tests under TestVRTXMLEntityExpansion, TestGDALMetadataXMLEntityExpansion, and TestSafeXMLHelper exercise the DOCTYPE rejection path, the end-to-end `open_geotiff(.vrt)` route, and backwards-compat with well-formed inputs.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR mitigates XML entity-expansion (billion-laughs, CWE-776) denial-of-service risks in the GeoTIFF VRT and GDALMetadata parsing paths by introducing a shared “safe XML” helper and adding regression tests to ensure DOCTYPE-based payloads are rejected.
Changes:
- Add
xrspatial/geotiff/_safe_xml.safe_fromstring, rejecting<!DOCTYPE ...>and preferringdefusedxmlwhen available. - Route VRT parsing (
_vrt.parse_vrt) and GDALMetadata parsing (_geotags._parse_gdal_metadata) through the safe helper, with VRT raising and GDALMetadata dropping metadata on rejection. - Add security-focused tests covering VRT/GDALMetadata DOCTYPE rejection and the helper’s behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
xrspatial/geotiff/_safe_xml.py |
Introduces centralized safe XML parsing (DOCTYPE rejection + optional defusedxml). |
xrspatial/geotiff/_vrt.py |
Switches VRT parsing to use the safe XML helper. |
xrspatial/geotiff/_geotags.py |
Switches GDALMetadata XML parsing to use the safe XML helper and drops metadata on rejection. |
xrspatial/geotiff/tests/test_security.py |
Adds regression tests ensuring DOCTYPE payloads are rejected and normal payloads still parse. |
.claude/sweep-security-state.csv |
Records the security sweep/audit status update for the geotiff module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
8
to
+15
| import os | ||
| import xml.etree.ElementTree as ET | ||
| from dataclasses import dataclass, field | ||
|
|
||
| import numpy as np | ||
|
|
||
| from ._safe_xml import safe_fromstring | ||
|
|
- _safe_xml: decode bytes via BOM / leading-null detection before
scanning for DOCTYPE so UTF-16 / UTF-32 payloads can't bypass the
ASCII regex.
- _safe_xml: align module docstring with actual fallback
(ET.fromstring, not XMLParser).
- _vrt: drop unused `xml.etree.ElementTree as ET` import.
- test_security: tighten DOCTYPE-rejection assertion to `== {}` and
add 7 parametrized wide-encoding (UTF-16 / UTF-32 / UTF-8-BOM)
cases.
Contributor
Author
|
Thanks @copilot — all four comments were valid. Pushed 8d1a563:
Security suite: 46 passed (was 39). |
Copilot stopped work on behalf of
brendancol due to an error
May 11, 2026 14:37
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1579.
Summary
xml.etree.ElementTree.fromstringin_vrt.parse_vrtand_geotags._parse_gdal_metadataexpanded internal XML entities by default. A crafted.vrtfile or a hostileGDALMetadatatag (42112) inside a TIFF could OOM the host via billion-laughs (CWE-776).xrspatial/geotiff/_safe_xml.safe_fromstringrefuses any input declaring a<!DOCTYPE ...>(the only path to entity expansion in stdlib XML) and prefersdefusedxmlwhen available.ParseErrorhandling). VRT raisesValueErrorbecause the entire input is the payload.Reproducer (pre-fix)
Test plan
pytest xrspatial/geotiff/tests/test_security.py -q-> 39 passed (30 baseline + 9 new).test_features.py::TestPalettefailures unrelated to this change (matplotlib deepcopy recursion on Python 3.14 / matplotlib path objects)..claude/sweep-security-state.csvupdated with the geotiff sweep row.