Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions Doc/library/xml.etree.elementtree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -711,14 +711,14 @@ Functions

.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
xml_declaration=None, default_namespace=None, \
short_empty_elements=True)
validate=False, short_empty_elements=True)

Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
generate a Unicode string (otherwise, a bytestring is generated). *method*
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
*xml_declaration*, *default_namespace* and *short_empty_elements* has the same
*xml_declaration*, *default_namespace*, *validate* and *short_empty_elements* have the same
meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string
containing the XML data.

Expand All @@ -732,17 +732,20 @@ Functions
The :func:`tostring` function now preserves the attribute order
specified by the user.

.. versionchanged:: next
Added the *validate* parameter.


.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
xml_declaration=None, default_namespace=None, \
short_empty_elements=True)
validate=False, short_empty_elements=True)

Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
generate a Unicode string (otherwise, a bytestring is generated). *method*
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
*xml_declaration*, *default_namespace* and *short_empty_elements* has the same
*xml_declaration*, *default_namespace*, *validate* and *short_empty_elements* have the same
meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded
strings containing the XML data. It does not guarantee any specific sequence,
except that ``b"".join(tostringlist(element)) == tostring(element)``.
Expand All @@ -759,6 +762,9 @@ Functions
The :func:`tostringlist` function now preserves the attribute order
specified by the user.

.. versionchanged:: next
Added the *validate* parameter.


.. function:: XML(text, parser=None)

Expand Down Expand Up @@ -1186,7 +1192,7 @@ ElementTree Objects

.. method:: write(file, encoding="us-ascii", xml_declaration=None, \
default_namespace=None, method="xml", *, \
short_empty_elements=True)
validate=False, short_empty_elements=True)

Writes the element tree to a file, as XML. *file* is a file name, or a
:term:`file object` opened for writing. *encoding* [1]_ is the output
Expand All @@ -1197,6 +1203,15 @@ ElementTree Objects
*default_namespace* sets the default XML namespace (for "xmlns").
*method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
``"xml"``).

If *validate* is true, check that all characters are legal,
that element and attribute names are valid, and that the content
of comments, processing instructions and HTML elements
like ``<script>`` do not contain illegal sequences according
to the selected *method* (``"xml"`` or ``"html"``).
Raise :exc:`ValueError` if any check fails.
By default, or if *method* is ``"text"``, no validation is performed.

The keyword-only *short_empty_elements* parameter controls the formatting
of elements that contain no content. If ``True`` (the default), they are
emitted as a single self-closed tag, otherwise they are emitted as a pair
Expand All @@ -1216,6 +1231,9 @@ ElementTree Objects
The :meth:`write` method now preserves the attribute order specified
by the user.

.. versionchanged:: next
Added the *validate* parameter.


This is the XML file that is going to be manipulated::

Expand Down
11 changes: 11 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1830,6 +1830,17 @@ xml
(Contributed by Serhiy Storchaka in :gh:`139489`.)


xml.etree.ElementTree
---------------------
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should now move this entry to Doc/whatsnew/3.16.rst.


* Add the *validate* option to functions
:func:`~xml.etree.ElementTree.tostring`,
:func:`~xml.etree.ElementTree.tostringlist`, and the
:meth:`Element.write <xml.etree.ElementTree.ElementTree.write>` method,
which allows to validate the element or element tree before serialization.
(Contributed by Serhiy Storchaka in :gh:`xxxxxx`.)


xml.parsers.expat
-----------------

Expand Down
188 changes: 188 additions & 0 deletions Lib/test/test_xml_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1387,6 +1387,194 @@ def test_attlist_default(self):
{'{http://www.w3.org/XML/1998/namespace}lang': 'eng'})


class XMLValidationTest(unittest.TestCase):

def check(self, elem, expected=None):
self.assertRaises(ValueError,
ET.tostring, elem, validate=True)
ET.tostring(elem) # no exception

def test_invalid_comment(self):
self.check(ET.Comment('a--b'))
self.check(ET.Comment(' B+, B, or B-'))

def test_invalid_processing_instruction(self):
self.check(ET.PI(''))
self.check(ET.PI('0'))
self.check(ET.PI('a/b'))
self.check(ET.PI('foo\xa0bar'))
self.check(ET.PI('xml'))
self.check(ET.PI('xml', 'encoding="UTF-8"'))
self.check(ET.PI('foo', 'a?>b'))
self.check(ET.PI('foo', '\x00'))
self.check(ET.PI('foo', '\ud8ff'))
self.check(ET.PI('foo', '\ufffe'))

def test_invalid_tag(self):
self.check(ET.Element(''))
self.check(ET.Element('0'))
self.check(ET.Element('a/b'))
self.check(ET.Element(ET.QName('')))
self.check(ET.Element(ET.QName('0')))
self.check(ET.Element(ET.QName('a/b')))

def test_invalid_attr_name(self):
self.check(ET.Element('tag', attrib={'': 'value'}))
self.check(ET.Element('tag', attrib={'0': 'value'}))
self.check(ET.Element('tag', attrib={'a/b': 'value'}))
self.check(ET.Element('tag', attrib={ET.QName(''): 'value'}))
self.check(ET.Element('tag', attrib={ET.QName('0'): 'value'}))
self.check(ET.Element('tag', attrib={ET.QName('a/b'): 'value'}))

def test_invalid_attr_value(self):
self.check(ET.Element('tag', attrib={'key': '\x00'}))
self.check(ET.Element('tag', attrib={'key': '\ud8ff'}))
self.check(ET.Element('tag', attrib={'key': '\ufffe'}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\x00')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\ud8ff')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\ufffe')}))
Comment on lines +1429 to +1435
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and several other methods could use subTests if you think it's an improvement, e.g.:

Suggested change
def test_invalid_attr_value(self):
self.check(ET.Element('tag', attrib={'key': '\x00'}))
self.check(ET.Element('tag', attrib={'key': '\ud8ff'}))
self.check(ET.Element('tag', attrib={'key': '\ufffe'}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\x00')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\ud8ff')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\ufffe')}))
@support.subTests('value', ('\x00', '\ud8ff', '\ufffe'))
def test_invalid_attr_value(self, value):
self.check(ET.Element('tag', attrib={'key': value}))
self.check(ET.Element('tag', attrib={'key': ET.QName(value)}))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about it.

The drawback is that this can make tests less flexible. It will be difficult to add different assertions in the same method. We also cannot use ET outside the method body, it is only defined at test running time.


def test_invalid_text(self):
elem = ET.Element('tag')
elem.text = '\x00'
self.check(elem)
elem.text = '\ud8ff'
self.check(elem)
elem.text = '\ufffe'
self.check(elem)

def test_invalid_tail(self):
elem = ET.Element('tag')
elem.tail = '\x00'
self.check(elem)
elem.tail = '\ud8ff'
self.check(elem)
elem.tail = '\ufffe'
self.check(elem)

def test_invalid_text_without_tag(self):
elem = ET.Element(None)
elem.text = '\x00'
self.check(elem)
elem.text = '\ud8ff'
self.check(elem)
elem.text = '\ufffe'
self.check(elem)

def test_invalid_subelements(self):
elem = ET.Element('tag')
subelem = ET.SubElement(elem, 'subtag')
ET.SubElement(subelem, '\x00')
self.check(elem)
elem.tag = None
self.check(elem)

def test_invalid_namespace_uri(self):
self.check(ET.Element('{\x00}tag'))
self.check(ET.Element('{\ud8ff}tag'))
self.check(ET.Element('{\ufffe}tag'))
self.check(ET.Element(ET.QName('\x00', 'tag')))
self.check(ET.Element(ET.QName('\ud8ff', 'tag')))
self.check(ET.Element(ET.QName('\ufffe', 'tag')))


class HTMLValidationTest(unittest.TestCase):
Comment thread
serhiy-storchaka marked this conversation as resolved.

def check(self, elem, expected=None):
self.assertRaises(ValueError,
ET.tostring, elem, method='html', validate=True)
ET.tostring(elem, method='html') # no exception

def test_invalid_comment(self):
self.check(ET.Comment('>'))
self.check(ET.Comment('->'))
self.check(ET.Comment('a-->b'))
self.check(ET.Comment('a--!>b'))
self.check(ET.Comment('a\x00b'))

def test_invalid_processing_instruction(self):
self.check(ET.PI('a>b'))
self.check(ET.PI('a\x00b'))

def test_invalid_tag(self):
self.check(ET.Element(''))
self.check(ET.Element('?'))
self.check(ET.Element('!'))
self.check(ET.Element('0'))
self.check(ET.Element(' a'))
self.check(ET.Element('a b'))
self.check(ET.Element('a\nb'))
self.check(ET.Element('a/b'))
self.check(ET.Element('a>b'))
self.check(ET.Element('a\x00b'))
self.check(ET.Element(ET.QName('')))
self.check(ET.Element(ET.QName('0')))
self.check(ET.Element(ET.QName('a/b')))

def test_invalid_attr_name(self):
self.check(ET.Element('tag', attrib={'': 'value'}))
self.check(ET.Element('tag', attrib={'a/b': 'value'}))
self.check(ET.Element('tag', attrib={'a=b': 'value'}))
self.check(ET.Element('tag', attrib={ET.QName(''): 'value'}))
self.check(ET.Element('tag', attrib={ET.QName('a/b'): 'value'}))

def test_invalid_attr_value(self):
self.check(ET.Element('tag', attrib={'key': '\x00'}))
self.check(ET.Element('tag', attrib={'key': ET.QName('\x00')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('a"b')}))
self.check(ET.Element('tag', attrib={'key': ET.QName('a&b')}))

def test_invalid_text(self):
elem = ET.Element('tag')
elem.text = '\x00'
self.check(elem)

def test_invalid_tail(self):
elem = ET.Element('tag')
elem.tail = '\x00'
self.check(elem)

def test_invalid_text_without_tag(self):
elem = ET.Element(None)
elem.text = '\x00'
self.check(elem)

def test_invalid_subelements(self):
elem = ET.Element('tag')
subelem = ET.SubElement(elem, 'subtag')
ET.SubElement(subelem, '\x00')
self.check(elem)
elem.tag = None
self.check(elem)

def test_invalid_namespace_uri(self):
self.check(ET.Element('{\x00}tag'))
self.check(ET.Element(ET.QName('\x00', 'tag')))

@support.subTests('tag', ("script", "style", "xmp", "iframe", "noembed", "noframes"))
def test_invalid_cdata_content(self, tag):
elem = ET.Element(tag.upper())
elem.text = 'a</%s>b' % tag.title()
self.check(elem)
elem.text = 'a</%s b' % tag.title()
self.check(elem)
elem.text = 'a</%s/b' % tag.title()
self.check(elem)
elem.text = 'a\x00b'
self.check(elem)

@support.subTests('tag', ("script", "style", "xmp", "iframe", "noembed", "noframes"))
def test_cdata_subelements(self, tag):
elem = ET.Element(tag)
ET.SubElement(elem, 'subtag')
self.check(elem)

def test_invalid_plaintext_content(self):
elem = ET.Element('plaintext')
elem.text = 'a\x00b'
self.check(elem)


class IterparseTest(unittest.TestCase):
Comment thread
serhiy-storchaka marked this conversation as resolved.
# Test iterparse interface.

Expand Down
Loading
Loading