diff --git a/peps/pep-0819.rst b/peps/pep-0819.rst index 339d8cecbdb..c3f7be1a1a0 100644 --- a/peps/pep-0819.rst +++ b/peps/pep-0819.rst @@ -118,8 +118,8 @@ Specification JSON Format Core Metadata File ------------------------------ -A new optional but recommended file ``METADATA.json`` shall be introduced as a -metadata file for Python distribution packages. If generated, the ``METADATA.json`` file +A new required file ``METADATA.json`` shall be introduced as a +metadata file for Python distribution packages. The ``METADATA.json`` file MUST be placed in the same directory as the current email formatted ``METADATA`` or ``PKG-INFO`` file. @@ -200,8 +200,8 @@ encoded core metadata file MUST be served at JSON Format Wheel Metadata File ------------------------------- -A new optional but recommended file ``WHEEL.json`` shall be introduced as a -JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json`` +A new required file ``WHEEL.json`` shall be introduced as a +JSON encoded version of the ``WHEEL`` file. The ``WHEEL.json`` file MUST be placed in the same directory as the current key-value formatted ``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. The wheel file @@ -235,6 +235,20 @@ JSON schema for wheel metadata has been produced. This schema will be updated with each revision to the wheel metadata specification. The schema is available in :ref:`0819-wheel-json-schema`. +Handling of Duplicate Keys in JSON Package Metadata +--------------------------------------------------- + +JSON does not define semantics for duplicate keys in a JSON document. However, +different parsers treat duplicate keys differently. Tools SHOULD NOT generate +duplicate keys in JSON package metadata. However, it is likely duplicate keys +may be generated anyway, so tools consuming JSON package metadata should handle +duplicate keys gracefully. In the interest of compatibility and matching the +behavior of the Python :mod:`!json` module, if duplicate keys are encountered, +the second duplicate key should be used as the data for that key. This matches +the behavior of many JSON parsers such as those in Python, Rust, Go, and the +ECMAScript Standard. Tools MAY warn about duplicate keys in JSON package +metadata. + Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files ------------------------------------------------------------------ @@ -272,25 +286,20 @@ or ``WHEEL`` files. Security Implications ===================== -One attack vector with JSON encoded core metadata is if the JSON payload is -designed to consume excessive memory or CPU resources in a denial of service -(DoS) attack. While this attack is not likely to affect users whom can cancel -resource-intensive interactive operations, it may be an issue for package -indexes. - -There are several mitigations that can be made to prevent this: +Maliciously crafted JSON encoded metadata files have the potential to cause a +denial of service attack due to the quadratic parsing time complexity of +reading integer strings as reported in +`CVE-2020-10735 `__. No +package metadata fields are currently encoded as integers, so this risk can be +mitigated by decoding integer values as strings when parsing JSON package +metadata. -#. The length of the JSON payload can be restricted to a reasonable size. -#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int` - and :class:`float` values to avoid quadratic number parsing time complexity - attacks. -#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python - 3.15+ that will allow it to be configured to restrict the nesting of JSON - payloads to a reasonable depth. Core metadata currently has a maximum depth - of 2 to encode mapping and list fields. +If using the Python :mod:`!json` module, parsing integers as strings +can be accomplished by setting the ``parse_int`` keyword argument to +:func:`json.load` or :func:`json.loads` to :class:`str`. -With these mitigations in place, concerns about denial of service attacks with -JSON encoded core metadata are minimal. +With this mitigation in place, concerns about denial of service attacks with +JSON encoded package metadata are considered minimal. Reference Implementation @@ -326,6 +335,15 @@ format, JSON has been chosen for a few reasons: #. JSON is fast to parse and emit. #. JSON schemas are JSON native and commonly used. +Make the JSON Package Metadata Files Optional +--------------------------------------------- + +A future major revision of the wheel format specification may make the +``METADATA.json`` and ``WHEEL.json`` files the default. Therefore, tools should +begin generating and consuming JSON package metadata files to ensure tools are +prepared for the future transition to the JSON package metadata files being +the default. + Open Issues ===========