1616This module provides functions for encoding binary data to printable
1717ASCII characters and decoding such encodings back to binary data.
1818This includes the :ref: `encodings specified in <base64-rfc-4648 >`
19- :rfc: `4648 ` (Base64, Base32 and Base16)
20- and the non-standard :ref: `Base85 encodings <base64-base-85 >`.
19+ :rfc: `4648 ` (Base64, Base32 and Base16), the :ref: `Base85 encoding
20+ <base64-base-85>` specified in `PDF 2.0
21+ <https://pdfa.org/resource/iso-32000-2/> `_, and non-standard variants
22+ of Base85 used elsewhere.
2123
2224There are two interfaces provided by this module. The modern interface
2325supports encoding :term: `bytes-like objects <bytes-like object> ` to ASCII
@@ -284,19 +286,28 @@ POST request.
284286Base85 Encodings
285287-----------------
286288
287- Base85 encoding is not formally specified but rather a de facto standard,
288- thus different systems perform the encoding differently.
289+ Base85 encoding is a family of algorithms which represent four bytes
290+ using five ASCII characters. Originally implemented in the Unix
291+ ``btoa(1) `` utility, a version of it was later adopted by Adobe in the
292+ PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
293+ This version, in both its ``btoa `` and PDF variants, is implemented by
294+ :func: `a85encode `.
289295
290- The :func: ` a85encode ` and :func: ` b85encode ` functions in this module are two implementations of
291- the de facto standard. You should call the function with the Base85
292- implementation used by the software you intend to work with .
296+ A separate version, using a different output character set, was
297+ defined as an April Fool's joke in :rfc: ` 1924 ` but is now used by Git
298+ and other software. This version is implemented by :func: ` b85encode ` .
293299
294- The two functions present in this module differ in how they handle the following:
300+ Finally, a third version, using yet another output character set
301+ designed for safe inclusion in programming language strings, is
302+ defined by ZeroMQ and implemented here by :func: `z85encode `.
295303
296- * Whether to include enclosing ``<~ `` and ``~> `` markers
297- * Whether to include newline characters
298- * The set of ASCII characters used for encoding
299- * Handling of null bytes
304+ The functions present in this module differ in how they handle the following:
305+
306+ * Whether to include and expect enclosing ``<~ `` and ``~> `` markers.
307+ * Whether to fold the input into multiple lines.
308+ * The set of ASCII characters used for encoding.
309+ * Compact encodings of sequences of spaces and null bytes.
310+ * The encoding of zero-padding bytes applied to the input.
300311
301312Refer to the documentation of the individual functions for more information.
302313
@@ -307,18 +318,22 @@ Refer to the documentation of the individual functions for more information.
307318
308319 *foldspaces * is an optional flag that uses the special short sequence 'y'
309320 instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
310- feature is not supported by the " standard" Ascii85 encoding.
321+ feature is not supported by the standard encoding used in PDF .
311322
312323 If *wrapcol * is non-zero, insert a newline (``b'\n' ``) character
313324 after at most every *wrapcol * characters.
314325 If *wrapcol * is zero (default), do not insert any newlines.
315326
316- If *pad * is true, the input is padded with ``b'\0' `` so its length is a
317- multiple of 4 bytes before encoding.
318- Note that the ``btoa `` implementation always pads.
327+ *pad * controls whether zero-padding applied to the end of the input
328+ is fully retained in the output encoding, as done by ``btoa ``,
329+ producing an exact multiple of 5 bytes of output. This is not part
330+ of the standard encoding used in PDF, as it does not preserve the
331+ length of the data.
319332
320- *adobe * controls whether the encoded byte sequence is framed with ``<~ ``
321- and ``~> ``, which is used by the Adobe implementation.
333+ *adobe * controls whether the encoded byte sequence is framed with
334+ ``<~ `` and ``~> ``, as in a PostScript base-85 string literal. Note
335+ that while ASCII85Decode streams in PDF documents *must * be
336+ terminated with ``~> ``, they *must not * use a leading ``<~ ``.
322337
323338 .. versionadded :: 3.4
324339
@@ -330,10 +345,12 @@ Refer to the documentation of the individual functions for more information.
330345
331346 *foldspaces * is a flag that specifies whether the 'y' short sequence
332347 should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
333- This feature is not supported by the "standard" Ascii85 encoding.
348+ This feature is not supported by the standard Ascii85 encoding used in
349+ PDF and PostScript.
334350
335- *adobe * controls whether the input sequence is in Adobe Ascii85 format
336- (i.e. is framed with <~ and ~>).
351+ *adobe * controls whether the ``<~ `` and ``~> `` markers are
352+ present. While the leading ``<~ `` is not required, the input must
353+ end with ``~> ``, or a :exc: `ValueError ` is raised.
337354
338355 *ignorechars * should be a :term: `bytes-like object ` containing characters
339356 to ignore from the input.
@@ -356,8 +373,11 @@ Refer to the documentation of the individual functions for more information.
356373 Encode the :term: `bytes-like object ` *b * using base85 (as used in e.g.
357374 git-style binary diffs) and return the encoded :class: `bytes `.
358375
359- If *pad * is true, the input is padded with ``b'\0' `` so its length is a
360- multiple of 4 bytes before encoding.
376+ The input is padded with ``b'\0' `` so its length is a multiple of 4
377+ bytes before encoding. If *pad * is true, all the resulting
378+ characters are retained in the output, which will always be a
379+ multiple of 5 bytes, and thus the length of the data may not be
380+ preserved on decoding.
361381
362382 If *wrapcol * is non-zero, insert a newline (``b'\n' ``) character
363383 after at most every *wrapcol * characters.
@@ -372,8 +392,7 @@ Refer to the documentation of the individual functions for more information.
372392.. function :: b85decode(b, *, ignorechars=b'', canonical=False)
373393
374394 Decode the base85-encoded :term: `bytes-like object ` or ASCII string *b * and
375- return the decoded :class: `bytes `. Padding is implicitly removed, if
376- necessary.
395+ return the decoded :class: `bytes `.
377396
378397 *ignorechars * should be a :term: `bytes-like object ` containing characters
379398 to ignore from the input.
@@ -392,11 +411,12 @@ Refer to the documentation of the individual functions for more information.
392411.. function :: z85encode(s, pad=False, *, wrapcol=0)
393412
394413 Encode the :term: `bytes-like object ` *s * using Z85 (as used in ZeroMQ)
395- and return the encoded :class: `bytes `. See `Z85 specification
396- <https://rfc.zeromq.org/spec/32/> `_ for more information.
414+ and return the encoded :class: `bytes `.
397415
398- If *pad * is true, the input is padded with ``b'\0' `` so its length is a
399- multiple of 4 bytes before encoding.
416+ The input is padded with ``b'\0' `` so its length is a multiple of 4
417+ bytes before encoding. If *pad * is true, all the resulting
418+ characters are retained in the output, which will always be a
419+ multiple of 5 bytes, as required by the ZeroMQ standard.
400420
401421 If *wrapcol * is non-zero, insert a newline (``b'\n' ``) character
402422 after at most every *wrapcol * characters.
@@ -414,8 +434,7 @@ Refer to the documentation of the individual functions for more information.
414434.. function :: z85decode(s, *, ignorechars=b'', canonical=False)
415435
416436 Decode the Z85-encoded :term: `bytes-like object ` or ASCII string *s * and
417- return the decoded :class: `bytes `. See `Z85 specification
418- <https://rfc.zeromq.org/spec/32/> `_ for more information.
437+ return the decoded :class: `bytes `.
419438
420439 *ignorechars * should be a :term: `bytes-like object ` containing characters
421440 to ignore from the input.
@@ -499,3 +518,11 @@ recommended to review the security section for any code deployed to production.
499518 Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
500519 base64 encoding.
501520
521+ `ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/ >`_
522+ Section 7.4.3, "ASCII85Decode Filter," provides the definition
523+ of the Ascii85 encoding used in PDF and PostScript, including
524+ the output character set and the details of data length preservation
525+ using zero-padding and partial output groups.
526+
527+ `ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/ >`_
528+ The "Formal Specification" section provides the character set used in Z85.
0 commit comments