pdf_rewrite_images() segfaults with shared image xrefs across many pages (buffer overflow)

### Description of the bug

Calling `doc.rewrite_images()` on a PDF where the same image xref is referenced from many pages causes a segmentation fault due to a buffer overflow in MuPDF's underlying C function pdf_rewrite_images.

The PDF attached has ~99 total image references across 39 pages, with a single image xref being reused (shared) on multiple pages. This appears to overflow an internal MuPDF buffer. The crash is deterministic and reproducible.

### How to reproduce the bug

```
import pymupdf
# Open any PDF where the same image xref is shared across many pages
# (e.g., a logo or watermark repeated on every page)
# The test PDF has ~99 image references across 39 pages.
doc = pymupdf.open("shared_image_xref.pdf")
# This segfaults:
doc.rewrite_images(dpi_threshold=150, dpi_target=100, quality=50)
```

Expected behavior: Images are rewritten/compressed without crashing.
Actual behavior: Segmentation fault (SIGSEGV) / memory corruption.

Workaround
Currently I bypass `doc.rewrite_images()` entirely and implement image rewriting per-xref using lower-level PyMuPDF APIs. But this is probably not ideal

```
import sys
import math
import pymupdf

def safe_rewrite_images(doc, dpi_target=None, dpi_threshold=None, quality=None, set_to_gray=False):
    """Workaround for segfault in doc.rewrite_images() with shared image xrefs."""
    if not (dpi_target or quality is not None or set_to_gray):
        return

    # Collect unique image xrefs and their smask info
    xref_info = {}
    for page in doc:
        for img in page.get_images(full=True):
            xref, smask = img[0], img[1]
            if xref > 0:
                xref_info.setdefault(xref, {"smask": smask, "min_dpi": float("inf")})

    # Calculate effective DPI for each xref across all page usages
    for page in doc:
        for info in page.get_image_info(hashes=False, xrefs=True):
            xref = info.get("xref", 0)
            if xref not in xref_info:
                continue
            bbox = info.get("bbox")
            w, h = info.get("width", 0), info.get("height", 0)
            if bbox and w > 0 and h > 0:
                disp_w = abs(bbox[2] - bbox[0])
                disp_h = abs(bbox[3] - bbox[1])
                if disp_w > 0 and disp_h > 0:
                    dpi = min(w / disp_w * 72, h / disp_h * 72)
                    if dpi < xref_info[xref]["min_dpi"]:
                        xref_info[xref]["min_dpi"] = dpi

    effective_threshold = max(dpi_threshold or 0, (dpi_target or 0) + 10) if dpi_target else None

    # Rewrite each image xref individually
    for xref, meta in xref_info.items():
        min_dpi = meta["min_dpi"]
        smask_xref = meta["smask"]

        needs_downscale = bool(
            dpi_target and effective_threshold
            and min_dpi != float("inf")
            and min_dpi > effective_threshold
        )
        if not needs_downscale and quality is None and not set_to_gray:
            continue

        try:
            pix = pymupdf.Pixmap(doc, xref)

            if set_to_gray and pix.colorspace and pix.colorspace.n > 1:
                pix = pymupdf.Pixmap(pymupdf.csGRAY, pix)
            elif pix.alpha:
                pix = pymupdf.Pixmap(pix.colorspace or pymupdf.csRGB, pix)

            if needs_downscale:
                ratio = min_dpi / dpi_target
                shrink_n = max(0, min(7, int(math.log2(ratio))))
                if shrink_n > 0:
                    pix.shrink(shrink_n)

            q = quality if quality is not None else 85
            jpeg_bytes = pix.tobytes("jpeg", jpg_quality=q)

            cs_name = "/DeviceGray" if pix.colorspace and pix.colorspace.n == 1 else "/DeviceRGB"
            smask_entry = f"/SMask {smask_xref} 0 R " if smask_xref else ""
            new_obj = (
                f"<</Type /XObject /Subtype /Image /BitsPerComponent 8"
                f" /ColorSpace {cs_name} /Filter /DCTDecode"
                f" /Height {pix.height} /Width {pix.width}"
                f" {smask_entry}>>"
            )
            doc.update_object(xref, new_obj)
            doc.update_stream(xref, jpeg_bytes, compress=0)
            pix = None

        except Exception as e:
            sys.stderr.write(f"[pymupdf] safe_rewrite_images xref {xref}: {e}\n")

```

PDF used: 



### PyMuPDF version

1.27.1

### Operating system

MacOS

### Python version

3.14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf_rewrite_images() segfaults with shared image xrefs across many pages (buffer overflow) #4918

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pdf_rewrite_images() segfaults with shared image xrefs across many pages (buffer overflow) #4918

Description

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions