fix(xlsx): tolerate legacy showZeroes sheet views by he-yufeng · Pull Request #2064 · microsoft/markitdown

he-yufeng · 2026-06-03T11:25:33Z

Fixes #2063.

Some XLSX files still contain the worksheet view attribute name showZeroes. openpyxl 3.1+ expects showZeros, so loading those files raises a TypeError before MarkItDown can read the workbook.

This keeps the normal pandas/openpyxl path unchanged. If that path fails with the known showZeroes TypeError, MarkItDown rewrites worksheet XML entries in memory from showZeroes to showZeros and retries the read. The fallback is scoped to worksheet XML files and only runs for that specific compatibility error.

Validation:

.venv\Scripts\python.exe -m pytest packages\markitdown\tests\test_module_misc.py -q -k "xlsx_legacy_show_zeroes"
.venv\Scripts\python.exe -m pytest packages\markitdown\tests\test_module_vectors.py::test_convert_local -q
.venv\Scripts\python.exe -m py_compile packages\markitdown\src\markitdown\converters_xlsx_converter.py packages\markitdown\tests\test_module_misc.py
.venv\Scripts\python.exe -m ruff check packages\markitdown\src\markitdown\converters_xlsx_converter.py packages\markitdown\tests\test_module_misc.py
git diff --check

noezhiya-dot

Clean fix for the showZeroes/showZeros compatibility issue in openpyxl 3.1+.

The approach is solid:

Normal pandas/openpyxl read path stays unchanged (no performance impact)
Fallback only triggers on the specific TypeError containing 'showZeroes'
XML repair is scoped to xl/worksheets/*.xml files (doesn't touch other parts of the ZIP)
In-memory BytesIO repair avoids writing temp files to disk
The byte-level replace is safe here because showZeroes/showZeros is an attribute name that won't collide with cell data

The regression test is well-constructed — it builds a real XLSX, injects the legacy attribute, and verifies the full conversion pipeline recovers correctly.

One minor consideration: the string check is case-sensitive, which is correct since Python attribute names are case-sensitive. If openpyxl ever changes the error message format, this would silently stop triggering the fallback, but that's an acceptable trade-off.

LGTM.

noezhiya-dot

Clean fix for the showZeroes/showZeros compatibility issue in openpyxl 3.1+. The normal read path stays unchanged. Fallback only triggers on the specific TypeError. XML repair is scoped to worksheet files only. In-memory BytesIO repair avoids temp files. Regression test is well-constructed. LGTM.

fix(xlsx): tolerate legacy showZeroes sheet views

d823f6c

noezhiya-dot approved these changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(xlsx): tolerate legacy showZeroes sheet views#2064

fix(xlsx): tolerate legacy showZeroes sheet views#2064
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/xlsx-sheetview-showzeroes

he-yufeng commented Jun 3, 2026

Uh oh!

noezhiya-dot left a comment

Uh oh!

noezhiya-dot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

he-yufeng commented Jun 3, 2026

Uh oh!

noezhiya-dot left a comment

Choose a reason for hiding this comment

Uh oh!

noezhiya-dot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants