Skip to content

fix: PptxConverter tolerates None shape.text and notes text#2059

Open
alvinttang wants to merge 1 commit into
microsoft:mainfrom
alvinttang:fix/pptx-none-text-frame
Open

fix: PptxConverter tolerates None shape.text and notes text#2059
alvinttang wants to merge 1 commit into
microsoft:mainfrom
alvinttang:fix/pptx-none-text-frame

Conversation

@alvinttang

Copy link
Copy Markdown

Refs #1808.

PptxConverter.convert does unguarded + / .lstrip() on shape.text and notes_text_frame.text. python-pptx returns None for a text frame whose <a:r> run has no <a:t> child, and for certain third-party-generated decks. One malformed shape then fails the whole file with AttributeError: 'NoneType' object has no attribute 'lstrip' or TypeError: can only concatenate str (not "NoneType") to str.

Fix: treat None text as empty string at all three sites (title shape, body shape, notes frame).

RED

FAILED tests/test_pptx_none_text.py::test_pptx_converter_handles_none_shape_text
- PptxConverter threw AttributeError: 'NoneType' object has no attribute 'lstrip'
FAILED tests/test_pptx_none_text.py::test_pptx_converter_handles_none_notes_text
- PptxConverter threw TypeError: can only concatenate str (not "NoneType") to str

GREEN

tests/test_pptx_none_text.py::test_pptx_converter_handles_none_shape_text PASSED
tests/test_pptx_none_text.py::test_pptx_converter_handles_none_notes_text PASSED
2 passed in 0.65s

Full test_module_vectors.py (109 tests, covers real pptx conversion) still passes.

shape.text and notes_text_frame.text can return None for runs with no
<a:t> child or certain third-party decks. Treat None as empty string so
one malformed shape does not fail the whole file.

Refs microsoft#1808

@noezhiya-dot noezhiya-dot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for a real-world crash. python-pptx can return None for shape.text when a run has no text child, and third-party PPTX generators produce these files in the wild.

The fix is clean: treat None as empty string at all three access points (title, body, notes). The RED→GREEN test approach is good — you demonstrated the exact failures before fixing.

The regression tests are well-crafted with monkeypatch to simulate the None cases without needing fixture files that are hard to create. LGTM.

@noezhiya-dot noezhiya-dot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid defensive fix. The root cause is clear — shape.text can return None when runs have no <a:t> child element, and the existing code assumed it was always a string.

The fix is minimal and correct:

  • text = shape.text or "" normalizes None to empty string in one place
  • Both title and body paths use the normalized text variable
  • Notes text frame gets the same treatment

Tests are well-structured:

  • Uses monkeypatch to simulate None returns, which is the right approach for unit testing
  • Covers both shape.text and notes_text_frame.text paths
  • The in-memory PPTX builder for the notes test is a nice touch

One small note: the test file name test_pptx_none_text.py follows a different convention than the existing test_pptx.py — consider whether the maintainers prefer separate test files or appending to the existing one. Either way, the tests themselves are good.

Approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants