Skip to content

Fix exiftool JSON decoding to use UTF-8#2067

Open
Vedant43hh wants to merge 1 commit into
microsoft:mainfrom
Vedant43hh:fix-exiftool-utf8-decoding
Open

Fix exiftool JSON decoding to use UTF-8#2067
Vedant43hh wants to merge 1 commit into
microsoft:mainfrom
Vedant43hh:fix-exiftool-utf8-decoding

Conversation

@Vedant43hh

Copy link
Copy Markdown

Description

ExifTool always outputs JSON in UTF-8, but exiftool_metadata() decoded the output using locale.getpreferredencoding(False).

On systems where the locale encoding is not UTF-8 (for example Windows cp936/GBK), this can raise UnicodeDecodeError when metadata contains non-ASCII characters.

This PR decodes the JSON output explicitly as UTF-8.

Fixes #1972

@noezhiya-dot noezhiya-dot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct one-liner fix. ExifTool's JSON output is always UTF-8 per its spec, so decoding with the locale encoding is wrong on any system where the locale isn't UTF-8 (Windows cp936/GBK, etc.). This would cause UnicodeDecodeError on metadata containing non-ASCII characters like author names or descriptions with accented characters.

The fix is minimal and exactly right — just hardcode utf-8 decoding since that's what ExifTool guarantees. LGTM.

@noezhiya-dot noezhiya-dot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change hardcodes UTF-8 instead of using the system locale encoding. While this is likely correct for most modern exiftool outputs (which are UTF-8), there's a tradeoff:

Pros:

  • Fixes the immediate issue where locale-dependent decoding fails on systems with non-UTF-8 locales
  • ExifTool's JSON output is typically UTF-8 regardless of system locale

Consideration:

  • If there are edge cases where exiftool outputs metadata in the system's native encoding (e.g., legacy files with non-UTF-8 EXIF tags), this could theoretically break. In practice, this is rare and the current behavior is already broken for those cases on non-UTF-8 locales.

A defensive approach might be to try UTF-8 first and fall back to the locale encoding, but that adds complexity for what's likely a non-issue. The straightforward UTF-8 approach is reasonable.

Consider adding a brief comment explaining why UTF-8 is used instead of the locale encoding, so future maintainers don't revert this thinking it was an oversight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: exiftool JSON output decoded with locale encoding instead of UTF-8

3 participants