Fix exiftool JSON decoding to use UTF-8#2067
Conversation
noezhiya-dot
left a comment
There was a problem hiding this comment.
Correct one-liner fix. ExifTool's JSON output is always UTF-8 per its spec, so decoding with the locale encoding is wrong on any system where the locale isn't UTF-8 (Windows cp936/GBK, etc.). This would cause UnicodeDecodeError on metadata containing non-ASCII characters like author names or descriptions with accented characters.
The fix is minimal and exactly right — just hardcode utf-8 decoding since that's what ExifTool guarantees. LGTM.
noezhiya-dot
left a comment
There was a problem hiding this comment.
This change hardcodes UTF-8 instead of using the system locale encoding. While this is likely correct for most modern exiftool outputs (which are UTF-8), there's a tradeoff:
Pros:
- Fixes the immediate issue where locale-dependent decoding fails on systems with non-UTF-8 locales
- ExifTool's JSON output is typically UTF-8 regardless of system locale
Consideration:
- If there are edge cases where exiftool outputs metadata in the system's native encoding (e.g., legacy files with non-UTF-8 EXIF tags), this could theoretically break. In practice, this is rare and the current behavior is already broken for those cases on non-UTF-8 locales.
A defensive approach might be to try UTF-8 first and fall back to the locale encoding, but that adds complexity for what's likely a non-issue. The straightforward UTF-8 approach is reasonable.
Consider adding a brief comment explaining why UTF-8 is used instead of the locale encoding, so future maintainers don't revert this thinking it was an oversight.
Description
ExifTool always outputs JSON in UTF-8, but exiftool_metadata() decoded the output using locale.getpreferredencoding(False).
On systems where the locale encoding is not UTF-8 (for example Windows cp936/GBK), this can raise UnicodeDecodeError when metadata contains non-ASCII characters.
This PR decodes the JSON output explicitly as UTF-8.
Fixes #1972