fix: 处理配置文件中的 UTF-8 BOM 编码问题#5376
Conversation
Problem: On Windows, some text editors (like Notepad) automatically add UTF-8 BOM to JSON files when saving. This causes json.decoder.JSONDecodeError: "Unexpected UTF-8 BOM" and AstrBot fails to start when cmd_config.json contains BOM. Solution: Add defensive check to strip UTF-8 BOM (\ufeff) if present before parsing JSON configuration file. Impact: - Improves robustness and cross-platform compatibility - No breaking changes to existing functionality - Fixes startup failure when configuration file has UTF-8 BOM encoding Relates-to: Windows editor compatibility issues
Summary of ChangesHello @tangsenfei, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在解决 AstrBot 在 Windows 环境下读取配置文件时遇到的 UTF-8 BOM 编码问题。通过在加载 JSON 配置文件内容时,自动检测并移除文件开头的 UTF-8 BOM,确保即使配置文件由特定编辑器保存,也能被 Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并给出了一些整体性反馈:
- 由于
open(..., encoding="utf-8-sig")本身就会去除 UTF-8 的 BOM,因此显式的startswith('\ufeff')检查是多余的;建议删除手动去除 BOM 的逻辑,或者去掉utf-8-sig编码设置,只保留其中一种机制即可。
面向 AI Agent 的提示
Please address the comments from this code review:
## Overall Comments
- Since `open(..., encoding="utf-8-sig")` already strips a UTF-8 BOM, the explicit `startswith('\ufeff')` check is redundant; consider removing the manual stripping or the `utf-8-sig` encoding and rely on just one mechanism.
## Individual Comments
### Comment 1
<location> `astrbot/core/config/astrbot_config.py:53-58` </location>
<code_context>
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
+ # Handle UTF-8 BOM if present
+ if conf_str.startswith('\ufeff'):
+ conf_str = conf_str[1:]
conf = json.loads(conf_str)
</code_context>
<issue_to_address>
**suggestion:** 使用 `encoding="utf-8-sig"` 时,手动去除 BOM 是多余的。
`encoding="utf-8-sig"` 已经会自动去除开头的 UTF-8 BOM,因此在正常使用场景下,这里的 `startswith('\ufeff')` 检查和切片操作实际上不会起作用。除非你有明确的需求去处理某些非标准的 BOM 位置,否则可以安全地删除这部分逻辑,让代码保持更简洁。
```suggestion
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
conf = json.loads(conf_str)
```
</issue_to_address>帮我变得更有用!请对每条评论点 👍 或 👎,我会根据这些反馈改进后续的 Review。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- Since
open(..., encoding="utf-8-sig")already strips a UTF-8 BOM, the explicitstartswith('\ufeff')check is redundant; consider removing the manual stripping or theutf-8-sigencoding and rely on just one mechanism.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Since `open(..., encoding="utf-8-sig")` already strips a UTF-8 BOM, the explicit `startswith('\ufeff')` check is redundant; consider removing the manual stripping or the `utf-8-sig` encoding and rely on just one mechanism.
## Individual Comments
### Comment 1
<location> `astrbot/core/config/astrbot_config.py:53-58` </location>
<code_context>
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
+ # Handle UTF-8 BOM if present
+ if conf_str.startswith('\ufeff'):
+ conf_str = conf_str[1:]
conf = json.loads(conf_str)
</code_context>
<issue_to_address>
**suggestion:** The manual BOM stripping is redundant when using `encoding="utf-8-sig"`.
Using `encoding="utf-8-sig"` already strips any leading UTF-8 BOM, so this `startswith('\ufeff')` check and slice never do anything in normal usage. Unless you have a documented need to handle some nonstandard BOM placement, you can safely remove this logic to keep the code simpler.
```suggestion
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
conf = json.loads(conf_str)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| with open(config_path, encoding="utf-8-sig") as f: | ||
| conf_str = f.read() | ||
| # Handle UTF-8 BOM if present | ||
| if conf_str.startswith('\ufeff'): | ||
| conf_str = conf_str[1:] | ||
| conf = json.loads(conf_str) |
There was a problem hiding this comment.
suggestion: 使用 encoding="utf-8-sig" 时,手动去除 BOM 是多余的。
encoding="utf-8-sig" 已经会自动去除开头的 UTF-8 BOM,因此在正常使用场景下,这里的 startswith('\ufeff') 检查和切片操作实际上不会起作用。除非你有明确的需求去处理某些非标准的 BOM 位置,否则可以安全地删除这部分逻辑,让代码保持更简洁。
| with open(config_path, encoding="utf-8-sig") as f: | |
| conf_str = f.read() | |
| # Handle UTF-8 BOM if present | |
| if conf_str.startswith('\ufeff'): | |
| conf_str = conf_str[1:] | |
| conf = json.loads(conf_str) | |
| with open(config_path, encoding="utf-8-sig") as f: | |
| conf_str = f.read() | |
| conf = json.loads(conf_str) |
Original comment in English
suggestion: The manual BOM stripping is redundant when using encoding="utf-8-sig".
Using encoding="utf-8-sig" already strips any leading UTF-8 BOM, so this startswith('\ufeff') check and slice never do anything in normal usage. Unless you have a documented need to handle some nonstandard BOM placement, you can safely remove this logic to keep the code simpler.
| with open(config_path, encoding="utf-8-sig") as f: | |
| conf_str = f.read() | |
| # Handle UTF-8 BOM if present | |
| if conf_str.startswith('\ufeff'): | |
| conf_str = conf_str[1:] | |
| conf = json.loads(conf_str) | |
| with open(config_path, encoding="utf-8-sig") as f: | |
| conf_str = f.read() | |
| conf = json.loads(conf_str) |
| # Handle UTF-8 BOM if present | ||
| if conf_str.startswith('\ufeff'): | ||
| conf_str = conf_str[1:] |
There was a problem hiding this comment.
@sourcery-aibot 感谢建议!理论上 utf-8-sig 确实应该自动处理 BOM。
但在实际测试中发现:
- Windows 记事本保存的 JSON 文件带有 BOM
- 使用
encoding="utf-8-sig"仍然报错JSONDecodeError: Unexpected UTF-8 BOM - 添加手动检查后问题解决
可能是特定 Python 版本或环境下的问题。保留这段防御性代码可以提高鲁棒性,建议保留。
|
@sourcery-aibot 感谢建议!理论上 但在实际测试中发现:
可能是特定 Python 版本或环境下的问题。保留这段防御性代码可以提高鲁棒性,建议保留。 |
Fix single quote to double quote to comply with project code style.
|
Generated docs update PR (pending manual review): AI change summary:
Experimental bot notice:
|
问题描述
在 Windows 系统上,某些文本编辑器(如记事本)保存 JSON 文件时会自动添加 UTF-8 BOM(字节顺序标记)。这会导致
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM错误,使 AstrBot 无法正常启动。解决方案
在解析 JSON 配置文件之前,添加防御性检查,如果内容以 UTF-8 BOM(
\ufeff)开头,则将其移除。改动内容
astrbot/core/config/astrbot_config.py测试情况
cmd_config.json测试,AstrBot 正常启动复现步骤
data/cmd_config.jsonJSONDecodeError: Unexpected UTF-8 BOM检查清单
Summary by Sourcery
Bug Fixes:
JSONDecodeError。Original summary in English
Summary by Sourcery
Bug Fixes: