Skip to content

Comments

fix: 处理配置文件中的 UTF-8 BOM 编码问题#5376

Merged
Soulter merged 2 commits intoAstrBotDevs:masterfrom
tangsenfei:fix/handle-utf8-bom-in-config
Feb 23, 2026
Merged

fix: 处理配置文件中的 UTF-8 BOM 编码问题#5376
Soulter merged 2 commits intoAstrBotDevs:masterfrom
tangsenfei:fix/handle-utf8-bom-in-config

Conversation

@tangsenfei
Copy link
Contributor

@tangsenfei tangsenfei commented Feb 23, 2026

问题描述

在 Windows 系统上,某些文本编辑器(如记事本)保存 JSON 文件时会自动添加 UTF-8 BOM(字节顺序标记)。这会导致 json.decoder.JSONDecodeError: Unexpected UTF-8 BOM 错误,使 AstrBot 无法正常启动。

解决方案

在解析 JSON 配置文件之前,添加防御性检查,如果内容以 UTF-8 BOM(\ufeff)开头,则将其移除。

改动内容

  • 修改文件:astrbot/core/config/astrbot_config.py
  • 新增 3 行代码,用于处理配置文件内容开头的 BOM

测试情况

  • √ 本地已验证:使用带 BOM 的 cmd_config.json 测试,AstrBot 正常启动
  • √ 向后兼容:不影响无 BOM 的配置文件
  • √ 对现有功能无破坏性变更

复现步骤

  1. 在 Windows 上使用记事本编辑 data/cmd_config.json
  2. 保存后文件会带有 UTF-8 BOM
  3. 启动 AstrBot 会报错 JSONDecodeError: Unexpected UTF-8 BOM
  4. 应用此修复后,AstrBot 正常启动

检查清单

  • √ 这不是破坏性变更
  • √ 代码已本地测试通过
  • √ 未引入新的依赖库
  • √ 代码符合项目规范

Summary by Sourcery

Bug Fixes:

  • 在对配置文件内容进行 JSON 解析之前,先去除开头的 UTF-8 BOM,以避免在使用 Windows 编辑的文件上出现 JSONDecodeError
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Strip leading UTF-8 BOM from configuration file contents before JSON parsing to avoid JSONDecodeError on Windows-edited files.

Problem:
On Windows, some text editors (like Notepad) automatically add UTF-8 BOM
to JSON files when saving. This causes json.decoder.JSONDecodeError:
"Unexpected UTF-8 BOM" and AstrBot fails to start when cmd_config.json
contains BOM.

Solution:
Add defensive check to strip UTF-8 BOM (\ufeff) if present before
parsing JSON configuration file.

Impact:
- Improves robustness and cross-platform compatibility
- No breaking changes to existing functionality
- Fixes startup failure when configuration file has UTF-8 BOM encoding

Relates-to: Windows editor compatibility issues
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 23, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tangsenfei, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决 AstrBot 在 Windows 环境下读取配置文件时遇到的 UTF-8 BOM 编码问题。通过在加载 JSON 配置文件内容时,自动检测并移除文件开头的 UTF-8 BOM,确保即使配置文件由特定编辑器保存,也能被 json 模块正确解析,从而避免了 JSONDecodeError 错误,提高了应用的兼容性和启动稳定性。

Highlights

  • 解决 UTF-8 BOM 编码问题: 修复了在 Windows 系统上,由记事本等编辑器保存的带有 UTF-8 BOM 的 JSON 配置文件导致 json.decoder.JSONDecodeError 的问题。
  • 增强配置解析的健壮性: 在解析 JSON 配置文件内容之前,增加了对 UTF-8 BOM 的检测和移除逻辑,确保配置文件能被正确加载。
  • 兼容性与稳定性: 此改动向后兼容,不影响现有无 BOM 的配置文件,且对现有功能无破坏性变更,提升了 AstrBot 在不同环境下的启动稳定性。
Changelog
  • astrbot/core/config/astrbot_config.py
    • 增加了处理 UTF-8 BOM 的逻辑,以避免 JSON 解析错误。
Activity
  • 暂无特定活动记录。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot
Copy link

dosubot bot commented Feb 23, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体性反馈:

  • 由于 open(..., encoding="utf-8-sig") 本身就会去除 UTF-8 的 BOM,因此显式的 startswith('\ufeff') 检查是多余的;建议删除手动去除 BOM 的逻辑,或者去掉 utf-8-sig 编码设置,只保留其中一种机制即可。
面向 AI Agent 的提示
Please address the comments from this code review:

## Overall Comments
- Since `open(..., encoding="utf-8-sig")` already strips a UTF-8 BOM, the explicit `startswith('\ufeff')` check is redundant; consider removing the manual stripping or the `utf-8-sig` encoding and rely on just one mechanism.

## Individual Comments

### Comment 1
<location> `astrbot/core/config/astrbot_config.py:53-58` </location>
<code_context>

         with open(config_path, encoding="utf-8-sig") as f:
             conf_str = f.read()
+            # Handle UTF-8 BOM if present
+            if conf_str.startswith('\ufeff'):
+                conf_str = conf_str[1:]
             conf = json.loads(conf_str)
</code_context>

<issue_to_address>
**suggestion:** 使用 `encoding="utf-8-sig"` 时,手动去除 BOM 是多余的。

`encoding="utf-8-sig"` 已经会自动去除开头的 UTF-8 BOM,因此在正常使用场景下,这里的 `startswith('\ufeff')` 检查和切片操作实际上不会起作用。除非你有明确的需求去处理某些非标准的 BOM 位置,否则可以安全地删除这部分逻辑,让代码保持更简洁。

```suggestion
        with open(config_path, encoding="utf-8-sig") as f:
            conf_str = f.read()
            conf = json.loads(conf_str)
```
</issue_to_address>

Sourcery 对开源项目是免费的 —— 如果你觉得我们的 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据这些反馈改进后续的 Review。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • Since open(..., encoding="utf-8-sig") already strips a UTF-8 BOM, the explicit startswith('\ufeff') check is redundant; consider removing the manual stripping or the utf-8-sig encoding and rely on just one mechanism.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Since `open(..., encoding="utf-8-sig")` already strips a UTF-8 BOM, the explicit `startswith('\ufeff')` check is redundant; consider removing the manual stripping or the `utf-8-sig` encoding and rely on just one mechanism.

## Individual Comments

### Comment 1
<location> `astrbot/core/config/astrbot_config.py:53-58` </location>
<code_context>

         with open(config_path, encoding="utf-8-sig") as f:
             conf_str = f.read()
+            # Handle UTF-8 BOM if present
+            if conf_str.startswith('\ufeff'):
+                conf_str = conf_str[1:]
             conf = json.loads(conf_str)
</code_context>

<issue_to_address>
**suggestion:** The manual BOM stripping is redundant when using `encoding="utf-8-sig"`.

Using `encoding="utf-8-sig"` already strips any leading UTF-8 BOM, so this `startswith('\ufeff')` check and slice never do anything in normal usage. Unless you have a documented need to handle some nonstandard BOM placement, you can safely remove this logic to keep the code simpler.

```suggestion
        with open(config_path, encoding="utf-8-sig") as f:
            conf_str = f.read()
            conf = json.loads(conf_str)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 53 to 58
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
# Handle UTF-8 BOM if present
if conf_str.startswith('\ufeff'):
conf_str = conf_str[1:]
conf = json.loads(conf_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: 使用 encoding="utf-8-sig" 时,手动去除 BOM 是多余的。

encoding="utf-8-sig" 已经会自动去除开头的 UTF-8 BOM,因此在正常使用场景下,这里的 startswith('\ufeff') 检查和切片操作实际上不会起作用。除非你有明确的需求去处理某些非标准的 BOM 位置,否则可以安全地删除这部分逻辑,让代码保持更简洁。

Suggested change
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
# Handle UTF-8 BOM if present
if conf_str.startswith('\ufeff'):
conf_str = conf_str[1:]
conf = json.loads(conf_str)
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
conf = json.loads(conf_str)
Original comment in English

suggestion: The manual BOM stripping is redundant when using encoding="utf-8-sig".

Using encoding="utf-8-sig" already strips any leading UTF-8 BOM, so this startswith('\ufeff') check and slice never do anything in normal usage. Unless you have a documented need to handle some nonstandard BOM placement, you can safely remove this logic to keep the code simpler.

Suggested change
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
# Handle UTF-8 BOM if present
if conf_str.startswith('\ufeff'):
conf_str = conf_str[1:]
conf = json.loads(conf_str)
with open(config_path, encoding="utf-8-sig") as f:
conf_str = f.read()
conf = json.loads(conf_str)

@dosubot dosubot bot added the area:core The bug / feature is about astrbot's core, backend label Feb 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

该拉取请求旨在解决 Windows 系统下 JSON 配置文件包含 UTF-8 BOM 导致解析失败的问题。通过使用 encoding="utf-8-sig",Python 已经能够自动处理并移除 BOM。因此,新增的手动检查逻辑是冗余的,建议移除以保持代码简洁。

Comment on lines 55 to 57
# Handle UTF-8 BOM if present
if conf_str.startswith('\ufeff'):
conf_str = conf_str[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在 Python 中,使用 encoding="utf-8-sig" 打开文件时,解码器会自动识别并剥离 UTF-8 BOM(字节顺序标记)。因此,conf_str = f.read() 返回的字符串已经不包含 \ufeff 字符了。这段手动的 startswith 检查和切片操作是冗余的,建议移除这三行代码以保持代码整洁。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-aibot 感谢建议!理论上 utf-8-sig 确实应该自动处理 BOM。

但在实际测试中发现:

  • Windows 记事本保存的 JSON 文件带有 BOM
  • 使用 encoding="utf-8-sig" 仍然报错 JSONDecodeError: Unexpected UTF-8 BOM
  • 添加手动检查后问题解决

可能是特定 Python 版本或环境下的问题。保留这段防御性代码可以提高鲁棒性,建议保留。

@tangsenfei
Copy link
Contributor Author

@sourcery-aibot 感谢建议!理论上 utf-8-sig 确实应该自动处理 BOM。

但在实际测试中发现:

  • Windows 记事本保存的 JSON 文件带有 BOM
  • 使用 encoding="utf-8-sig" 仍然报错 JSONDecodeError: Unexpected UTF-8 BOM
  • 添加手动检查后问题解决

可能是特定 Python 版本或环境下的问题。保留这段防御性代码可以提高鲁棒性,建议保留。

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 23, 2026
Fix single quote to double quote to comply with project code style.
@Soulter Soulter merged commit 351895a into AstrBotDevs:master Feb 23, 2026
6 checks passed
astrbot-doc-agent bot pushed a commit to AstrBotDevs/AstrBot-docs that referenced this pull request Feb 23, 2026
@astrbot-doc-agent
Copy link

Generated docs update PR (pending manual review):
AstrBotDevs/AstrBot-docs#141
Trigger: PR merged


AI change summary:

  • zh/faq.md:新增「启动时报错 JSONDecodeError: Unexpected UTF-8 BOM」FAQ,说明原因(Windows 记事本保存添加 BOM)及解决方法。
  • en/faq.md:同步新增对应英文 FAQ 条目。
  • i18n:中英文文档同步更新。

Experimental bot notice:

  • This output is generated by AstrBot-Doc-Agent for review only.
  • It does not represent the final documentation form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants