Skip to content

Comments

修复Telegram语音问题#5389

Open
realquantumcookie wants to merge 1 commit intoAstrBotDevs:masterfrom
realquantumcookie:patch-1
Open

修复Telegram语音问题#5389
realquantumcookie wants to merge 1 commit intoAstrBotDevs:masterfrom
realquantumcookie:patch-1

Conversation

@realquantumcookie
Copy link

@realquantumcookie realquantumcookie commented Feb 23, 2026

此PR解决Telegram的语音消息format是ogg而不是wav的问题,在调用openai api STT模型时会出现问题

Modifications / 改动点

在TG Adaptor增加了OGG下载和转换成wav逻辑

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

修改前:

image

修改后:
image


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

Bug Fixes:

  • 修复由于 Telegram 语音消息以 OGG 格式而非 WAV 格式提供而导致的转录问题。
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Fix transcription issues caused by Telegram voice messages being provided in OGG format instead of WAV.

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 23, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @realquantumcookie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此PR旨在解决Telegram语音消息与OpenAI API语音转文本(STT)模型之间的兼容性问题。由于Telegram语音消息采用OGG格式,导致STT模型无法正确处理。本次更改在Telegram适配器中引入了新功能,能够下载OGG格式的语音消息并将其转换为WAV格式,从而确保与STT服务的顺畅集成。

Highlights

  • Telegram语音消息格式兼容性: 解决了Telegram语音消息为OGG格式导致OpenAI API语音转文本(STT)模型处理失败的问题。
  • OGG到WAV转换: 在Telegram适配器中增加了OGG语音消息的下载和转换为WAV格式的逻辑,以确保与STT服务的兼容性。
Changelog
  • astrbot/core/platform/sources/telegram/tg_adapter.py
    • 增加了处理OGG语音消息的逻辑,包括下载并将其转换为WAV格式,然后传递给STT模型。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. label Feb 23, 2026
@dosubot
Copy link

dosubot bot commented Feb 23, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了两个问题,并给出了一些总体反馈:

  • 直接使用 file_basename 作为临时文件名,在多个语音消息同名的情况下可能会产生冲突;建议在临时路径和输出路径中添加一个唯一后缀(例如消息 ID 或 UUID),以避免并发请求中的竞争条件。
  • 新增的下载/转换逻辑没有处理失败场景(例如下载过程中网络错误或转换错误),目前会直接中断流程;建议用 try/except 包裹这部分逻辑,并在失败时回退到原始的 file.file_path 或返回一个清晰的错误响应。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using `file_basename` directly for the temp filename can cause collisions if multiple voice messages share the same name; consider adding a unique suffix (e.g., message ID or a UUID) to the temp and output paths to avoid race conditions in concurrent requests.
- The new download/convert logic does not handle failures (e.g., network error during download or conversion error) and will currently break the flow; consider wrapping this block in try/except and falling back to the original `file.file_path` or returning a clear error response.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="384-391" />
<code_context>
             file = await update.message.voice.get_file()
+
+            file_basename = os.path.basename(file.file_path)
+            temp_dir = get_astrbot_temp_path()
+            temp_path = os.path.join(temp_dir, file_basename)
+            temp_path = await download_image_by_url(file.file_path, path=temp_path)
+            path_wav = os.path.join(
+                temp_dir,
+                f"{file_basename}.wav",
+            )
+            path_wav = await convert_audio_to_wav(temp_path, path_wav)
+
             message.message = [
</code_context>
<issue_to_address>
**suggestion (performance):** 考虑一下这些临时音频文件在何时以及如何被清理,以避免磁盘空间被占满。

原始下载文件(`temp_path`)和转换后的文件(`path_wav`)都会保留在 `temp_dir` 中且没有清理逻辑。对于短时处理的场景,它们应该写入到一个会被定期清理的位置,或者在处理完成后显式删除;否则,对于长期运行的 bot,这些文件会逐渐占满磁盘。

推荐实现方式:

```python
import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64

```

```python
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

```

这里假设所有需要读取 WAV 文件的处理逻辑都会在同一个调用栈中同步完成(即在 `finally` 执行之前完成)。如果你的处理流水线会延迟读取 `Comp.Record.file`/`.url`(例如在另一个任务或 worker 中),就需要将清理逻辑移动到生命周期的更后阶段——在语音消息完全处理完之后——或者重构 `Comp.Record`,使其使用内存数据而不是文件路径。
</issue_to_address>

### Comment 2
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="383-385" />
<code_context>
         elif update.message.voice:
             file = await update.message.voice.get_file()
+
+            file_basename = os.path.basename(file.file_path)
+            temp_dir = get_astrbot_temp_path()
+            temp_path = os.path.join(temp_dir, file_basename)
+            temp_path = await download_image_by_url(file.file_path, path=temp_path)
+            path_wav = os.path.join(
</code_context>
<issue_to_address>
**issue (bug_risk):** 临时文件仅使用 basename 作为文件名,在并发或重复处理场景下可能发生冲突。

由于 `temp_path` 只是来自 `os.path.basename(file.file_path)`,不同的语音消息如果有相同的 basename(或者同一条消息被并发处理),就可能在 `temp_dir` 中互相覆盖,从而导致竞争条件或损坏的音频文件。请在临时文件名中增加一个唯一标识(例如 UUID、时间戳或 update/message ID)。
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进评审质量。
Original comment in English

Hey - I've found 2 issues, and left some high level feedback:

  • Using file_basename directly for the temp filename can cause collisions if multiple voice messages share the same name; consider adding a unique suffix (e.g., message ID or a UUID) to the temp and output paths to avoid race conditions in concurrent requests.
  • The new download/convert logic does not handle failures (e.g., network error during download or conversion error) and will currently break the flow; consider wrapping this block in try/except and falling back to the original file.file_path or returning a clear error response.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using `file_basename` directly for the temp filename can cause collisions if multiple voice messages share the same name; consider adding a unique suffix (e.g., message ID or a UUID) to the temp and output paths to avoid race conditions in concurrent requests.
- The new download/convert logic does not handle failures (e.g., network error during download or conversion error) and will currently break the flow; consider wrapping this block in try/except and falling back to the original `file.file_path` or returning a clear error response.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="384-391" />
<code_context>
             file = await update.message.voice.get_file()
+
+            file_basename = os.path.basename(file.file_path)
+            temp_dir = get_astrbot_temp_path()
+            temp_path = os.path.join(temp_dir, file_basename)
+            temp_path = await download_image_by_url(file.file_path, path=temp_path)
+            path_wav = os.path.join(
+                temp_dir,
+                f"{file_basename}.wav",
+            )
+            path_wav = await convert_audio_to_wav(temp_path, path_wav)
+
             message.message = [
</code_context>
<issue_to_address>
**suggestion (performance):** Consider how/when these temporary audio files are cleaned up to avoid disk buildup.

Both the original download (`temp_path`) and converted file (`path_wav`) remain in `temp_dir` with no cleanup. For short‑lived processing, they should either be written to a location that’s periodically cleaned or explicitly deleted once processing is complete; otherwise, long‑running bots will gradually fill the disk.

Suggested implementation:

```python
import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64

```

```python
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

```

This change assumes that any processing that needs to read the WAV file happens synchronously within the same call stack (i.e., before the `finally` block executes). If your pipeline defers reading `Comp.Record.file`/`.url` until later (e.g., in another task or worker), you'll need to move the cleanup logic to a later point in the lifecycle—after the voice message has been fully processed—or refactor `Comp.Record` to work with in-memory data instead of a file path.
</issue_to_address>

### Comment 2
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="383-385" />
<code_context>
         elif update.message.voice:
             file = await update.message.voice.get_file()
+
+            file_basename = os.path.basename(file.file_path)
+            temp_dir = get_astrbot_temp_path()
+            temp_path = os.path.join(temp_dir, file_basename)
+            temp_path = await download_image_by_url(file.file_path, path=temp_path)
+            path_wav = os.path.join(
</code_context>
<issue_to_address>
**issue (bug_risk):** Using only the basename for temp files can cause collisions under concurrent or repeated processing.

Because `temp_path` comes only from `os.path.basename(file.file_path)`, different voice messages with the same basename (or concurrent handling of the same message) can overwrite each other in `temp_dir`, leading to race conditions or corrupted audio. Please add a uniqueness component (e.g., UUID, timestamp, or update/message ID) to the temp filename.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +384 to +391
temp_dir = get_astrbot_temp_path()
temp_path = os.path.join(temp_dir, file_basename)
temp_path = await download_image_by_url(file.file_path, path=temp_path)
path_wav = os.path.join(
temp_dir,
f"{file_basename}.wav",
)
path_wav = await convert_audio_to_wav(temp_path, path_wav)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): 考虑一下这些临时音频文件在何时以及如何被清理,以避免磁盘空间被占满。

原始下载文件(temp_path)和转换后的文件(path_wav)都会保留在 temp_dir 中且没有清理逻辑。对于短时处理的场景,它们应该写入到一个会被定期清理的位置,或者在处理完成后显式删除;否则,对于长期运行的 bot,这些文件会逐渐占满磁盘。

推荐实现方式:

import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

这里假设所有需要读取 WAV 文件的处理逻辑都会在同一个调用栈中同步完成(即在 finally 执行之前完成)。如果你的处理流水线会延迟读取 Comp.Record.file/.url(例如在另一个任务或 worker 中),就需要将清理逻辑移动到生命周期的更后阶段——在语音消息完全处理完之后——或者重构 Comp.Record,使其使用内存数据而不是文件路径。

Original comment in English

suggestion (performance): Consider how/when these temporary audio files are cleaned up to avoid disk buildup.

Both the original download (temp_path) and converted file (path_wav) remain in temp_dir with no cleanup. For short‑lived processing, they should either be written to a location that’s periodically cleaned or explicitly deleted once processing is complete; otherwise, long‑running bots will gradually fill the disk.

Suggested implementation:

import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

This change assumes that any processing that needs to read the WAV file happens synchronously within the same call stack (i.e., before the finally block executes). If your pipeline defers reading Comp.Record.file/.url until later (e.g., in another task or worker), you'll need to move the cleanup logic to a later point in the lifecycle—after the voice message has been fully processed—or refactor Comp.Record to work with in-memory data instead of a file path.

Comment on lines +383 to +385
file_basename = os.path.basename(file.file_path)
temp_dir = get_astrbot_temp_path()
temp_path = os.path.join(temp_dir, file_basename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): 临时文件仅使用 basename 作为文件名,在并发或重复处理场景下可能发生冲突。

由于 temp_path 只是来自 os.path.basename(file.file_path),不同的语音消息如果有相同的 basename(或者同一条消息被并发处理),就可能在 temp_dir 中互相覆盖,从而导致竞争条件或损坏的音频文件。请在临时文件名中增加一个唯一标识(例如 UUID、时间戳或 update/message ID)。

Original comment in English

issue (bug_risk): Using only the basename for temp files can cause collisions under concurrent or repeated processing.

Because temp_path comes only from os.path.basename(file.file_path), different voice messages with the same basename (or concurrent handling of the same message) can overwrite each other in temp_dir, leading to race conditions or corrupted audio. Please add a uniqueness component (e.g., UUID, timestamp, or update/message ID) to the temp filename.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR addresses the issue of Telegram voice messages being in OGG format, which causes problems with the OpenAI API STT model. It implements OGG download and WAV conversion logic in the TG Adaptor to enable proper transcription. However, the current implementation introduces a high-severity Path Traversal vulnerability. User-controlled filenames are used without sanitization, allowing files to be written to arbitrary locations. Please refer to the specific comments for remediation, which involves using securely generated filenames.

Comment on lines +383 to +391
file_basename = os.path.basename(file.file_path)
temp_dir = get_astrbot_temp_path()
temp_path = os.path.join(temp_dir, file_basename)
temp_path = await download_image_by_url(file.file_path, path=temp_path)
path_wav = os.path.join(
temp_dir,
f"{file_basename}.wav",
)
path_wav = await convert_audio_to_wav(temp_path, path_wav)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

A high-severity Path Traversal vulnerability exists here. The filename from file.file_path is used to construct local file paths without proper sanitization, which could allow an attacker to write files to arbitrary locations. Additionally, the use of download_image_by_url is misleading and incorrect for OGG audio files, as it internally calls save_temp_img which saves files as .jpg. While the provided suggestion addresses the path traversal by generating a secure, random filename and explicitly setting the .ogg extension for the temporary file, ensure that download_image_by_url correctly saves the file with the specified extension, or consider using a more generic file download function like download_file if download_image_by_url forces a .jpg extension.

            # To prevent path traversal, generate a random filename instead of using the one from the URL.
            safe_basename = uuid.uuid4().hex
            temp_dir = get_astrbot_temp_path()
            # Assume the downloaded voice file is in ogg format as per the PR description.
            temp_path = os.path.join(temp_dir, f\"{safe_basename}.ogg\")
            await download_image_by_url(file.file_path, path=temp_path)
            path_wav = os.path.join(
                temp_dir, f\"{safe_basename}.wav\"
            )
            path_wav = await convert_audio_to_wav(temp_path, path_wav)

temp_dir,
f"{file_basename}.wav",
)
path_wav = await convert_audio_to_wav(temp_path, path_wav)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

下载的临时OGG文件 (temp_path) 在转换为WAV格式后并未被删除。这可能导致临时文件在文件系统中累积,占用不必要的磁盘空间。建议在文件转换完成后清理这些临时文件。

            path_wav = await convert_audio_to_wav(temp_path, path_wav)
            # 清理临时OGG文件
            if os.path.exists(temp_path):
                os.remove(temp_path)

from astrbot.core.star.filter.command_group import CommandGroupFilter
from astrbot.core.star.star import star_map
from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

导入了 file_to_base64 函数,但根据当前PR的改动,该函数似乎并未在 tg_adapter.py 文件中使用。建议移除未使用的导入,以保持代码的整洁性。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant