Skip to content

fix:新增vLLM Embedding提供商,解决新版vllm部署bge-m3这类无法瘦身的embedding模型时,不允许传入dimensions参数,而原有的OpenAI Embedding会强制传入向量维度参数导致对vllm embedding的请求失败的问题。#8236

Open
Creeper3222 wants to merge 6 commits into
AstrBotDevs:masterfrom
Creeper3222:fix/vllm-embedding-compatibility

Conversation

@Creeper3222
Copy link
Copy Markdown

@Creeper3222 Creeper3222 commented May 19, 2026

This PR adds a dedicated built-in vllm_embedding provider to AstrBot and exposes it as a first-class Embedding provider in the WebUI.

Before this change, users who wanted to use vLLM's OpenAI-compatible Embedding endpoint had to either reuse openai_embedding or rely on extra runtime patch/plugin logic. In practice, that caused several problems:

  • there was no dedicated vLLM Embedding provider card in the Add Provider dialog;
  • the OpenAI-style dimensions habit is not always compatible with vLLM embedding endpoints;
  • model naming often needs to be aligned with vLLM served-model-name, which is not obvious from the existing provider options;
  • the overall configuration and troubleshooting flow was not intuitive for users who only wanted to use vLLM as an embedding backend.

这个 PR 为 AstrBot 新增了一个内置的 vllm_embedding 提供商,并在 WebUI 中将其作为独立的 Embedding 提供商暴露出来。
屏幕截图 2026-05-19 140538
屏幕截图 2026-05-19 141834

在此之前,如果用户想使用 vLLM 的 OpenAI-compatible Embedding 接口,通常只能复用 openai_embedding 或依赖额外的运行时 patch / 插件逻辑,实际会带来几个问题:

  • 使用旧的OpenAI Embedding模型提供商接入本地vLLM部署的bge-m3这类无法瘦身的embedding模型时,因为强制传入dimensions参数导致请求失败的问题
屏幕截图 2026-05-19 140500 屏幕截图 2026-05-18 230052
  • 在“新增模型提供商”里没有独立的 vLLM Embedding 卡片;
  • OpenAI 风格的 dimensions 传参习惯并不总是兼容 vLLM embedding 接口;
  • 配置模型名和 vLLM served-model-name 之间往往需要额外对齐;
  • 对只想把 vLLM 当作 embedding 后端的用户来说,配置和排障路径不够直观。

Modifications / 改动点

  • Added a new built-in provider source: astrbot/core/provider/sources/vllm_embedding_source.py.

  • Added vllm_embedding import wiring in astrbot/core/provider/manager.py, so AstrBot can load the provider type through the normal core provider path.

  • Added a vLLM Embedding provider template in astrbot/core/config/default.py, so the provider appears under Embedding in the Add Provider dialog.

  • Normalized the default values of the new provider template: keep id = vllm_embedding and timeout = 20, while leaving other visible fields blank and defaulting enable to false.

  • Implemented the provider behavior specifically for vLLM embedding compatibility instead of treating vLLM as a disguised OpenAI embedding provider:

    • skip the dimensions request parameter when sending embedding requests;
    • normalize embedding_api_base to the expected /v1 style endpoint;
    • try to align configured model names with vLLM served-model-name via /models;
    • support local/private endpoint direct transport and explicit proxy configuration;
    • infer/cache embedding dimensions for downstream vector DB usage.
  • Included the supporting config/UI changes already present on this branch so provider hints are surfaced more clearly in the configuration form:

    • astrbot/dashboard/routes/config.py
    • dashboard/src/components/shared/AstrBotConfig.vue
    • dashboard/src/i18n/locales/zh-CN/features/config-metadata.json
  • This change makes vLLM Embedding a first-class built-in provider instead of requiring users to overload openai_embedding for vLLM.

  • 新增内置 provider 文件:astrbot/core/provider/sources/vllm_embedding_source.py

  • astrbot/core/provider/manager.py 中加入 vllm_embedding 的导入分发逻辑,使其能走 AstrBot 本体正常的 provider 加载链路。

  • astrbot/core/config/default.py 中新增 vLLM Embedding provider 模板,使其自动出现在 WebUI 的 Embedding 提供商列表中。

  • 规范化了新 provider 的默认值:保留 id = vllm_embeddingtimeout = 20,其余可见字段默认留空,并将 enable 设为 false

  • 按 vLLM embedding 的实际兼容需求实现了内置 provider,而不是继续把 vLLM 当作“伪 OpenAI Embedding”来用:

    • 发送 embedding 请求时跳过 dimensions 参数;
    • 自动规范化 embedding_api_base
    • 通过 /models 尝试把配置模型名对齐到 vLLM served-model-name
    • 支持本地/内网端点直连和显式代理配置;
    • 支持 embedding 维度推断和缓存,便于后续向量库使用。
  • 同时纳入了该分支上已有的配置/UI 配套改动,用于在配置表单中更清晰地展示 provider hint:

    • astrbot/dashboard/routes/config.py
    • dashboard/src/components/shared/AstrBotConfig.vue
    • dashboard/src/i18n/locales/zh-CN/features/config-metadata.json
  • 这项改动的目标是把 vLLM Embedding 变成 AstrBot 本体中的一等内置 provider,而不是要求用户继续复用 openai_embedding

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Verification steps executed on a mirrored AstrBot copy before pushing this branch:

  1. Static validation
  • py_compile succeeded for:
    • astrbot/core/provider/sources/vllm_embedding_source.py
    • astrbot/core/provider/manager.py
    • astrbot/core/config/default.py
  1. Provider registration validation
  • Direct import of astrbot.core.provider.sources.vllm_embedding_source succeeded.
  • Registration result:
    • registered=True
    • class_name=VLLMEmbeddingProvider
  1. WebUI validation
  • vLLM Embedding appears in Embedding -> Add Provider.
  • The new provider form opens successfully.
  • The normalized defaults were verified manually:
    • ID = vllm_embedding
    • Enable = false
    • API Key = empty
    • API Base URL = empty
    • Embedding Model = empty
    • Embedding Dimensions = empty
    • Timeout = 20
    • Proxy = empty
  1. Functional validation
  • A vllm_embedding provider was created successfully in the mirrored AstrBot instance.
  • Manual testing confirmed the provider works after creation.
  1. Additional notes
  • No new third-party dependencies were introduced by this PR. The implementation uses existing project dependencies (openai, httpx) that are already present in both requirements.txt and pyproject.toml.

已执行的验证步骤(基于镜像 AstrBot 副本):

  1. 静态验证
  • 以下文件均已通过 py_compile
    • astrbot/core/provider/sources/vllm_embedding_source.py
    • astrbot/core/provider/manager.py
    • astrbot/core/config/default.py
  1. Provider 注册验证
  • 直接导入 astrbot.core.provider.sources.vllm_embedding_source 成功。
  • 注册结果为:
    • registered=True
    • class_name=VLLMEmbeddingProvider
  1. WebUI 验证
屏幕截图 2026-05-19 140538 image
  • Embedding -> Add Provider 中确认已经出现 vLLM Embedding 卡片。
  • 新建 provider 表单能够正常打开。
  • 已手动确认默认值规范化结果:
    • ID = vllm_embedding
    • Enable = false
    • API Key = 空
    • API Base URL = 空
    • Embedding Model = 空
    • Embedding Dimensions = 空
    • Timeout = 20
    • Proxy = 空
  1. 功能验证
  • 已在镜像 AstrBot 实例中成功创建 vllm_embedding provider。
  • 手动测试确认 provider 创建后可正常使用。
  • 知识库召回测试成功
屏幕截图 2026-05-19 140519 屏幕截图 2026-05-19 140936
  1. 依赖说明
  • 本 PR 未引入新的第三方依赖。实现使用的 openaihttpx 已经存在于项目的 requirements.txtpyproject.toml 中。

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add a dedicated vLLM Embedding provider and related config/UI updates to improve compatibility with vLLM-based embedding backends and enhance provider configuration capabilities.

New Features:

  • Introduce a built-in vLLM Embedding provider that skips unsupported dimensions parameters, normalizes API base URLs, and aligns model names with vLLM served-model-name.
  • Expose new NVIDIA Embedding and Ollama Embedding providers via default configuration templates.
  • Add MiniMax Token Plan as a chat completion provider in the default provider templates.

Enhancements:

  • Refine the OpenAI Embedding provider to automatically avoid incompatible dimensions usage with vLLM endpoints and infer embedding dimensions from common model names.
  • Extend provider manager dynamic import logic to support the new embedding and chat providers and improve log messages for provider loading and default selection.
  • Expand configuration metadata with new options for Firecrawl web search, sandbox CUA runtime support, fallback context window sizing, buffering of intermediate agent messages, and disabling anonymous metrics.
  • Improve default dashboard security settings by clearing the default password and adding fields for PBKDF2 password storage and password upgrade requirements.
  • Update default enabling of multiple message platform connectors and adjust various UI hints, labels, and descriptions for clearer configuration guidance.
  • Change embedding dimension detection behavior so the WebUI surfaces dimensions as hints without auto-writing them into provider configs, including a vLLM-adaptive fallback on specific errors.

@auto-assign auto-assign Bot requested review from Fridemn and LIghtJUNction May 19, 2026 06:58
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. labels May 19, 2026
@Creeper3222 Creeper3222 changed the title 新增vLLM Embedding提供商,解决新版vllm部署bge-m3这类无法瘦身的embedding模型时,不允许传入dimensions参数,而原有的OpenAI Embedding会强制传入向量维度参数导致对vllm embedding的请求失败的问题。 fix:新增vLLM Embedding提供商,解决新版vllm部署bge-m3这类无法瘦身的embedding模型时,不允许传入dimensions参数,而原有的OpenAI Embedding会强制传入向量维度参数导致对vllm embedding的请求失败的问题。 May 19, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant updates to the AstrBot configuration and provider management, including the addition of new embedding providers (NVIDIA, Ollama, vLLM) and support for MiniMax Token Plan. It also refactors the embedding dimension handling to improve compatibility with vLLM, which does not support the dimensions parameter. My review identified several areas for improvement: the manual dimension input requirement in the dashboard should be refined to support automatic filling for standard providers, redundant logic for dimension inference and error handling in embedding sources should be refactored into shared utilities, and a type mismatch in the vLLM embedding configuration template needs to be corrected to ensure successful validation.

Comment on lines +114 to +117
//[已禁用] 不再自动写入配置文件,仅显示提示
// providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
useToast().success("获取成功: " + response.data.data.embedding_dimensions)
useToast().info(`检测到维度: ${response.data.data.embedding_dimensions}。如需保存,请手动填入后点保存。`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处禁用了 embedding_dimensions 的自动填充逻辑,会导致所有 Embedding 提供商在点击“获取维度”后都需要用户手动输入,降低了用户体验。建议增加类型判断:如果返回值为数字,则继续执行自动填充;如果返回值为字符串(如 "vLLM-Adaptive"),则仅显示提示信息。

Comment on lines +208 to +216
model_dims = {
"bge-m3": 1024,
"bge-large-en-v1.5": 1024,
"bge-large-zh-v1.5": 1024,
"text-embedding-3-small": 1536,
"text-embedding-3-large": 3072,
"text-embedding-ada-002": 1536,
}
for model_key, dim in model_dims.items():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处定义的 model_dims 字典及其相关的维度推断逻辑与 vllm_embedding_source.py 中的 _COMMON_MODEL_DIMENSIONS 完全重复。根据通用规则,建议将此逻辑提取到公共工具模块中,以避免代码重复并提高可维护性。

References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

Comment on lines +90 to +156
except Exception as e:
# 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
# 尝试不带dimensions重试
error_msg = str(e).lower()
if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
logger.warning(
f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
try:
embedding = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs_retry,
)
logger.info(
"[OpenAI Embedding] Successfully retrieved embedding without dimensions parameter, marking as vLLM"
)
# 标记为vLLM以便后续调用也跳过dimensions
self._mark_as_vllm()
return embedding.data[0].embedding
except Exception as retry_error:
logger.error(
f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}"
)
raise retry_error
else:
raise

async def get_embeddings(self, text: list[str]) -> list[list[float]]:
"""批量获取文本的嵌入"""
kwargs = self._embedding_kwargs()
embeddings = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs,
)
return [item.embedding for item in embeddings.data]
try:
embeddings = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs,
)
return [item.embedding for item in embeddings.data]
except Exception as e:
# 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
# 尝试不带dimensions重试
error_msg = str(e).lower()
if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
logger.warning(
f"[OpenAI Embedding] Detected vLLM dimensions error in batch mode, retrying without dimensions: {e}"
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
try:
embeddings = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs_retry,
)
logger.info(
"[OpenAI Embedding] Successfully retrieved batch embeddings without dimensions parameter"
)
# 标记为vLLM以便后续调用也跳过dimensions
self._mark_as_vllm()
return [item.embedding for item in embeddings.data]
except Exception as retry_error:
logger.error(
f"[OpenAI Embedding] Batch retry without dimensions also failed: {retry_error}"
)
raise retry_error
else:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

get_embeddingget_embeddings 方法中针对 vLLM 维度错误的捕获、日志记录及重试逻辑高度重复。建议将这部分逻辑提取为一个通用的私有辅助方法(如 _request_with_vllm_retry),以减少冗余代码。

References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

Comment thread astrbot/core/config/default.py Outdated
"embedding_api_key": "",
"embedding_api_base": "",
"embedding_model": "",
"embedding_dimensions": "",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

vLLM Embedding 模板中的 embedding_dimensions 默认值为空字符串 "",但在配置元数据中该字段的类型定义为 int。这会导致在 WebUI 保存配置时,后端的 validate_config 校验因类型不匹配而失败。建议将其默认值设为 0

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • This PR mixes the new vLLM Embedding provider with a large number of unrelated default-config changes (dashboard password handling, enabling many channels by default, new CUA/sandbox and websearch options, wording tweaks, etc.); consider splitting the provider work into a focused PR so behavior changes to existing deployments are easier to review and roll back.
  • You now both add a dedicated vllm_embedding provider and add vLLM auto-detection/dimensions workarounds into openai_embedding_source.py (including the magic API key value "vllm"); it would be clearer and less surprising to keep vLLM-specific behavior confined to the new provider or explicitly document why OpenAI-style providers should also try to mutate behavior based on naming heuristics.
  • The new vLLM Embedding template hint is a hard-coded Chinese string instead of an i18n key like the nearby providers; aligning it with the existing localization system (and adding the translation entry) will keep the UI consistent across languages.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- This PR mixes the new vLLM Embedding provider with a large number of unrelated default-config changes (dashboard password handling, enabling many channels by default, new CUA/sandbox and websearch options, wording tweaks, etc.); consider splitting the provider work into a focused PR so behavior changes to existing deployments are easier to review and roll back.
- You now both add a dedicated `vllm_embedding` provider and add vLLM auto-detection/`dimensions` workarounds into `openai_embedding_source.py` (including the magic API key value `"vllm"`); it would be clearer and less surprising to keep vLLM-specific behavior confined to the new provider or explicitly document why OpenAI-style providers should also try to mutate behavior based on naming heuristics.
- The new `vLLM Embedding` template `hint` is a hard-coded Chinese string instead of an i18n key like the nearby providers; aligning it with the existing localization system (and adding the translation entry) will keep the UI consistent across languages.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="83-63" />
<code_context>
-            **kwargs,
-        )
-        return [item.embedding for item in embeddings.data]
+        try:
+            embeddings = await self.client.embeddings.create(
+                input=text,
+                model=self.model,
+                **kwargs,
+            )
+            return [item.embedding for item in embeddings.data]
+        except Exception as e:
+            # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
</code_context>
<issue_to_address>
**suggestion:** The vLLM `dimensions` retry logic is duplicated between single and batch embedding methods.

The vLLM-specific error handling (`matryoshka`/`dimensions` detection, stripping `dimensions` from kwargs, retry, and marking as vLLM) is duplicated in both `get_embedding` and `get_embeddings`. Please extract this into a shared helper (e.g., `_create_embeddings_with_optional_dimensions_retry(input, kwargs)`) that both methods call so vLLM compatibility logic is maintained in one place.

Suggested implementation:

```python
    async def _create_embeddings_with_optional_dimensions_retry(
        self,
        input: Any,
        **kwargs: Any,
    ):
        """
        调用 embeddings 接口,并在检测到 vLLM 对 dimensions/matryoshka 不兼容时,
        去掉 dimensions 参数重试一次。
        """
        try:
            return await self.client.embeddings.create(
                input=input,
                model=self.model,
                **kwargs,
            )
        except Exception as e:
            # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
            # 尝试不带dimensions重试
            error_msg = str(e).lower()
            if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
                logger.warning(
                    f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
                )
                kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
                # 如果需要标记为 vLLM 后端,可以在这里设置状态位,例如:self._is_vllm = True
                return await self.client.embeddings.create(
                    input=input,
                    model=self.model,
                    **kwargs_retry,
                )
            # 不是 vLLM 相关错误则继续抛出
            raise

    async def get_embedding(self, text: str) -> list[float]:
        """获取文本的嵌入"""
        kwargs = self._embedding_kwargs()
        embeddings = await self._create_embeddings_with_optional_dimensions_retry(
            input=text,
            **kwargs,
        )
        return embeddings.data[0].embedding

```

1. 在文件顶部确保已经导入 `Any`(如果还没有的话),例如:
   - `from typing import Any``import typing as t` 并把签名改为 `input: t.Any, **kwargs: t.Any` 以符合项目现有的类型导入风格。
2. 在批量嵌入方法(通常名为 `async def get_embeddings(...)` 或类似)中,删除当前的 `try/except` vLLM 处理逻辑,并改为:
   - 构造 `kwargs = self._embedding_kwargs()`
   - 调用 `embeddings = await self._create_embeddings_with_optional_dimensions_retry(input=texts, **kwargs)`
   - 返回 `[item.embedding for item in embeddings.data]`
3. 如果类中已有“标记为 vLLM 后端”的逻辑(例如 `self._is_vllm``self._mark_vllm_backend()`),请将该标记逻辑集中到 `_create_embeddings_with_optional_dimensions_retry` 的 vLLM 分支中,以避免在单条和批量方法中重复设置。
</issue_to_address>

### Comment 2
<location path="astrbot/dashboard/routes/config.py" line_range="914" />
<code_context>
+            if "matryoshka" in err_msg or "dimensions" in err_msg:
+                logger.info("Detected vLLM specific error, bypassing...")
+                # 伪造一个成功的响应,告知前端进入"兼容模式"
+                return Response().ok({"embedding_dimensions": "vLLM-Adaptive"}).__dict__
             return Response().error(f"获取嵌入维度失败: {e!s}").__dict__

</code_context>
<issue_to_address>
**issue (bug_risk):** Embedding dimension API now returns a string sentinel, which may break consumers expecting an integer.

The new vLLM path returns `{"embedding_dimensions": "vLLM-Adaptive"}`, changing the type from number to string. While the Vue config UI handles this, other consumers of `/api/config/provider/get_embedding_dim` may still expect an integer (e.g. schema validation or `int(...)` casts) and break.

To avoid that, consider keeping `embedding_dimensions` numeric and conveying the adaptive mode via a separate field, for example:
- `{ embedding_dimensions: 0, vllm_adaptive: true }`, or
- `{ embedding_dimensions: 0, mode: "vllm_adaptive" }`.

This preserves the existing type contract while still signaling vLLM adaptive behavior.
</issue_to_address>

### Comment 3
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="80" />
<code_context>
+        self._is_vllm_detected = True
+        logger.info("[OpenAI Embedding] Marked as vLLM (runtime detection via error)")

     async def get_embedding(self, text: str) -> list[float]:
         """获取文本的嵌入"""
         kwargs = self._embedding_kwargs()
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the embedding provider to centralize vLLM handling, retry logic, and dimension inference into shared helpers to avoid duplication and noisy logging.

You can keep all the new behavior but significantly reduce complexity with a small refactor inside this provider.

### 1. Deduplicate vLLM retry / error handling

`get_embedding` and `get_embeddings` are almost identical apart from the input/output shape. You can move the try/except + retry + `_mark_as_vllm` logic into a single helper:

```python
from typing import Any, Sequence

    async def _create_embeddings(self, input: Any) -> Sequence[Sequence[float]]:
        kwargs = self._embedding_kwargs()
        try:
            resp = await self.client.embeddings.create(
                input=input,
                model=self.model,
                **kwargs,
            )
            return [item.embedding for item in resp.data]
        except Exception as e:
            error_msg = str(e).lower()
            if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
                logger.warning(
                    f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions: {e}"
                )
                kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
                try:
                    resp = await self.client.embeddings.create(
                        input=input,
                        model=self.model,
                        **kwargs_retry,
                    )
                    self._mark_as_vllm()
                    logger.info(
                        "[OpenAI Embedding] Successfully retrieved embeddings without dimensions parameter, marking as vLLM"
                    )
                    return [item.embedding for item in resp.data]
                except Exception as retry_error:
                    logger.error(
                        f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}"
                    )
                    raise retry_error
            raise

    async def get_embedding(self, text: str) -> list[float]:
        return (await self._create_embeddings(text))[0]

    async def get_embeddings(self, text: list[str]) -> list[list[float]]:
        return list(await self._create_embeddings(text))
```

This keeps all existing behavior (including runtime vLLM detection) but centralizes it.

### 2. Centralize vLLM detection within `_embedding_kwargs`

Right now vLLM heuristics live in `_is_vllm`, `_mark_as_vllm`, `_embedding_kwargs`, and the error handler. You can at least confine detection/decision to `_is_vllm` and `_mark_as_vllm`, and keep `_embedding_kwargs` “dumb”:

```python
    def _embedding_kwargs(self) -> dict:
        kwargs: dict[str, Any] = {}
        embedding_dim_config = self.provider_config.get("embedding_dimensions", "")
        provider_id = self.provider_config.get("id", "unknown")

        if self._is_vllm():
            # vLLM never gets dimensions here
            logger.debug(
                f"[OpenAI Embedding] {provider_id}: vLLM detected, skipping dimensions (config='{embedding_dim_config}')"
            )
            return kwargs

        if embedding_dim_config:
            try:
                dim_value = int(embedding_dim_config)
                kwargs["dimensions"] = dim_value
                logger.debug(
                    f"[OpenAI Embedding] {provider_id}: Added dimensions parameter: {dim_value}"
                )
            except (ValueError, TypeError):
                logger.warning(
                    f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: "
                    f"'{embedding_dim_config}', ignored."
                )
        return kwargs
```

All call sites (`get_embedding`, `get_embeddings`, `_create_embeddings`) simply call `_embedding_kwargs` without doing any extra vLLM-specific branching.

### 3. Reduce noisy logging in hot paths and share the model→dimension map

`_embedding_kwargs` and `get_dim` are on the critical path. Most of those `info` logs can be `debug`, and the model dimension map can be shared to keep behavior consistent with other providers:

```python
# module-level shared map
_MODEL_DIMS = {
    "bge-m3": 1024,
    "bge-large-en-v1.5": 1024,
    "bge-large-zh-v1.5": 1024,
    "text-embedding-3-small": 1536,
    "text-embedding-3-large": 3072,
    "text-embedding-ada-002": 1536,
}

    def get_dim(self) -> int:
        provider_id = self.provider_config.get("id", "unknown")
        embedding_dim_config = self.provider_config.get("embedding_dimensions", "")

        if embedding_dim_config:
            try:
                dim = int(embedding_dim_config)
                if dim > 0:
                    logger.debug(
                        f"[OpenAI Embedding] {provider_id}: Dimension from config: {dim}"
                    )
                    return dim
            except (ValueError, TypeError):
                logger.warning(
                    f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: "
                    f"'{embedding_dim_config}', trying model inference"
                )

        model = self.provider_config.get("embedding_model", "").lower()
        for model_key, dim in _MODEL_DIMS.items():
            if model_key in model:
                logger.debug(
                    f"[OpenAI Embedding] {provider_id}: Inferred dimension {dim} from model: {model}"
                )
                return dim

        logger.warning(
            f"[OpenAI Embedding] {provider_id}: Could not determine dimension "
            f"(model: {model}, config: '{embedding_dim_config}')"
        )
        return 0
```

This keeps all current functionality (vLLM detection, auto-dim inference, retry behavior) but shrinks the mental surface area of the class and makes future changes (e.g., adjusting retry heuristics or model dims) safer and localized.
</issue_to_address>

### Comment 4
<location path="astrbot/core/provider/sources/vllm_embedding_source.py" line_range="33" />
<code_context>
+    provider_type=ProviderType.EMBEDDING,
+    provider_display_name="vLLM Embedding",
+)
+class VLLMEmbeddingProvider(EmbeddingProvider):
+    def __init__(self, provider_config: dict, provider_settings: dict) -> None:
+        super().__init__(provider_config, provider_settings)
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring transport setup, model resolution, dimension inference, and logging into smaller shared utilities and linear flows to keep the provider’s behavior while making it easier to follow and maintain.

You can keep all current behavior while reducing complexity by factoring a few concerns out and simplifying some flows.

### 1. Simplify transport selection & runtime swap

You already compute `_force_direct_transport` in `__init__`, but `_ensure_runtime_ready` recomputes `_should_force_direct_transport()` and duplicates client‑construction logic.

You can make transport selection a one‑liner and centralize client creation, which removes cross‑method branching while preserving heuristics:

```python
def __init__(self, provider_config: dict, provider_settings: dict) -> None:
    super().__init__(provider_config, provider_settings)
    self.provider_config = provider_config
    self.provider_settings = provider_settings
    self.timeout = int(provider_config.get("timeout", 20) or 20)
    self.model = str(provider_config.get("embedding_model", "") or "").strip()
    self.set_model(self.model)

    self._force_direct_transport = self._should_force_direct_transport()
    self._direct_client_ready = self._force_direct_transport

    self._detected_dimension: int | None = None
    self._resolved_request_model: str | None = None

    self.client = self._build_openai_client(force_direct=self._force_direct_transport)

def _build_openai_client(self, force_direct: bool) -> AsyncOpenAI:
    return AsyncOpenAI(
        api_key=self.provider_config.get("embedding_api_key"),
        base_url=self._effective_api_base(),
        timeout=self.timeout,
        http_client=self._build_http_client(force_direct=force_direct),
    )

def _build_http_client(self, force_direct: bool) -> httpx.AsyncClient | None:
    proxy = str(self.provider_config.get("proxy", "") or "").strip()
    if proxy:
        logger.info("[vLLM Embedding] %s 使用显式代理: %s", self._provider_id(), proxy)
        return httpx.AsyncClient(proxy=proxy, timeout=self.timeout)
    if force_direct:
        return httpx.AsyncClient(timeout=self.timeout, trust_env=False)
    return None

async def _ensure_runtime_ready(self) -> None:
    if self._direct_client_ready or not self._force_direct_transport:
        return

    old_client = self.client
    self.client = self._build_openai_client(force_direct=True)
    self._direct_client_ready = True

    logger.info(
        "[vLLM Embedding] %s 检测到本地/内网端点,已切换为 trust_env=False 的直连 client。",
        self._provider_id(),
    )

    if old_client is not None and old_client is not self.client:
        try:
            await old_client.close()
        except Exception:
            logger.debug("[vLLM Embedding] %s 关闭旧 client 失败,已忽略。", self._provider_id())
```

This removes the second `_should_force_direct_transport()` call and keeps all behavior the same.

### 2. Share embedding dimension inference

`_COMMON_MODEL_DIMENSIONS` and `_infer_dimension_from_model` are likely identical to the OpenAI provider. You can move them to a shared utility to avoid duplication and keep the logic in one place:

```python
# embedding_dimensions.py (new module)
_COMMON_MODEL_DIMENSIONS = {
    "bge-m3": 1024,
    "bge-large-en-v1.5": 1024,
    "bge-large-zh-v1.5": 1024,
    "text-embedding-3-small": 1536,
    "text-embedding-3-large": 3072,
    "text-embedding-ada-002": 1536,
}

def infer_dimension_from_model(model_name: Any) -> int | None:
    normalized_model = str(model_name or "").strip().lower()
    for model_key, dimension in _COMMON_MODEL_DIMENSIONS.items():
        if model_key in normalized_model:
            return dimension
    return None
```

Then in this provider:

```python
from .embedding_dimensions import infer_dimension_from_model

def get_dim(self) -> int:
    configured_dim = self._configured_dimension()
    if configured_dim:
        return configured_dim
    if self._detected_dimension:
        return self._detected_dimension
    inferred_dim = infer_dimension_from_model(self.model)
    return inferred_dim or 0
```

This keeps behavior while reducing duplication and maintenance cost.

### 3. Reduce perceived complexity in logging

The current `info` logs are quite verbose and fire on every request. You can keep the diagnostics but move most of them to `debug` so normal logs are cleaner:

```python
async def get_embedding(self, text: str) -> list[float]:
    await self._ensure_runtime_ready()
    request_model = await self._resolve_request_model()
    logger.debug(
        "[vLLM Embedding] %s 单条 embedding 请求,model=%s,text_len=%s,跳过 dimensions。",
        self._provider_id(),
        request_model,
        len(text),
    )
    ...
```

Keep `info`/`warning` only for misconfigurations or fallbacks (`/models` failure, basename fallback, client switch), which makes the class easier to reason about in normal operation.

### 4. Make model resolution flow more linear

You can keep the `/models` robustness but tighten the control flow in `_resolve_request_model` to a straight, small decision tree:

```python
async def _resolve_request_model(self) -> str:
    if self._resolved_request_model is not None:
        return self._resolved_request_model

    configured_model = self.model
    if not configured_model:
        self._resolved_request_model = ""
        return ""

    available_models = await self._list_vllm_models()
    resolved = self._match_served_model(configured_model, available_models)
    if resolved:
        self._resolved_request_model = resolved
        if resolved != configured_model:
            logger.info(
                "[vLLM Embedding] %s 已将模型名 %s 对齐到 served-model-name %s",
                self._provider_id(),
                configured_model,
                resolved,
            )
        return resolved

    basename_model = configured_model.rsplit("/", 1)[-1].strip()
    if basename_model and basename_model != configured_model:
        logger.warning(
            "[vLLM Embedding] %s 未能从 /models 精确匹配 %s,回退为 %s",
            self._provider_id(),
            configured_model,
            basename_model,
        )
        self._resolved_request_model = basename_model
        return basename_model

    self._resolved_request_model = configured_model
    return configured_model
```

This doesn’t change the behavior, but it reads as a single linear decision with clear ordering and fewer early returns, which makes the model‑name reconciliation easier to follow.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/provider/sources/openai_embedding_source.py
if "matryoshka" in err_msg or "dimensions" in err_msg:
logger.info("Detected vLLM specific error, bypassing...")
# 伪造一个成功的响应,告知前端进入"兼容模式"
return Response().ok({"embedding_dimensions": "vLLM-Adaptive"}).__dict__
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Embedding dimension API now returns a string sentinel, which may break consumers expecting an integer.

The new vLLM path returns {"embedding_dimensions": "vLLM-Adaptive"}, changing the type from number to string. While the Vue config UI handles this, other consumers of /api/config/provider/get_embedding_dim may still expect an integer (e.g. schema validation or int(...) casts) and break.

To avoid that, consider keeping embedding_dimensions numeric and conveying the adaptive mode via a separate field, for example:

  • { embedding_dimensions: 0, vllm_adaptive: true }, or
  • { embedding_dimensions: 0, mode: "vllm_adaptive" }.

This preserves the existing type contract while still signaling vLLM adaptive behavior.

Comment thread astrbot/core/provider/sources/openai_embedding_source.py
Comment thread astrbot/core/provider/sources/vllm_embedding_source.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant