LCORE-86: Prioritize BYOK content over built-in content#1208
LCORE-86: Prioritize BYOK content over built-in content#1208are-ces wants to merge 7 commits intolightspeed-core:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR replaces the prior Solr-centric RAG approach with a two-mode RAG architecture (Inline RAG and Tool RAG), adds a RagConfiguration surface (inline + tool), implements RAGContext construction (build_rag_context), integrates BYOK and Solr/OKP retrieval with score multipliers, updates endpoints to consume inline RAG context, and updates docs, configs, constants, and tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Request
participant QueryEp as Query Endpoint
participant RagBuilder as build_rag_context()
participant ByokStore as BYOK Vector Store
participant SolrOkp as Solr/OKP
participant LLM as LLM
participant Tools as file_search Tool
Client->>QueryEp: send query_request
rect rgba(100,150,200,0.5)
Note over QueryEp,RagBuilder: Inline RAG (pre-query)
QueryEp->>RagBuilder: build_rag_context(query_request)
RagBuilder->>ByokStore: parallel BYOK searches (score_multiplier applied)
RagBuilder->>SolrOkp: fetch Solr/OKP chunks
ByokStore-->>RagBuilder: rag_chunks + referenced_documents
SolrOkp-->>RagBuilder: rag_chunks + referenced_documents
RagBuilder-->>QueryEp: RAGContext(context_text, rag_chunks, documents)
end
QueryEp->>LLM: forward query + inline_rag_context.context_text
rect rgba(200,100,150,0.5)
Note over LLM,Tools: Tool RAG (on-demand)
alt LLM needs more info
LLM->>Tools: invoke file_search
Tools->>ByokStore: retrieve docs
ByokStore-->>Tools: tool results
Tools-->>LLM: return results
end
end
LLM-->>QueryEp: response (may include used rag_chunks)
QueryEp->>Client: merge inline + tool RAG chunks and return response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Chunks from Solr did not include Printout of the Solr chunk structure returned by Chunk(
chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
content = '# Chapter 9. Admission plugins\n\n\n\n\nAdmission plugins are used to help regulate how [..]',
chunk_metadata = ChunkChunkMetadata(
chunk_embedding_dimension = None,
chunk_embedding_model = None,
chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
chunk_tokenizer = None,
chunk_window = None,
content_token_count = None,
created_timestamp = None,
document_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index',
metadata_token_count = None,
source = None,
updated_timestamp = None,
),
embedding = [],
metadata = {},
embedding_model = 'sentence-transformers/ibm-granite/granite-embedding-30m-english',
embedding_dimension = 384,
) |
2e57151 to
f593d68
Compare
- Add configurable RAG strategies: always RAG which is performed at each query (OKP Solr + BYOK) and tool RAG can be used independently or together - Add chunk prioritization with score multipliers per vector store for always RAG - Added knobs in config to select the RAG strategy - Tool RAG defaults to enabled=True for backward compatibility - Update lightspeed stack configuration enrichment script to build the solr section in llama stack and fix bugs in building the vector stores - Updated byok and rag documentation - Updated unit tests
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
tests/unit/utils/test_responses.py (1)
2248-2275: Add an explicit regression test forvector_store_ids=[].Current cases cover
Noneand non-empty lists, but not an explicit empty list. A dedicated test will prevent accidental “fetch all stores” behavior regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2248 - 2275, Add a regression test that verifies passing an explicit empty list to get_vector_store_ids returns an empty list and does not call the API: create an AsyncMock client (client_mock), call result = await get_vector_store_ids(client_mock, []), assert result == [], and assert client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g., test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the existing tests in the same test module.tests/unit/test_llama_stack_configuration.py (1)
362-451: Good test coverage forenrich_solrfunction.The tests comprehensively cover the main scenarios: disabled state, empty config, adding providers/stores/models, and preserving existing configuration. The duplicate detection tests for providers and vector stores are present.
Consider adding a test for duplicate embedding model detection to match the logic in
enrich_solr(lines 470-473 insrc/llama_stack_configuration.py) which checksexisting_model_idsbefore adding.Optional test for duplicate embedding model
def test_enrich_solr_skips_duplicate_embedding_model() -> None: """Test enrich_solr does not add duplicate embedding model.""" ls_config: dict[str, Any] = { "registered_resources": { "models": [{"model_id": "existing", "provider_model_id": "all-mpnet-base-v2"}] } } enrich_solr(ls_config, {"enabled": True}) model_ids = [m["provider_model_id"] for m in ls_config["registered_resources"]["models"]] assert model_ids.count("all-mpnet-base-v2") == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_llama_stack_configuration.py` around lines 362 - 451, Add a unit test to cover duplicate embedding model detection in enrich_solr: create ls_config with "registered_resources": {"models": [{"model_id": "existing", "provider_model_id": "all-mpnet-base-v2"}]}, call enrich_solr(ls_config, {"enabled": True}), then assert that the list of provider_model_id values in ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only once (count == 1); reference enrich_solr and the existing_model_ids/provider_model_id logic to locate where duplicates are checked.src/models/config.py (1)
1730-1740: Minor: Lambda wrappers indefault_factoryare unnecessary.The
# pylint: disable=unnecessary-lambdacomments indicate these lambdas could be simplified. Since these are class constructors with no arguments, you can pass the class directly.Optional simplification
- solr: SolrRagConfiguration = Field( - default_factory=lambda: SolrRagConfiguration(), # pylint: disable=unnecessary-lambda + solr: SolrRagConfiguration = Field( + default_factory=SolrRagConfiguration, title="Solr RAG configuration", description="Configuration for Solr RAG (pre-query).", ) - byok: ByokRagConfiguration = Field( - default_factory=lambda: ByokRagConfiguration(), # pylint: disable=unnecessary-lambda + byok: ByokRagConfiguration = Field( + default_factory=ByokRagConfiguration, title="BYOK RAG configuration", description="Configuration for BYOK RAG (pre-query).", )The same applies to lines 1764-1773. However, line 1750 requires the lambda since it passes
enabled=True.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 1730 - 1740, Replace the unnecessary lambda wrappers used as default_factory for SolrRagConfiguration and ByokRagConfiguration by passing the classes themselves (use default_factory=SolrRagConfiguration and default_factory=ByokRagConfiguration) in the Field definitions for the solr and byok attributes; also apply the same simplification for any other fields that use no-arg constructors (e.g., the similar fields around the RAG config group), but keep the lambda where a constructor requires arguments (such as the field that calls a class with enabled=True).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/rag_guide.md`:
- Around line 335-339: The fenced code block in docs/rag_guide.md is missing a
language specifier causing markdownlint MD040; update the block start to include
a language (e.g., bash) so the snippet begins with ```bash and leave the
contents unchanged—this addresses the lint warning for the curl POST example
used to call /v1/query.
In `@lightspeed-stack.yaml`:
- Around line 46-49: The YAML comment and value disagree for the BYOK flag: the
comment says "Default is true for backward compatibility" while the config sets
tool.byok.enabled: false; either change the comment to indicate the feature is
intentionally disabled (e.g., update the line near the tool/byok block to state
the default is false or that this base config disables BYOK) or change the value
to true to match the comment (set tool.byok.enabled: true); update only the
comment or only the boolean so they are consistent and leave the surrounding
YAML structure unchanged.
In `@src/app/endpoints/query.py`:
- Around line 163-167: The code mutates query_request.query by appending
rag_context.context_text; instead, avoid changing the model: after making the
deep copy (or even without copying), construct a separate retrieval string like
retrieval_query = query_request.query + rag_context.context_text and use
retrieval_query for retrieval/search/embedding calls only (where you currently
rely on the mutated value), leaving query_request.query intact for later
operations such as topic summary generation and storing the original user query;
update calls that currently read query_request.query for retrieval to use
retrieval_query instead and ensure persisted/stored values continue to use the
original query_request.query.
In `@src/utils/responses.py`:
- Around line 205-207: prepare_tools is calling vector_stores.list() via
get_vector_store_ids even when the feature toggle rag.tool.byok.enabled is
false, causing unnecessary failures; change the logic so that when
rag.tool.byok.enabled is False you skip any vector store listing and simply
return the passed-in vector_store_ids (or an empty list) instead of calling
vector_stores.list(); update either prepare_tools to short-circuit before
calling get_vector_store_ids or modify get_vector_store_ids to check the
rag.tool.byok.enabled flag and avoid calling vector_stores.list() when the flag
is disabled (referencing prepare_tools, get_vector_store_ids,
vector_stores.list, and rag.tool.byok.enabled to locate the code).
- Around line 125-126: The current truthiness check "if vector_store_ids:"
treats an explicit empty list as False and causes the code to fall through and
fetch all vector stores; change the condition to test for presence explicitly
(e.g., check for None) so that an empty list is preserved and returned. Locate
the check around the vector_store_ids variable in src/utils/responses.py and
replace the truthy conditional with an explicit None check (use
"vector_store_ids is not None" or equivalent) so the function returns the
provided vector_store_ids even when it's an empty list.
---
Nitpick comments:
In `@src/models/config.py`:
- Around line 1730-1740: Replace the unnecessary lambda wrappers used as
default_factory for SolrRagConfiguration and ByokRagConfiguration by passing the
classes themselves (use default_factory=SolrRagConfiguration and
default_factory=ByokRagConfiguration) in the Field definitions for the solr and
byok attributes; also apply the same simplification for any other fields that
use no-arg constructors (e.g., the similar fields around the RAG config group),
but keep the lambda where a constructor requires arguments (such as the field
that calls a class with enabled=True).
In `@tests/unit/test_llama_stack_configuration.py`:
- Around line 362-451: Add a unit test to cover duplicate embedding model
detection in enrich_solr: create ls_config with "registered_resources":
{"models": [{"model_id": "existing", "provider_model_id":
"all-mpnet-base-v2"}]}, call enrich_solr(ls_config, {"enabled": True}), then
assert that the list of provider_model_id values in
ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only
once (count == 1); reference enrich_solr and the
existing_model_ids/provider_model_id logic to locate where duplicates are
checked.
In `@tests/unit/utils/test_responses.py`:
- Around line 2248-2275: Add a regression test that verifies passing an explicit
empty list to get_vector_store_ids returns an empty list and does not call the
API: create an AsyncMock client (client_mock), call result = await
get_vector_store_ids(client_mock, []), assert result == [], and assert
client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g.,
test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the
existing tests in the same test module.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
docs/byok_guide.mddocs/config.mddocs/rag_guide.mdexamples/lightspeed-stack-byok-rag.yamllightspeed-stack.yamlrun.yamlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/responses/test_rag_chunk.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.py
tisnik
left a comment
There was a problem hiding this comment.
In overall LGTM, nice work!
asimurka
left a comment
There was a problem hiding this comment.
Great work, LGTM in overall. I suggested some minor improvements and there are also things worth a discussion.
src/models/config.py
Outdated
| """ | ||
|
|
||
| always: AlwaysRagConfiguration = Field( | ||
| default_factory=lambda: AlwaysRagConfiguration(), # pylint: disable=unnecessary-lambda |
There was a problem hiding this comment.
I think this could be done like default_factory=AlwaysRagConfiguration without using lambda
There was a problem hiding this comment.
+1 goes for the other instances too.
src/utils/responses.py
Outdated
| """ | ||
| # Check if Tool RAG is enabled in configuration | ||
| if not (configuration and configuration.rag.tool.byok.enabled): | ||
| return None |
There was a problem hiding this comment.
Worth a discussion: Returning an empty list here would make more sense. I see it does not affect current logic but possibly can in future (responses endpoint). By returning None you signal "overwrite by config" and by an empty list "do not use vector stores".
There was a problem hiding this comment.
Agreed, updated PTAL
|
@are-ces @tisnik Ideally we should try, from the Ansible side, this feature before being merged, just to early detect any incompatibilities or problems, so new changes can be still added as part of the PR. On the other hand, it may take a while, so if you have urgency on merging this, we are also fine on merging , but if any changes necessary once testing, then you'll have to create new PRs and potentially new minor releases. I am busy on a top priority customer issue, @TamiTakamiya is also quite busy I think, but I can guarantee that at some point during next week, I can give this new feature a try for our Ansible use-case. Let's keep in touch! |
I am trying to set up a test environment on my CRC with this, #1179, and the lightspeed-providers#52. Right now I am still unable to make authentication work (it should work with #1179, though)... Hopefully I should be able to find a solution soon. |
src/app/endpoints/query.py
Outdated
| turn_summary.referenced_documents = deduplicate_referenced_documents( | ||
| doc_ids_from_chunks + (turn_summary.referenced_documents or []) | ||
| ) | ||
| # Merge RAG chunks (BYOK + Solr) with tool-based RAG chunks |
There was a problem hiding this comment.
Merge "always" RAG chunks with "tool" RAG chunks?
There was a problem hiding this comment.
Yes, these are the RAGChunk objects returned in the query response and saved in the transcript, not chunks which are sent to an LLM
There was a problem hiding this comment.
Will make the comment more clear about that
src/app/endpoints/streaming_query.py
Outdated
| _, _, doc_ids_from_chunks, pre_rag_chunks = await perform_vector_search( | ||
| client, query_request, configuration | ||
| ) | ||
| # Build RAG context from BYOK and Solr sources |
There was a problem hiding this comment.
Build RAG context from the configured "always" sources?
src/app/endpoints/streaming_query.py
Outdated
| Args: | ||
| responses_params: The Responses API parameters | ||
| context: The response generator context | ||
| pre_rag_documents: Referenced documents from pre-query RAG (BYOK + Solr) |
There was a problem hiding this comment.
always_rag_documents: Referenced documents from "always" RAG?
There was a problem hiding this comment.
Fixed ty, changed to inline_rag_documents
Anxhela21
left a comment
There was a problem hiding this comment.
Left a few minor comments. Overall looks good, nice work!
/LGTM
src/models/config.py
Outdated
| """ | ||
|
|
||
| always: AlwaysRagConfiguration = Field( | ||
| default_factory=lambda: AlwaysRagConfiguration(), # pylint: disable=unnecessary-lambda |
There was a problem hiding this comment.
+1 goes for the other instances too.
| title="BYOK RAG configuration", | ||
| description="BYOK RAG configuration. This configuration can be used to " | ||
| "reconfigure Llama Stack through its run.yaml configuration file", | ||
| ) |
There was a problem hiding this comment.
Since we are using the rag RagConfiguration object to define the configurations for all RAGs do we still need the byok_rag configuration above?
There was a problem hiding this comment.
You have a valid point. My initial thought was to put all under "rag" and have a section for "dbs" or "indexes", however this is a breaking change for teams which use the current config with "byok_rag". That's why I kept the old "byok_rag" section for the configuration of vector stores.
@tisnik wdyt?
| ``` | ||
|
|
||
| The configuration above automatically enables the RAG tools. The system will: | ||
| Both modes can be enabled simultaneously. Choose based on your latency and control preferences: |
There was a problem hiding this comment.
Would be nice to have some documentation on what the intended behavior is when both modes are enabled (always RAG and tool RAG) but that can be in a separate PR.
There was a problem hiding this comment.
I feel this would be a spike, we would need to test the joint behavior e.g. in what cases the tool is being called.
There was a problem hiding this comment.
Should I create a ticket for that @Anxhela21?
There was a problem hiding this comment.
Yes, let's open a separate ticket for this. I wonder if we want some evaluation outcome included in this as well.
|
@are-ces I left a few comments here in case the approach in this PR lives on, but I took this PR, with help of my mechanical assistant, in a different direction, presented at https://github.com/syedriko/lightspeed-stack/tree/syedriko-lcore-86.
|
Hey @syedriko thanks for the review. You raise valid points. However, when it comes to including OKP Solr vector stores in the RAG tool IIRC @Anxhela21 tested this and it was not working properly, right? Anyway I agree, OKP should be included among the vector stores available to the RAG tool. Finally, having two lists of vector stores defined for |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/vector_search.py (1)
191-205:⚠️ Potential issue | 🟠 MajorPreserve
reference_urlwhenchunk_metadataomits it.At Line 198 and Line 204,
reference_urlis reassigned unconditionally. Ifmetadata.reference_urlwas present andchunk_metadata.reference_urlis missing, the URL is lost, which drops downstream document links.Proposed fix
if isinstance(chunk_meta, dict): doc_id = chunk_meta.get("doc_id") or chunk_meta.get("document_id") title = title or chunk_meta.get("title") - reference_url = chunk_meta.get("reference_url") + reference_url = reference_url or chunk_meta.get("reference_url") else: doc_id = getattr(chunk_meta, "doc_id", None) or getattr( chunk_meta, "document_id", None ) title = title or getattr(chunk_meta, "title", None) - reference_url = getattr(chunk_meta, "reference_url", None) + reference_url = reference_url or getattr( + chunk_meta, "reference_url", None + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 191 - 205, The code in src/utils/vector_search.py clobbers an existing reference_url when chunk.chunk_metadata lacks it; update the assignments inside the chunk_meta handling so reference_url is only overwritten when the chunk metadata actually provides a non-null value (e.g., use reference_url = chunk_meta.get("reference_url") or reference_url for the dict branch and reference_url = getattr(chunk_meta, "reference_url", None) or reference_url for the object branch), leaving doc_id and title handling as-is; locate the logic around chunk_meta, chunk_metadata, doc_id, title, and reference_url to apply this conditional-preserve change.
♻️ Duplicate comments (1)
src/utils/responses.py (1)
169-170:⚠️ Potential issue | 🟠 MajorSkip vector-store discovery when Tool RAG is disabled.
prepare_tools()resolves vector stores before checking whether Tool RAG is enabled. Ifvector_stores.list()fails, the request can fail even whenconfiguration.rag.tool.byok.enabledisfalse.Proposed fix
- # Get vector stores for RAG tools - use specified ones or fetch all - vector_store_ids = await get_vector_store_ids(client, vector_store_ids) - - # Add RAG tools if vector stores are available - rag_tools = get_rag_tools(vector_store_ids) + resolved_vector_store_ids = vector_store_ids or [] + if configuration.rag.tool.byok.enabled: + # Get vector stores for RAG tools - use specified ones or fetch all + resolved_vector_store_ids = await get_vector_store_ids(client, vector_store_ids) + + # Add RAG tools if vector stores are available + rag_tools = get_rag_tools(resolved_vector_store_ids)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 169 - 170, The code calls get_vector_store_ids (which triggers vector_stores.list()) before checking whether Tool RAG is enabled; change prepare_tools() to skip vector-store discovery when configuration.rag.tool.byok.enabled is false by moving or guarding the get_vector_store_ids call: first check configuration.rag.tool.byok.enabled in prepare_tools(), and only invoke get_vector_store_ids(client, vector_store_ids) if the flag is true (or when vector_store_ids is explicitly provided), so failures from vector_stores.list() cannot occur when Tool RAG is disabled.
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query.py (1)
747-756: Reuse the shared referenced-document dedup helper.This block duplicates dedup logic that already exists in
src/utils/responses.py(deduplicate_referenced_documents). Reusing it avoids drift and keeps URL normalization behavior consistent.Proposed fix
from utils.responses import ( build_mcp_tool_call_from_arguments_done, build_tool_call_summary, build_tool_result_from_mcp_output_item_done, + deduplicate_referenced_documents, extract_token_usage, extract_vector_store_ids_from_tools, get_topic_summary, parse_referenced_documents, prepare_responses_params, ) @@ - if turn_summary.inline_rag_documents: - all_documents = turn_summary.inline_rag_documents + tool_based_documents - seen = set() - deduplicated_documents = [] - for doc in all_documents: - key = (doc.doc_url, doc.doc_title) - if key not in seen: - seen.add(key) - deduplicated_documents.append(doc) - turn_summary.referenced_documents = deduplicated_documents + if turn_summary.inline_rag_documents: + turn_summary.referenced_documents = deduplicate_referenced_documents( + turn_summary.inline_rag_documents + tool_based_documents + ) else: turn_summary.referenced_documents = tool_based_documents🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 747 - 756, Replace the in-place dedup logic with the shared helper: import and call deduplicate_referenced_documents (from src.utils.responses) on the combined list (turn_summary.inline_rag_documents + tool_based_documents) and assign its result to turn_summary.referenced_documents; keep using the same symbols (turn_summary.inline_rag_documents, tool_based_documents, turn_summary.referenced_documents) so URL/title normalization and dedup behavior remain consistent across the codebase.src/models/config.py (1)
1758-1774: Add compatibility mapping for renamed RAG keys.With
extra="forbid", legacy configs using older key names will fail hard on upgrade. Consider a small pre-validation migration (always→inline, legacysolrblock→rag.inline.okp) with deprecation warnings.Proposed fix
class RagConfiguration(ConfigurationBase): @@ + `@model_validator`(mode="before") + `@classmethod` + def migrate_legacy_inline_key(cls, data: Any) -> Any: + if isinstance(data, dict) and "always" in data and "inline" not in data: + migrated = dict(data) + migrated["inline"] = migrated.pop("always") + return migrated + return data @@ class Configuration(ConfigurationBase): @@ + `@model_validator`(mode="before") + `@classmethod` + def migrate_legacy_solr_key(cls, data: Any) -> Any: + if not isinstance(data, dict) or "solr" not in data: + return data + + migrated = dict(data) + rag = dict(migrated.get("rag") or {}) + inline = dict(rag.get("inline") or {}) + inline.setdefault("okp", migrated.pop("solr")) + rag["inline"] = inline + migrated["rag"] = rag + return migratedAlso applies to: 1915-1919
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 1758 - 1774, RagConfiguration currently uses strict validation (extra="forbid") so legacy keys will error; add a pre-validation migration step (e.g., a `@root_validator`(pre=True) on RagConfiguration or on the top-level ConfigurationBase) that detects and rewrites legacy keys: map "always" -> {"inline": ...} and move any legacy "solr" block into "rag" -> "inline" -> "okp" (or "rag.inline.okp"), emit a DeprecationWarning or logger.warn when a legacy key is found, then return the normalized dict so the existing fields inline: InlineRagConfiguration and tool: ToolRagConfiguration validate cleanly. Ensure the migration runs before pydantic's forbid check so old configs continue to work with warnings.docs/openapi.json (1)
7378-7385: Constrain propagated header names and deduplicate entries.Consider validating header names and enforcing uniqueness to prevent invalid config values from silently propagating.
🔧 Proposed hardening
"headers": { "items": { - "type": "string" + "type": "string", + "pattern": "^[!#$%&'*+.^_`|~0-9A-Za-z-]+$" }, "type": "array", + "uniqueItems": true, "title": "Propagated headers", "description": "List of HTTP header names to automatically forward from the incoming request to this MCP server. Headers listed here are extracted from the original client request and included when calling the MCP server. This is useful when infrastructure components (e.g. API gateways) inject headers that MCP servers need, such as x-rh-identity in HCC. Header matching is case-insensitive. These headers are additive with authorization_headers and MCP-HEADERS." },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/openapi.json` around lines 7378 - 7385, The "headers" array schema should enforce valid header-name syntax and uniqueness: update the "headers" property (the array and its "items") to add "uniqueItems": true and a regex "pattern" on each item to match valid HTTP field-names (e.g. token chars per RFC, e.g. /^[A-Za-z0-9!#$%&'*+\-.^_`|~]+$/) so invalid names are rejected and duplicates are prevented; also adjust the "description" text if needed to note case-insensitivity and that values are deduplicated/normalized.src/utils/vector_search.py (1)
359-364: Use shared default score multiplier constant.Line 363 uses a hardcoded
1.0. Prefer the shared constant to keep behavior centralized and avoid drift.Proposed fix
_query_store_for_byok_rag( client, vector_store_id, query, - score_multiplier_mapping.get(vector_store_id, 1.0), + score_multiplier_mapping.get( + vector_store_id, constants.DEFAULT_SCORE_MULTIPLIER + ), )As per coding guidelines, check
constants.pyfor shared constants before defining new ones.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 359 - 364, Replace the hardcoded 1.0 score multiplier in the call to _query_store_for_byok_rag with the shared default constant from constants.py (e.g., constants.DEFAULT_SCORE_MULTIPLIER or the appropriate name found in constants.py); update the top of src/utils/vector_search.py to import that constant if not already imported, and change score_multiplier_mapping.get(vector_store_id, 1.0) to score_multiplier_mapping.get(vector_store_id, constants.DEFAULT_SCORE_MULTIPLIER) so the default behavior is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/byok_guide.md`:
- Around line 41-48: The docs use the term "Solr" in user-facing text and the
mermaid diagram label "Solr OKP", causing inconsistency with the public
abstraction `okp`; update all occurrences of "Solr" to "OKP" (e.g., change the
mermaid node label "Solr OKP" to "OKP" and any inline text/table entries that
say "Solr"), and ensure the docs reference the configuration/endpoint name `okp`
where shown (also apply same replacement for the occurrences noted around lines
314-315).
In `@docs/config.md`:
- Around line 528-532: Update the "Inline RAG" bullet to use OKP terminology
instead of "Solr": replace the phrase that currently reads "context is fetched
from Solr and/or BYOK vector stores" with wording like "context is fetched from
OKP (Solr-backed) and/or BYOK vector stores" so the docs align with the config
surface `okp`; ensure the term "OKP (Solr-backed)" appears exactly in the Inline
RAG description and leave the "Tool RAG" line unchanged.
In `@docs/openapi.json`:
- Around line 7993-8004: The OpenAPI schema exposes the internal field
"inline_rag_context" in the public request model for the /v1/query and
/v1/streaming_query endpoints; remove it from the public request model (or move
it to a separate internal-only schema) so clients cannot set it. Locate
occurrences of "inline_rag_context" in docs/openapi.json (specifically under the
request body schemas for /v1/query and /v1/streaming_query) and either delete
that property from the public request schema or mark it as internal-only (e.g.,
move to an internal schema reference or set readOnly/internal extension) and
update any $ref references accordingly.
- Around line 5507-5513: The schema for the "score_multiplier" property
currently uses "exclusiveMinimum": 0.0 which disallows 0; if the field is
intended to be non-negative change the schema to use "minimum": 0.0 instead of
"exclusiveMinimum": 0.0 so that zero is a valid value; update the
"score_multiplier" definition in the OpenAPI JSON (the property named
score_multiplier) to use minimum: 0.0 and keep the existing type, title,
description, and default.
In `@src/app/endpoints/streaming_query.py`:
- Around line 186-187: Wrap the call to build_rag_context(client, query_request,
configuration) in a try/except that catches APIConnectionError and Exception and
re-raises fastapi.HTTPException with appropriate status codes/messages (e.g.,
502 for APIConnectionError, 500 for generic errors) so the endpoint returns
translated HTTP errors; ensure the caught exception is logged before raising the
HTTPException. Also remove the manual deduplication loop currently implemented
and call deduplicate_referenced_documents(referenced_documents) from
utils.responses instead to centralize logic and avoid duplication; update any
variable names to use the helper's return value where referenced in the
streaming_query handler.
In `@src/llama_stack_configuration.py`:
- Around line 142-147: Don't default missing BYOK identifiers to empty strings:
in the logic that builds backend_name and output (variables rag_id,
backend_name, and the output dict) validate that brag["rag_id"] (and
brag["db_path"] if required) are present and non-empty and raise/throw a clear
exception (or return an error) instead of using get(..., ""); ensure
backend_name uses a validated rag_id so you never produce names like
"byok__storage" or paths like ".llama/.db" and update any surrounding code that
expects a valid rag_id to handle the raised error.
In `@src/models/requests.py`:
- Around line 110-115: The inline_rag_context Field is marked internal but
remains client-writable; add a Pydantic validation that rejects any incoming
payload containing this key. In the request model class that defines
inline_rag_context, implement a `@root_validator`(pre=True) (e.g., def
validate_no_inline_rag_context(cls, values): ...) that checks if
"inline_rag_context" is present in the incoming values and raises a ValueError
(or ValidationError) if so; keep the Field(...) for internal use but ensure
external inputs are refused during model validation.
In `@src/utils/vector_search.py`:
- Around line 592-599: The conversion currently assumes chunk.chunk_metadata
uses attribute access (chunk.chunk_metadata.document_id) and misses document_id
when chunk_metadata is a dict; update the block handling chunk.chunk_metadata to
accept both attribute and dict forms (e.g., try getattr(chunk.chunk_metadata,
"document_id", None) then fallback to chunk.chunk_metadata.get("document_id") if
it's a dict) and then set attributes["document_id"] accordingly; also ensure the
offline URL fallback sets attributes["doc_url"] using
urljoin(constants.MIMIR_DOC_URL, doc_id) when "doc_url" not present and doc_id
exists.
- Around line 511-514: The current logger.debug calls print the full RAG payload
via context_text (sensitive user query and chunk bodies); replace these debug
prints with non-sensitive metadata: log the number of retrieved chunks, total
size/char count of context_text, and optionally a short redacted snippet or a
hash (e.g., first 50 chars masked or sha256 of context_text) instead of the
entire content; update the logger.debug calls that reference context_text to
emit counts/lengths and a redacted/hash summary so no full payload is written to
logs.
---
Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 191-205: The code in src/utils/vector_search.py clobbers an
existing reference_url when chunk.chunk_metadata lacks it; update the
assignments inside the chunk_meta handling so reference_url is only overwritten
when the chunk metadata actually provides a non-null value (e.g., use
reference_url = chunk_meta.get("reference_url") or reference_url for the dict
branch and reference_url = getattr(chunk_meta, "reference_url", None) or
reference_url for the object branch), leaving doc_id and title handling as-is;
locate the logic around chunk_meta, chunk_metadata, doc_id, title, and
reference_url to apply this conditional-preserve change.
---
Duplicate comments:
In `@src/utils/responses.py`:
- Around line 169-170: The code calls get_vector_store_ids (which triggers
vector_stores.list()) before checking whether Tool RAG is enabled; change
prepare_tools() to skip vector-store discovery when
configuration.rag.tool.byok.enabled is false by moving or guarding the
get_vector_store_ids call: first check configuration.rag.tool.byok.enabled in
prepare_tools(), and only invoke get_vector_store_ids(client, vector_store_ids)
if the flag is true (or when vector_store_ids is explicitly provided), so
failures from vector_stores.list() cannot occur when Tool RAG is disabled.
---
Nitpick comments:
In `@docs/openapi.json`:
- Around line 7378-7385: The "headers" array schema should enforce valid
header-name syntax and uniqueness: update the "headers" property (the array and
its "items") to add "uniqueItems": true and a regex "pattern" on each item to
match valid HTTP field-names (e.g. token chars per RFC, e.g.
/^[A-Za-z0-9!#$%&'*+\-.^_`|~]+$/) so invalid names are rejected and duplicates
are prevented; also adjust the "description" text if needed to note
case-insensitivity and that values are deduplicated/normalized.
In `@src/app/endpoints/streaming_query.py`:
- Around line 747-756: Replace the in-place dedup logic with the shared helper:
import and call deduplicate_referenced_documents (from src.utils.responses) on
the combined list (turn_summary.inline_rag_documents + tool_based_documents) and
assign its result to turn_summary.referenced_documents; keep using the same
symbols (turn_summary.inline_rag_documents, tool_based_documents,
turn_summary.referenced_documents) so URL/title normalization and dedup behavior
remain consistent across the codebase.
In `@src/models/config.py`:
- Around line 1758-1774: RagConfiguration currently uses strict validation
(extra="forbid") so legacy keys will error; add a pre-validation migration step
(e.g., a `@root_validator`(pre=True) on RagConfiguration or on the top-level
ConfigurationBase) that detects and rewrites legacy keys: map "always" ->
{"inline": ...} and move any legacy "solr" block into "rag" -> "inline" -> "okp"
(or "rag.inline.okp"), emit a DeprecationWarning or logger.warn when a legacy
key is found, then return the normalized dict so the existing fields inline:
InlineRagConfiguration and tool: ToolRagConfiguration validate cleanly. Ensure
the migration runs before pydantic's forbid check so old configs continue to
work with warnings.
In `@src/utils/vector_search.py`:
- Around line 359-364: Replace the hardcoded 1.0 score multiplier in the call to
_query_store_for_byok_rag with the shared default constant from constants.py
(e.g., constants.DEFAULT_SCORE_MULTIPLIER or the appropriate name found in
constants.py); update the top of src/utils/vector_search.py to import that
constant if not already imported, and change
score_multiplier_mapping.get(vector_store_id, 1.0) to
score_multiplier_mapping.get(vector_store_id,
constants.DEFAULT_SCORE_MULTIPLIER) so the default behavior is centralized.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
docs/byok_guide.mddocs/config.mddocs/openapi.jsondocs/rag_guide.mdexamples/lightspeed-stack-byok-rag.yamllightspeed-stack.yamlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/models/requests.pysrc/utils/query.pysrc/utils/responses.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_query.pytests/unit/models/config/test_dump_configuration.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/utils/types.py
- examples/lightspeed-stack-byok-rag.yaml
- docs/rag_guide.md
docs/openapi.json
Outdated
| "inline_rag_context": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "string" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Inline Rag Context", | ||
| "description": "RAG context injected into the query before sending to the LLM. Populated internally; not part of the external API." | ||
| }, |
There was a problem hiding this comment.
inline_rag_context is marked internal, but it is currently exposed as client input in the public request schema.
This makes an internal field externally settable via /v1/query and /v1/streaming_query. Keep it out of the public OpenAPI request model (or use a separate internal model).
🔧 Proposed fix (schema-level)
- "inline_rag_context": {
- "anyOf": [
- {
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "title": "Inline Rag Context",
- "description": "RAG context injected into the query before sending to the LLM. Populated internally; not part of the external API."
- },- inline_rag_context: RAG context injected into the query before sending to the LLM.
- Populated internally; not part of the external API. Defaults to None.Also applies to: 8216-8216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/openapi.json` around lines 7993 - 8004, The OpenAPI schema exposes the
internal field "inline_rag_context" in the public request model for the
/v1/query and /v1/streaming_query endpoints; remove it from the public request
model (or move it to a separate internal-only schema) so clients cannot set it.
Locate occurrences of "inline_rag_context" in docs/openapi.json (specifically
under the request body schemas for /v1/query and /v1/streaming_query) and either
delete that property from the public request schema or mark it as internal-only
(e.g., move to an internal schema reference or set readOnly/internal extension)
and update any $ref references accordingly.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Grep occurrences of 'inline_rag_context' across repo =="
rg -n -S --no-heading -C2 'inline_rag_context' || true
echo
echo "== Grep occurrences of title string 'Inline Rag Context' =="
rg -n -S --no-heading -C1 '"Inline Rag Context"' || true
echo
echo "== Show JSON node for components.schemas.QueryRequest.properties.inline_rag_context (if present) =="
if command -v jq >/dev/null 2>&1; then
if [ -f docs/openapi.json ]; then
jq -r 'try .components.schemas.QueryRequest.properties.inline_rag_context // empty' docs/openapi.json || true
else
echo "docs/openapi.json not found"
fi
else
echo "jq not available"
fi
echo
echo "== Confirm Python model marks the field as internal (exclude=True) =="
rg -n -S --no-heading -C2 'inline_rag_context' src/models/requests.py || true
Length of output: 4673
@coderabbit-user Thanks — confirmed via repo scan that inline_rag_context is:
- Exposed in docs/openapi.json under components.schemas.QueryRequest.properties at Lines 7993–8003
- Mentioned in the QueryRequest description at Line 8216
- Set only internally by endpoints and consumed in utils/query.py
Here’s a minimal patch to keep the field out of the public OpenAPI while leaving internal behavior unchanged.
Diff 1/2: remove property from public schema
diff --git a/docs/openapi.json b/docs/openapi.json
@@ -7990,18 +7990,6 @@
]
},
- "inline_rag_context": {
- "anyOf": [
- {
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "title": "Inline Rag Context",
- "description": "RAG context injected into the query before sending to the LLM. Populated internally; not part of the external API."
- },
"conversation_id": {
"anyOf": [
{Diff 2/2: remove mention from the description block
diff --git a/docs/openapi.json b/docs/openapi.json
@@ -8214,7 +8214,6 @@
- "description": "Model representing a request for the LLM (Language Model).\n\nAttributes:\n query: The query string.\n inline_rag_context: RAG context injected into the query before sending to the LLM.\n Populated internally; not part of the external API. Defaults to None.\n conversation_id: The optional conversation ID (UUID).\n provider: The optional provider.\n model: The optional model.\n system_prompt: The optional system prompt.\n attachments: The optional attachments.\n no_tools: Whether to bypass all tools and MCP servers (default: False).\n generate_topic_summary: Whether to generate topic summary for new conversations.\n media_type: The optional media type for response format (application/json or text/plain).\n vector_store_ids: The optional list of specific vector store IDs to query for RAG.\n shield_ids: The optional list of safety shield IDs to apply.\n\nExample:\n ```python\n query_request = QueryRequest(query=\"Tell me about Kubernetes\")\n ```",
+ "description": "Model representing a request for the LLM (Language Model).\n\nAttributes:\n query: The query string.\n conversation_id: The optional conversation ID (UUID).\n provider: The optional provider.\n model: The optional model.\n system_prompt: The optional system prompt.\n attachments: The optional attachments.\n no_tools: Whether to bypass all tools and MCP servers (default: False).\n generate_topic_summary: Whether to generate topic summary for new conversations.\n media_type: The optional media type for response format (application/json or text/plain).\n vector_store_ids: The optional list of specific vector store IDs to query for RAG.\n shield_ids: The optional list of safety shield IDs to apply.\n\nExample:\n ```python\n query_request = QueryRequest(query=\"Tell me about Kubernetes\")\n ```",Optional follow-up
- Add a lightweight CI guard to prevent regressions: fail if docs/openapi.json contains "inline_rag_context".
- Example: rg -q 'inline_rag_context' docs/openapi.json && { echo "inline_rag_context must not be exposed"; exit 1; } || true
Shout if you want me to open a quick follow-up PR or wire the CI check.
src/app/endpoints/streaming_query.py
Outdated
| # Build RAG context from Inline RAG sources | ||
| rag_context = await build_rag_context(client, query_request, configuration) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the definition and imports related to build_rag_context
echo "=== Looking for build_rag_context definition ==="
rg -n "def build_rag_context|from.*build_rag_context|import.*build_rag_context" src/ --type=py
echo ""
echo "=== Checking streaming_query.py around lines 186-187 ==="
sed -n '180,200p' src/app/endpoints/streaming_query.py
echo ""
echo "=== Checking dedup logic around lines 747-756 ==="
sed -n '740,760p' src/app/endpoints/streaming_query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1967
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where build_rag_context is defined
echo "=== Searching for build_rag_context in utils ==="
fd -e py -x rg -l "build_rag_context" {} \; src/utils/
echo ""
echo "=== Looking for dedup helper in responses.py ==="
cat -n src/utils/responses.py | grep -A5 -B5 "dedup\|duplicate"Repository: lightspeed-core/lightspeed-stack
Length of output: 487
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check error handling patterns in streaming_query.py context
echo "=== Error handling near build_rag_context call ==="
sed -n '175,210p' src/app/endpoints/streaming_query.py | cat -n
echo ""
echo "=== Check if build_rag_context handles API errors ==="
rg -A10 "async def build_rag_context" src/ --type=pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2367
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full build_rag_context implementation ==="
sed -n '479,550p' src/utils/vector_search.py
echo ""
echo "=== Check for APIConnectionError/APIStatusError handling in build_rag_context ==="
rg -A50 "async def build_rag_context" src/utils/vector_search.py | head -80Repository: lightspeed-core/lightspeed-stack
Length of output: 4270
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for dedup helper in responses.py ==="
cat src/utils/responses.py | grep -i "dedup\|duplicate" || echo "No dedup/duplicate found"
echo ""
echo "=== Check if responses.py has any document deduplication helper ==="
rg "def.*dedup|def.*duplicate" src/utils/ --type=pyRepository: lightspeed-core/lightspeed-stack
Length of output: 363
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check the entire streaming endpoint function for error handling ==="
rg -B5 "async def streaming_query" src/app/endpoints/streaming_query.py | head -20
echo ""
echo "=== Look for try/except blocks in streaming_query function ==="
awk '/async def streaming_query/,/^async def|^def [a-z]/' src/app/endpoints/streaming_query.py | grep -n "try\|except\|raise HTTPException" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 454
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== deduplicate_referenced_documents helper ==="
rg -A15 "def deduplicate_referenced_documents" src/utils/responses.py
echo ""
echo "=== Check _fetch_byok_rag and _fetch_solr_rag for error handling ==="
rg -B2 -A20 "async def _fetch_byok_rag|async def _fetch_solr_rag" src/utils/vector_search.py | head -80Repository: lightspeed-core/lightspeed-stack
Length of output: 2160
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check imports and exception handling at top of vector_search.py ==="
head -50 src/utils/vector_search.py | grep -E "^import|^from|APIConnectionError|APIStatusError" || echo "Checking full imports..."
echo ""
echo "=== Check for APIConnectionError/APIStatusError handling in vector_search.py ==="
rg "APIConnectionError|APIStatusError|except.*Error" src/utils/vector_search.py
echo ""
echo "=== Look for the complete streaming_query_endpoint_handler function signature and error handling ==="
sed -n '130,280p' src/app/endpoints/streaming_query.py | grep -n "try\|except\|HTTPException\|async def" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 671
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check error handling structure in _fetch_byok_rag and _fetch_solr_rag implementations ==="
rg -A40 "async def _fetch_byok_rag" src/utils/vector_search.py | head -50
echo ""
echo "=== Check _fetch_solr_rag for try/except ==="
rg -A50 "async def _fetch_solr_rag" src/utils/vector_search.py | head -60Repository: lightspeed-core/lightspeed-stack
Length of output: 3615
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Verify deduplicate_referenced_documents is not used in streaming_query.py ==="
rg "deduplicate_referenced_documents" src/app/endpoints/streaming_query.py || echo "Not found - confirms duplication"
echo ""
echo "=== Show the manual dedup at lines 747-756 more clearly ==="
sed -n '745,760p' src/app/endpoints/streaming_query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 848
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check if streaming_query_endpoint_handler has any try/except or error handling ==="
awk '/async def streaming_query_endpoint_handler/,/^async def|^@/' src/app/endpoints/streaming_query.py | grep -c "try\|except" || echo "0"
echo ""
echo "=== Check the direct context around line 186-187 for wrapping ==="
sed -n '150,250p' src/app/endpoints/streaming_query.py | grep -B20 "build_rag_context" | tail -30Repository: lightspeed-core/lightspeed-stack
Length of output: 1110
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== See the except clause in _fetch_byok_rag ==="
rg -A60 "async def _fetch_byok_rag" src/utils/vector_search.py | grep -A5 "except"
echo ""
echo "=== See the except clause in _fetch_solr_rag ==="
rg -A100 "async def _fetch_solr_rag" src/utils/vector_search.py | grep -A5 "except"
echo ""
echo "=== Check what exceptions are caught ==="
rg "except.*:" src/utils/vector_search.py | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 124
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full _fetch_byok_rag with exception handling ==="
rg -A80 "async def _fetch_byok_rag" src/utils/vector_search.py
echo ""
echo "=== Full _fetch_solr_rag with exception handling ==="
rg -A120 "async def _fetch_solr_rag" src/utils/vector_search.py | tail -80Repository: lightspeed-core/lightspeed-stack
Length of output: 6245
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check _fetch_byok_rag for except clause (full function) ==="
awk '/async def _fetch_byok_rag/,/^async def|^def [a-z_]/' src/utils/vector_search.py | tail -50Repository: lightspeed-core/lightspeed-stack
Length of output: 166
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Get line numbers for _fetch_byok_rag and _fetch_solr_rag ==="
rg -n "async def _fetch_byok_rag|async def _fetch_solr_rag|async def build_rag_context" src/utils/vector_search.py
echo ""
echo "=== Get line count of vector_search.py ==="
wc -l src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 310
🏁 Script executed:
#!/bin/bash
set -euoiopefail
# Get the lines for _fetch_byok_rag
echo "=== _fetch_byok_rag function ==="
sed -n '320,420p' src/utils/vector_search.py | tail -60Repository: lightspeed-core/lightspeed-stack
Length of output: 1290
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check exception handling: is there an except in _fetch_byok_rag? ==="
awk '/^async def _fetch_byok_rag/,/^async def _fetch_solr_rag/' src/utils/vector_search.py | grep -n "except\|try" | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 232
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== _fetch_byok_rag except block ==="
sed -n '316,404p' src/utils/vector_search.py | tail -10
echo ""
echo "=== Check the full except clause in _fetch_byok_rag ==="
sed -n '395,404p' src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1029
Add proper error handling for RAG context fetching per guidelines.
The build_rag_context() call at line 186 is not wrapped with error translation. While the helper functions internally catch and log errors gracefully, they do not convert API exceptions to HTTPException as required by the endpoint error handling guidelines. Additionally, wrap the call in a try/except that converts caught exceptions (e.g., APIConnectionError, general Exception) to appropriate HTTPException with meaningful status codes.
Also, replace the manual deduplication logic at lines 747-756 with the existing deduplicate_referenced_documents() helper from utils/responses.py to avoid code duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/streaming_query.py` around lines 186 - 187, Wrap the call
to build_rag_context(client, query_request, configuration) in a try/except that
catches APIConnectionError and Exception and re-raises fastapi.HTTPException
with appropriate status codes/messages (e.g., 502 for APIConnectionError, 500
for generic errors) so the endpoint returns translated HTTP errors; ensure the
caught exception is logged before raising the HTTPException. Also remove the
manual deduplication loop currently implemented and call
deduplicate_referenced_documents(referenced_documents) from utils.responses
instead to centralize logic and avoid duplication; update any variable names to
use the helper's return value where referenced in the streaming_query handler.
asimurka
left a comment
There was a problem hiding this comment.
LGTM, all my comments are resolved.
…fix query mutation
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/vector_search.py (1)
198-199:⚠️ Potential issue | 🟠 MajorPreserve previously extracted
reference_urlwhen chunk metadata lacks it.Line 198 and Line 204 currently overwrite
reference_urlwithNonein some cases, which can drop a valid URL fromchunk.metadata.Proposed fix
- reference_url = chunk_meta.get("reference_url") + reference_url = reference_url or chunk_meta.get("reference_url") ... - reference_url = getattr(chunk_meta, "reference_url", None) + reference_url = reference_url or getattr( + chunk_meta, "reference_url", None + )Also applies to: 204-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 198 - 199, The code is overwriting a previously extracted reference_url with None when chunk_meta lacks "reference_url"; update the assignment so it only replaces reference_url if chunk_meta contains a non-empty value (e.g., check if "reference_url" in chunk_meta or use chunk_meta.get("reference_url", reference_url)) so existing reference_url from chunk.metadata is preserved; apply this change to the places where reference_url is set from chunk_meta (the reference_url variable and the chunk_meta.get(...) usage) so you never intentionally drop a valid URL.
♻️ Duplicate comments (6)
docs/rag_guide.md (1)
335-339:⚠️ Potential issue | 🟡 MinorAdd a language specifier to the fenced code block.
This still triggers markdownlint MD040.
🛠️ Proposed fix
-``` +```bash curl -sX POST http://localhost:8080/v1/query \ -H "Content-Type: application/json" \ -d '{"query" : "how do I secure a nodejs application with keycloak?"}' | jq .</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/rag_guide.mdaround lines 335 - 339, The fenced code block containing
the curl example is missing a language specifier which triggers markdownlint
MD040; add a language tag (e.g., "bash") to the opening fence of the curl
snippet so the block starts with ```bash to resolve the lint warning and improve
syntax highlighting for the curl POST example.</details> </blockquote></details> <details> <summary>docs/byok_guide.md (1)</summary><blockquote> `310-315`: _⚠️ Potential issue_ | _🟡 Minor_ **Clarify combined-mode behavior when Inline + Tool are both enabled.** Lines 310–315 should explicitly mention that tool invocation remains model-driven (non-deterministic) and that inline source merging is BYOK-first/OKP-after without cross-source reranking. <details> <summary>📝 Proposed wording</summary> ```diff Both modes can be enabled simultaneously. Choose based on your latency and control preferences: + +> [!NOTE] +> When Inline and Tool are both enabled, `file_search` tool usage is model-driven and non-deterministic. +> Inline retrieval merges sources by appending OKP chunks after BYOK chunks (no cross-source reranking). ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@docs/byok_guide.md` around lines 310 - 315, Update the combined-mode documentation to explicitly state that when both Inline RAG and Tool RAG are enabled the tool calls remain model-driven (non-deterministic) and that inline context merging follows a BYOK-first then OKP-after order with no cross-source reranking; modify the text around the "Inline RAG" and "Tool RAG" table entries to add a short sentence clarifying these behaviors and noting that tool invocation timing is decided by the LLM while inline sources are merged deterministically BYOK-first/OKP-second without reordering across sources. ``` </details> </blockquote></details> <details> <summary>src/app/endpoints/streaming_query.py (1)</summary><blockquote> `744-753`: _⚠️ Potential issue_ | _🟡 Minor_ **Use the shared referenced-document dedup helper here.** Lines 746–753 reimplement deduplication locally using raw `(doc.doc_url, doc.doc_title)` keys. Reusing `deduplicate_referenced_documents()` from `src/utils/responses.py` keeps URL normalization behavior consistent across endpoints. <details> <summary>🛠️ Proposed fix</summary> ```diff from utils.responses import ( build_mcp_tool_call_from_arguments_done, build_tool_call_summary, build_tool_result_from_mcp_output_item_done, + deduplicate_referenced_documents, extract_token_usage, extract_vector_store_ids_from_tools, @@ - if turn_summary.inline_rag_documents: - all_documents = turn_summary.inline_rag_documents + tool_based_documents - seen = set() - deduplicated_documents = [] - for doc in all_documents: - key = (doc.doc_url, doc.doc_title) - if key not in seen: - seen.add(key) - deduplicated_documents.append(doc) - turn_summary.referenced_documents = deduplicated_documents + if turn_summary.inline_rag_documents: + all_documents = turn_summary.inline_rag_documents + tool_based_documents + turn_summary.referenced_documents = deduplicate_referenced_documents(all_documents) else: turn_summary.referenced_documents = tool_based_documents ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 744 - 753, Replace the manual deduplication loop with the shared helper deduplicate_referenced_documents by calling deduplicate_referenced_documents(all_documents) (where all_documents is turn_summary.inline_rag_documents + tool_based_documents) and assigning its result to turn_summary.referenced_documents; also ensure the deduplicate_referenced_documents symbol from src/utils/responses.py is imported into streaming_query.py if not already present so URL normalization and dedup behavior remain consistent across endpoints. ``` </details> </blockquote></details> <details> <summary>src/utils/responses.py (1)</summary><blockquote> `169-170`: _⚠️ Potential issue_ | _🟠 Major_ **Do not list vector stores when Tool RAG is disabled.** At Line 170, `get_vector_store_ids()` is called unconditionally. If Tool RAG is off and `vector_stores.list()` fails, the whole request can fail for a disabled feature. <details> <summary>🛠️ Proposed fix</summary> ```diff toolgroups: list[InputTool] = [] - # Get vector stores for RAG tools - use specified ones or fetch all - vector_store_ids = await get_vector_store_ids(client, vector_store_ids) - - # Add RAG tools if vector stores are available - rag_tools = get_rag_tools(vector_store_ids) - if rag_tools: - toolgroups.extend(rag_tools) + # Add RAG tools only when Tool RAG is enabled + if configuration.rag.tool.byok.enabled: + resolved_vector_store_ids = await get_vector_store_ids(client, vector_store_ids) + rag_tools = get_rag_tools(resolved_vector_store_ids) + if rag_tools: + toolgroups.extend(rag_tools) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 169 - 170, Currently get_vector_store_ids(client, vector_store_ids) is invoked unconditionally which can cause failures when Tool RAG is disabled; update the flow so that get_vector_store_ids is only called when the Tool RAG feature flag is enabled (check the existing flag/utility that indicates Tool RAG is active), otherwise skip the call and ensure vector_store_ids remains empty or unchanged; specifically modify the code around the vector_store_ids assignment (the call to get_vector_store_ids) to guard with the Tool RAG check used elsewhere in the codebase so list() is never invoked for a disabled feature. ``` </details> </blockquote></details> <details> <summary>src/utils/vector_search.py (2)</summary><blockquote> `281-281`: _⚠️ Potential issue_ | _🟠 Major_ **Avoid logging raw Solr payload objects.** Line 281 and Line 446 log full objects that can contain retrieved chunk content/metadata. Prefer IDs/counts to reduce data exposure in debug logs. <details> <summary>Proposed fix</summary> ```diff - logger.debug("Extract doc ids from chunk: %s", chunk) + logger.debug( + "Extract doc ids from chunk_id=%s (has_metadata=%s, has_chunk_metadata=%s)", + getattr(chunk, "chunk_id", None), + bool(getattr(chunk, "metadata", None)), + bool(getattr(chunk, "chunk_metadata", None)), + ) ... - logger.debug("Solr query response: %s", query_response) + logger.debug( + "Solr query returned chunks=%d scores=%d", + len(getattr(query_response, "chunks", []) or []), + len(getattr(query_response, "scores", []) or []), + ) ``` </details> Also applies to: 446-446 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` at line 281, The debug logs currently output full Solr payload objects (e.g., the logger.debug call that logs "Extract doc ids from chunk: %s" and the similar debug at the other location), which can expose chunk content/metadata; change these debug statements to log only safe identifiers and summary counts (e.g., chunk id(s), document ids list length, number of fields) instead of the full chunk payload. Locate the logger.debug calls that include the literal text "Extract doc ids from chunk" and the other debug at the later occurrence and replace their message/arguments to include chunk.get('id') or explicit doc id lists and len(...) summaries rather than the entire chunk object. Ensure both occurrences are updated. ``` </details> --- `593-599`: _⚠️ Potential issue_ | _🟠 Major_ **Handle dict-shaped `chunk_metadata` in Solr chunk conversion.** Line 593-599 only supports attribute-style access. When `chunk_metadata` is a `dict`, `document_id` and offline URL fallback are skipped. <details> <summary>Proposed fix</summary> ```diff if hasattr(chunk, "chunk_metadata") and chunk.chunk_metadata: - if hasattr(chunk.chunk_metadata, "document_id"): - doc_id = chunk.chunk_metadata.document_id + chunk_meta = chunk.chunk_metadata + if isinstance(chunk_meta, dict): + doc_id = chunk_meta.get("document_id") or chunk_meta.get("doc_id") + else: + doc_id = getattr(chunk_meta, "document_id", None) or getattr( + chunk_meta, "doc_id", None + ) + if doc_id: attributes["document_id"] = doc_id # Build URL if not already set if "doc_url" not in attributes and offline and doc_id: attributes["doc_url"] = urljoin(constants.MIMIR_DOC_URL, doc_id) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 593 - 599, The current conversion only handles attribute-style chunk.chunk_metadata and skips dict-shaped metadata; update the logic around chunk, chunk.chunk_metadata, and attributes to handle both mapping and object styles: if chunk has chunk_metadata, check if isinstance(chunk.chunk_metadata, dict) and read doc_id = chunk.chunk_metadata.get("document_id"), else fall back to getattr(chunk.chunk_metadata, "document_id", None); then set attributes["document_id"] when doc_id is present and apply the same offline doc_url fallback (using urljoin(constants.MIMIR_DOC_URL, doc_id)) when "doc_url" not in attributes. Ensure you reference the same symbols (chunk, chunk_metadata, document_id, attributes, offline, urljoin, constants.MIMIR_DOC_URL) so the change fits into the existing solr chunk conversion code. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (2)</summary><blockquote> <details> <summary>lightspeed-stack.yaml (1)</summary><blockquote> `37-41`: **Consider using OKP wording in the inline RAG comment for consistency.** The config key is `okp`, so updating the comment text avoids mixed terminology. <details> <summary>✏️ Suggested wording</summary> ```diff - # Inline RAG (inject context pre-query with RAG from Solr and BYOK vector stores) + # Inline RAG (inject context pre-query with RAG from OKP (Solr-backed) and BYOK vector stores) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@lightspeed-stack.yaml` around lines 37 - 41, Update the inline RAG comment to use "OKP" terminology to match the config key; specifically change the comment above the inline block that references RAG from "OKP and BYOK" or similar to explicitly use "OKP" (matching the okp key) so the comment aligns with the inline.okp setting and avoids mixed terminology. ``` </details> </blockquote></details> <details> <summary>src/utils/responses.py (1)</summary><blockquote> `343-355`: **Align `get_rag_tools` type/docstring with actual return behavior.** Lines 343–355 always return a list (`[]` or populated list), but the signature/docstring still describe `Optional`/`None`. <details> <summary>♻️ Proposed fix</summary> ```diff -def get_rag_tools(vector_store_ids: list[str]) -> Optional[list[InputToolFileSearch]]: +def get_rag_tools(vector_store_ids: list[str]) -> list[InputToolFileSearch]: @@ - Returns: - List containing file_search tool configuration, or None if RAG as tool is disabled + Returns: + List containing file_search tool configuration, or empty list if disabled/unavailable ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 343 - 355, The get_rag_tools function currently always returns a list (either [] or a populated list) but its signature and docstring claim Optional/None; update the function signature to return list[InputToolFileSearch] (remove Optional) and revise the docstring to state it always returns a list of file_search tool configurations (empty list when RAG-as-tool is disabled via configuration.rag.tool.byok.enabled), and ensure any type imports or annotations referencing InputToolFileSearch remain correct. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@docs/openapi.json:
- Around line 5527-5531: The OpenAPI schema is missing the backward-compatible
default for ToolRagConfiguration.byok (the runtime defaults
ToolRagConfiguration.byok.enabled to true), so update the schema to declare a
default for ToolRagConfiguration.byok that sets enabled: true; specifically add
a default object for the ToolRagConfiguration.byok property mirroring
ByokRagConfiguration (with "enabled": true) so the OpenAPI JSON reflects the
actual default behavior of ToolRagConfiguration.byok.enabled.- Line 4450: The GET and POST operations for the /a2a path both use the same
operationId "handle_a2a_jsonrpc_a2a_post", which must be globally unique; update
the GET operation's operationId to "handle_a2a_jsonrpc_a2a_get" so the two
operations differ. Locate the /a2a GET operation in the OpenAPI spec where
operationId is "handle_a2a_jsonrpc_a2a_post" and change it to
"handle_a2a_jsonrpc_a2a_get", then run your OpenAPI validator and regenerate any
client code to ensure no lingering duplicate operationId references remain.In
@docs/rag_guide.md:
- Around line 29-33: Update the Inline RAG description to use OKP terminology
instead of "Solr and/or BYOK": edit the sentence under "Inline RAG" so it
references context fetched from the okp vector stores (and/or Solr if still
applicable) and aligns with the config/API naming (use the termokpwhere the
guide currently says "BYOK"); ensure the heading "Inline RAG" and the phrase
"context is fetched" remain intact while replacing the offending wording.In
@src/app/endpoints/query.py:
- Line 170: The moderation check is being applied to responses_params.input
after inline RAG content may have been injected via inline_rag_context, causing
false positives; change the flow so you preserve the original user query (e.g.,
store it in a separate variable like original_user_input) and run moderation
against that original text before any inline_rag_context is attached, then only
build responses_params.input by combining the already-moderated
original_user_input with inline_rag_context afterwards; locate usages of
inline_rag_context and responses_params.input in query.py (and the moderation
call/function) and update them so moderation consumes the unmodified user input
variable rather than the post-injection responses_params.input.- Around line 157-158: The call to build_rag_context(client, query_request,
configuration) can raise APIConnectionError and API status exceptions but is not
wrapped; catch those exceptions around the inline_rag_context assignment and
handle them the same way retrieve_response(...) does: catch APIConnectionError
and return/raise a 503-style connectivity response, and catch API status errors
and propagate/marshal their HTTP status and message accordingly; reference the
build_rag_context call and mirror the exception handling logic used for
retrieve_response to ensure consistent HTTP error mapping.In
@src/utils/query.py:
- Around line 214-215: The code concatenates inline_rag_context directly onto
input_text (variables inline_rag_context and input_text), which can merge
prompts; update the concatenation in the block that checks inline_rag_context to
ensure a fixed separator is inserted—e.g., if inline_rag_context does not start
with whitespace or newline, prepend a separator like "\n\n" (or always append
"\n\n" before inline_rag_context) before adding to input_text so the injected
context is clearly separated.In
@src/utils/vector_search.py:
- Around line 126-129: The metadata label is incorrect: in
src/utils/vector_search.py the code appends f"document_id: {chunk.source}" to
metadata_parts even though chunk.source is a source identifier; update that
formatted string to use the correct label (e.g., f"source: {chunk.source}" or
f"source_id: {chunk.source}") so metadata_parts reflects the true meaning of
chunk.source (update the line that builds metadata_parts where chunk.source is
referenced).In
@tests/unit/utils/test_responses.py:
- Around line 2280-2282: Replace the improper construction of APIConnectionError
using new with a normal initialized instance so the exception's
initialization logic runs; specifically change the side_effect assignment on
client_mock.vector_stores.list to raise an APIConnectionError created via its
constructor (provide a descriptive message and a minimal/mock request object
consistent with other tests) instead of
APIConnectionError.new(APIConnectionError), preserving the intent of the
test that the call raises this error.
Outside diff comments:
In@src/utils/vector_search.py:
- Around line 198-199: The code is overwriting a previously extracted
reference_url with None when chunk_meta lacks "reference_url"; update the
assignment so it only replaces reference_url if chunk_meta contains a non-empty
value (e.g., check if "reference_url" in chunk_meta or use
chunk_meta.get("reference_url", reference_url)) so existing reference_url from
chunk.metadata is preserved; apply this change to the places where reference_url
is set from chunk_meta (the reference_url variable and the chunk_meta.get(...)
usage) so you never intentionally drop a valid URL.
Duplicate comments:
In@docs/byok_guide.md:
- Around line 310-315: Update the combined-mode documentation to explicitly
state that when both Inline RAG and Tool RAG are enabled the tool calls remain
model-driven (non-deterministic) and that inline context merging follows a
BYOK-first then OKP-after order with no cross-source reranking; modify the text
around the "Inline RAG" and "Tool RAG" table entries to add a short sentence
clarifying these behaviors and noting that tool invocation timing is decided by
the LLM while inline sources are merged deterministically BYOK-first/OKP-second
without reordering across sources.In
@docs/rag_guide.md:
- Around line 335-339: The fenced code block containing the curl example is
missing a language specifier which triggers markdownlint MD040; add a language
tag (e.g., "bash") to the opening fence of the curl snippet so the block starts
with ```bash to resolve the lint warning and improve syntax highlighting for the
curl POST example.In
@src/app/endpoints/streaming_query.py:
- Around line 744-753: Replace the manual deduplication loop with the shared
helper deduplicate_referenced_documents by calling
deduplicate_referenced_documents(all_documents) (where all_documents is
turn_summary.inline_rag_documents + tool_based_documents) and assigning its
result to turn_summary.referenced_documents; also ensure the
deduplicate_referenced_documents symbol from src/utils/responses.py is imported
into streaming_query.py if not already present so URL normalization and dedup
behavior remain consistent across endpoints.In
@src/utils/responses.py:
- Around line 169-170: Currently get_vector_store_ids(client, vector_store_ids)
is invoked unconditionally which can cause failures when Tool RAG is disabled;
update the flow so that get_vector_store_ids is only called when the Tool RAG
feature flag is enabled (check the existing flag/utility that indicates Tool RAG
is active), otherwise skip the call and ensure vector_store_ids remains empty or
unchanged; specifically modify the code around the vector_store_ids assignment
(the call to get_vector_store_ids) to guard with the Tool RAG check used
elsewhere in the codebase so list() is never invoked for a disabled feature.In
@src/utils/vector_search.py:
- Line 281: The debug logs currently output full Solr payload objects (e.g., the
logger.debug call that logs "Extract doc ids from chunk: %s" and the similar
debug at the other location), which can expose chunk content/metadata; change
these debug statements to log only safe identifiers and summary counts (e.g.,
chunk id(s), document ids list length, number of fields) instead of the full
chunk payload. Locate the logger.debug calls that include the literal text
"Extract doc ids from chunk" and the other debug at the later occurrence and
replace their message/arguments to include chunk.get('id') or explicit doc id
lists and len(...) summaries rather than the entire chunk object. Ensure both
occurrences are updated.- Around line 593-599: The current conversion only handles attribute-style
chunk.chunk_metadata and skips dict-shaped metadata; update the logic around
chunk, chunk.chunk_metadata, and attributes to handle both mapping and object
styles: if chunk has chunk_metadata, check if isinstance(chunk.chunk_metadata,
dict) and read doc_id = chunk.chunk_metadata.get("document_id"), else fall back
to getattr(chunk.chunk_metadata, "document_id", None); then set
attributes["document_id"] when doc_id is present and apply the same offline
doc_url fallback (using urljoin(constants.MIMIR_DOC_URL, doc_id)) when "doc_url"
not in attributes. Ensure you reference the same symbols (chunk, chunk_metadata,
document_id, attributes, offline, urljoin, constants.MIMIR_DOC_URL) so the
change fits into the existing solr chunk conversion code.
Nitpick comments:
In@lightspeed-stack.yaml:
- Around line 37-41: Update the inline RAG comment to use "OKP" terminology to
match the config key; specifically change the comment above the inline block
that references RAG from "OKP and BYOK" or similar to explicitly use "OKP"
(matching the okp key) so the comment aligns with the inline.okp setting and
avoids mixed terminology.In
@src/utils/responses.py:
- Around line 343-355: The get_rag_tools function currently always returns a
list (either [] or a populated list) but its signature and docstring claim
Optional/None; update the function signature to return list[InputToolFileSearch]
(remove Optional) and revise the docstring to state it always returns a list of
file_search tool configurations (empty list when RAG-as-tool is disabled via
configuration.rag.tool.byok.enabled), and ensure any type imports or annotations
referencing InputToolFileSearch remain correct.</details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 22fe72da916cb1e8723d45f28384876cd57eee4a and a4075c6d6083b7247ae0c0890c59e2c8c0c5a944. </details> <details> <summary>📒 Files selected for processing (19)</summary> * `docs/byok_guide.md` * `docs/config.md` * `docs/openapi.json` * `docs/rag_guide.md` * `examples/lightspeed-stack-byok-rag.yaml` * `lightspeed-stack.yaml` * `src/app/endpoints/query.py` * `src/app/endpoints/streaming_query.py` * `src/constants.py` * `src/llama_stack_configuration.py` * `src/models/config.py` * `src/utils/query.py` * `src/utils/responses.py` * `src/utils/types.py` * `src/utils/vector_search.py` * `tests/unit/app/endpoints/test_query.py` * `tests/unit/models/config/test_dump_configuration.py` * `tests/unit/test_llama_stack_configuration.py` * `tests/unit/utils/test_responses.py` </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (3)</summary> * examples/lightspeed-stack-byok-rag.yaml * tests/unit/app/endpoints/test_query.py * src/llama_stack_configuration.py </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| "summary": "Handle A2A Jsonrpc", | ||
| "description": "Handle A2A JSON-RPC requests following the A2A protocol specification.\n\nThis endpoint uses the DefaultRequestHandler from the A2A SDK to handle\nall JSON-RPC requests including message/send, message/stream, etc.\n\nThe A2A SDK application is created per-request to include authentication\ncontext while still leveraging FastAPI's authorization middleware.\n\nAutomatically detects streaming requests (message/stream JSON-RPC method)\nand returns a StreamingResponse to enable real-time chunk delivery.\n\nArgs:\n request: FastAPI request object\n auth: Authentication tuple\n mcp_headers: MCP headers for context propagation\n\nReturns:\n JSON-RPC response or streaming response", | ||
| "operationId": "handle_a2a_jsonrpc_a2a_get", | ||
| "operationId": "handle_a2a_jsonrpc_a2a_post", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual content around lines 4450 and 4468
echo "=== Content around line 4450 ==="
sed -n '4445,4455p' docs/openapi.json
echo ""
echo "=== Content around line 4468 ==="
sed -n '4463,4473p' docs/openapi.jsonRepository: lightspeed-core/lightspeed-stack
Length of output: 2378
🏁 Script executed:
#!/bin/bash
# Run the duplicate operationId detection
python - <<'PY'
import json, collections, pathlib
spec = json.loads(pathlib.Path("docs/openapi.json").read_text())
ids = collections.defaultdict(list)
for path, methods in spec.get("paths", {}).items():
for method, op in methods.items():
if isinstance(op, dict):
oid = op.get("operationId")
if oid:
ids[oid].append(f"{method.upper()} {path}")
dups = {k: v for k, v in ids.items() if len(v) > 1}
if dups:
print("=== Duplicate operationIds found ===")
for oid, locations in sorted(dups.items()):
print(f"{oid}:")
for loc in sorted(locations):
print(f" - {loc}")
else:
print("No duplicate operationIds found")
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 168
🏁 Script executed:
#!/bin/bash
# Find the HTTP method declarations for lines 4450 and 4468
# Look backwards from each line to find the method keyword
echo "=== Context for line 4450 (looking back for method) ==="
sed -n '4420,4450p' docs/openapi.json | tail -40
echo ""
echo "=== Context for line 4468 (looking back for method) ==="
sed -n '4440,4468p' docs/openapi.json | tail -40Repository: lightspeed-core/lightspeed-stack
Length of output: 4465
Duplicate operationId values on /a2a GET and POST endpoints.
Line 4450 (GET) and line 4468 (POST) both define operationId: "handle_a2a_jsonrpc_a2a_post". This violates the OpenAPI specification requirement that operation IDs be globally unique and will break API client code generation and method routing.
Change the GET operation ID at line 4450 to "handle_a2a_jsonrpc_a2a_get":
Proposed fix
"/a2a": {
"get": {
"operationId": "handle_a2a_jsonrpc_a2a_get",
},
"post": {
"operationId": "handle_a2a_jsonrpc_a2a_post",
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/openapi.json` at line 4450, The GET and POST operations for the /a2a
path both use the same operationId "handle_a2a_jsonrpc_a2a_post", which must be
globally unique; update the GET operation's operationId to
"handle_a2a_jsonrpc_a2a_get" so the two operations differ. Locate the /a2a GET
operation in the OpenAPI spec where operationId is "handle_a2a_jsonrpc_a2a_post"
and change it to "handle_a2a_jsonrpc_a2a_get", then run your OpenAPI validator
and regenerate any client code to ensure no lingering duplicate operationId
references remain.
| "enabled": { | ||
| "type": "boolean", | ||
| "title": "BYOK RAG enabled", | ||
| "description": "When True, queries BYOK vector stores for RAG context.", | ||
| "default": false |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import json, pathlib
spec = json.loads(pathlib.Path("docs/openapi.json").read_text())
schemas = spec["components"]["schemas"]
print("=== Schema Defaults ===")
print("ByokRagConfiguration.enabled.default =",
schemas["ByokRagConfiguration"]["properties"]["enabled"].get("default"))
print("ToolRagConfiguration.byok.default =",
schemas["ToolRagConfiguration"]["properties"]["byok"].get("default"))
print("\n=== Full ByokRagConfiguration ===")
print(json.dumps(schemas["ByokRagConfiguration"], indent=2))
print("\n=== Full ToolRagConfiguration.byok property ===")
print(json.dumps(schemas["ToolRagConfiguration"]["properties"]["byok"], indent=2))
PY
echo
echo "=== Checking model defaults in source ==="
fd 'config\.py$' src --type f 2>/dev/null | while read cfg_file; do
echo "Found: $cfg_file"
echo "--- Content ---"
rg -n -A5 'class (ByokRagConfiguration|ToolRagConfiguration)' "$cfg_file" || true
doneRepository: lightspeed-core/lightspeed-stack
Length of output: 1303
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get full ToolRagConfiguration definition
echo "=== Full ToolRagConfiguration definition ==="
sed -n '1743,1800p' src/models/config.py
echo
echo "=== Check for backward-compatibility context ==="
rg -i 'backward|compat|tool.*rag.*default' src --type py -B2 -A2 | head -50
echo
echo "=== Check git history for recent changes to RAG defaults ==="
git log --oneline --all -- src/models/config.py docs/openapi.json 2>/dev/null | head -20 || echo "Git history unavailable"Repository: lightspeed-core/lightspeed-stack
Length of output: 5499
Add backward-compatible default to ToolRagConfiguration.byok in OpenAPI schema.
The source code (src/models/config.py:1748) explicitly defaults ToolRagConfiguration.byok.enabled to true for backward compatibility, but the OpenAPI schema does not capture this. The schema currently shows only ByokRagConfiguration.enabled defaulting to false, leaving ToolRagConfiguration.byok without a default, which contradicts the actual API behavior.
Proposed fix
"ToolRagConfiguration": {
"properties": {
"byok": {
"$ref": "#/components/schemas/ByokRagConfiguration",
+ "default": {
+ "enabled": true
+ },
"title": "BYOK RAG configuration",
"description": "Configuration for BYOK RAG as a tool."
}
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/openapi.json` around lines 5527 - 5531, The OpenAPI schema is missing
the backward-compatible default for ToolRagConfiguration.byok (the runtime
defaults ToolRagConfiguration.byok.enabled to true), so update the schema to
declare a default for ToolRagConfiguration.byok that sets enabled: true;
specifically add a default object for the ToolRagConfiguration.byok property
mirroring ByokRagConfiguration (with "enabled": true) so the OpenAPI JSON
reflects the actual default behavior of ToolRagConfiguration.byok.enabled.
| Lightspeed Core Stack (LCS) supports two complementary RAG strategies: | ||
|
|
||
| The system operates a chain of command. The **Agent** is the orchestrator, using the LLM as its reasoning engine. When a plan requires external information, the Agent queries the **Vector Store**. This is your database of indexed knowledge, which you are responsible for creating before running the stack. The **Embedding Model** is used to convert the queries to vectors. | ||
| - **Inline RAG**: context is fetched from Solr and/or BYOK vector stores and injected into every query before the LLM responds. No tool calls are required. | ||
| - **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve context on demand from BYOK vector stores. | ||
|
|
There was a problem hiding this comment.
Use OKP terminology in the high-level Inline RAG definition.
The guide currently says “Solr and/or BYOK,” but the config/API surface uses okp; aligning wording avoids confusion.
✏️ Suggested wording
-- **Inline RAG**: context is fetched from Solr and/or BYOK vector stores and injected into every query before the LLM responds. No tool calls are required.
+- **Inline RAG**: context is fetched from OKP (Solr-backed) and/or BYOK vector stores and injected into every query before the LLM responds. No tool calls are required.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Lightspeed Core Stack (LCS) supports two complementary RAG strategies: | |
| The system operates a chain of command. The **Agent** is the orchestrator, using the LLM as its reasoning engine. When a plan requires external information, the Agent queries the **Vector Store**. This is your database of indexed knowledge, which you are responsible for creating before running the stack. The **Embedding Model** is used to convert the queries to vectors. | |
| - **Inline RAG**: context is fetched from Solr and/or BYOK vector stores and injected into every query before the LLM responds. No tool calls are required. | |
| - **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve context on demand from BYOK vector stores. | |
| Lightspeed Core Stack (LCS) supports two complementary RAG strategies: | |
| - **Inline RAG**: context is fetched from OKP (Solr-backed) and/or BYOK vector stores and injected into every query before the LLM responds. No tool calls are required. | |
| - **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve context on demand from BYOK vector stores. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/rag_guide.md` around lines 29 - 33, Update the Inline RAG description to
use OKP terminology instead of "Solr and/or BYOK": edit the sentence under
"Inline RAG" so it references context fetched from the okp vector stores (and/or
Solr if still applicable) and aligns with the config/API naming (use the term
`okp` where the guide currently says "BYOK"); ensure the heading "Inline RAG"
and the phrase "context is fetched" remain intact while replacing the offending
wording.
src/app/endpoints/query.py
Outdated
| # Build RAG context from Inline RAG sources | ||
| inline_rag_context = await build_rag_context(client, query_request, configuration) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Examine the query.py file at lines 157-158 and surrounding context
cat -n src/app/endpoints/query.py | sed -n '150,170p'Repository: lightspeed-core/lightspeed-stack
Length of output: 909
🏁 Script executed:
#!/bin/bash
# Check for error handling imports and patterns in query.py
head -50 src/app/endpoints/query.py | cat -nRepository: lightspeed-core/lightspeed-stack
Length of output: 2032
🏁 Script executed:
#!/bin/bash
# Search for APIConnectionError handling and build_rag_context function
rg -n 'APIConnectionError|build_rag_context|ServiceUnavailableResponse|handle_known_apistatus_errors' \
src/app/endpoints/query.py src/utils/vector_search.py 2>/dev/null || echo "Searching broader scope..."Repository: lightspeed-core/lightspeed-stack
Length of output: 864
🏁 Script executed:
#!/bin/bash
# Find files with error handling utilities
fd -e py -t f | xargs grep -l 'ServiceUnavailableResponse\|handle_known_apistatus_errors' 2>/dev/nullRepository: lightspeed-core/lightspeed-stack
Length of output: 829
🏁 Script executed:
#!/bin/bash
# Check existing error handling pattern at lines 311-318
cat -n src/app/endpoints/query.py | sed -n '305,325p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1049
🏁 Script executed:
#!/bin/bash
# Check if build_rag_context handles these errors internally
cat -n src/utils/vector_search.py | sed -n '479,530p'Repository: lightspeed-core/lightspeed-stack
Length of output: 2083
🏁 Script executed:
#!/bin/bash
# Check if there's any try-except wrapping the entire query operation
cat -n src/app/endpoints/query.py | sed -n '100,200p'Repository: lightspeed-core/lightspeed-stack
Length of output: 4813
Add error handling for inline RAG context building.
build_rag_context(...) can raise APIConnectionError and API status errors without being caught, causing Llama Stack connectivity failures to propagate as generic 500 responses. This is inconsistent with the error handling applied to retrieve_response(...) at line 187, which properly maps these exceptions to appropriate HTTP errors (503 for connection failures, consistent status responses for API errors).
🛠️ Suggested fix
- inline_rag_context = await build_rag_context(client, query_request, configuration)
+ try:
+ inline_rag_context = await build_rag_context(
+ client, query_request, configuration
+ )
+ except APIConnectionError as e:
+ error_response = ServiceUnavailableResponse(
+ backend_name="Llama Stack",
+ cause=str(e),
+ )
+ raise HTTPException(**error_response.model_dump()) from e
+ except (LLSApiStatusError, OpenAIAPIStatusError) as e:
+ error_response = handle_known_apistatus_errors(e, query_request.model or "unknown")
+ raise HTTPException(**error_response.model_dump()) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/query.py` around lines 157 - 158, The call to
build_rag_context(client, query_request, configuration) can raise
APIConnectionError and API status exceptions but is not wrapped; catch those
exceptions around the inline_rag_context assignment and handle them the same way
retrieve_response(...) does: catch APIConnectionError and return/raise a
503-style connectivity response, and catch API status errors and
propagate/marshal their HTTP status and message accordingly; reference the
build_rag_context call and mirror the exception handling logic used for
retrieve_response to ensure consistent HTTP error mapping.
| if inline_rag_context: | ||
| input_text += inline_rag_context |
There was a problem hiding this comment.
Add a fixed separator before injected inline context.
At Line 215, inline_rag_context is concatenated directly to the query text. If the context string does not start with whitespace/newlines, prompt text can merge unexpectedly.
🛠️ Proposed fix
- if inline_rag_context:
- input_text += inline_rag_context
+ if inline_rag_context:
+ input_text += f"\n\n{inline_rag_context.lstrip()}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/query.py` around lines 214 - 215, The code concatenates
inline_rag_context directly onto input_text (variables inline_rag_context and
input_text), which can merge prompts; update the concatenation in the block that
checks inline_rag_context to ensure a fixed separator is inserted—e.g., if
inline_rag_context does not start with whitespace or newline, prepend a
separator like "\n\n" (or always append "\n\n" before inline_rag_context) before
adding to input_text so the injected context is clearly separated.
| if chunk.source: | ||
| metadata_parts.append(f"document_id: {chunk.source}") | ||
| if chunk.score is not None: | ||
| metadata_parts.append(f"score: {chunk.score:.4f}") |
There was a problem hiding this comment.
Use correct metadata label in formatted context.
Line 127 labels chunk.source as document_id, but it’s a source identifier. This mislabels metadata passed to the model.
Proposed fix
- if chunk.source:
- metadata_parts.append(f"document_id: {chunk.source}")
+ if chunk.source:
+ metadata_parts.append(f"source: {chunk.source}")
+ if chunk.attributes and chunk.attributes.get("document_id"):
+ metadata_parts.append(f"document_id: {chunk.attributes['document_id']}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/vector_search.py` around lines 126 - 129, The metadata label is
incorrect: in src/utils/vector_search.py the code appends f"document_id:
{chunk.source}" to metadata_parts even though chunk.source is a source
identifier; update that formatted string to use the correct label (e.g.,
f"source: {chunk.source}" or f"source_id: {chunk.source}") so metadata_parts
reflects the true meaning of chunk.source (update the line that builds
metadata_parts where chunk.source is referenced).
| client_mock.vector_stores.list.side_effect = APIConnectionError.__new__( | ||
| APIConnectionError | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the test file and verify the code snippet
fd -t f test_responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 153
🏁 Script executed:
# Search for APIConnectionError definition and imports
rg -n "class APIConnectionError" --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check how APIConnectionError is imported in the test file
rg -n "APIConnectionError|from.*llama" tests/unit/utils/test_responses.py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 1007
🏁 Script executed:
# Look at the specific lines mentioned in the review (2280-2282)
rg -n -A5 -B5 "APIConnectionError.__new__" tests/unit/utils/test_responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 586
🏁 Script executed:
# Search for other uses of APIConnectionError in tests to understand patterns
rg -n "APIConnectionError" tests/ --type py -A2 | head -40Repository: lightspeed-core/lightspeed-stack
Length of output: 2731
Use proper initialization for APIConnectionError instead of __new__ to match test patterns.
This is the only place in the test suite where APIConnectionError is constructed via __new__. All other uses (lines 865, 951, 1090, 1115 in this file and across other test files) instantiate it normally with message and request parameters. This inconsistency bypasses initialization logic.
🛠️ Proposed fix
- client_mock.vector_stores.list.side_effect = APIConnectionError.__new__(
- APIConnectionError
- )
+ client_mock.vector_stores.list.side_effect = APIConnectionError(
+ message="Connection failed", request=mocker.Mock()
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| client_mock.vector_stores.list.side_effect = APIConnectionError.__new__( | |
| APIConnectionError | |
| ) | |
| client_mock.vector_stores.list.side_effect = APIConnectionError( | |
| message="Connection failed", request=mocker.Mock() | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/utils/test_responses.py` around lines 2280 - 2282, Replace the
improper construction of APIConnectionError using __new__ with a normal
initialized instance so the exception's initialization logic runs; specifically
change the side_effect assignment on client_mock.vector_stores.list to raise an
APIConnectionError created via its constructor (provide a descriptive message
and a minimal/mock request object consistent with other tests) instead of
APIConnectionError.__new__(APIConnectionError), preserving the intent of the
test that the call raises this error.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/llama_stack_configuration.py (1)
1-6:⚠️ Potential issue | 🟡 MinorFix Black formatting to pass CI.
The pipeline indicates Black would reformat this file. Run
black src/llama_stack_configuration.pyto fix formatting issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 1 - 6, This file fails Black formatting; run Black on the module (e.g., black src/llama_stack_configuration.py) or reformat the file to match Black's style so CI passes; ensure the module docstring and any definitions such as generate_configuration are formatted according to Black (consistent line breaks/quotes/spacing) before committing.src/utils/vector_search.py (1)
1-1:⚠️ Potential issue | 🟡 MinorBlack check is failing for this file.
CI reports
src/utils/vector_search.pywould be reformatted. Please run Black on this file before merge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` at line 1, CI reports src/utils/vector_search.py is not formatted with Black; run the Black formatter (e.g., black src/utils/vector_search.py) or apply the project's Black config to reformat the file so it matches CI; ensure the file-level docstring and all code in the vector_search.py module are reformatted accordingly and re-run linters/CI to confirm the change.src/app/endpoints/query.py (1)
1-1:⚠️ Potential issue | 🟡 MinorBlack check is failing for this file.
CI reports
src/app/endpoints/query.pywould be reformatted. Please run Black on this file before merge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` at line 1, The file src/app/endpoints/query.py fails Black formatting; run Black (e.g., python -m black src/app/endpoints/query.py) to auto-reformat the file and commit the changes so CI passes; ensure the pylint disable header ("# pylint: disable=too-many-locals,too-many-branches,too-many-nested-blocks") remains intact and that any imports/functions within query.py (e.g., endpoint handlers or helper functions defined there) are updated to match Black's formatting before pushing the commit.
♻️ Duplicate comments (3)
src/utils/vector_search.py (2)
593-600:⚠️ Potential issue | 🟠 MajorSupport dict-based
chunk_metadatawhen extracting Solr document IDs.The current conversion only handles attribute access (
chunk.chunk_metadata.document_id). Ifchunk_metadatais a dict,document_id(and offline URL fallback) is dropped.Proposed fix
- if hasattr(chunk, "chunk_metadata") and chunk.chunk_metadata: - if hasattr(chunk.chunk_metadata, "document_id"): - doc_id = chunk.chunk_metadata.document_id - attributes["document_id"] = doc_id - # Build URL if not already set - if "doc_url" not in attributes and offline and doc_id: - attributes["doc_url"] = urljoin(constants.MIMIR_DOC_URL, doc_id) + if hasattr(chunk, "chunk_metadata") and chunk.chunk_metadata: + chunk_meta = chunk.chunk_metadata + if isinstance(chunk_meta, dict): + doc_id = chunk_meta.get("document_id") or chunk_meta.get("doc_id") + else: + doc_id = getattr(chunk_meta, "document_id", None) or getattr( + chunk_meta, "doc_id", None + ) + if doc_id: + attributes["document_id"] = doc_id + # Build URL if not already set + if "doc_url" not in attributes and offline: + attributes["doc_url"] = urljoin(constants.MIMIR_DOC_URL, doc_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 593 - 600, The code only reads chunk.chunk_metadata.document_id via attribute access and ignores dict-style metadata; update the extraction logic in the block that handles chunk and chunk_metadata (referencing chunk, chunk_metadata, attributes, doc_id) to handle both attribute and mapping access: if chunk.chunk_metadata is a dict, read document_id = chunk.chunk_metadata.get("document_id"), otherwise use getattr(chunk.chunk_metadata, "document_id", None); then set attributes["document_id"]=doc_id if present and keep the existing offline URL fallback using urljoin(constants.MIMIR_DOC_URL, doc_id) when "doc_url" not in attributes and offline is true.
125-127:⚠️ Potential issue | 🟡 MinorUse the correct metadata label for
chunk.sourcein inline context.
chunk.sourceis a source identifier, not a document ID. This mislabels metadata sent to the model.Proposed fix
- if chunk.source: - metadata_parts.append(f"document_id: {chunk.source}") + if chunk.source: + metadata_parts.append(f"source: {chunk.source}") + if chunk.attributes and chunk.attributes.get("document_id"): + metadata_parts.append(f"document_id: {chunk.attributes['document_id']}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 125 - 127, The metadata label for chunk.source is incorrect: in the metadata assembly (the metadata_parts list in src/utils/vector_search.py) replace the mislabel "document_id: {chunk.source}" with the correct inline label "source: {chunk.source}" so the model receives the proper source identifier; update the conditional that appends to metadata_parts (the block referencing chunk.source and metadata_parts) to use "source" rather than "document_id".src/app/endpoints/query.py (1)
172-172:⚠️ Potential issue | 🟠 MajorModerate the original user query, not the inline RAG-augmented payload.
responses_params.inputcan include injected inline context, which can cause false moderation blocks unrelated to user intent. Keep moderation input separate from LLM-augmented input.Proposed fix
- turn_summary = await retrieve_response( + turn_summary = await retrieve_response( client, responses_params, query_request.shield_ids, vector_store_ids, rag_id_mapping, + moderation_input=query_request.query, )async def retrieve_response( # pylint: disable=too-many-locals client: AsyncLlamaStackClient, responses_params: ResponsesApiParams, shield_ids: Optional[list[str]] = None, vector_store_ids: Optional[list[str]] = None, rag_id_mapping: Optional[dict[str, str]] = None, + moderation_input: Optional[str] = None, ) -> TurnSummary: @@ - moderation_result = await run_shield_moderation( - client, cast(str, responses_params.input), shield_ids - ) + moderation_payload = ( + moderation_input + if moderation_input is not None + else cast(str, responses_params.input) + ) + moderation_result = await run_shield_moderation( + client, moderation_payload, shield_ids + )Also applies to: 290-292
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` at line 172, The moderation call is currently using the LLM-augmented payload (e.g., inline_rag_context.context_text or responses_params.input) which may contain injected context; change the moderation input to the original user query (use the raw user input variable—e.g., responses_params.input_raw or the original_request_input variable that holds the user message before augmentation) so only the unmodified user query is moderated; update the two places noted (the inline_rag_context usage around inline_rag_context.context_text and the duplicate occurrence at the later block around lines 290-292) to pass the original user input variable to the moderation function instead of responses_params.input or inline_rag_context.context_text.
🧹 Nitpick comments (2)
src/llama_stack_configuration.py (2)
250-253: Consider extracting duplicate prefix-stripping logic into a helper function.This same logic for stripping the
sentence-transformers/prefix appears here and inenrich_solr(lines 476-478). A small helper would reduce duplication.♻️ Suggested helper function
def strip_sentence_transformers_prefix(model_id: str) -> str: """Strip 'sentence-transformers/' prefix from model ID if present.""" prefix = "sentence-transformers/" if model_id.startswith(prefix): return model_id[len(prefix):] return model_idThen use:
- provider_model_id = embedding_model - if provider_model_id.startswith("sentence-transformers/"): - provider_model_id = provider_model_id[len("sentence-transformers/") :] + provider_model_id = strip_sentence_transformers_prefix(embedding_model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 250 - 253, Extract the duplicated prefix-stripping logic into a small helper, e.g., strip_sentence_transformers_prefix(model_id: str), and replace the inline logic in the block using provider_model_id/embedding_model and the corresponding code in enrich_solr with calls to that helper; ensure the helper returns the original string when the "sentence-transformers/" prefix is not present and update references to use the helper to avoid duplication.
379-438: LGTM with optional suggestion to modularize.The Solr provider enrichment logic is correct - it properly checks for existing providers before adding and uses environment variable placeholders with sensible defaults from constants.
For improved maintainability, consider breaking this ~120-line function into smaller helpers (e.g.,
_add_solr_provider,_add_solr_vector_store,_add_solr_embedding_model), but this is optional given the clear sectional comments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 379 - 438, The function enrich_solr is a bit large and would benefit from modularization; extract logical sections into small helper functions to improve readability and testability. Specifically, create helpers like _add_solr_provider(ls_config, solr_config) to encapsulate the provider construction and append logic (currently the block that builds solr_url_env, collection_env, vector_field_env, content_field_env, embedding_model_env, embedding_dim_env and the dict appended to ls_config["providers"]["vector_io"]), and optionally _ensure_vector_io_provider_container(ls_config) to initialize ls_config["providers"]["vector_io"] if missing; then update enrich_solr to call these helpers while keeping the same behavior and constants (constants.SOLR_PROVIDER_ID, SOLR_DEFAULT_*), ensuring no change to existing condition checks (e.g., if not solr_config or not solr_config.get("enabled") and the existing_providers check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 202-203: The code currently runs moderation against
responses_params.input after inline RAG has been injected; change the flow so
moderation runs on the raw user query before any inline RAG injection: locate
where inline_rag_context is built and where moderation is invoked (references:
inline_rag_context and responses_params.input), capture the original user query
variable (the pre-injection input on responses_params) and call the moderation
routine on that raw value, then only after moderation passes attach
inline_rag_context=context_text or None and proceed; apply the same change at
the other occurrence mentioned (the block around lines 282-285) so both
streaming paths moderate the raw query prior to injection.
---
Outside diff comments:
In `@src/app/endpoints/query.py`:
- Line 1: The file src/app/endpoints/query.py fails Black formatting; run Black
(e.g., python -m black src/app/endpoints/query.py) to auto-reformat the file and
commit the changes so CI passes; ensure the pylint disable header ("# pylint:
disable=too-many-locals,too-many-branches,too-many-nested-blocks") remains
intact and that any imports/functions within query.py (e.g., endpoint handlers
or helper functions defined there) are updated to match Black's formatting
before pushing the commit.
In `@src/llama_stack_configuration.py`:
- Around line 1-6: This file fails Black formatting; run Black on the module
(e.g., black src/llama_stack_configuration.py) or reformat the file to match
Black's style so CI passes; ensure the module docstring and any definitions such
as generate_configuration are formatted according to Black (consistent line
breaks/quotes/spacing) before committing.
In `@src/utils/vector_search.py`:
- Line 1: CI reports src/utils/vector_search.py is not formatted with Black; run
the Black formatter (e.g., black src/utils/vector_search.py) or apply the
project's Black config to reformat the file so it matches CI; ensure the
file-level docstring and all code in the vector_search.py module are reformatted
accordingly and re-run linters/CI to confirm the change.
---
Duplicate comments:
In `@src/app/endpoints/query.py`:
- Line 172: The moderation call is currently using the LLM-augmented payload
(e.g., inline_rag_context.context_text or responses_params.input) which may
contain injected context; change the moderation input to the original user query
(use the raw user input variable—e.g., responses_params.input_raw or the
original_request_input variable that holds the user message before augmentation)
so only the unmodified user query is moderated; update the two places noted (the
inline_rag_context usage around inline_rag_context.context_text and the
duplicate occurrence at the later block around lines 290-292) to pass the
original user input variable to the moderation function instead of
responses_params.input or inline_rag_context.context_text.
In `@src/utils/vector_search.py`:
- Around line 593-600: The code only reads chunk.chunk_metadata.document_id via
attribute access and ignores dict-style metadata; update the extraction logic in
the block that handles chunk and chunk_metadata (referencing chunk,
chunk_metadata, attributes, doc_id) to handle both attribute and mapping access:
if chunk.chunk_metadata is a dict, read document_id =
chunk.chunk_metadata.get("document_id"), otherwise use
getattr(chunk.chunk_metadata, "document_id", None); then set
attributes["document_id"]=doc_id if present and keep the existing offline URL
fallback using urljoin(constants.MIMIR_DOC_URL, doc_id) when "doc_url" not in
attributes and offline is true.
- Around line 125-127: The metadata label for chunk.source is incorrect: in the
metadata assembly (the metadata_parts list in src/utils/vector_search.py)
replace the mislabel "document_id: {chunk.source}" with the correct inline label
"source: {chunk.source}" so the model receives the proper source identifier;
update the conditional that appends to metadata_parts (the block referencing
chunk.source and metadata_parts) to use "source" rather than "document_id".
---
Nitpick comments:
In `@src/llama_stack_configuration.py`:
- Around line 250-253: Extract the duplicated prefix-stripping logic into a
small helper, e.g., strip_sentence_transformers_prefix(model_id: str), and
replace the inline logic in the block using provider_model_id/embedding_model
and the corresponding code in enrich_solr with calls to that helper; ensure the
helper returns the original string when the "sentence-transformers/" prefix is
not present and update references to use the helper to avoid duplication.
- Around line 379-438: The function enrich_solr is a bit large and would benefit
from modularization; extract logical sections into small helper functions to
improve readability and testability. Specifically, create helpers like
_add_solr_provider(ls_config, solr_config) to encapsulate the provider
construction and append logic (currently the block that builds solr_url_env,
collection_env, vector_field_env, content_field_env, embedding_model_env,
embedding_dim_env and the dict appended to ls_config["providers"]["vector_io"]),
and optionally _ensure_vector_io_provider_container(ls_config) to initialize
ls_config["providers"]["vector_io"] if missing; then update enrich_solr to call
these helpers while keeping the same behavior and constants
(constants.SOLR_PROVIDER_ID, SOLR_DEFAULT_*), ensuring no change to existing
condition checks (e.g., if not solr_config or not solr_config.get("enabled") and
the existing_providers check).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/llama_stack_configuration.pysrc/utils/vector_search.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)
310-327:⚠️ Potential issue | 🟠 MajorPrevent duplicate
provider_identries inproviders.vector_io.This appends providers unconditionally. Duplicate
rag_idvalues (or re-enrichment over existing config) can produce duplicateprovider_ids and ambiguous provider resolution.🛠️ Proposed fix
def construct_vector_io_providers_section( ls_config: dict[str, Any], byok_rag: list[dict[str, Any]] ) -> list[dict[str, Any]]: @@ - # append new vector_io entries + # append new vector_io entries + existing_provider_ids = {p.get("provider_id") for p in output} for brag in byok_rag: if not brag.get("rag_id"): raise ValueError(f"BYOK RAG entry is missing required 'rag_id': {brag}") rag_id = brag["rag_id"] backend_name = f"byok_{rag_id}_storage" provider_id = f"byok_{rag_id}" + if provider_id in existing_provider_ids: + logger.debug("Skipping duplicate vector_io provider_id: %s", provider_id) + continue + existing_provider_ids.add(provider_id) output.append( { "provider_id": provider_id,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 310 - 327, The loop over byok_rag unconditionally appends provider entries and can create duplicate provider_id values (provider_id = f"byok_{rag_id}") causing ambiguous resolution; modify the loop in llama_stack_configuration.py (the byok_rag iteration where provider_id and backend_name are built) to first check existing provider_ids in the accumulating output (or providers.vector_io collection) and skip or update the entry if provider_id already exists (e.g., compute existing_ids = {p["provider_id"] for p in output} and only append when provider_id not in existing_ids), ensuring duplicates are not added.
♻️ Duplicate comments (3)
src/app/endpoints/query.py (1)
157-173:⚠️ Potential issue | 🟠 MajorRun moderation against the original user query, not the inline-injected prompt.
Because moderation consumes
responses_params.input, inline RAG content can cause false blocks unrelated to user intent.🛠️ Suggested direction
- turn_summary = await retrieve_response( + turn_summary = await retrieve_response( client, responses_params, query_request.shield_ids, vector_store_ids, rag_id_mapping, + moderation_input=query_request.query, )async def retrieve_response( @@ rag_id_mapping: Optional[dict[str, str]] = None, + moderation_input: Optional[str] = None, ) -> TurnSummary: @@ - moderation_result = await run_shield_moderation( - client, cast(str, responses_params.input), shield_ids - ) + moderation_payload = ( + moderation_input + if moderation_input is not None + else cast(str, responses_params.input) + ) + moderation_result = await run_shield_moderation( + client, moderation_payload, shield_ids + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 157 - 173, The moderation is being run on responses_params.input after you inject inline RAG content; instead run moderation against the original user query (query_request.query) before or separately from building/injecting inline_rag_context so inline_rag_context.context_text cannot affect moderation. Update the flow around build_rag_context and prepare_responses_params so that prepare_responses_params is given the original query (or a separate field like user_input) for moderation and only after moderation passes attach inline_rag_context.context_text to the request; adjust code paths that reference responses_params.input to use the original query value for moderation checks (referencing build_rag_context, prepare_responses_params, responses_params.input, and inline_rag_context.context_text).src/utils/vector_search.py (2)
125-127:⚠️ Potential issue | 🟡 MinorUse the correct metadata label for
chunk.source.
chunk.sourceis a source identifier, not a document ID. The current label can mislead the injected context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 125 - 127, The metadata label for chunk.source is incorrect; locate the code that appends to metadata_parts (the conditional using chunk.source) and replace the string "document_id: {chunk.source}" with a clear source label such as "source: {chunk.source}" (or "source_id: {chunk.source}") so the injected metadata correctly reflects that chunk.source is a source identifier rather than a document ID.
588-596:⚠️ Potential issue | 🟠 MajorHandle Solr
chunk_metadataas dict/object and propagate title/url metadata intoRAGChunk.attributes.The conversion still relies on attribute-only
chunk_metadata.document_id, and doesn’t carrytitle/reference_urlconsistently into chunk attributes. This is likely why Solr chunk metadata remains sparse.🛠️ Proposed fix
- # For Solr chunks, also extract from chunk_metadata - if hasattr(chunk, "chunk_metadata") and chunk.chunk_metadata: - if hasattr(chunk.chunk_metadata, "document_id"): - doc_id = chunk.chunk_metadata.document_id - attributes["document_id"] = doc_id - # Build URL if not already set - if "doc_url" not in attributes and offline and doc_id: - attributes["doc_url"] = urljoin(constants.MIMIR_DOC_URL, doc_id) + # Extract metadata from both dict-style and object-style payloads + doc_id, title, reference_url = _extract_solr_document_metadata(chunk) + if doc_id: + attributes["document_id"] = doc_id + if title: + attributes["title"] = title + if reference_url and "doc_url" not in attributes and not offline: + attributes["doc_url"] = reference_url + if "doc_url" not in attributes and offline and doc_id: + attributes["doc_url"] = urljoin(constants.MIMIR_DOC_URL, doc_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 588 - 596, The Solr chunk handling only reads chunk.chunk_metadata.document_id via attribute access and never copies title/reference_url into RAGChunk.attributes; update the conversion logic around chunk, chunk_metadata and attributes to accept chunk.chunk_metadata as either a dict or object (check both dict keys and object attributes), extract document_id, title and reference_url (or equivalent keys) into attributes["document_id"], attributes["title"], and attributes["reference_url"], and only build attributes["doc_url"] when missing using offline and document_id or fallback to reference_url via urljoin(constants.MIMIR_DOC_URL, doc_id) or the reference_url value; ensure you reference the same symbols chunk, chunk_metadata, attributes, offline, urljoin and constants.MIMIR_DOC_URL in your changes.
🧹 Nitpick comments (1)
src/llama_stack_configuration.py (1)
381-387: Completeenrich_solrdocstring to match repository docstring standard.Please include explicit Returns/Raises sections (even if Returns is
None) to keep new function docs consistent with the project convention.As per coding guidelines "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 381 - 387, Update the enrich_solr function docstring to follow the project's Google-style standard: after the existing short description and Args section for ls_config and solr_config, add a Returns section (explicitly state "None") and a Raises section documenting any exceptions the function may raise (e.g., ValueError or KeyError if you validate inputs, or RuntimeError for Solr connection errors) or state "None" if it does not raise; ensure parameter names match ls_config and solr_config and keep the docstring concise and consistent with other functions in the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/vector_search.py`:
- Around line 280-281: The logger.debug call that logs the full Solr chunk
object (logger.debug("Extract doc ids from chunk: %s", chunk)) risks leaking
sensitive content; update the logging in the code that handles chunk objects
(the place where the variable chunk is processed, e.g., the Extract doc ids
logic) to only emit a non-sensitive summary such as chunk id, score/rank, and
safe metadata keys (e.g., chunk.get("id"), chunk.get("score"), chunk.get("meta",
{}).get("source")) and explicitly omit or avoid fields like "text", "content",
"payload", or any raw document fields; replace the raw chunk log with a debug
message that formats only these allowed summary fields.
- Around line 347-351: The comprehension for vector_store_ids_to_query currently
includes every non-Solr store returned by get_vector_store_ids, which can pull
from unintended stores; update it to restrict to only the configured BYOK stores
by intersecting the returned IDs with the provided vector_store_ids parameter
and excluding constants.SOLR_DEFAULT_VECTOR_STORE_ID. Locate the code building
vector_store_ids_to_query (uses get_vector_store_ids and
constants.SOLR_DEFAULT_VECTOR_STORE_ID) and change the filter to: result_ids =
await get_vector_store_ids(client, vector_store_ids); then set
vector_store_ids_to_query to the intersection of result_ids and the original
vector_store_ids (minus SOLR_DEFAULT_VECTOR_STORE_ID) so only explicitly
configured BYOK stores are queried.
---
Outside diff comments:
In `@src/llama_stack_configuration.py`:
- Around line 310-327: The loop over byok_rag unconditionally appends provider
entries and can create duplicate provider_id values (provider_id =
f"byok_{rag_id}") causing ambiguous resolution; modify the loop in
llama_stack_configuration.py (the byok_rag iteration where provider_id and
backend_name are built) to first check existing provider_ids in the accumulating
output (or providers.vector_io collection) and skip or update the entry if
provider_id already exists (e.g., compute existing_ids = {p["provider_id"] for p
in output} and only append when provider_id not in existing_ids), ensuring
duplicates are not added.
---
Duplicate comments:
In `@src/app/endpoints/query.py`:
- Around line 157-173: The moderation is being run on responses_params.input
after you inject inline RAG content; instead run moderation against the original
user query (query_request.query) before or separately from building/injecting
inline_rag_context so inline_rag_context.context_text cannot affect moderation.
Update the flow around build_rag_context and prepare_responses_params so that
prepare_responses_params is given the original query (or a separate field like
user_input) for moderation and only after moderation passes attach
inline_rag_context.context_text to the request; adjust code paths that reference
responses_params.input to use the original query value for moderation checks
(referencing build_rag_context, prepare_responses_params,
responses_params.input, and inline_rag_context.context_text).
In `@src/utils/vector_search.py`:
- Around line 125-127: The metadata label for chunk.source is incorrect; locate
the code that appends to metadata_parts (the conditional using chunk.source) and
replace the string "document_id: {chunk.source}" with a clear source label such
as "source: {chunk.source}" (or "source_id: {chunk.source}") so the injected
metadata correctly reflects that chunk.source is a source identifier rather than
a document ID.
- Around line 588-596: The Solr chunk handling only reads
chunk.chunk_metadata.document_id via attribute access and never copies
title/reference_url into RAGChunk.attributes; update the conversion logic around
chunk, chunk_metadata and attributes to accept chunk.chunk_metadata as either a
dict or object (check both dict keys and object attributes), extract
document_id, title and reference_url (or equivalent keys) into
attributes["document_id"], attributes["title"], and attributes["reference_url"],
and only build attributes["doc_url"] when missing using offline and document_id
or fallback to reference_url via urljoin(constants.MIMIR_DOC_URL, doc_id) or the
reference_url value; ensure you reference the same symbols chunk,
chunk_metadata, attributes, offline, urljoin and constants.MIMIR_DOC_URL in your
changes.
---
Nitpick comments:
In `@src/llama_stack_configuration.py`:
- Around line 381-387: Update the enrich_solr function docstring to follow the
project's Google-style standard: after the existing short description and Args
section for ls_config and solr_config, add a Returns section (explicitly state
"None") and a Raises section documenting any exceptions the function may raise
(e.g., ValueError or KeyError if you validate inputs, or RuntimeError for Solr
connection errors) or state "None" if it does not raise; ensure parameter names
match ls_config and solr_config and keep the docstring concise and consistent
with other functions in the repo.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query.pysrc/llama_stack_configuration.pysrc/utils/vector_search.py
| vector_store_ids_to_query = [ | ||
| vs_id | ||
| for vs_id in await get_vector_store_ids(client, vector_store_ids) | ||
| if vs_id != constants.SOLR_DEFAULT_VECTOR_STORE_ID | ||
| ] |
There was a problem hiding this comment.
Limit inline BYOK retrieval to configured BYOK vector stores.
This currently queries every non-Solr store returned by Llama Stack, which can pull chunks from unintended stores and violate configured-source behavior.
🛠️ Proposed fix
- vector_store_ids_to_query = [
+ configured_byok_store_ids = set(configuration.rag_id_mapping.keys())
+ vector_store_ids_to_query = [
vs_id
for vs_id in await get_vector_store_ids(client, vector_store_ids)
- if vs_id != constants.SOLR_DEFAULT_VECTOR_STORE_ID
+ if vs_id != constants.SOLR_DEFAULT_VECTOR_STORE_ID
+ and vs_id in configured_byok_store_ids
]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/vector_search.py` around lines 347 - 351, The comprehension for
vector_store_ids_to_query currently includes every non-Solr store returned by
get_vector_store_ids, which can pull from unintended stores; update it to
restrict to only the configured BYOK stores by intersecting the returned IDs
with the provided vector_store_ids parameter and excluding
constants.SOLR_DEFAULT_VECTOR_STORE_ID. Locate the code building
vector_store_ids_to_query (uses get_vector_store_ids and
constants.SOLR_DEFAULT_VECTOR_STORE_ID) and change the filter to: result_ids =
await get_vector_store_ids(client, vector_store_ids); then set
vector_store_ids_to_query to the intersection of result_ids and the original
vector_store_ids (minus SOLR_DEFAULT_VECTOR_STORE_ID) so only explicitly
configured BYOK stores are queried.
TL;DR
Motivation
To prioritize BYOK content, a mechanism to tune chunk scoring per vector store is necessary. The alternative was to create a client-side tool to change the behavior of the RAG tool, but that is not optimal for two reasons:
Always RAG with LCORE-side chunk prioritization avoids both issues.
Description
Adds configurable RAG strategies and chunk prioritization. Control which documentation sources to search, how to search them, and which sources are more important.
How It Works
Always RAG retrieves chunks from configured sources (BYOK and/or Solr) and injects them into the query before sending to the LLM. The LLM always has documentation context without calling tools.
Tool RAG lets the AI call the
file_searchtool during generation. This is the original behavior, enabled by default for backward compatibility.Chunk Prioritization applies a
score_multiplierper BYOK source. All sources are queried in parallel, chunk scores are multiplied by their source's weight, then merged and sorted. Top N chunks are selected across all sources. Solr chunks are appended after BYOK chunks without cross-source ranking (TBD, needs discussion / spike).Configuration
In
lightspeed-stack.yaml:Chunk limits in
src/constants.py:Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Prerequisites: BYOK vector stores (FAISS) created with rag-content tool, OKP Solr instance, Llama Stack 0.4.3+, Lightspeed Stack Providers installed, OpenAI API key.
Configure chunk limits in
src/constants.py:Configure Lightspeed Stack - add BYOK RAG sources and RAG strategy to
lightspeed-stack.yaml:Run enrichment script - reads lightspeed-stack config and generates an enriched llama-stack
run.yamlwith BYOK vector stores and Solr provider registered:Install Lightspeed Stack Providers:
Start Llama Stack:
Start Lightspeed Stack:
Test query — send to
/v1/queryor streaming endpoint:Response:
{ "conversation_id": "...", "response": "Admission plugins in Red Hat OpenShift Container Platform are [...]", "rag_chunks": [ { "content": "...", "source": "openshift-docs-part1", "score": 1.0038, "attributes": { "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt", "title": "Admission plugins", "document_id": "file-7a70ef22c4a646f2a6f657c66961ba2c" } }, { "content": "...", "source": "openshift-docs-part2", "score": 0.926, "attributes": { "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt", "title": "Overview of dynamic plugins", "document_id": "file-b266f575a95a4da19d7ba058fd980f00" } }, { "content": "...", "source": "OKP Solr", "score": 63.996, "attributes": { "document_id": "/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index" } } ], "referenced_documents": [ { "doc_title": "Admission plugins", "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt", "source": "openshift-docs-part1" }, { "doc_title": "Overview of dynamic plugins", "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt", "source": "openshift-docs-part2" }, { "doc_title": null, "doc_url": "https://mimir.corp.redhat.com/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index", "source": "OKP Solr" } ], "truncated": false, "input_tokens": 3736, "output_tokens": 448 }Summary by CodeRabbit
New Features
Configuration
Documentation