lightspeed-core · are-ces · Mar 5, 2026 · Feb 23, 2026 · Mar 2, 2026 · Mar 2, 2026
diff --git a/docs/byok_guide.md b/docs/byok_guide.md
@@ -16,7 +16,7 @@ The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to
   * [Step 2: Create Vector Database](#step-2-create-vector-database)
   * [Step 3: Configure Embedding Model](#step-3-configure-embedding-model)
   * [Step 4: Configure Llama Stack](#step-4-configure-llama-stack)
-  * [Step 5: Enable RAG Tools](#step-5-enable-rag-tools)
+  * [Step 5: Configure RAG Strategy](#step-5-configure-rag-strategy)
 * [Supported Vector Database Types](#supported-vector-database-types)
 * [Configuration Examples](#configuration-examples)
 * [Conclusion](#conclusion)
@@ -34,27 +34,58 @@ BYOK (Bring Your Own Knowledge) is Lightspeed Core's implementation of Retrieval
 
 ## How BYOK Works
 
-The BYOK system operates through a sophisticated chain of components:
+BYOK knowledge sources can be queried in two complementary modes, configured independently:
 
-1. **Agent Orchestrator**: The AI agent acts as the central coordinator, using the LLM as its reasoning engine
-2. **RAG Tool**: When the agent needs external information, it queries your custom vector database
-3. **Vector Database**: Your indexed knowledge sources, stored as vector embeddings for semantic search
-4. **Embedding Model**: Converts queries and documents into vector representations for similarity matching
-5. **Context Integration**: Retrieved knowledge is integrated into the AI's response generation process
+### Inline RAG
+
+Context is fetched from your BYOK vector stores and/or OKP and injected before the LLM request. No tool calls are required.
+
+```mermaid
+graph TD
+    A[User Query] --> B[Fetch Context]
+    B --> C[BYOK Vector Stores]
+    B --> D[OKP Vector Stores]
+    C --> E[Retrieved Chunks]
+    D --> E
+    E --> F[Inject Context into Prompt Context]
+    F --> G[LLM Generates Response]
+    G --> H[Response to User]
+```
+
+### Tool RAG (on-demand retrieval)
+
+The LLM can call the `file_search` tool during generation when it decides external knowledge is needed. Both BYOK vector stores and OKP are supported in Tool RAG mode.
 
 ```mermaid
 graph TD
-    A[User Query] --> B[AI Agent]
+    A[User Query] --> P{Inline RAG enabled?}
+    P -->|Yes| Q[Fetch Context]
+    Q --> R[BYOK / OKP Vector Stores]
+    R --> S[Inject Context into Prompt Context]
+    S --> B[LLM]
+    P -->|No| B
     B --> C{Need External Knowledge?}
-    C -->|Yes| D[RAG Tool]
+    C -->|Yes| D[file_search Tool]
     C -->|No| E[Generate Response]
-    D --> F[Vector Database]
+    D --> F[BYOK / OKP Vector Stores]
     F --> G[Retrieve Relevant Context]
-    G --> H[Integrate Context]
-    H --> E
-    E --> I[Response to User]
+    G --> B
+    E --> H[Response to User]
 ```
 
+Both modes rely on:
+- **Vector Database**: Your indexed knowledge sources stored as vector embeddings
+- **Embedding Model**: Converts queries and documents into vector representations for similarity matching
+
+Inline RAG additionally supports:
+- **Score Multiplier**: Optional weight applied per BYOK vector store when mixing multiple sources. Allows custom prioritization of content.
+
+> [!NOTE]
+> OKP and BYOK scores are not directly comparable (different scoring systems), so
+> `score_multiplier` does not apply to OKP results. To control the amount of retrieved
+> context, set the `BYOK_RAG_MAX_CHUNKS` and `OKP_RAG_MAX_CHUNKS` constants in `src/constants.py`
+> (defaults: 10 and 5 respectively). For Tool RAG, use `TOOL_RAG_MAX_CHUNKS` (default: 10).
+
 ---
 
 ## Prerequisites
@@ -244,12 +275,58 @@ registered_resources:
 
 **⚠️ Important**: The `vector_store_id` value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.
 
-### Step 5: Enable RAG Tools
+> [!TIP]
+> Instead of manually editing `run.yaml`, you can declare your knowledge sources in the `byok_rag`
+> section of `lightspeed-stack.yaml`. The lightspeed-stack service automatically generates the required configuration
+> at startup.
+>
+> ```yaml
+> byok_rag:
+>   - rag_id: my-docs           # Unique identifier for this knowledge source
+>     rag_type: inline::faiss
+>     embedding_model: sentence-transformers/all-mpnet-base-v2
+>     embedding_dimension: 768
+>     vector_db_id: your-index-id  # Llama Stack vector store ID (from index generation)
+>     db_path: /path/to/vector_db/faiss_store.db
+>     score_multiplier: 1.0       # Optional: weight results when mixing multiple sources
+> ```
+>
+> When multiple BYOK sources are configured, `score_multiplier` adjusts the relative importance of
+> each store's results during Inline RAG retrieval. Values above 1.0 boost a store; below 1.0 reduce it.
+
+### Step 5: Configure RAG Strategy
+
+Add a `rag` section to your `lightspeed-stack.yaml` to choose how BYOK knowledge is used.
+Each list entry is a `rag_id` from `byok_rag`, or the special value `okp` for OKP.
+
+```yaml
+rag:
+  # Inline RAG: inject context before the LLM request (no tool calls needed)
+  inline:
+    - my-docs         # rag_id from byok_rag
+    - okp             # include OKP context inline
+
+  # Tool RAG: the LLM can call file_search to retrieve context on demand
+  # Omit to use all registered BYOK stores (backward compatibility)
+  tool:
+    - my-docs         # expose this BYOK store as the file_search tool
+    - okp             # expose OKP as the file_search tool
+
+# OKP provider settings (only relevant when okp is listed above)
+okp:
+  offline: true       # true = use parent_id for source URLs, false = use reference_url
+```
+
+Both modes can be enabled simultaneously. Choose based on your latency and control preferences:
 
-The configuration above automatically enables the RAG tools. The system will:
+| Mode | When context is fetched | Tool call needed | score_multiplier |
+|------|------------------------|------------------|-----------------|
+| Inline RAG | With every query | No | Yes (BYOK only) |
+| Tool RAG | On LLM demand | Yes | No |
 
-1. **Detect RAG availability**: Automatically identify when RAG is available
-2. **Enhance prompts**: Encourage the AI to use RAG tools
+> [!TIP]
+> A ready-to-use example combining BYOK and OKP is available at
+> [`examples/lightspeed-stack-byok-okp-rag.yaml`](../examples/lightspeed-stack-byok-okp-rag.yaml).
 
 ---
 

diff --git a/docs/config.md b/docs/config.md
@@ -110,15 +110,32 @@ Microsoft Entra ID authentication attributes for Azure.
 
 BYOK (Bring Your Own Knowledge) RAG configuration.
 
+Each entry registers a local vector store. The `rag_id` is the
+identifier used in `rag.inline` and `rag.tool` to select which stores to use.
+
+Example:
+
+```yaml
+byok_rag:
+  - rag_id: my-docs          # referenced in rag.inline / rag.tool
+    rag_type: inline::faiss
+    embedding_model: sentence-transformers/all-MiniLM-L6-v2
+    embedding_dimension: 384
+    vector_db_id: vs_abc123
+    db_path: /path/to/faiss_store.db
+    score_multiplier: 1.0
+```
+
 
 | Field | Type | Description |
 |-------|------|-------------|
 | rag_id | string | Unique RAG ID |
-| rag_type | string | Type of RAG database. |
+| rag_type | string | Type of RAG database (e.g. `inline::faiss`). |
 | embedding_model | string | Embedding model identification |
 | embedding_dimension | integer | Dimensionality of embedding vectors. |
 | vector_db_id | string | Vector database identification. |
 | db_path | string | Path to RAG database. |
+| score_multiplier | number | Multiplier applied to relevance scores from this vector store when querying multiple sources. Values > 1 boost results; values < 1 reduce them. Default: 1.0. |
 
 
 ## CORSConfiguration
@@ -170,7 +187,7 @@ Global service configuration.
 | azure_entra_id |  |  |
 | splunk |  | Splunk HEC configuration for sending telemetry events. |
 | deployment_environment | string | Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events. |
-| solr |  | Configuration for Solr vector search operations. |
+| rag |  | RAG strategy configuration (OKP and BYOK). Controls pre-query (Inline RAG) and tool-based (Tool RAG) retrieval. |
 
 
 ## ConversationHistoryConfiguration
@@ -520,19 +537,60 @@ the service can handle requests concurrently.
 | cors |  | Cross-Origin Resource Sharing configuration for cross-domain requests |
 
 
-## SolrConfiguration
+## RagConfiguration
+
+
+Top-level RAG strategy configuration. Controls two complementary retrieval modes:
+
+- **Inline RAG**: context is fetched from the listed sources and injected before the
+  LLM request.
+- **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve
+  context on demand from the listed vector stores. Supports both BYOK and OKP.
+
+Each strategy is configured as a list of RAG IDs referencing entries in `byok_rag`.
+The special ID `okp` activates the OKP provider (no `byok_rag` entry needed).
+
+**Backward compatibility**: omitting `tool` uses all registered BYOK vector stores
+(equivalent to the old `tool.byok.enabled = True`). Omitting `inline` means no
+context is injected before the LLM request.
+
+Example:
+
+```yaml
+rag:
+  inline:
+    - my-docs       # inject context from my-docs before the LLM request
+  tool:
+    - okp       # LLM can search OKP as a tool
+    - my-docs       # LLM can also search my-docs as a tool
+
+okp:
+  offline: true     # use parent_id for OKP URL construction
+```
+
+
+| Field | Type | Description |
+|-------|------|-------------|
+| inline | list[string] | RAG IDs whose content is injected before the LLM request. Use `okp` for OKP. Empty by default (no inline RAG). |
+| tool | list[string] or null | RAG IDs exposed as a `file_search` tool the LLM can invoke. Use `okp` to include OKP. When omitted, all registered BYOK vector stores are used (backward compatibility). |
+
 
+## OkpConfiguration
 
-Solr configuration for vector search queries.
+OKP (Offline Knowledge Portal) provider settings. Only used when `okp` is listed in `rag.inline` or `rag.tool`.
 
-Controls whether to use offline or online mode when building document URLs
-from vector search results, and enables/disables Solr vector IO functionality.
+Example:
 
+```yaml
+okp:
+  offline: true                    # use parent_id for OKP URL construction
+  chunk_filter_query: "is_chunk:true"
+```
 
 | Field | Type | Description |
 |-------|------|-------------|
-| enabled | boolean | When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing. |
-| offline | boolean | When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs. |
+| offline | boolean | When `true` (default), use `parent_id` for OKP chunk source URLs. When `false`, use `reference_url`. |
+| chunk_filter_query | string | OKP filter query (`fq`) applied to every OKP search request. Defaults to `"is_chunk:true"`. Extend with `AND` for extra constraints. |
 
 
 ## SplunkConfiguration

diff --git a/docs/openapi.json b/docs/openapi.json
@@ -5503,6 +5503,13 @@
                         "format": "file-path",
                         "title": "DB path",
                         "description": "Path to RAG database."
+                    },
+                    "score_multiplier": {
+                        "type": "number",
+                        "exclusiveMinimum": 0.0,
+                        "title": "Score multiplier",
+                        "description": "Multiplier applied to relevance scores from this vector store. Used to weight results when querying multiple knowledge sources. Values > 1 boost this store's results; values < 1 reduce them.",
+                        "default": 1.0
                     }
                 },
                 "additionalProperties": false,
@@ -5714,17 +5721,15 @@
                         "description": "Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events.",
                         "default": "development"
                     },
-                    "solr": {
-                        "anyOf": [
-                            {
-                                "$ref": "#/components/schemas/SolrConfiguration"
-                            },
-                            {
-                                "type": "null"
-                            }
-                        ],
-                        "title": "Solr configuration",
-                        "description": "Configuration for Solr vector search operations."
+                    "rag": {
+                        "$ref": "#/components/schemas/RagConfiguration",
+                        "title": "RAG configuration",
+                        "description": "Configuration for all RAG strategies (inline and tool-based)."
+                    },
+                    "okp": {
+                        "$ref": "#/components/schemas/OkpConfiguration",
+                        "title": "OKP configuration",
+                        "description": "OKP provider settings. Only used when 'okp' is listed in rag.inline or rag.tool."
                     }
                 },
                 "additionalProperties": false,
@@ -7575,6 +7580,26 @@
                 "title": "OAuthFlows",
                 "description": "Defines the configuration for the supported OAuth 2.0 flows."
             },
+            "OkpConfiguration": {
+                "properties": {
+                    "offline": {
+                        "type": "boolean",
+                        "title": "OKP offline mode",
+                        "description": "When True, use parent_id for OKP chunk source URLs. When False, use reference_url for chunk source URLs.",
+                        "default": true
+                    },
+                    "chunk_filter_query": {
+                        "type": "string",
+                        "title": "OKP chunk filter query",
+                        "description": "OKP filter query applied to every OKP search request. Defaults to 'is_chunk:true' to restrict results to chunk documents. To add extra constraints, extend the expression using boolean syntax, e.g. 'is_chunk:true AND product:*openshift*'.",
+                        "default": "is_chunk:true"
+                    }
+                },
+                "additionalProperties": false,
+                "type": "object",
+                "title": "OkpConfiguration",
+                "description": "OKP (Offline Knowledge Portal) provider configuration.\n\nControls provider-specific behaviour for the OKP vector store.\nOnly relevant when ``\"okp\"`` is listed in ``rag.inline`` or ``rag.tool``."
+            },
             "OpenIdConnectSecurityScheme": {
                 "properties": {
                     "description": {
@@ -8749,6 +8774,37 @@
                 "title": "RHIdentityConfiguration",
                 "description": "Red Hat Identity authentication configuration."
             },
+            "RagConfiguration": {
+                "properties": {
+                    "inline": {
+                        "items": {
+                            "type": "string"
+                        },
+                        "type": "array",
+                        "title": "Inline RAG IDs",
+                        "description": "RAG IDs whose sources are injected as context before the LLM call. Use 'okp' to enable OKP inline RAG. Empty by default (no inline RAG)."
+                    },
+                    "tool": {
+                        "anyOf": [
+                            {
+                                "items": {
+                                    "type": "string"
+                                },
+                                "type": "array"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Tool RAG IDs",
+                        "description": "RAG IDs made available to the LLM as a file_search tool. Use 'okp' to include the OKP vector store. When omitted, all registered BYOK vector stores are used (backward compatibility)."
+                    }
+                },
+                "additionalProperties": false,
+                "type": "object",
+                "title": "RagConfiguration",
+                "description": "RAG strategy configuration.\n\nControls which RAG sources are used for inline and tool-based retrieval.\n\nEach strategy lists RAG IDs to include. The special ID ``\"okp\"`` defined in constants,\nactivates the OKP provider; all other IDs refer to entries in ``byok_rag``.\n\nBackward compatibility:\n    - ``inline`` defaults to ``[]`` (no inline RAG).\n    - ``tool`` defaults to ``None`` which means all registered vector stores\n      are used (identical to the previous ``tool.byok.enabled = True`` default)."
+            },
             "ReadinessResponse": {
                 "properties": {
                     "ready": {
@@ -9260,26 +9316,6 @@
                     }
                 ]
             },
-            "SolrConfiguration": {
-                "properties": {
-                    "enabled": {
-                        "type": "boolean",
-                        "title": "Solr enabled",
-                        "description": "When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing.",
-                        "default": false
-                    },
-                    "offline": {
-                        "type": "boolean",
-                        "title": "Offline mode",
-                        "description": "When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs.",
-                        "default": true
-                    }
-                },
-                "additionalProperties": false,
-                "type": "object",
-                "title": "SolrConfiguration",
-                "description": "Solr configuration for vector search queries.\n\nControls whether to use offline or online mode when building document URLs\nfrom vector search results, and enables/disables Solr vector IO functionality."
-            },
             "SplunkConfiguration": {
                 "properties": {
                     "enabled": {