Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 94 additions & 17 deletions docs/byok_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to
* [Step 2: Create Vector Database](#step-2-create-vector-database)
* [Step 3: Configure Embedding Model](#step-3-configure-embedding-model)
* [Step 4: Configure Llama Stack](#step-4-configure-llama-stack)
* [Step 5: Enable RAG Tools](#step-5-enable-rag-tools)
* [Step 5: Configure RAG Strategy](#step-5-configure-rag-strategy)
* [Supported Vector Database Types](#supported-vector-database-types)
* [Configuration Examples](#configuration-examples)
* [Conclusion](#conclusion)
Expand All @@ -34,27 +34,58 @@ BYOK (Bring Your Own Knowledge) is Lightspeed Core's implementation of Retrieval

## How BYOK Works

The BYOK system operates through a sophisticated chain of components:
BYOK knowledge sources can be queried in two complementary modes, configured independently:

1. **Agent Orchestrator**: The AI agent acts as the central coordinator, using the LLM as its reasoning engine
2. **RAG Tool**: When the agent needs external information, it queries your custom vector database
3. **Vector Database**: Your indexed knowledge sources, stored as vector embeddings for semantic search
4. **Embedding Model**: Converts queries and documents into vector representations for similarity matching
5. **Context Integration**: Retrieved knowledge is integrated into the AI's response generation process
### Inline RAG

Context is fetched from your BYOK vector stores and/or OKP and injected before the LLM request. No tool calls are required.

```mermaid
graph TD
A[User Query] --> B[Fetch Context]
B --> C[BYOK Vector Stores]
B --> D[OKP Vector Stores]
C --> E[Retrieved Chunks]
D --> E
E --> F[Inject Context into Prompt Context]
F --> G[LLM Generates Response]
G --> H[Response to User]
```

### Tool RAG (on-demand retrieval)

The LLM can call the `file_search` tool during generation when it decides external knowledge is needed. Both BYOK vector stores and OKP are supported in Tool RAG mode.

```mermaid
graph TD
A[User Query] --> B[AI Agent]
A[User Query] --> P{Inline RAG enabled?}
P -->|Yes| Q[Fetch Context]
Q --> R[BYOK / OKP Vector Stores]
R --> S[Inject Context into Prompt Context]
S --> B[LLM]
P -->|No| B
B --> C{Need External Knowledge?}
C -->|Yes| D[RAG Tool]
C -->|Yes| D[file_search Tool]
C -->|No| E[Generate Response]
D --> F[Vector Database]
D --> F[BYOK / OKP Vector Stores]
F --> G[Retrieve Relevant Context]
G --> H[Integrate Context]
H --> E
E --> I[Response to User]
G --> B
E --> H[Response to User]
```

Both modes rely on:
- **Vector Database**: Your indexed knowledge sources stored as vector embeddings
- **Embedding Model**: Converts queries and documents into vector representations for similarity matching

Inline RAG additionally supports:
- **Score Multiplier**: Optional weight applied per BYOK vector store when mixing multiple sources. Allows custom prioritization of content.

> [!NOTE]
> OKP and BYOK scores are not directly comparable (different scoring systems), so
> `score_multiplier` does not apply to OKP results. To control the amount of retrieved
> context, set the `BYOK_RAG_MAX_CHUNKS` and `OKP_RAG_MAX_CHUNKS` constants in `src/constants.py`
> (defaults: 10 and 5 respectively). For Tool RAG, use `TOOL_RAG_MAX_CHUNKS` (default: 10).

---

## Prerequisites
Expand Down Expand Up @@ -244,12 +275,58 @@ registered_resources:

**⚠️ Important**: The `vector_store_id` value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.

### Step 5: Enable RAG Tools
> [!TIP]
> Instead of manually editing `run.yaml`, you can declare your knowledge sources in the `byok_rag`
> section of `lightspeed-stack.yaml`. The lightspeed-stack service automatically generates the required configuration
> at startup.
>
> ```yaml
> byok_rag:
> - rag_id: my-docs # Unique identifier for this knowledge source
> rag_type: inline::faiss
> embedding_model: sentence-transformers/all-mpnet-base-v2
> embedding_dimension: 768
> vector_db_id: your-index-id # Llama Stack vector store ID (from index generation)
> db_path: /path/to/vector_db/faiss_store.db
> score_multiplier: 1.0 # Optional: weight results when mixing multiple sources
> ```
>
> When multiple BYOK sources are configured, `score_multiplier` adjusts the relative importance of
> each store's results during Inline RAG retrieval. Values above 1.0 boost a store; below 1.0 reduce it.

### Step 5: Configure RAG Strategy

Add a `rag` section to your `lightspeed-stack.yaml` to choose how BYOK knowledge is used.
Each list entry is a `rag_id` from `byok_rag`, or the special value `okp` for OKP.

```yaml
rag:
# Inline RAG: inject context before the LLM request (no tool calls needed)
inline:
- my-docs # rag_id from byok_rag
- okp # include OKP context inline

# Tool RAG: the LLM can call file_search to retrieve context on demand
# Omit to use all registered BYOK stores (backward compatibility)
tool:
- my-docs # expose this BYOK store as the file_search tool
- okp # expose OKP as the file_search tool

# OKP provider settings (only relevant when okp is listed above)
okp:
offline: true # true = use parent_id for source URLs, false = use reference_url
```

Both modes can be enabled simultaneously. Choose based on your latency and control preferences:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some documentation on what the intended behavior is when both modes are enabled (always RAG and tool RAG) but that can be in a separate PR.

Copy link
Contributor Author

@are-ces are-ces Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this would be a spike, we would need to test the joint behavior e.g. in what cases the tool is being called.

Copy link
Contributor Author

@are-ces are-ces Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I create a ticket for that @Anxhela21?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's open a separate ticket for this. I wonder if we want some evaluation outcome included in this as well.


The configuration above automatically enables the RAG tools. The system will:
| Mode | When context is fetched | Tool call needed | score_multiplier |
|------|------------------------|------------------|-----------------|
| Inline RAG | With every query | No | Yes (BYOK only) |
| Tool RAG | On LLM demand | Yes | No |

1. **Detect RAG availability**: Automatically identify when RAG is available
2. **Enhance prompts**: Encourage the AI to use RAG tools
> [!TIP]
> A ready-to-use example combining BYOK and OKP is available at
> [`examples/lightspeed-stack-byok-okp-rag.yaml`](../examples/lightspeed-stack-byok-okp-rag.yaml).

---

Expand Down
74 changes: 66 additions & 8 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,15 +110,32 @@ Microsoft Entra ID authentication attributes for Azure.

BYOK (Bring Your Own Knowledge) RAG configuration.

Each entry registers a local vector store. The `rag_id` is the
identifier used in `rag.inline` and `rag.tool` to select which stores to use.

Example:

```yaml
byok_rag:
- rag_id: my-docs # referenced in rag.inline / rag.tool
rag_type: inline::faiss
embedding_model: sentence-transformers/all-MiniLM-L6-v2
embedding_dimension: 384
vector_db_id: vs_abc123
db_path: /path/to/faiss_store.db
score_multiplier: 1.0
```


| Field | Type | Description |
|-------|------|-------------|
| rag_id | string | Unique RAG ID |
| rag_type | string | Type of RAG database. |
| rag_type | string | Type of RAG database (e.g. `inline::faiss`). |
| embedding_model | string | Embedding model identification |
| embedding_dimension | integer | Dimensionality of embedding vectors. |
| vector_db_id | string | Vector database identification. |
| db_path | string | Path to RAG database. |
| score_multiplier | number | Multiplier applied to relevance scores from this vector store when querying multiple sources. Values > 1 boost results; values < 1 reduce them. Default: 1.0. |


## CORSConfiguration
Expand Down Expand Up @@ -170,7 +187,7 @@ Global service configuration.
| azure_entra_id | | |
| splunk | | Splunk HEC configuration for sending telemetry events. |
| deployment_environment | string | Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events. |
| solr | | Configuration for Solr vector search operations. |
| rag | | RAG strategy configuration (OKP and BYOK). Controls pre-query (Inline RAG) and tool-based (Tool RAG) retrieval. |


## ConversationHistoryConfiguration
Expand Down Expand Up @@ -520,19 +537,60 @@ the service can handle requests concurrently.
| cors | | Cross-Origin Resource Sharing configuration for cross-domain requests |


## SolrConfiguration
## RagConfiguration


Top-level RAG strategy configuration. Controls two complementary retrieval modes:

- **Inline RAG**: context is fetched from the listed sources and injected before the
LLM request.
- **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve
context on demand from the listed vector stores. Supports both BYOK and OKP.

Each strategy is configured as a list of RAG IDs referencing entries in `byok_rag`.
The special ID `okp` activates the OKP provider (no `byok_rag` entry needed).

**Backward compatibility**: omitting `tool` uses all registered BYOK vector stores
(equivalent to the old `tool.byok.enabled = True`). Omitting `inline` means no
context is injected before the LLM request.

Example:

```yaml
rag:
inline:
- my-docs # inject context from my-docs before the LLM request
tool:
- okp # LLM can search OKP as a tool
- my-docs # LLM can also search my-docs as a tool

okp:
offline: true # use parent_id for OKP URL construction
```


| Field | Type | Description |
|-------|------|-------------|
| inline | list[string] | RAG IDs whose content is injected before the LLM request. Use `okp` for OKP. Empty by default (no inline RAG). |
| tool | list[string] or null | RAG IDs exposed as a `file_search` tool the LLM can invoke. Use `okp` to include OKP. When omitted, all registered BYOK vector stores are used (backward compatibility). |


## OkpConfiguration

Solr configuration for vector search queries.
OKP (Offline Knowledge Portal) provider settings. Only used when `okp` is listed in `rag.inline` or `rag.tool`.

Controls whether to use offline or online mode when building document URLs
from vector search results, and enables/disables Solr vector IO functionality.
Example:

```yaml
okp:
offline: true # use parent_id for OKP URL construction
chunk_filter_query: "is_chunk:true"
```

| Field | Type | Description |
|-------|------|-------------|
| enabled | boolean | When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing. |
| offline | boolean | When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs. |
| offline | boolean | When `true` (default), use `parent_id` for OKP chunk source URLs. When `false`, use `reference_url`. |
| chunk_filter_query | string | OKP filter query (`fq`) applied to every OKP search request. Defaults to `"is_chunk:true"`. Extend with `AND` for extra constraints. |


## SplunkConfiguration
Expand Down
98 changes: 67 additions & 31 deletions docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -5503,6 +5503,13 @@
"format": "file-path",
"title": "DB path",
"description": "Path to RAG database."
},
"score_multiplier": {
"type": "number",
"exclusiveMinimum": 0.0,
"title": "Score multiplier",
"description": "Multiplier applied to relevance scores from this vector store. Used to weight results when querying multiple knowledge sources. Values > 1 boost this store's results; values < 1 reduce them.",
"default": 1.0
}
},
"additionalProperties": false,
Expand Down Expand Up @@ -5714,17 +5721,15 @@
"description": "Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events.",
"default": "development"
},
"solr": {
"anyOf": [
{
"$ref": "#/components/schemas/SolrConfiguration"
},
{
"type": "null"
}
],
"title": "Solr configuration",
"description": "Configuration for Solr vector search operations."
"rag": {
"$ref": "#/components/schemas/RagConfiguration",
"title": "RAG configuration",
"description": "Configuration for all RAG strategies (inline and tool-based)."
},
"okp": {
"$ref": "#/components/schemas/OkpConfiguration",
"title": "OKP configuration",
"description": "OKP provider settings. Only used when 'okp' is listed in rag.inline or rag.tool."
}
},
"additionalProperties": false,
Expand Down Expand Up @@ -7575,6 +7580,26 @@
"title": "OAuthFlows",
"description": "Defines the configuration for the supported OAuth 2.0 flows."
},
"OkpConfiguration": {
"properties": {
"offline": {
"type": "boolean",
"title": "OKP offline mode",
"description": "When True, use parent_id for OKP chunk source URLs. When False, use reference_url for chunk source URLs.",
"default": true
},
"chunk_filter_query": {
"type": "string",
"title": "OKP chunk filter query",
"description": "OKP filter query applied to every OKP search request. Defaults to 'is_chunk:true' to restrict results to chunk documents. To add extra constraints, extend the expression using boolean syntax, e.g. 'is_chunk:true AND product:*openshift*'.",
"default": "is_chunk:true"
}
},
"additionalProperties": false,
"type": "object",
"title": "OkpConfiguration",
"description": "OKP (Offline Knowledge Portal) provider configuration.\n\nControls provider-specific behaviour for the OKP vector store.\nOnly relevant when ``\"okp\"`` is listed in ``rag.inline`` or ``rag.tool``."
},
"OpenIdConnectSecurityScheme": {
"properties": {
"description": {
Expand Down Expand Up @@ -8749,6 +8774,37 @@
"title": "RHIdentityConfiguration",
"description": "Red Hat Identity authentication configuration."
},
"RagConfiguration": {
"properties": {
"inline": {
"items": {
"type": "string"
},
"type": "array",
"title": "Inline RAG IDs",
"description": "RAG IDs whose sources are injected as context before the LLM call. Use 'okp' to enable OKP inline RAG. Empty by default (no inline RAG)."
},
"tool": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"title": "Tool RAG IDs",
"description": "RAG IDs made available to the LLM as a file_search tool. Use 'okp' to include the OKP vector store. When omitted, all registered BYOK vector stores are used (backward compatibility)."
}
},
"additionalProperties": false,
"type": "object",
"title": "RagConfiguration",
"description": "RAG strategy configuration.\n\nControls which RAG sources are used for inline and tool-based retrieval.\n\nEach strategy lists RAG IDs to include. The special ID ``\"okp\"`` defined in constants,\nactivates the OKP provider; all other IDs refer to entries in ``byok_rag``.\n\nBackward compatibility:\n - ``inline`` defaults to ``[]`` (no inline RAG).\n - ``tool`` defaults to ``None`` which means all registered vector stores\n are used (identical to the previous ``tool.byok.enabled = True`` default)."
},
"ReadinessResponse": {
"properties": {
"ready": {
Expand Down Expand Up @@ -9260,26 +9316,6 @@
}
]
},
"SolrConfiguration": {
"properties": {
"enabled": {
"type": "boolean",
"title": "Solr enabled",
"description": "When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing.",
"default": false
},
"offline": {
"type": "boolean",
"title": "Offline mode",
"description": "When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs.",
"default": true
}
},
"additionalProperties": false,
"type": "object",
"title": "SolrConfiguration",
"description": "Solr configuration for vector search queries.\n\nControls whether to use offline or online mode when building document URLs\nfrom vector search results, and enables/disables Solr vector IO functionality."
},
"SplunkConfiguration": {
"properties": {
"enabled": {
Expand Down
Loading
Loading