Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
37bae1c
feat(llm-router): Initial LLMRouter extension
bsbodden Feb 16, 2026
f04dea3
test(llm-router): Add unit tests for schema validation
bsbodden Feb 16, 2026
500a3e1
fix(llm-router): fix Pydantic initialization and test assertions
bsbodden Feb 16, 2026
2705803
test(llm-router): simplify test assertions for semantic matching
bsbodden Feb 16, 2026
6a4dc2e
docs(llm-router): add comprehensive DESIGN.md
bsbodden Feb 16, 2026
b8a4d1f
feat(llm-router): add pretrained default config with pre-computed emb…
bsbodden Feb 16, 2026
c5da07e
feat(llm-router): add AsyncLLMRouter and update exports
bsbodden Feb 16, 2026
f0a1aa6
test(llm-router): add async and pretrained integration tests
bsbodden Feb 16, 2026
f596cc8
docs(llm-router): add user guide notebook and update DESIGN.md
bsbodden Feb 16, 2026
b91a938
fix(llm-router): prioritize redis_client over default redis_url in cl…
bsbodden Feb 16, 2026
edbc38b
fix(llm-router): add threshold validation and empty-tiers guard
bsbodden Feb 17, 2026
d8b4eb9
fix(llm-router): address PR review comments
bsbodden Feb 25, 2026
227ce57
refactor(llm-router): consolidate into SemanticRouter with backward c…
bsbodden Mar 2, 2026
49b9ed9
fix(llm-router): address Copilot review comments
bsbodden Mar 2, 2026
4b130c5
fix(router): clean up route_config keys in delete() method
bsbodden Mar 2, 2026
c4db563
fix(router): resolve mypy and Pydantic validation errors
bsbodden Mar 2, 2026
bd827a3
fix(nltk): add retry logic for NLTK download race condition
bsbodden Mar 2, 2026
4fe9e90
style: apply black formatting
bsbodden Mar 2, 2026
56e2013
fix(nltk): improve retry logic to handle corrupted downloads
bsbodden Mar 2, 2026
ec0698b
fix(docs): correct schema import paths in LLM router notebook
bsbodden Mar 2, 2026
00031cc
fix(review): address Copilot code quality issues
bsbodden Mar 3, 2026
c3a7d19
fix(types): filter None values in alternatives list comprehension
bsbodden Mar 3, 2026
8b54c53
chore: trigger Copilot review
bsbodden Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,592 changes: 1,592 additions & 0 deletions docs/user_guide/13_llm_router.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still very much in the old model.

Large diffs are not rendered by default.

362 changes: 362 additions & 0 deletions redisvl/extensions/llm_router/DESIGN.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file shouldn't be committed

Original file line number Diff line number Diff line change
@@ -0,0 +1,362 @@
# LLM Router Extension - Design Document

## Overview

The LLM Router is an extension to RedisVL that provides intelligent, cost-optimized LLM model selection using semantic routing. Instead of routing queries to topics (like SemanticRouter), it routes queries to **model tiers** - selecting the cheapest LLM capable of handling each task.

## Problem Statement

### The LLM Cost Problem
Modern applications often default to using the most capable (and expensive) LLM for all queries, even when simpler models would suffice:
- "Hello, how are you?" -> Claude Opus 4.5 ($5/M tokens)
- "Hello, how are you?" -> GPT-4.1 Nano ($0.10/M tokens)

### Existing Solutions and Their Limitations

**RouteLLM** (CMU/LMSys):
- Binary classification only (strong vs weak model)
- No support for >2 tiers
- Requires training data or preference matrices

**NVIDIA LLM Router Blueprint**:
- Complexity classification approach (simple/moderate/complex)
- Provides the taxonomy basis but no open-source Redis-native implementation

**RouterArena / Bloom's Taxonomy Approach**:
- Maps query complexity to Bloom's cognitive levels
- Informs our tier design but lacks production routing infrastructure

**OpenRouter Auto-Router**:
- Black box routing decisions
- Data flows through third-party servers
- No transparency into why a model was selected
- Can't self-host or customize

**NotDiamond**:
- Proprietary ML model for routing
- Requires API calls for every routing decision
- No local/offline capability

**FrugalGPT**:
- Sequential cascade approach (try cheap first, escalate)
- Higher latency due to serial model calls

## Solution: Semantic Model Tier Routing

Repurpose RedisVL's battle-tested SemanticRouter for model selection:

```
SemanticRouter -> LLMRouter
-----------------------------------------
Route -> ModelTier
route.name -> tier.name (simple/standard/expert)
route.references -> tier.references (task complexity examples)
route.metadata -> tier.metadata (cost, capabilities)
RouteMatch -> LLMRouteMatch (includes model string)
```

### Architecture

```
+---------------------------------------------------------------+
| LLMRouter |
+---------------------------------------------------------------+
| +-------------+ +-------------+ +-------------+ |
| | Simple | | Standard | | Expert | |
| | Tier | | Tier | | Tier | |
| +-------------+ +-------------+ +-------------+ |
| | gpt-4.1-nano| | sonnet 4.5 | | opus 4.5 | |
| | $0.10/M | | $3/M | | $5/M | |
| | threshold: | | threshold: | | threshold: | |
| | 0.5 | | 0.6 | | 0.7 | |
| +-------------+ +-------------+ +-------------+ |
| | | | |
| +----------------+----------------+ |
| v |
| +------------------------+ |
| | Redis Vector Index | |
| | (reference phrases) | |
| +------------------------+ |
+---------------------------------------------------------------+
|
v
+-------------+
| Query |
| "analyze |
| this..." |
+-------------+
|
v
+-------------+
| LiteLLM |
| (optional) |
+-------------+
```

## Key Design Decisions

### 1. Model Tiers, Not Individual Models

Routes map to **tiers** (simple, standard, expert) rather than specific models. This provides:
- Abstraction from model churn (swap haiku -> gemini-flash without changing routes)
- Clear mental model for users
- Easy cost optimization within tiers

### 2. Bloom's Taxonomy-Grounded Tiers

The default pretrained config maps tiers to Bloom's Taxonomy cognitive levels:
- **Simple** (Remember/Understand): Factual recall, greetings, format conversion
- **Standard** (Apply/Analyze): Code explanation, summarization, moderate analysis
- **Expert** (Evaluate/Create): Research, architecture, formal reasoning

This is informed by RouterArena's finding that cognitive complexity correlates with model capability requirements.

### 3. LiteLLM-Compatible Model Strings

Tier model identifiers use LiteLLM format (`provider/model`):
```python
ModelTier(
name="standard",
model="anthropic/claude-sonnet-4-5", # Works directly with LiteLLM
...
)
```

### 4. Per-Tier Distance Thresholds

Each tier has its own `distance_threshold`, allowing fine-grained control:
```python
simple_tier = ModelTier(..., distance_threshold=0.5) # Strict match
expert_tier = ModelTier(..., distance_threshold=0.7) # Looser match
```

### 5. Cost-Aware Routing

When `cost_optimization=True`, the router adds a cost penalty to distances:
```python
adjusted_distance = distance + (cost_per_1k * cost_weight)
```
This prefers cheaper tiers when semantic distances are close.

### 6. Pretrained Configs with Embedded Vectors

The built-in `default.json` provides a ready-to-use 3-tier configuration:
```python
# Instant setup - no embedding model needed at load time
router = LLMRouter.from_pretrained("default", redis_client=client)
```

The pretrained config includes pre-computed embeddings from
`sentence-transformers/all-mpnet-base-v2`, with 18 reference phrases per tier
covering the Bloom's Taxonomy spectrum.

Custom configs can also be exported and shared:
```python
# Export (one-time, with embedding model)
router.export_with_embeddings("my_router.json")

# Import (no embedding needed)
router = LLMRouter.from_pretrained("my_router.json", redis_client=client)
```

### 7. Async Support

`AsyncLLMRouter` provides the same functionality using async I/O. Since
`__init__` cannot be async, it uses a `create()` classmethod factory:

```python
router = await AsyncLLMRouter.create(
name="my-router",
tiers=tiers,
redis_client=async_client,
)
match = await router.route("hello")
```

Key async method mapping:

| Sync (`LLMRouter`) | Async (`AsyncLLMRouter`) |
|---------------------|--------------------------|
| `__init__()` | `await create()` |
| `from_existing()` | `await from_existing()` |
| `route()` | `await route()` |
| `route_many()` | `await route_many()` |
| `add_tier()` | `await add_tier()` |
| `remove_tier()` | `await remove_tier()` |
| `from_dict()` | `await from_dict()` |
| `from_pretrained()` | `await from_pretrained()` |
| `delete()` | `await delete()` |

## Module Structure

```
redisvl/extensions/llm_router/
+-- __init__.py # Public exports (LLMRouter, AsyncLLMRouter, schemas)
+-- DESIGN.md # This document
+-- schema.py # Pydantic models
| +-- ModelTier # Tier definition
| +-- LLMRouteMatch # Routing result
| +-- RoutingConfig # Router configuration
| +-- Pretrained* # Export/import schemas
+-- router.py # LLMRouter + AsyncLLMRouter implementations
+-- pretrained/
+-- __init__.py # Pretrained loader (get_pretrained_path)
+-- default.json # Standard 3-tier config (simple/standard/expert)
```

## API Examples

### Basic Usage

```python
from redisvl.extensions.llm_router import LLMRouter, ModelTier

tiers = [
ModelTier(
name="simple",
model="openai/gpt-4.1-nano",
references=[
"hello", "hi there", "thanks", "goodbye",
"what time is it?", "how are you?",
],
metadata={"cost_per_1k_input": 0.0001},
distance_threshold=0.5,
),
ModelTier(
name="standard",
model="anthropic/claude-sonnet-4-5",
references=[
"analyze this code for bugs",
"explain how neural networks learn",
"compare and contrast these approaches",
],
metadata={"cost_per_1k_input": 0.003},
distance_threshold=0.6,
),
ModelTier(
name="expert",
model="anthropic/claude-opus-4-5",
references=[
"prove this mathematical theorem",
"architect a distributed system",
"write a research paper analyzing",
],
metadata={"cost_per_1k_input": 0.005},
distance_threshold=0.7,
),
]

router = LLMRouter(
name="my-llm-router",
tiers=tiers,
redis_url="redis://localhost:6379",
)

# Route a query
match = router.route("hello, how's it going?")
print(match.tier) # "simple"
print(match.model) # "openai/gpt-4.1-nano"

# Use with LiteLLM (optional integration)
from litellm import completion
response = completion(model=match.model, messages=[{"role": "user", "content": query}])
```

### Cost-Optimized Routing

```python
router = LLMRouter(
name="cost-aware-router",
tiers=tiers,
cost_optimization=True, # Prefer cheaper tiers when distances are close
redis_url="redis://localhost:6379",
)
```

### Pretrained Router

```python
# Load without needing an embedding model for the references
router = LLMRouter.from_pretrained(
"default", # Built-in config, or path to JSON
redis_client=client,
)
```

### Async Usage

```python
from redisvl.extensions.llm_router import AsyncLLMRouter

router = await AsyncLLMRouter.create(
name="my-async-router",
tiers=tiers,
redis_url="redis://localhost:6379",
)

match = await router.route("explain how garbage collection works")
print(match.model) # "anthropic/claude-sonnet-4-5"

# Or load from pretrained
router = await AsyncLLMRouter.from_pretrained("default", redis_client=client)

await router.delete()
```

## Comparison with SemanticRouter

| Feature | SemanticRouter | LLMRouter |
|---------|---------------|-----------|
| Purpose | Topic classification | Model selection |
| Output | Route name | Model string + metadata |
| Cost awareness | No | Yes |
| Pretrained configs | No | Yes |
| Per-route thresholds | Yes | Yes |
| LiteLLM integration | No | Yes (model strings) |
| Async support | No | Yes (`AsyncLLMRouter`) |

## Testing

```bash
uv run pytest tests/unit/test_llm_router_schema.py -v
uv run pytest tests/integration/test_llm_router.py -v
uv run pytest tests/integration/test_async_llm_router.py -v
```

## Future Enhancements

### 1. `complete()` Method
Direct LiteLLM integration for one-liner usage:
```python
response = router.complete("analyze this code", messages=[...])
```

### 2. Capability Filtering
Filter tiers by capability before routing:
```python
match = router.route("generate an image", capabilities=["vision"])
```

### 3. Budget Constraints
Enforce cost limits:
```python
router = LLMRouter(..., max_cost_per_1k=0.01) # Never select opus
```

### 4. Fallback Chains
Define fallback order when primary tier unavailable:
```python
tier = ModelTier(..., fallback=["standard", "simple"])
```

## References

- [RedisVL SemanticRouter](https://docs.redisvl.com/en/latest/user_guide/semantic_router.html)
- [LiteLLM Model List](https://docs.litellm.ai/docs/providers)
- [RouteLLM](https://github.com/lm-sys/RouteLLM) - LMSys binary router framework
- [NVIDIA LLM Router Blueprint](https://build.nvidia.com/blueprints/llm-router) - Complexity-based routing
- [RouterArena / Bloom's Taxonomy](https://arxiv.org/abs/2412.06644) - Cognitive complexity for routing
- [FrugalGPT](https://arxiv.org/abs/2305.05176) - Cost-efficient LLM strategies
- [OpenRouter](https://openrouter.ai/) - Auto-routing concept
- [NotDiamond](https://notdiamond.ai/) - ML-based model routing
- [Unify.ai](https://unify.ai/) - Quality-cost tradeoff routing
Loading