RelayFreeLLM

One endpoint. More free AI than any single provider. Less rate limit headaches.

Don't want to pay $~$$$/month to use AI Models? RelayFreeLLM can help. It is an open-source gateway that combines free tier model providers like Gemini, Groq, Mistral, Cerebras, and Ollama into a single OpenAI-compatible API—so you get aggregately more free inference with automatic failover.

# Your existing code works. Just change the URL.
client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake")

No code changes. No retry logic. No 429 errors breaking your app.

See It In Action

Why You Need This

The Free Tier Problem

Free AI APIs are powerful—but using them directly is painful:

❌ Groq hits rate limit → Your app crashes
❌ Gemini quota exhausted → User sees error
❌ Switching providers → Rewrite your integration
❌ Testing 5 providers → 5 different SDKs to manage

The RelayFreeLLM Solution

✅ Gemini fails → Automatically tries Groq
✅ One provider down → Traffic routes to others
✅ Same API for everyone → OpenAI-compatible
✅ More providers = More throughput

You get a meta-model: a single endpoint that routes to the best available free provider, handles failures automatically, and keeps your app running.

What You Get

Feature	Why It Matters
OpenAI-compatible	Drop-in for your existing code. LangChain, LlamaIndex, any SDK.
Any free providers	Gemini, Groq, Mistral, Cerebras, Ollama, etc.
Automatic failover	Provider down? One model hit limits? We try the next one, round-robin or random or by your preferences. Zero downtime.
Circuit breakers	Bad provider? Quarantined automatically.
Rate limit management	Built-in quota tracking. No external dependencies.
Real-time streaming	SSE streaming from every provider.
Local models	Mix cloud free tiers with your local Ollama instance.

Who It's For

User	Use Case
Independent developers	Ship AI features without a $$$/month API bill
Students & hobbyists	GPT-level AI, no need credit card or phone number
Self-hosters	Combine Ollama privacy with cloud capacity
Researchers	Batch queries across providers for higher throughput

Quick Start

1. Install

git clone https://github.com/msmarkgu/RelayFreeLLM.git
cd RelayFreeLLM
pip install -r requirements.txt

2. Add free API keys

Create a .env file:

GEMINI_APIKEY=      # ai.google.dev
GROQ_APIKEY=        # console.groq.com
MISTRAL_APIKEY=     # console.mistral.ai
CEREBRAS_APIKEY=    # cloud.cerebras.ai
# any other providers you have
#ABC_APIKEY=...
#XYZ_APIKEY=...
#Best_APIKEY=...
# OLLMA model you host locally
#OLLAMA_BASE_URL=http://localhost:11434  # optional

# tell RelayFreeLLM how to choose from providers and provided models.
# default strategy is roundrobin for both.
PROVIDER_STRATEGY=roundrobin # pick provider in turn
MODEL_STRATEGY=random # randomly pick a model of the currently selected provider

3. Verify connectivity (optional but recommended)

python -m tests.test_models_availability

4. Start

python -m src.server

5. Use it

Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="relay-free"
)

# Automatic routing - picks the best available
response = client.chat.completions.create(
    model="meta-model",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Or route to specific provider
response = client.chat.completions.create(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)

cURL:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer relay-free" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-model", "messages": [{"role": "user", "content": "Hi"}]}'

LangChain:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="relay-free",
    model="meta-model"
)

How Routing Works

Intent-Based Selection

Tell RelayFreeLLM what you need:

// "Any model from any providers, RelayFreeLLM will choose one"
{"model": "meta-model", "messages": [...]}

// "Give me coding model from any providers"
{"model": "meta-model", "model_type": "coding", "messages": [...]}

// "I prefer small models to run fast, give simple responses"
{"model": "meta-model", "model_scale": "small", "messages": [...]}

// "I want large models to do most capable reasoning"
{"model": "meta-model", "model_scale": "large", "messages": [...]}

// "I want DeepSeek models if available"
{"model": "meta-model", "model_name": "deepseek", "messages": [...]}

// "Specific provider/model"
{"model": "Gemini/gemini-2.5-flash", "messages": [...]}

Automatic Failover

When a provider hits a rate limit:

Request → Groq (rate limited)
       → Circuit breaker activates
       → Retry → Gemini
       → Retry → Mistral
       → Success ✓

API Reference

`POST /v1/chat/completions`

Parameter	Type	Description
`model`	string	`"meta-model"` for auto-routing, or `"provider/model"` for direct
`messages`	array	Standard OpenAI message format
`stream`	bool	Enable SSE streaming (default: false)
`model_type`	string	Filter: `text`, `coding`, `ocr`
`model_scale`	string	Filter: `large`, `medium`, `small`
`model_name`	string	Match model name substring

`GET /v1/models`

List available models with status:

curl http://localhost:8000/v1/models?type=coding&scale=large

`GET /v1/usage`

Track your aggregated usage:

curl http://localhost:8000/v1/usage

Architecture

┌─────────────────────────────────────────────────┐
│                 Your Application                │
│         (OpenAI SDK, LangChain, etc.)           │
└─────────────────────┬───────────────────────────┘
                      │ OpenAI-compatible API
┌─────────────────────▼───────────────────────────┐
│              RelayFreeLLM Gateway               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────┐  │
│  │   Router    │→ │  Dispatcher │→ │ Selector│  │
│  │  /v1/chat   │  │  (retries)  │  │ (quota) │  │
│  └─────────────┘  └─────────────┘  └────┬────┘  │
└─────────────────────────────────────────┼───────┘
                                          │
        ┌──────────┬──────────┬───────────┼──────────┐
        ▼          ▼          ▼           ▼          ▼
   ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌───────┐
   │ Gemini │ │  Groq  │ │ Mistral│ │Cerebras│ │ Ollama│
   └────────┘ └────────┘ └────────┘ └────────┘ └───────┘

Project Structure

RelayFreeLLM/
├── src/
│   ├── server.py                 # Entry point
│   ├── router.py                 # API endpoints
│   ├── model_dispatcher.py       # Retry & circuit breaker logic
│   ├── model_selector.py         # Quota-aware routing
│   ├── provider_registry.py      # Auto-discovers providers
│   ├── models.py                 # Request/response models
│   └── api_clients/              # Provider implementations
│       ├── gemini_client.py
│       ├── groq_client.py
│       ├── mistral_client.py
│       └── ...
├── tests/                        # Unit & integration tests
└── provider_model_limits.json    # Rate limit configuration

Roadmap

Web dashboard for live provider status
Persistent rate limit state
Prompt caching layer
Embeddings & image generation routing
One-command Docker deploy

Contributing

Found a new free provider? Adding one takes about 50 lines:

# src/api_clients/my_provider_client.py
class MyProviderClient(ApiInterface):
    PROVIDER_NAME = "myprovider"
    
    async def call_model_api(self, request, stream):
        # Your API logic here
        pass

PRs welcome.

Acknowledgements

Built with FastAPI, Pydantic, and httpx.

Powered by the generous free tiers of Google Gemini, Groq, Mistral AI, Cerebras, and Ollama.

Built for developers who want great AI without the bill.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
relayfreellm-demo.gif		relayfreellm-demo.gif
requirements.txt		requirements.txt
run_tests.py		run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelayFreeLLM

See It In Action

Why You Need This

The Free Tier Problem

The RelayFreeLLM Solution

What You Get

Who It's For

Quick Start

1. Install

2. Add free API keys

3. Verify connectivity (optional but recommended)

4. Start

5. Use it

How Routing Works

Intent-Based Selection

Automatic Failover

API Reference

`POST /v1/chat/completions`

`GET /v1/models`

`GET /v1/usage`

Architecture

Project Structure

Roadmap

Contributing

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RelayFreeLLM

See It In Action

Why You Need This

The Free Tier Problem

The RelayFreeLLM Solution

What You Get

Who It's For

Quick Start

1. Install

2. Add free API keys

3. Verify connectivity (optional but recommended)

4. Start

5. Use it

How Routing Works

Intent-Based Selection

Automatic Failover

API Reference

POST /v1/chat/completions

GET /v1/models

GET /v1/usage

Architecture

Project Structure

Roadmap

Contributing

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/chat/completions`

`GET /v1/models`

`GET /v1/usage`

Packages