Ask your database questions in plain English. This library powers natural language interfaces to SQL databases — no SQL knowledge required. Built on LlamaIndex and OpenAI, it transforms freeform questions into precise, executed SQL queries and returns human-readable answers.
SQLAgent is an LLM-powered agent that lets you query relational databases using natural language. Under the hood, it:
- Introspects your database schema and generates table summaries via LLM
- Indexes table rows into a Pinecone vector store for semantic row retrieval
- Routes your natural language question through a multi-stage query pipeline
- Synthesizes a final answer from query results
- Natural Language to SQL — Convert freeform questions into optimized SQL queries automatically
- Semantic Row Retrieval — Relevant table rows are retrieved via vector similarity to provide context for query generation
- Multi-Database Support — Works with any SQL database supported by SQLAlchemy: MySQL, PostgreSQL, SQLite, and more
- LLM-Powered Schema Summarization — Tables are automatically described in business terms to help the LLM understand your data model
- Query Pipeline Architecture — Modular, visualizable pipeline: table retrieval → schema context → text-to-SQL → execution → response synthesis
- Automatic Model Fallback — Gracefully falls back from
gpt-3.5-turbotogpt-4oon failure - Security-First Defaults — Sensitive tables (
admin,api_key,refresh_token, etc.) are ignored by default - Persisted Indexes — Table schema index and row indexes are cached locally / in Pinecone for fast re-initialization
| Layer | Technology |
|---|---|
| LLM Framework | LlamaIndex |
| Language Model | OpenAI gpt-3.5-turbo / gpt-4o |
| Embeddings | OpenAI text-embed-3-small |
| Vector Store | Pinecone |
| Database ORM | SQLAlchemy |
| Python | Python 3.10 – 3.12 |
pip install sqlagentOr via Poetry:
poetry add sqlagent# Required
OPENAI_API_KEY=sk-...
# Database connection string (or pass db_url directly)
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
# Optional — required only if using Pinecone row indexing
PINECONE_API_KEY=...
PINECONE_HOST=... # e.g., https://xxx-abcde.svc.xxx.pinecone.ioYou can also pass these values directly to the SQLAgent constructor.
from sqlagent.agent import SQLAgent
agent = SQLAgent(
db_url="postgresql://user:pass@localhost:5432/mydb",
api_key="sk-...",
)
result = await agent.arun("How many orders were placed last month?")
print(result)import asyncio
from sqlagent.agent import SQLAgent
async def main():
agent = SQLAgent(
db_url="mysql://user:pass@localhost:3306/sales_db",
api_key="sk-...",
pinecone_api_key="...",
pinecone_host="https://xxx.svc.xxx.pinecone.io",
)
result = await agent.arun("What is the average order value for Toyota Motor?")
print(result)
asyncio.run(main())| Parameter | Type | Default | Description |
|---|---|---|---|
db_url |
str |
DATABASE_URL env |
SQLAlchemy database URL |
api_key |
str |
OPENAI_API_KEY env |
OpenAI API key |
model |
str |
"gpt-3.5-turbo" |
Primary LLM model |
fallback_model |
str |
"gpt-4o" |
Fallback model on error |
embedding_model |
OpenAIEmbeddingModelType |
TEXT_EMBED_3_SMALL |
Embedding model |
pinecone_api_key |
str |
PINECONE_API_KEY env |
Pinecone API key |
pinecone_host |
str |
PINECONE_HOST env |
Pinecone host URL |
object_index_dir |
str |
"./object_index" |
Directory for persisted schema index |
max_db_row_index |
int | None |
None |
Max rows per table to index in Pinecone |
ignore_tables |
List[str] |
[admin, admin_block, api_key, ...] |
Tables to exclude from indexing |
- MySQL
- PostgreSQL
- SQLite
- Any database supported by SQLAlchemy's dialect ecosystem
sqlagent/
├── agent.py # Main SQLAgent class and query pipeline
├── __init__.py # Package entry point
├── components/
│ └── sql_parser_component.py # Parses LLM output to extract SQL query
└── ingestions/
├── load_and_persist_object_index.py # LLM-powered table schema summarization + ObjectIndex
└── load_and_persist_tables_row.py # Pinecone row-level vector indexing
User Query
│
▼
┌─────────────────────┐ ┌──────────────────────┐
│ Table Retriever │────▶│ Table Parser + Rows │
│ (ObjectIndex) │ │ (schema + top-k rows)│
└─────────────────────┘ └──────────────────────┘
│
▼
┌──────────────────┐
│ Text-to-SQL LLM │
│ (gpt-3.5-turbo) │
└──────────────────┘
│
▼
┌──────────────────┐
│ SQL Parser │
│ (extract query) │
└──────────────────┘
│
▼
┌──────────────────┐
│ SQL Retriever │
│ (SQLAlchemy) │
└──────────────────┘
│
▼
┌──────────────────┐
│ Response LLM │
│ (synthesize) │
└──────────────────┘
│
▼
Final Answer
MIT License. See LICENSE for details.