Skip to content

vikyw89/sql-agent

Repository files navigation

SQL Agent

Ask your database questions in plain English. This library powers natural language interfaces to SQL databases — no SQL knowledge required. Built on LlamaIndex and OpenAI, it transforms freeform questions into precise, executed SQL queries and returns human-readable answers.


Overview

SQLAgent is an LLM-powered agent that lets you query relational databases using natural language. Under the hood, it:

  1. Introspects your database schema and generates table summaries via LLM
  2. Indexes table rows into a Pinecone vector store for semantic row retrieval
  3. Routes your natural language question through a multi-stage query pipeline
  4. Synthesizes a final answer from query results

Features

  • Natural Language to SQL — Convert freeform questions into optimized SQL queries automatically
  • Semantic Row Retrieval — Relevant table rows are retrieved via vector similarity to provide context for query generation
  • Multi-Database Support — Works with any SQL database supported by SQLAlchemy: MySQL, PostgreSQL, SQLite, and more
  • LLM-Powered Schema Summarization — Tables are automatically described in business terms to help the LLM understand your data model
  • Query Pipeline Architecture — Modular, visualizable pipeline: table retrieval → schema context → text-to-SQL → execution → response synthesis
  • Automatic Model Fallback — Gracefully falls back from gpt-3.5-turbo to gpt-4o on failure
  • Security-First Defaults — Sensitive tables (admin, api_key, refresh_token, etc.) are ignored by default
  • Persisted Indexes — Table schema index and row indexes are cached locally / in Pinecone for fast re-initialization

Tech Stack

Layer Technology
LLM Framework LlamaIndex
Language Model OpenAI gpt-3.5-turbo / gpt-4o
Embeddings OpenAI text-embed-3-small
Vector Store Pinecone
Database ORM SQLAlchemy
Python Python 3.10 – 3.12

Installation

pip install sqlagent

Or via Poetry:

poetry add sqlagent

Required Environment Variables

# Required
OPENAI_API_KEY=sk-...

# Database connection string (or pass db_url directly)
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb

# Optional — required only if using Pinecone row indexing
PINECONE_API_KEY=...
PINECONE_HOST=...   # e.g., https://xxx-abcde.svc.xxx.pinecone.io

You can also pass these values directly to the SQLAgent constructor.

Usage

Quick Start

from sqlagent.agent import SQLAgent

agent = SQLAgent(
    db_url="postgresql://user:pass@localhost:5432/mydb",
    api_key="sk-...",
)

result = await agent.arun("How many orders were placed last month?")
print(result)

Async Usage

import asyncio
from sqlagent.agent import SQLAgent

async def main():
    agent = SQLAgent(
        db_url="mysql://user:pass@localhost:3306/sales_db",
        api_key="sk-...",
        pinecone_api_key="...",
        pinecone_host="https://xxx.svc.xxx.pinecone.io",
    )
    result = await agent.arun("What is the average order value for Toyota Motor?")
    print(result)

asyncio.run(main())

Configuration Options

Parameter Type Default Description
db_url str DATABASE_URL env SQLAlchemy database URL
api_key str OPENAI_API_KEY env OpenAI API key
model str "gpt-3.5-turbo" Primary LLM model
fallback_model str "gpt-4o" Fallback model on error
embedding_model OpenAIEmbeddingModelType TEXT_EMBED_3_SMALL Embedding model
pinecone_api_key str PINECONE_API_KEY env Pinecone API key
pinecone_host str PINECONE_HOST env Pinecone host URL
object_index_dir str "./object_index" Directory for persisted schema index
max_db_row_index int | None None Max rows per table to index in Pinecone
ignore_tables List[str] [admin, admin_block, api_key, ...] Tables to exclude from indexing

Supported Databases

Project Structure

sqlagent/
├── agent.py                          # Main SQLAgent class and query pipeline
├── __init__.py                       # Package entry point
├── components/
│   └── sql_parser_component.py       # Parses LLM output to extract SQL query
└── ingestions/
    ├── load_and_persist_object_index.py  # LLM-powered table schema summarization + ObjectIndex
    └── load_and_persist_tables_row.py    # Pinecone row-level vector indexing

How the Query Pipeline Works

User Query
    │
    ▼
┌─────────────────────┐     ┌──────────────────────┐
│  Table Retriever    │────▶│  Table Parser + Rows  │
│  (ObjectIndex)      │     │  (schema + top-k rows)│
└─────────────────────┘     └──────────────────────┘
                                     │
                                     ▼
                            ┌──────────────────┐
                            │  Text-to-SQL LLM  │
                            │  (gpt-3.5-turbo)  │
                            └──────────────────┘
                                     │
                                     ▼
                            ┌──────────────────┐
                            │  SQL Parser      │
                            │  (extract query) │
                            └──────────────────┘
                                     │
                                     ▼
                            ┌──────────────────┐
                            │  SQL Retriever   │
                            │  (SQLAlchemy)    │
                            └──────────────────┘
                                     │
                                     ▼
                            ┌──────────────────┐
                            │  Response LLM    │
                            │  (synthesize)    │
                            └──────────────────┘
                                     │
                                     ▼
                              Final Answer

License

MIT License. See LICENSE for details.

Releases

No releases published

Packages

 
 
 

Contributors