Skip to content

mindsdb/engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20,573 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MindsDB Query Engine

Semantic search over all your data — entirely in SQL.

PyPI version Supported Python versions Docker pulls

Docs · Website · Discord · Contact


MindsDB Query Engine connects to 200+ data sources — databases, warehouses, applications, files — and lets you query them live in one SQL dialect, with no ETL. Index unstructured content into knowledge bases, then search it by meaning, by keyword, or both at once, with plain SQL filters on top. Everything is reachable from any MySQL- or PostgreSQL-compatible client.

Where this fits: MindsDB now builds MindsHub — a hub for open AI agents. The Query Engine remains a standalone open-source project, and it pairs well with MindsHub agents: connect it to give an agent live, SQL-queryable access to your data and semantic search. The full story: MindsHub vs MindsDB.

How it works

   MySQL clients · PostgreSQL clients · BI tools · ORMs · HTTP API
                                  │
                   ┌──────────────▼───────────────┐
                   │     MindsDB Query Engine     │
                   │     one SQL dialect over     │
                   │  a federated query planner   │
                   └──────────────┬───────────────┘
                                  │
            ┌─────────────────────┼─────────────────────┐
            │                     │                     │
  ┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
  │     Databases     │ │    Apps & files   │ │  Knowledge bases  │
  │ Postgres, MySQL,  │ │ Slack, web crawler│ │   embeddings +    │
  │ MongoDB, Snowflake│ │ docs, sheets,     │ │  vector store +   │
  │ BigQuery, S3, …   │ │ email, calendars… │ │    BM25 index     │
  └───────────────────┘ └───────────────────┘ └───────────────────┘
           queried live, in place — data is never copied
  • One server, three interfaces. The engine ships a built-in SQL editor on HTTP (:47334) and speaks the MySQL (:47335) and PostgreSQL (:47336) wire protocols — so mysql, psql, DBeaver, SQLAlchemy, or any BI tool connects directly.
  • Federated queries, no pipelines. CREATE DATABASE attaches a live data source through an integration handler. The planner translates each query, pushes work down to the source, and streams results back — your data stays where it is. Source-specific syntax is still available via native queries.
  • Knowledge bases are the semantic layer. A knowledge base combines an embedding model, an optional reranking model, and a vector store (e.g. pgvector). INSERT INTO it to chunk, embed, and index content; SELECT from it to retrieve by meaning, filtered by metadata columns like any other table.
  • Hybrid retrieval. Hybrid search runs vector similarity and BM25 keyword matching in parallel and merges the results — for queries that mix natural language with exact identifiers, codes, or acronyms.
  • Organize and automate. Projects namespace your work, views save cross-source transformations, and jobs schedule any SQL to run on an interval — e.g. to keep knowledge bases fresh.

Quick start

Run with Docker:

docker run --name mindsdb_container \
  -e MINDSDB_APIS=http,mysql \
  -p 47334:47334 -p 47335:47335 \
  mindsdb/mindsdb

Or install from PyPI:

pip install mindsdb            # add extras as needed, e.g. mindsdb[pgvector,openai,postgres]
python -m mindsdb

Then open the editor at http://127.0.0.1:47334, or connect any MySQL client to port 47335. The quickstart walks through the rest.

From zero to semantic search

Six SQL statements, start to finish. Full syntax for every statement is in the SQL reference.

1. Attach your data sources (docs) — they are queried live, nothing is imported:

CREATE DATABASE my_pg
WITH ENGINE = 'postgres',
PARAMETERS = {
  "host": "localhost", "port": 5432,
  "user": "user", "password": "pass",
  "database": "mydb"
};

CREATE DATABASE my_mongo
WITH ENGINE = 'mongodb',
PARAMETERS = {
  "host": "mongodb+srv://user:pass@cluster.example.net",
  "database": "support"
};

2. Query across sources in one dialect (docs) — even non-SQL stores like MongoDB, and save the result as a view:

CREATE VIEW open_tickets_by_product AS (
  SELECT p.name, COUNT(t.ticket_id) AS open_tickets
  FROM my_mongo.support_tickets AS t
  JOIN my_pg.products AS p
    ON t.product_id = p.id
  WHERE t.status = 'open'
  GROUP BY p.name
);

3. Create a knowledge base (docs) — an embedding model plus a vector store, addressable as a table:

CREATE KNOWLEDGE_BASE support_kb
USING
  embedding_model = {
    "provider":   "openai",
    "model_name": "text-embedding-3-large",
    "api_key":    "sk-..."
  },
  storage          = my_pgvector.support_kb_store,  -- a pgvector connection
  content_columns  = ['subject', 'body'],
  metadata_columns = ['product_name', 'priority', 'created_at'],
  id_column        = 'ticket_id';

4. Index your content (docs) — rows are chunked, embedded, and upserted:

INSERT INTO support_kb
  SELECT ticket_id, subject, body, product_name, priority, created_at
  FROM my_mongo.support_tickets;

5. Search by meaning, filter by metadata (docs):

SELECT chunk_content, product_name, relevance
FROM support_kb
WHERE content = 'cannot connect after the latest update'
  AND priority <= 2
  AND relevance >= 0.5
LIMIT 10;

-- hybrid search: blend vector similarity with BM25 keyword matching
SELECT *
FROM support_kb
WHERE content = 'error ERR-4421'
  AND hybrid_search = true;

How to use semantic search with metadata filters — a good explainer of this feature.

6. Keep the index fresh with a job (docs):

CREATE JOB refresh_support_kb (
  INSERT INTO support_kb
    SELECT ticket_id, subject, body, product_name, priority, created_at
    FROM my_mongo.support_tickets
    WHERE created_at > LAST
)
EVERY hour;

Help and support

You need Go to
Ask a question Discord
Report a bug GitHub Issues — please include reproduction steps
Commercial support Contact the team

Security note: if you find a vulnerability, please do not open a public issue — follow our security policy instead.

Contributing

Contributions are welcome — code, integrations, docs, and bug reports alike. We follow the fork-and-pull workflow: see the contribution guide to get set up, and browse the open issues for somewhere to start. Good first areas are new integration handlers, bug fixes, and documentation improvements.

Resources

License

MindsDB Core is licensed under the Elastic License 2.0; some directories carry their own license — see the LICENSE file for the full structure.

About

Semantic Query Engine for AI Agents to securely query data from any datasource

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors