MindsDB Query Engine

Semantic search over all your data — entirely in SQL.

Docs · Website · Discord · Contact

MindsDB Query Engine connects to 200+ data sources — databases, warehouses, applications, files — and lets you query them live in one SQL dialect, with no ETL. Index unstructured content into knowledge bases, then search it by meaning, by keyword, or both at once, with plain SQL filters on top. Everything is reachable from any MySQL- or PostgreSQL-compatible client.

Where this fits: MindsDB now builds MindsHub — a hub for open AI agents. The Query Engine remains a standalone open-source project, and it pairs well with MindsHub agents: connect it to give an agent live, SQL-queryable access to your data and semantic search. The full story: MindsHub vs MindsDB.

How it works

   MySQL clients · PostgreSQL clients · BI tools · ORMs · HTTP API
                                  │
                   ┌──────────────▼───────────────┐
                   │     MindsDB Query Engine     │
                   │     one SQL dialect over     │
                   │  a federated query planner   │
                   └──────────────┬───────────────┘
                                  │
            ┌─────────────────────┼─────────────────────┐
            │                     │                     │
  ┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
  │     Databases     │ │    Apps & files   │ │  Knowledge bases  │
  │ Postgres, MySQL,  │ │ Slack, web crawler│ │   embeddings +    │
  │ MongoDB, Snowflake│ │ docs, sheets,     │ │  vector store +   │
  │ BigQuery, S3, …   │ │ email, calendars… │ │    BM25 index     │
  └───────────────────┘ └───────────────────┘ └───────────────────┘
           queried live, in place — data is never copied

One server, three interfaces. The engine ships a built-in SQL editor on HTTP (:47334) and speaks the MySQL (:47335) and PostgreSQL (:47336) wire protocols — so mysql, psql, DBeaver, SQLAlchemy, or any BI tool connects directly.
Federated queries, no pipelines. CREATE DATABASE attaches a live data source through an integration handler. The planner translates each query, pushes work down to the source, and streams results back — your data stays where it is. Source-specific syntax is still available via native queries.
Knowledge bases are the semantic layer. A knowledge base combines an embedding model, an optional reranking model, and a vector store (e.g. pgvector). INSERT INTO it to chunk, embed, and index content; SELECT from it to retrieve by meaning, filtered by metadata columns like any other table.
Hybrid retrieval. Hybrid search runs vector similarity and BM25 keyword matching in parallel and merges the results — for queries that mix natural language with exact identifiers, codes, or acronyms.
Organize and automate. Projects namespace your work, views save cross-source transformations, and jobs schedule any SQL to run on an interval — e.g. to keep knowledge bases fresh.

Quick start

Run with Docker:

docker run --name mindsdb_container \
  -e MINDSDB_APIS=http,mysql \
  -p 47334:47334 -p 47335:47335 \
  mindsdb/mindsdb

Or install from PyPI:

pip install mindsdb            # add extras as needed, e.g. mindsdb[pgvector,openai,postgres]
python -m mindsdb

Then open the editor at http://127.0.0.1:47334, or connect any MySQL client to port 47335. The quickstart walks through the rest.

From zero to semantic search

Six SQL statements, start to finish. Full syntax for every statement is in the SQL reference.

1. Attach your data sources (docs) — they are queried live, nothing is imported:

CREATE DATABASE my_pg
WITH ENGINE = 'postgres',
PARAMETERS = {
  "host": "localhost", "port": 5432,
  "user": "user", "password": "pass",
  "database": "mydb"
};

CREATE DATABASE my_mongo
WITH ENGINE = 'mongodb',
PARAMETERS = {
  "host": "mongodb+srv://user:pass@cluster.example.net",
  "database": "support"
};

2. Query across sources in one dialect (docs) — even non-SQL stores like MongoDB, and save the result as a view:

CREATE VIEW open_tickets_by_product AS (
  SELECT p.name, COUNT(t.ticket_id) AS open_tickets
  FROM my_mongo.support_tickets AS t
  JOIN my_pg.products AS p
    ON t.product_id = p.id
  WHERE t.status = 'open'
  GROUP BY p.name
);

3. Create a knowledge base (docs) — an embedding model plus a vector store, addressable as a table:

CREATE KNOWLEDGE_BASE support_kb
USING
  embedding_model = {
    "provider":   "openai",
    "model_name": "text-embedding-3-large",
    "api_key":    "sk-..."
  },
  storage          = my_pgvector.support_kb_store,  -- a pgvector connection
  content_columns  = ['subject', 'body'],
  metadata_columns = ['product_name', 'priority', 'created_at'],
  id_column        = 'ticket_id';

4. Index your content (docs) — rows are chunked, embedded, and upserted:

INSERT INTO support_kb
  SELECT ticket_id, subject, body, product_name, priority, created_at
  FROM my_mongo.support_tickets;

5. Search by meaning, filter by metadata (docs):

SELECT chunk_content, product_name, relevance
FROM support_kb
WHERE content = 'cannot connect after the latest update'
  AND priority <= 2
  AND relevance >= 0.5
LIMIT 10;

-- hybrid search: blend vector similarity with BM25 keyword matching
SELECT *
FROM support_kb
WHERE content = 'error ERR-4421'
  AND hybrid_search = true;

▶ How to use semantic search with metadata filters — a good explainer of this feature.

6. Keep the index fresh with a job (docs):

CREATE JOB refresh_support_kb (
  INSERT INTO support_kb
    SELECT ticket_id, subject, body, product_name, priority, created_at
    FROM my_mongo.support_tickets
    WHERE created_at > LAST
)
EVERY hour;

Help and support

You need	Go to
Ask a question	Discord
Report a bug	GitHub Issues — please include reproduction steps
Commercial support	Contact the team

Security note: if you find a vulnerability, please do not open a public issue — follow our security policy instead.

Contributing

Contributions are welcome — code, integrations, docs, and bug reports alike. We follow the fork-and-pull workflow: see the contribution guide to get set up, and browse the open issues for somewhere to start. Good first areas are new integration handlers, bug fixes, and documentation improvements.

Resources

License

MindsDB Core is licensed under the Elastic License 2.0; some directories carry their own license — see the LICENSE file for the full structure.

Name		Name	Last commit message	Last commit date
Latest commit History 20,573 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
docker		docker
docs 2		docs 2
docs		docs
mindsdb hacktoberfest		mindsdb hacktoberfest
mindsdb		mindsdb
requirements		requirements
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
otel-collector-config.yaml		otel-collector-config.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MindsDB Query Engine

How it works

Quick start

From zero to semantic search

Help and support

Contributing

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MindsDB Query Engine

How it works

Quick start

From zero to semantic search

Help and support

Contributing

Resources

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages