flyquery — API Reference

Overview
Authentication
Common headers
Error envelope
User-tier endpoints
Agent-tier endpoints
Scope catalog
SSE event reference

1. Overview

Base URL: https://api.flyservices.io (production) / http://localhost:8520 (dev)
Versioning: all endpoints live under /api/v1/
Format: JSON request and response bodies; Content-Type: application/json
Upload: multipart/form-data for file upload endpoints
Streaming: Server-Sent Events (SSE) on */stream endpoints

The full machine-readable spec is in openapi.json at the repository root. Use it as the source of truth for all field types, required/optional markers, and response schemas. This document covers representative examples and important behavioral notes.

2. Authentication

User-tier

JWT bearer token. Pass in the Authorization header:

Authorization: Bearer <jwt>

JWT must carry tenant_id and workspace_id claims, or the request must include X-Tenant-Id and X-Workspace-Id headers.

Agent-tier

Long-lived agent token. Pass in the X-Agent-Token header:

X-Agent-Token: agt_a1b2c3d4_<32hex>

Every write to an agent-tier endpoint must also include:

Idempotency-Key: <unique-string>

The server deduplicates by (tenant_id, workspace_id, key) for a default 24-hour window. Replaying the same key within the window returns the original response.

3. Common headers

Header	Required	Notes
`X-Tenant-Id`	Yes	Tenant slug, e.g. `acme`
`X-Workspace-Id`	Yes	Workspace slug, e.g. `analytics`
`X-Correlation-Id`	No	Trace identifier; autogenerated if absent, echoed in response
`Idempotency-Key`	Writes only	Required on all agent-tier writes

4. Error envelope

All errors use RFC 7807 Problem Details:

{
  "type": "https://firefly.dev/problems/resource_not_found",
  "title": "Resource not found",
  "status": 404,
  "code": "resource_not_found",
  "detail": "Dataset 'abc' does not exist in workspace 'analytics'.",
  "errors": [
    {"path": "dataset_id", "message": "No dataset with this ID"}
  ]
}

Branch on code (stable) not title (human-readable). Common codes:

HTTP	Code	Meaning
400	`validation_error`	Request body failed validation
401	`unauthorized`	Missing or expired JWT/agent token
403	`forbidden`	Token lacks required scope
404	`resource_not_found`	Entity does not exist
409	`conflict`	Idempotency replay collision or unique constraint
422	`unprocessable`	Structurally valid but semantically rejected (e.g., direct SQL disabled)
429	`rate_limit_exceeded`	Agent token RPM limit exhausted
500	`internal_error`	Unexpected server error

5. User-tier endpoints

5.1 Workspaces

`POST /api/v1/workspaces`

Create a workspace.

Scope: authenticated user.

Request

{
  "slug": "analytics",
  "name": "Analytics",
  "kms_key_uri": null,
  "retention_days": null,
  "allow_direct_sql": false,
  "default_locale": "en-US"
}

Response 201 Created

{
  "id": "01906f2a-...",
  "slug": "analytics",
  "name": "Analytics",
  "allow_direct_sql": false,
  "default_locale": "en-US",
  "storage_used_bytes": 0,
  "created_at": "2026-05-23T12:00:00Z"
}

`GET /api/v1/workspaces`

List workspaces for the tenant.

Response 200 OK — array of workspace summaries.

`GET /api/v1/workspaces/{id}`

Fetch one workspace.

`PUT /api/v1/workspaces/{id}`

Update kms_key_uri, retention_days, allow_direct_sql, default_locale.

`DELETE /api/v1/workspaces/{id}:purge`

Schedule hard deletion. A 30-day soft tombstone is retained before bytes are removed. In-flight ingestion and queries are cooperatively cancelled first.

5.2 Datasets

`POST /api/v1/datasets`

Request

{
  "name": "sales-2026",
  "description": "Annual sales export",
  "drift_policy": "AUTO",
  "ingest_policy_json": {}
}

Response 201 Created — DatasetResponse

`GET /api/v1/datasets`

List datasets. Optional query params: status (ACTIVE|ARCHIVED), page, page_size.

`GET /api/v1/datasets/{id}`

`PUT /api/v1/datasets/{id}`

Update drift_policy, ingest_policy_json, default_locale, description.

`DELETE /api/v1/datasets/{id}`

Soft-archive. Does not remove Parquet blobs; those are removed on workspace purge or when retention_days expires.

5.3 Files and tables

`POST /api/v1/datasets/{id}/files` — upload a file

Multipart upload. The server detects format from magic bytes + extension.

Request Content-Type: multipart/form-data

field file: the binary content
field original_filename: string (optional; defaults to part filename)
field table_extraction_rules_json: JSON string (optional)

Response 202 Accepted

{
  "file_id": "01906f2b-...",
  "ingest_job_id": "01906f2c-...",
  "tables": [
    {
      "table_id": "01906f2d-...",
      "qualified_name": "sales-2026.orders",
      "sheet_or_json_path": null,
      "n_rows_estimate": 15000
    }
  ]
}

An XLSX with 3 sheets returns 3 entries in tables. Poll or stream the ingest_job_id to track pipeline progress.

`POST /api/v1/datasets/{id}/files:bulk` — upload many files in one request

Multi-file multipart variant of the single upload. Every files part is processed through the same per-file pipeline (receive → parse → reconcile → sample → profile → describe → embed → publish) and the per-file calls run in parallel through asyncio.gather — ingesting N files takes roughly the wall-clock of one.

Per-file failures DO NOT abort the bulk. Each result carries status="OK" + file_id + tables, or status="FAILED" + error.

Request Content-Type: multipart/form-data

N files parts (each a binary file); a singleton file part is also accepted for client compatibility

Response 201 Created

{
  "results": [
    {"index": 0, "original_filename": "customers.csv", "status": "OK",
     "file_id": "...", "tables": [{"table_id":"...","name":"customers", ...}]},
    {"index": 1, "original_filename": "broken.csv", "status": "FAILED",
     "file_id": null, "tables": [], "error": "could not sniff dialect"}
  ],
  "total_files": 2,
  "succeeded": 1,
  "failed": 1
}

SDK helpers:

Python: FlyqueryClient.upload_bulk(dataset_id, paths) and FlyqueryClient.upload_directory(dataset_id, dir).
Java: WebClient against :bulk with a MultipartBodyBuilder — see sdks/java/examples/Demo.java.

`POST /api/v1/datasets/{id}/files:async` — async upload

Like POST /datasets/{id}/files but the response is a 202 Accepted with a job id. Stage 1 (receive: caps check, hash, format detect, store bytes, write flyquery_files row, track workspace storage) runs synchronously before responding — file_id is final and the content hash provides natural per-file dedup. Stages 2-10 of the pipeline are queued as a PARSE_AND_INGEST ingest job that the IngestWorker consumes (the worker honours an already_received flag on the job so it does NOT re-run Stage 1; see async-ingest.md and workers.md for the worker side).

Request Content-Type: multipart/form-data — one file part.

Response 202 Accepted with Location: /api/v1/ingest-jobs/{job_id}

{
  "job_id": "01906f2c-...",
  "file_id": "01906f2b-...",
  "dataset_id": "01906f2a-...",
  "status": "PENDING"
}

Use in place of the synchronous endpoint when a single file exceeds the request-timeout budget of the deployment (typically anything above a few MB once cold-cache LLM describe calls land).

Source: files_controller.py:249 (upload_file_async).

`GET /api/v1/datasets/{id}/files`

List files for a dataset.

`GET /api/v1/files/{id}`

Fetch file metadata (no binary download; original blob is retained internally).

`DELETE /api/v1/files/{id}`

Marks the file as DELETED. Does not purge the Parquet snapshot for tables already derived from this file.

`GET /api/v1/datasets/{id}/tables`

List tables. Optional: kind (UPLOADED|DERIVED), is_active.

`GET /api/v1/tables/{id}`

Response includes current_snapshot_id, n_rows_estimate, n_columns, description, kind, sheet_or_json_path.

`PUT /api/v1/tables/{id}`

Update name, description, locale_override.

`PUT /api/v1/datasets/{ds}/tables/{id}:upload` — re-upload

Re-upload a file into an existing table slot. Creates a new Snapshot with the same table_id. Annotations carry forward via (qualified_name, column_name). Ambiguous column renames are flagged as RENAMED_CANDIDATE.

Request: same multipart as POST /datasets/{id}/files.

Response 202 Accepted — {ingest_job_id, snapshot_id}

`DELETE /api/v1/tables/{id}`

Soft-delete. Sets is_active=false.

`POST /api/v1/tables:derive`

Materialise a new derived table from a SQL SELECT statement.

Scope: flyquery.derived:write

Request

{
  "name": "orders_summary",
  "dataset_id": "01906f2a-...",
  "sql": "SELECT region, SUM(amount) AS total FROM orders GROUP BY region"
}

Response 202 Accepted — {table_id, ingest_job_id}

Derived tables accept INSERT/UPDATE/DELETE under flyquery.derived:write scope. Ingested (UPLOADED) tables are always read-only.

`GET /api/v1/tables/{id}/snapshots`

List all snapshots for a table, most recent first.

`GET /api/v1/tables/{id}/snapshots/{snap_id}`

Fetch one snapshot: status, column count, row count, Parquet byte size.

`GET /api/v1/tables/{id}/changes`

Paginated diff log (flyquery_schema_changes). Shows ADDED, REMOVED, TYPE_CHANGED, RENAMED, RENAMED_CANDIDATE events across re-uploads.

5.4 Ingest jobs

`POST /api/v1/ingest-jobs`

Start an ingestion job explicitly.

Request

{
  "job_kind": "DESCRIBE_PASS",
  "dataset_id": "01906f2a-...",
  "table_id": "01906f2d-..."
}

job_kind options:

PARSE_AND_INGEST — full pipeline; used by file upload
REPARSE — re-run parse from the existing blob
SAMPLE_REFRESH — re-collect samples for changed columns
DESCRIBE_PASS — run DescribeAgent on objects still lacking descriptions
RELATION_PASS — re-run heuristic + RelationProposerAgent across the dataset

Response 202 Accepted — {ingest_job_id}

`GET /api/v1/ingest-jobs`

List jobs. Query params: status, job_kind, dataset_id, table_id, page, page_size.

`GET /api/v1/ingest-jobs/{id}`

Fetch full job detail: status, attempts, result_json, cost_cents.

`GET /api/v1/ingest-jobs/{id}/stream`

SSE stream of pipeline-stage events. See §8 SSE event reference.

`GET /api/v1/ingest-jobs/{id}/events`

Paginated ledger of all flyquery_ingest_events rows for this job.

`POST /api/v1/ingest-jobs/{id}:cancel`

Cooperative cancel. Sets status=CANCELLED and signals the worker to stop after the current stage completes.

5.5 Schema annotation

`PUT /api/v1/schema-objects/{id}`

Edit metadata on a TABLE or COLUMN schema object.

Request

{
  "description": "Gross revenue before tax, in USD cents",
  "pii_tag": "NONE",
  "business_owner": "finance-team",
  "synonyms_json": ["revenue", "total_revenue"],
  "governance_json": {}
}

Setting pii_tag to a non-NONE value on a column with existing samples triggers automatic sample scrubbing in the same transaction.

`POST /api/v1/schema-objects/{id}:sample`

Refresh sample_values_json for this column.

`POST /api/v1/schema-objects/{id}:profile`

Refresh profile_json (null_fraction, distinct_estimate, top_values, min/max).

`POST /api/v1/schema-objects/{id}/type-hint`

Apply an LLM-suggested derived virtual type to a column. Marks description_source=AGENT. This is the approval step for type hints proposed during ingestion.

`GET /api/v1/schema-objects:search`

Hybrid BM25 + vector search over the schema knowledge base.

Query params: q (required), dataset_id, table_id, kind (TABLE|COLUMN), top_k (default 10).

Response — ranked list of schema objects with score and snippet.

`GET /api/v1/schema-objects/{id}/lineage`

Upstream lineage (source file) and downstream provenance (derived tables).

`POST /api/v1/schema-objects:rename-merge`

Transplant annotations from one schema object to another, used when a column rename is confirmed out-of-band.

Request

{
  "from_id": "01906f30-...",
  "to_id": "01906f31-..."
}

`POST /api/v1/schema-changes/{id}:confirm`

Flip a RENAMED_CANDIDATE schema-change row to RENAMED. Triggers annotation transplant from the old column object to the new one.

5.6 Relations

`GET /api/v1/datasets/{id}/relations`

List proposed + approved + rejected join relations for a dataset. Optional: status (PROPOSED|APPROVED|REJECTED), kind.

`POST /api/v1/datasets/{id}/relations`

Add a manual join relation.

Scope: flyquery.relations:write

Request

{
  "from_table_id": "01906f2d-...",
  "from_column_name": "customer_id",
  "to_table_id": "01906f2e-...",
  "to_column_name": "id",
  "condition": null,
  "kind": "MANUAL"
}

`POST /api/v1/datasets/{id}/relations/{rel_id}:approve`

Promotes PROPOSED → APPROVED, kind=MANUAL. This relation now enters the retrieval pool for query grounding.

`POST /api/v1/datasets/{id}/relations/{rel_id}:reject`

Sets status=REJECTED. Rejected relations are excluded from retrieval.

`DELETE /api/v1/datasets/{id}/relations/{rel_id}`

Permanently removes a relation row.

5.7 Semantic layer

`POST /api/v1/semantic/metrics`

Scope: flyquery.semantic:author

Request

{
  "name": "total_revenue",
  "label": "Total Revenue",
  "description": "Sum of order_amount across all completed orders",
  "metric_type": "SIMPLE",
  "dataset_id": "01906f2a-...",
  "definition_yaml": "metric:\n  name: total_revenue\n  type: simple\n  type_params:\n    measure:\n      name: order_amount\n      agg: sum\n  filter: \"order_status = 'COMPLETED'\"\n"
}

The definition_yaml is validated against the MetricFlow schema. On success, compiled_sql_template is populated deterministically. See semantic-layer.md for the YAML schema and compilation rules.

Response 201 Created — SemanticMetricResponse

`GET /api/v1/semantic/metrics`

List metrics. Optional: dataset_id, status (DRAFT|PUBLISHED|RETIRED).

`GET /api/v1/semantic/metrics/{id}`

`PUT /api/v1/semantic/metrics/{id}`

Update definition_yaml, label, description. Recompiles on change.

`POST /api/v1/semantic/metrics/{id}:publish`

Moves status from DRAFT → PUBLISHED. Published metrics enter the query-grounding retrieval pool.

`POST /api/v1/semantic/metrics/{id}:retire`

Moves status to RETIRED. Retired metrics are excluded from retrieval.

`GET /api/v1/semantic/metrics/{id}/history`

Immutable version history from flyquery_semantic_versions.

The same CRUD + publish/retire/history endpoints exist for dimensions: /api/v1/semantic/dimensions.

`POST /api/v1/glossary`

Scope: flyquery.semantic:author

Request

{
  "term": "ARR",
  "definition": "Annual Recurring Revenue",
  "synonyms": ["annual recurring revenue"]
}

Glossary terms are workspace-scoped (not per-dataset) and enter the grounding retrieval pool for all datasets in the workspace.

`GET /api/v1/glossary`

`PUT /api/v1/glossary/{id}`

`DELETE /api/v1/glossary/{id}`

5.8 Examples (few-shot)

`POST /api/v1/examples`

Add a user-curated (question, SQL) pair to the example store.

Scope: flyquery.examples:author

Request

{
  "question": "What is the total revenue by region for Q1 2026?",
  "generated_sql": "SELECT region, SUM(amount) FROM orders WHERE order_date >= '2026-01-01' AND order_date < '2026-04-01' GROUP BY region",
  "dataset_id": "01906f2a-...",
  "quality": "APPROVED"
}

quality=APPROVED immediately enters the retrieval pool. quality=PROPOSED (default for agent-learned examples) waits for approval.

`GET /api/v1/examples`

List examples. Optional: source (USER_CURATED|AGENT_LEARNED), quality (PROPOSED|APPROVED|REJECTED), dataset_id.

`POST /api/v1/examples/{id}:approve`

Promotes PROPOSED → APPROVED. Example now enters the retrieval pool.

`POST /api/v1/examples/{id}:reject`

5.9 Query

`POST /api/v1/query`

Synchronous query. Drains the SSE stream internally; returns only after the full pipeline completes.

Scope: flyquery.query:read

Request

{
  "question": "Show top 10 customers by revenue last month",
  "dataset_id": "01906f2a-...",
  "conversation_id": null,
  "row_cap": 1000,
  "statement_timeout_ms": 30000
}

Response 200 OK

{
  "query_id": "01906f40-...",
  "executed_sql": "SELECT ...",
  "semantic_path_taken": "SYNTHESIS",
  "row_count": 10,
  "elapsed_ms": 2340,
  "clarification_emitted": false,
  "result_preview": [...],
  "result_parquet_url": "https://...",
  "result_url_expires_at": "2026-05-24T12:00:00Z",
  "explanation": "The query joins the orders and customers tables...",
  "chart_hint": "bar"
}

result_parquet_url is a presigned URL valid for FLYQUERY_RESULT_TTL_HOURS (default 24 h). Download the Parquet for the full result set.

`POST /api/v1/query/stream`

SSE stream. Same request body as POST /query. Events arrive as stages complete. See §8 for event names and payloads.

`POST /api/v1/query:explain`

Generate SQL and an EXPLAIN plan without executing.

Response — {generated_sql, explain_plan, grounded_context_summary}

`POST /api/v1/query:validate`

Parse + classify generated SQL, run the AST firewall, return findings. Does not execute.

`POST /api/v1/query:batch` — N questions in one round-trip

Run many NL questions through the full pipeline in parallel on the server side. Per-question failures (one bad grounding, one DuckDB error) DO NOT abort the batch -- the response carries status="OK" or status="FAILED" + error per item, plus aggregate counts.

Use this for dashboards (one batch with N panel queries), comparison reports (same question against M datasets, by sending the same question with different dataset_id), or SDK callers that want to amortise auth + tenant context across many questions.

Request Content-Type: application/json

{
  "queries": [
    {"question": "Top 5 customers by revenue?", "dataset_id": "..."},
    {"question": "Refund rate this quarter?",   "dataset_id": "..."},
    {"question": "Total revenue?",              "dataset_id": "..."}
  ]
}

Response 200 OK

{
  "results": [
    {"index": 0, "status": "OK", "query_id": "...", "sql": "...",
     "execution_status": "OK", "preview": [...], "row_count": 5,
     "elapsed_ms": 4321, "chart_hint": "bar", "explanation": "..."},
    {"index": 1, "status": "FAILED", "error": "no dataset with id ..."}
  ],
  "total_queries": 3,
  "succeeded": 2,
  "failed": 1
}

SDK helpers:

Python: FlyqueryClient.ask_batch(dataset_id, questions).
Java: QueryApi.batch(new BatchQueryRequest().queries(...)).

Request {"sql": "SELECT ..."}

Response — {classification, is_valid, findings: [{severity, message}]}

5.10 Direct SQL

Available only when workspace.allow_direct_sql = true.

Scope: flyquery.sql:execute

`POST /api/v1/sql:execute`

Request

{
  "sql": "SELECT COUNT(*) FROM orders",
  "dataset_id": "01906f2a-...",
  "row_cap": 500
}

Response 200 OK — same shape as POST /query, without the NL explanation.

`POST /api/v1/sql:execute/stream`

SSE variant.

5.11 Conversations

`POST /api/v1/conversations`

Create a conversation session.

Request {"dataset_id": "01906f2a-...", "title": "Revenue analysis"}

Response 201 Created — {conversation_id, created_at}

`GET /api/v1/conversations`

List conversations. Optional: dataset_id.

`GET /api/v1/conversations/{id}`

Fetch conversation with all turns.

`POST /api/v1/conversations/{id}/turn`

Add a drill-down turn. The prior turn's executed_sql, table_qnames, and snapshot_pins are automatically carried into the Grounding agent as context.

Request {"question": "Now break that down by region"}

Response 200 OK — same shape as POST /query, with turn_index added.

5.12 History and ops

`GET /api/v1/audit-events`

Paginated audit-event ledger. Filters: event_type, actor, resource_kind, date_from, date_to, limit, offset. Response is the standard Paginated[AuditEventRead] envelope. Newest first.

Today's writers (more callsites land over time):

Dataset CRUD (dataset.created, dataset.updated, dataset.archived, dataset.purged)
Workspace CRUD (workspace.created, workspace.updated)
Agent-token mint / revoke (agent_token.minted, agent_token.revoked)

Future scope gate: flyquery.audit:read.

`GET /api/v1/cost-events`

Paginated per-call LLM cost ledger. Filters: actor, model, operation, date_from, date_to, limit, offset. Response is Paginated[CostEventRead]. One row per LLM-bearing operation.

Today's writers cover the query pipeline (grounding / generation / critic / explainer) and the ingest pipeline (describe / column-name proposer / relation proposer / rename detector).

Future scope gate: flyquery.billing:read. The GET /api/v1/billing rollup below is the aggregation on top of this raw stream — see billing.md.

The read-side query history, billing rollup, and stats endpoints below are served on top of the flyquery_queries, flyquery_cost_events, and workspace inventory tables. See src/flyquery/web/controllers/queries_controller.py, billing_controller.py, stats_controller.py.

`GET /api/v1/queries`

Paginated query history scoped to the caller's workspace, newest first.

Query params — dataset_id, execution_status, semantic_path_taken, date_from (inclusive), date_to (exclusive), limit (clamped to [1, 200]), offset.

Each row is the compact QueryHistoryItem shape — heavy JSONB columns (candidates, clarification, PII findings) are omitted; use the detail endpoint below for the full payload. Response is the standard Paginated[QueryHistoryItem] envelope.

Source: queries_controller.py:52 (list_queries).

`GET /api/v1/queries/{id}`

Full single-query payload including every candidate proposal, the AST classification, every model identifier used (model_grounding / model_generation / model_critic / model_explainer), PII findings, clarification frame, retries, and the final error envelope if any. Returns QueryDetailRead.

Cross-tenant probing returns 404 (the same response as a missing query), so the endpoint never leaks the existence of another tenant's row.

Source: queries_controller.py:100 (get_query).

`GET /api/v1/queries/{id}/result`

Re-download the preview + a fresh presigned Parquet URL for a previously executed query.

{
  "query_id": "01906f40-...",
  "preview_json": [...],
  "parquet_presigned_url": "https://...",
  "result_byte_size": 204800,
  "ttl_expires_at": "2026-05-24T12:00:00Z"
}

parquet_presigned_url is None once ttl_expires_at has elapsed (default 24h, controlled by FLYQUERY_RESULT_TTL_HOURS) — the consumer must rerun the query to materialise a fresh URL. A presign-time failure (object reclaimed, backend transient outage) also collapses the URL to None while still returning the inline preview.

Source: queries_controller.py:124 (get_query_result).

`GET /api/v1/billing`

Aggregates flyquery_cost_events into day / week / month buckets scoped to the caller's workspace. Consume GET /api/v1/cost-events for the raw per-call ledger; this endpoint just rolls it up.

Query params: period = day (default) | week | month, date_from (inclusive), date_to (exclusive).

Returns BillingRollup with the split ingest_cost_cents / query_cost_cents / other_cost_cents per bucket plus the total_cost_cents sum. Buckets with zero cost are omitted (no empty days). An invalid period returns 400 invalid_request.

{
  "period": "day",
  "date_from": "2026-05-01T00:00:00Z",
  "date_to": "2026-05-24T00:00:00Z",
  "total_cost_cents": 14230,
  "breakdown": [
    {
      "date": "2026-05-23T00:00:00Z",
      "ingest_cost_cents": 200,
      "query_cost_cents": 14030,
      "other_cost_cents": 0,
      "total_cost_cents": 14230
    }
  ]
}

See billing.md and cost-tracking.md for the cost model and which call sites currently write to the ledger. Scope: flyquery.billing:read.

Source: billing_controller.py:32 (rollup).

`GET /api/v1/stats`

Compact workspace summary, returned as WorkspaceStats. Six fields:

{
  "storage_used_bytes": 524288000,
  "dataset_count": 5,
  "table_count": 23,
  "query_count_last_30d": 312,
  "token_count_last_30d": 412380,
  "ingest_job_count_pending": 0
}

Operator traffic, not hot-path — backed by a small set of COUNT(*) queries on demand without caching. See stats.md for field semantics. Scope: flyquery.billing:read (reused).

Source: stats_controller.py:32 (workspace_summary).

`GET /api/v1/version`

Returns {version, git_sha, build_time}.

Health probes — `GET /actuator/health` and siblings

Pyfly's actuator is enabled and exposes Spring-Boot-style health probes. Use these instead of ad-hoc /healthz / /readyz paths:

Path	Purpose
`GET /actuator/health`	Aggregate of every registered `HealthIndicator`. Returns `{"status": "UP" \| "DOWN"}`.
`GET /actuator/health/liveness`	Kubernetes liveness probe (process alive?).
`GET /actuator/health/readiness`	Kubernetes readiness probe (deps reachable?).
`GET /actuator/info`	`{"app": {"name", "version", "description"}}` from `pyfly.yaml`.
`GET /actuator/metrics`	Prometheus-compatible counters.
`GET /actuator/env`	Runtime configuration view (gate behind auth in production).
`GET /admin/*`	Full admin UI surface for development.

These are not under /api/v1/* (deliberate — they're operator endpoints, not tenant-scoped business endpoints) and therefore don't require X-Tenant-Id / X-Workspace-Id.

5.13 Agent tokens

Operator-only. Creates long-lived bearer tokens for agent-tier access.

`POST /api/v1/agent-tokens`

Request

{
  "name": "etl-pipeline",
  "scopes": ["flyquery.files:upload", "flyquery.ingest:read", "flyquery.query:read"],
  "workspace_allowlist": ["analytics"],
  "dataset_allowlist": null,
  "expires_at": null,
  "rate_limit_rpm": 60
}

Response 201 Created

{
  "token_id": "01906f50-...",
  "token": "agt_a1b2c3d4_<32hex>",
  "prefix": "agt_a1b2c3d4",
  "name": "etl-pipeline",
  "scopes": ["flyquery.files:upload", "flyquery.ingest:read", "flyquery.query:read"],
  "created_at": "2026-05-23T12:00:00Z"
}

The full token is returned only once. The server stores the SHA-256 hash. The prefix is used for listing and revocation.

`GET /api/v1/agent-tokens`

List tokens (prefix + metadata only; no hashes).

`DELETE /api/v1/agent-tokens/{id}`

Revoke immediately.

6. Agent-tier endpoints

All agent-tier endpoints mirror their user-tier counterparts under /api/v1/agent/. They accept X-Agent-Token instead of JWT. Writes require Idempotency-Key.

The token's effective grant is the intersection of:

Token scopes (scopes_json)
dataset_allowlist_json (if set)
workspace_allowlist_json (if set)
AST classification of generated SQL
Target table kind (UPLOADED rejects all writes; DERIVED accepts writes only under flyquery.derived:write)

Agent-tier endpoint index

Method	Path	Required scope
`POST`	`/agent/datasets/{id}/files`	`flyquery.files:upload`
`PUT`	`/agent/datasets/{ds}/tables/{id}:upload`	`flyquery.files:upload`
`POST`	`/agent/tables:derive`	`flyquery.derived:write`
`GET`	`/agent/tables/{id}`	`flyquery.schema:read`
`GET`	`/agent/tables/{id}/snapshots`	`flyquery.schema:read`
`GET`	`/agent/tables/{id}/changes`	`flyquery.schema:read`
`PUT`	`/agent/schema-objects/{id}`	`flyquery.schema:annotate`
`GET`	`/agent/schema-objects/{id}/lineage`	`flyquery.lineage:read`
`POST`	`/agent/ingest-jobs`	`flyquery.ingest:run`
`GET`	`/agent/ingest-jobs/{id}`	`flyquery.ingest:read`
`GET`	`/agent/ingest-jobs/{id}/stream`	`flyquery.ingest:read`
`POST`	`/agent/semantic/metrics`	`flyquery.semantic:author`
`POST`	`/agent/examples`	`flyquery.examples:author`
`GET`	`/agent/examples`	`flyquery.examples:read`
`POST`	`/agent/query`	`flyquery.query:read`
`POST`	`/agent/query/stream`	`flyquery.query:read`
`POST`	`/agent/query:explain`	`flyquery.query:read`
`POST`	`/agent/query:validate`	`flyquery.query:read`
`POST`	`/agent/sql:execute`	`flyquery.sql:execute` + `workspace.allow_direct_sql=true`
`POST`	`/agent/sql:execute/stream`	`flyquery.sql:execute`

All agent-tier endpoints also accept the same X-Tenant-Id, X-Workspace-Id, and X-Correlation-Id headers.

7. Scope catalog

23 scopes. A token carries a subset; the effective grant is the intersection of token scopes + workspace/dataset allowlists.

Scope	Grants access to
`flyquery.datasets:read`	`GET /datasets`, `GET /datasets/{id}`
`flyquery.datasets:write`	`POST /datasets`, `PUT /datasets/{id}`, `DELETE /datasets/{id}`
`flyquery.files:upload`	`POST /datasets/{id}/files`, `PUT /datasets/{ds}/tables/{id}:upload`
`flyquery.files:read`	`GET /files/{id}`, `GET /datasets/{id}/files`
`flyquery.schema:read`	`GET /tables/*`, `GET /schema-objects:search`, samples + profiles
`flyquery.schema:annotate`	`PUT /schema-objects/{id}`, type-hint approval, rename-merge
`flyquery.relations:read`	`GET /datasets/{id}/relations`
`flyquery.relations:write`	`POST /relations`, approve/reject relation
`flyquery.semantic:read`	`GET /semantic/metrics`, `GET /semantic/dimensions`, `GET /glossary`
`flyquery.semantic:author`	Create/edit metrics, dimensions, glossary; publish/retire
`flyquery.examples:read`	`GET /examples` (APPROVED only)
`flyquery.examples:author`	`POST /examples`, approve/reject examples
`flyquery.query:read`	`POST /query`, `/query/stream`, `/query:explain`, `/query:validate`
`flyquery.derived:write`	`POST /tables:derive` + DML on DERIVED tables
`flyquery.sql:execute`	`POST /sql:execute` (requires `workspace.allow_direct_sql=true`)
`flyquery.conversations:*`	All conversation endpoints
`flyquery.ingest:read`	`GET /ingest-jobs`, `GET /ingest-jobs/{id}`, stream
`flyquery.ingest:run`	`POST /ingest-jobs`, `POST /ingest-jobs/{id}:cancel`
`flyquery.lineage:read`	`GET /schema-objects/{id}/lineage`
`flyquery.audit:read`	`GET /audit`
`flyquery.billing:read`	`GET /billing`, `GET /stats`
`flyquery.agent_tokens:write`	`POST /agent-tokens`, `DELETE /agent-tokens/{id}`
`*`	Operator wildcard; grants everything. Never set on externally-minted tokens

8. SSE event reference

Query stream (`POST /query/stream`, `POST /agent/query/stream`)

Event name	Payload fields
`schema_linked`	`grounded_context_summary`, `semantic_path`, `missing_info`, `candidate_table_count`
`clarification`	`questions: [str]`, `reasons: [str]` — emitted alongside the answer, non-blocking
`sql_generated`	`candidate_count`, `chosen_index`, `candidate_summaries`
`executed`	`row_count`, `elapsed_ms`, `retried_after_error`, `snapshot_pins: {table_id: snapshot_id}`
`explained`	`summary`, `chart_hint`
`final`	Full `AnswerResponse` JSON including presigned URL
`error`	RFC 7807 envelope

Ingest stream (`GET /ingest-jobs/{id}/stream`)

Event name	Payload fields
`received`	`file_id`, `file_format`, `compression`, `size_bytes`
`parsed`	`tables_extracted: [{table_id, name, n_columns, n_rows_estimate}]`
`reconciled`	`table_id`, `n_added`, `n_removed`, `n_type_changed`, `n_renamed_candidate`
`sampled`	`table_id`, `columns_sampled`, `columns_skipped`
`profiled`	`table_id`, `tables_profiled`, `tables_skipped`
`relations_proposed`	`heuristic`, `agent_proposed`, `dataset_total_relations`
`described`	`table_id`, `objects_described`, `budget_remaining_cents`
`pii_tagged`	`table_id`, `tags_set`, `redactions`, `late_redactions`
`embedded`	`table_id`, `embeddings_written`
`snapshot_ready`	`table_id`, `snapshot_id`, `snapshot_hash`, `parquet_byte_size`
`final`	Full `IngestJobResult` JSON
`error`	RFC 7807 envelope

SQL execute stream (`POST /sql:execute/stream`)

Event name	Payload fields
`ast_classified`	`classification`, `single_statement`, `table_refs: []`
`executed`	`row_count`, `elapsed_ms`, `snapshot_pins`
`final`	Full `SqlExecuteResponse` JSON
`error`	RFC 7807 envelope

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

flyquery — API Reference

Table of Contents

1. Overview

2. Authentication

User-tier

Agent-tier

3. Common headers

4. Error envelope

5. User-tier endpoints

5.1 Workspaces

POST /api/v1/workspaces

GET /api/v1/workspaces

GET /api/v1/workspaces/{id}

PUT /api/v1/workspaces/{id}

DELETE /api/v1/workspaces/{id}:purge

5.2 Datasets

POST /api/v1/datasets

GET /api/v1/datasets

GET /api/v1/datasets/{id}

PUT /api/v1/datasets/{id}

DELETE /api/v1/datasets/{id}

5.3 Files and tables

POST /api/v1/datasets/{id}/files — upload a file

POST /api/v1/datasets/{id}/files:bulk — upload many files in one request

POST /api/v1/datasets/{id}/files:async — async upload

GET /api/v1/datasets/{id}/files

GET /api/v1/files/{id}

DELETE /api/v1/files/{id}

GET /api/v1/datasets/{id}/tables

GET /api/v1/tables/{id}

PUT /api/v1/tables/{id}

PUT /api/v1/datasets/{ds}/tables/{id}:upload — re-upload

DELETE /api/v1/tables/{id}

POST /api/v1/tables:derive

GET /api/v1/tables/{id}/snapshots

GET /api/v1/tables/{id}/snapshots/{snap_id}

GET /api/v1/tables/{id}/changes

5.4 Ingest jobs

POST /api/v1/ingest-jobs

GET /api/v1/ingest-jobs

GET /api/v1/ingest-jobs/{id}

GET /api/v1/ingest-jobs/{id}/stream

GET /api/v1/ingest-jobs/{id}/events

POST /api/v1/ingest-jobs/{id}:cancel

5.5 Schema annotation

PUT /api/v1/schema-objects/{id}

POST /api/v1/schema-objects/{id}:sample

POST /api/v1/schema-objects/{id}:profile

POST /api/v1/schema-objects/{id}/type-hint

GET /api/v1/schema-objects:search

GET /api/v1/schema-objects/{id}/lineage

POST /api/v1/schema-objects:rename-merge

POST /api/v1/schema-changes/{id}:confirm

5.6 Relations

GET /api/v1/datasets/{id}/relations

POST /api/v1/datasets/{id}/relations

POST /api/v1/datasets/{id}/relations/{rel_id}:approve

POST /api/v1/datasets/{id}/relations/{rel_id}:reject

DELETE /api/v1/datasets/{id}/relations/{rel_id}

5.7 Semantic layer

POST /api/v1/semantic/metrics

GET /api/v1/semantic/metrics

GET /api/v1/semantic/metrics/{id}

PUT /api/v1/semantic/metrics/{id}

POST /api/v1/semantic/metrics/{id}:publish

POST /api/v1/semantic/metrics/{id}:retire

GET /api/v1/semantic/metrics/{id}/history

POST /api/v1/glossary

GET /api/v1/glossary

PUT /api/v1/glossary/{id}

DELETE /api/v1/glossary/{id}

5.8 Examples (few-shot)

POST /api/v1/examples

GET /api/v1/examples

`POST /api/v1/workspaces`

`GET /api/v1/workspaces`

`GET /api/v1/workspaces/{id}`

`PUT /api/v1/workspaces/{id}`

`DELETE /api/v1/workspaces/{id}:purge`

`POST /api/v1/datasets`

`GET /api/v1/datasets`

`GET /api/v1/datasets/{id}`

`PUT /api/v1/datasets/{id}`

`DELETE /api/v1/datasets/{id}`

`POST /api/v1/datasets/{id}/files` — upload a file

`POST /api/v1/datasets/{id}/files:bulk` — upload many files in one request

`POST /api/v1/datasets/{id}/files:async` — async upload

`GET /api/v1/datasets/{id}/files`

`GET /api/v1/files/{id}`

`DELETE /api/v1/files/{id}`

`GET /api/v1/datasets/{id}/tables`

`GET /api/v1/tables/{id}`

`PUT /api/v1/tables/{id}`

`PUT /api/v1/datasets/{ds}/tables/{id}:upload` — re-upload

`DELETE /api/v1/tables/{id}`

`POST /api/v1/tables:derive`

`GET /api/v1/tables/{id}/snapshots`

`GET /api/v1/tables/{id}/snapshots/{snap_id}`

`GET /api/v1/tables/{id}/changes`

`POST /api/v1/ingest-jobs`

`GET /api/v1/ingest-jobs`

`GET /api/v1/ingest-jobs/{id}`

`GET /api/v1/ingest-jobs/{id}/stream`

`GET /api/v1/ingest-jobs/{id}/events`

`POST /api/v1/ingest-jobs/{id}:cancel`

`PUT /api/v1/schema-objects/{id}`

`POST /api/v1/schema-objects/{id}:sample`

`POST /api/v1/schema-objects/{id}:profile`

`POST /api/v1/schema-objects/{id}/type-hint`

`GET /api/v1/schema-objects:search`

`GET /api/v1/schema-objects/{id}/lineage`

`POST /api/v1/schema-objects:rename-merge`

`POST /api/v1/schema-changes/{id}:confirm`

`GET /api/v1/datasets/{id}/relations`

`POST /api/v1/datasets/{id}/relations`

`POST /api/v1/datasets/{id}/relations/{rel_id}:approve`

`POST /api/v1/datasets/{id}/relations/{rel_id}:reject`

`DELETE /api/v1/datasets/{id}/relations/{rel_id}`

`POST /api/v1/semantic/metrics`

`GET /api/v1/semantic/metrics`

`GET /api/v1/semantic/metrics/{id}`

`PUT /api/v1/semantic/metrics/{id}`

`POST /api/v1/semantic/metrics/{id}:publish`

`POST /api/v1/semantic/metrics/{id}:retire`

`GET /api/v1/semantic/metrics/{id}/history`

`POST /api/v1/glossary`

`GET /api/v1/glossary`

`PUT /api/v1/glossary/{id}`

`DELETE /api/v1/glossary/{id}`

`POST /api/v1/examples`

`GET /api/v1/examples`

`POST /api/v1/examples/{id}:approve`

`POST /api/v1/examples/{id}:reject`

`POST /api/v1/query`

`POST /api/v1/query/stream`

`POST /api/v1/query:explain`

`POST /api/v1/query:validate`

`POST /api/v1/query:batch` — N questions in one round-trip

`POST /api/v1/sql:execute`

`POST /api/v1/sql:execute/stream`

`POST /api/v1/conversations`

`GET /api/v1/conversations`

`GET /api/v1/conversations/{id}`

`POST /api/v1/conversations/{id}/turn`

`GET /api/v1/audit-events`

`GET /api/v1/cost-events`

`GET /api/v1/queries`

`GET /api/v1/queries/{id}`

`GET /api/v1/queries/{id}/result`

`GET /api/v1/billing`

`GET /api/v1/stats`

`GET /api/v1/version`

Health probes — `GET /actuator/health` and siblings

`POST /api/v1/agent-tokens`