Skip to content

Latest commit

 

History

History
1267 lines (884 loc) · 34.3 KB

File metadata and controls

1267 lines (884 loc) · 34.3 KB

flyquery — API Reference

Table of Contents

  1. Overview
  2. Authentication
  3. Common headers
  4. Error envelope
  5. User-tier endpoints
  6. Agent-tier endpoints
  7. Scope catalog
  8. SSE event reference

1. Overview

  • Base URL: https://api.flyservices.io (production) / http://localhost:8520 (dev)
  • Versioning: all endpoints live under /api/v1/
  • Format: JSON request and response bodies; Content-Type: application/json
  • Upload: multipart/form-data for file upload endpoints
  • Streaming: Server-Sent Events (SSE) on */stream endpoints

The full machine-readable spec is in openapi.json at the repository root. Use it as the source of truth for all field types, required/optional markers, and response schemas. This document covers representative examples and important behavioral notes.


2. Authentication

User-tier

JWT bearer token. Pass in the Authorization header:

Authorization: Bearer <jwt>

JWT must carry tenant_id and workspace_id claims, or the request must include X-Tenant-Id and X-Workspace-Id headers.

Agent-tier

Long-lived agent token. Pass in the X-Agent-Token header:

X-Agent-Token: agt_a1b2c3d4_<32hex>

Every write to an agent-tier endpoint must also include:

Idempotency-Key: <unique-string>

The server deduplicates by (tenant_id, workspace_id, key) for a default 24-hour window. Replaying the same key within the window returns the original response.


3. Common headers

Header Required Notes
X-Tenant-Id Yes Tenant slug, e.g. acme
X-Workspace-Id Yes Workspace slug, e.g. analytics
X-Correlation-Id No Trace identifier; autogenerated if absent, echoed in response
Idempotency-Key Writes only Required on all agent-tier writes

4. Error envelope

All errors use RFC 7807 Problem Details:

{
  "type": "https://firefly.dev/problems/resource_not_found",
  "title": "Resource not found",
  "status": 404,
  "code": "resource_not_found",
  "detail": "Dataset 'abc' does not exist in workspace 'analytics'.",
  "errors": [
    {"path": "dataset_id", "message": "No dataset with this ID"}
  ]
}

Branch on code (stable) not title (human-readable). Common codes:

HTTP Code Meaning
400 validation_error Request body failed validation
401 unauthorized Missing or expired JWT/agent token
403 forbidden Token lacks required scope
404 resource_not_found Entity does not exist
409 conflict Idempotency replay collision or unique constraint
422 unprocessable Structurally valid but semantically rejected (e.g., direct SQL disabled)
429 rate_limit_exceeded Agent token RPM limit exhausted
500 internal_error Unexpected server error

5. User-tier endpoints

5.1 Workspaces

POST /api/v1/workspaces

Create a workspace.

Scope: authenticated user.

Request

{
  "slug": "analytics",
  "name": "Analytics",
  "kms_key_uri": null,
  "retention_days": null,
  "allow_direct_sql": false,
  "default_locale": "en-US"
}

Response 201 Created

{
  "id": "01906f2a-...",
  "slug": "analytics",
  "name": "Analytics",
  "allow_direct_sql": false,
  "default_locale": "en-US",
  "storage_used_bytes": 0,
  "created_at": "2026-05-23T12:00:00Z"
}

GET /api/v1/workspaces

List workspaces for the tenant.

Response 200 OK — array of workspace summaries.


GET /api/v1/workspaces/{id}

Fetch one workspace.


PUT /api/v1/workspaces/{id}

Update kms_key_uri, retention_days, allow_direct_sql, default_locale.


DELETE /api/v1/workspaces/{id}:purge

Schedule hard deletion. A 30-day soft tombstone is retained before bytes are removed. In-flight ingestion and queries are cooperatively cancelled first.


5.2 Datasets

POST /api/v1/datasets

Request

{
  "name": "sales-2026",
  "description": "Annual sales export",
  "drift_policy": "AUTO",
  "ingest_policy_json": {}
}

Response 201 CreatedDatasetResponse


GET /api/v1/datasets

List datasets. Optional query params: status (ACTIVE|ARCHIVED), page, page_size.


GET /api/v1/datasets/{id}


PUT /api/v1/datasets/{id}

Update drift_policy, ingest_policy_json, default_locale, description.


DELETE /api/v1/datasets/{id}

Soft-archive. Does not remove Parquet blobs; those are removed on workspace purge or when retention_days expires.


5.3 Files and tables

POST /api/v1/datasets/{id}/files — upload a file

Multipart upload. The server detects format from magic bytes + extension.

Request Content-Type: multipart/form-data

  • field file: the binary content
  • field original_filename: string (optional; defaults to part filename)
  • field table_extraction_rules_json: JSON string (optional)

Response 202 Accepted

{
  "file_id": "01906f2b-...",
  "ingest_job_id": "01906f2c-...",
  "tables": [
    {
      "table_id": "01906f2d-...",
      "qualified_name": "sales-2026.orders",
      "sheet_or_json_path": null,
      "n_rows_estimate": 15000
    }
  ]
}

An XLSX with 3 sheets returns 3 entries in tables. Poll or stream the ingest_job_id to track pipeline progress.


POST /api/v1/datasets/{id}/files:bulk — upload many files in one request

Multi-file multipart variant of the single upload. Every files part is processed through the same per-file pipeline (receive → parse → reconcile → sample → profile → describe → embed → publish) and the per-file calls run in parallel through asyncio.gather — ingesting N files takes roughly the wall-clock of one.

Per-file failures DO NOT abort the bulk. Each result carries status="OK" + file_id + tables, or status="FAILED" + error.

Request Content-Type: multipart/form-data

  • N files parts (each a binary file); a singleton file part is also accepted for client compatibility

Response 201 Created

{
  "results": [
    {"index": 0, "original_filename": "customers.csv", "status": "OK",
     "file_id": "...", "tables": [{"table_id":"...","name":"customers", ...}]},
    {"index": 1, "original_filename": "broken.csv", "status": "FAILED",
     "file_id": null, "tables": [], "error": "could not sniff dialect"}
  ],
  "total_files": 2,
  "succeeded": 1,
  "failed": 1
}

SDK helpers:

  • Python: FlyqueryClient.upload_bulk(dataset_id, paths) and FlyqueryClient.upload_directory(dataset_id, dir).
  • Java: WebClient against :bulk with a MultipartBodyBuilder — see sdks/java/examples/Demo.java.

POST /api/v1/datasets/{id}/files:async — async upload

Like POST /datasets/{id}/files but the response is a 202 Accepted with a job id. Stage 1 (receive: caps check, hash, format detect, store bytes, write flyquery_files row, track workspace storage) runs synchronously before responding — file_id is final and the content hash provides natural per-file dedup. Stages 2-10 of the pipeline are queued as a PARSE_AND_INGEST ingest job that the IngestWorker consumes (the worker honours an already_received flag on the job so it does NOT re-run Stage 1; see async-ingest.md and workers.md for the worker side).

Request Content-Type: multipart/form-data — one file part.

Response 202 Accepted with Location: /api/v1/ingest-jobs/{job_id}

{
  "job_id": "01906f2c-...",
  "file_id": "01906f2b-...",
  "dataset_id": "01906f2a-...",
  "status": "PENDING"
}

Use in place of the synchronous endpoint when a single file exceeds the request-timeout budget of the deployment (typically anything above a few MB once cold-cache LLM describe calls land).

Source: files_controller.py:249 (upload_file_async).


GET /api/v1/datasets/{id}/files

List files for a dataset.


GET /api/v1/files/{id}

Fetch file metadata (no binary download; original blob is retained internally).


DELETE /api/v1/files/{id}

Marks the file as DELETED. Does not purge the Parquet snapshot for tables already derived from this file.


GET /api/v1/datasets/{id}/tables

List tables. Optional: kind (UPLOADED|DERIVED), is_active.


GET /api/v1/tables/{id}

Response includes current_snapshot_id, n_rows_estimate, n_columns, description, kind, sheet_or_json_path.


PUT /api/v1/tables/{id}

Update name, description, locale_override.


PUT /api/v1/datasets/{ds}/tables/{id}:upload — re-upload

Re-upload a file into an existing table slot. Creates a new Snapshot with the same table_id. Annotations carry forward via (qualified_name, column_name). Ambiguous column renames are flagged as RENAMED_CANDIDATE.

Request: same multipart as POST /datasets/{id}/files.

Response 202 Accepted{ingest_job_id, snapshot_id}


DELETE /api/v1/tables/{id}

Soft-delete. Sets is_active=false.


POST /api/v1/tables:derive

Materialise a new derived table from a SQL SELECT statement.

Scope: flyquery.derived:write

Request

{
  "name": "orders_summary",
  "dataset_id": "01906f2a-...",
  "sql": "SELECT region, SUM(amount) AS total FROM orders GROUP BY region"
}

Response 202 Accepted{table_id, ingest_job_id}

Derived tables accept INSERT/UPDATE/DELETE under flyquery.derived:write scope. Ingested (UPLOADED) tables are always read-only.


GET /api/v1/tables/{id}/snapshots

List all snapshots for a table, most recent first.


GET /api/v1/tables/{id}/snapshots/{snap_id}

Fetch one snapshot: status, column count, row count, Parquet byte size.


GET /api/v1/tables/{id}/changes

Paginated diff log (flyquery_schema_changes). Shows ADDED, REMOVED, TYPE_CHANGED, RENAMED, RENAMED_CANDIDATE events across re-uploads.


5.4 Ingest jobs

POST /api/v1/ingest-jobs

Start an ingestion job explicitly.

Request

{
  "job_kind": "DESCRIBE_PASS",
  "dataset_id": "01906f2a-...",
  "table_id": "01906f2d-..."
}

job_kind options:

  • PARSE_AND_INGEST — full pipeline; used by file upload
  • REPARSE — re-run parse from the existing blob
  • SAMPLE_REFRESH — re-collect samples for changed columns
  • DESCRIBE_PASS — run DescribeAgent on objects still lacking descriptions
  • RELATION_PASS — re-run heuristic + RelationProposerAgent across the dataset

Response 202 Accepted{ingest_job_id}


GET /api/v1/ingest-jobs

List jobs. Query params: status, job_kind, dataset_id, table_id, page, page_size.


GET /api/v1/ingest-jobs/{id}

Fetch full job detail: status, attempts, result_json, cost_cents.


GET /api/v1/ingest-jobs/{id}/stream

SSE stream of pipeline-stage events. See §8 SSE event reference.


GET /api/v1/ingest-jobs/{id}/events

Paginated ledger of all flyquery_ingest_events rows for this job.


POST /api/v1/ingest-jobs/{id}:cancel

Cooperative cancel. Sets status=CANCELLED and signals the worker to stop after the current stage completes.


5.5 Schema annotation

PUT /api/v1/schema-objects/{id}

Edit metadata on a TABLE or COLUMN schema object.

Request

{
  "description": "Gross revenue before tax, in USD cents",
  "pii_tag": "NONE",
  "business_owner": "finance-team",
  "synonyms_json": ["revenue", "total_revenue"],
  "governance_json": {}
}

Setting pii_tag to a non-NONE value on a column with existing samples triggers automatic sample scrubbing in the same transaction.


POST /api/v1/schema-objects/{id}:sample

Refresh sample_values_json for this column.


POST /api/v1/schema-objects/{id}:profile

Refresh profile_json (null_fraction, distinct_estimate, top_values, min/max).


POST /api/v1/schema-objects/{id}/type-hint

Apply an LLM-suggested derived virtual type to a column. Marks description_source=AGENT. This is the approval step for type hints proposed during ingestion.


GET /api/v1/schema-objects:search

Hybrid BM25 + vector search over the schema knowledge base.

Query params: q (required), dataset_id, table_id, kind (TABLE|COLUMN), top_k (default 10).

Response — ranked list of schema objects with score and snippet.


GET /api/v1/schema-objects/{id}/lineage

Upstream lineage (source file) and downstream provenance (derived tables).


POST /api/v1/schema-objects:rename-merge

Transplant annotations from one schema object to another, used when a column rename is confirmed out-of-band.

Request

{
  "from_id": "01906f30-...",
  "to_id": "01906f31-..."
}

POST /api/v1/schema-changes/{id}:confirm

Flip a RENAMED_CANDIDATE schema-change row to RENAMED. Triggers annotation transplant from the old column object to the new one.


5.6 Relations

GET /api/v1/datasets/{id}/relations

List proposed + approved + rejected join relations for a dataset. Optional: status (PROPOSED|APPROVED|REJECTED), kind.


POST /api/v1/datasets/{id}/relations

Add a manual join relation.

Scope: flyquery.relations:write

Request

{
  "from_table_id": "01906f2d-...",
  "from_column_name": "customer_id",
  "to_table_id": "01906f2e-...",
  "to_column_name": "id",
  "condition": null,
  "kind": "MANUAL"
}

POST /api/v1/datasets/{id}/relations/{rel_id}:approve

Promotes PROPOSEDAPPROVED, kind=MANUAL. This relation now enters the retrieval pool for query grounding.


POST /api/v1/datasets/{id}/relations/{rel_id}:reject

Sets status=REJECTED. Rejected relations are excluded from retrieval.


DELETE /api/v1/datasets/{id}/relations/{rel_id}

Permanently removes a relation row.


5.7 Semantic layer

POST /api/v1/semantic/metrics

Scope: flyquery.semantic:author

Request

{
  "name": "total_revenue",
  "label": "Total Revenue",
  "description": "Sum of order_amount across all completed orders",
  "metric_type": "SIMPLE",
  "dataset_id": "01906f2a-...",
  "definition_yaml": "metric:\n  name: total_revenue\n  type: simple\n  type_params:\n    measure:\n      name: order_amount\n      agg: sum\n  filter: \"order_status = 'COMPLETED'\"\n"
}

The definition_yaml is validated against the MetricFlow schema. On success, compiled_sql_template is populated deterministically. See semantic-layer.md for the YAML schema and compilation rules.

Response 201 CreatedSemanticMetricResponse


GET /api/v1/semantic/metrics

List metrics. Optional: dataset_id, status (DRAFT|PUBLISHED|RETIRED).


GET /api/v1/semantic/metrics/{id}


PUT /api/v1/semantic/metrics/{id}

Update definition_yaml, label, description. Recompiles on change.


POST /api/v1/semantic/metrics/{id}:publish

Moves status from DRAFTPUBLISHED. Published metrics enter the query-grounding retrieval pool.


POST /api/v1/semantic/metrics/{id}:retire

Moves status to RETIRED. Retired metrics are excluded from retrieval.


GET /api/v1/semantic/metrics/{id}/history

Immutable version history from flyquery_semantic_versions.


The same CRUD + publish/retire/history endpoints exist for dimensions: /api/v1/semantic/dimensions.

POST /api/v1/glossary

Scope: flyquery.semantic:author

Request

{
  "term": "ARR",
  "definition": "Annual Recurring Revenue",
  "synonyms": ["annual recurring revenue"]
}

Glossary terms are workspace-scoped (not per-dataset) and enter the grounding retrieval pool for all datasets in the workspace.


GET /api/v1/glossary

PUT /api/v1/glossary/{id}

DELETE /api/v1/glossary/{id}


5.8 Examples (few-shot)

POST /api/v1/examples

Add a user-curated (question, SQL) pair to the example store.

Scope: flyquery.examples:author

Request

{
  "question": "What is the total revenue by region for Q1 2026?",
  "generated_sql": "SELECT region, SUM(amount) FROM orders WHERE order_date >= '2026-01-01' AND order_date < '2026-04-01' GROUP BY region",
  "dataset_id": "01906f2a-...",
  "quality": "APPROVED"
}

quality=APPROVED immediately enters the retrieval pool. quality=PROPOSED (default for agent-learned examples) waits for approval.


GET /api/v1/examples

List examples. Optional: source (USER_CURATED|AGENT_LEARNED), quality (PROPOSED|APPROVED|REJECTED), dataset_id.


POST /api/v1/examples/{id}:approve

Promotes PROPOSEDAPPROVED. Example now enters the retrieval pool.


POST /api/v1/examples/{id}:reject


5.9 Query

POST /api/v1/query

Synchronous query. Drains the SSE stream internally; returns only after the full pipeline completes.

Scope: flyquery.query:read

Request

{
  "question": "Show top 10 customers by revenue last month",
  "dataset_id": "01906f2a-...",
  "conversation_id": null,
  "row_cap": 1000,
  "statement_timeout_ms": 30000
}

Response 200 OK

{
  "query_id": "01906f40-...",
  "executed_sql": "SELECT ...",
  "semantic_path_taken": "SYNTHESIS",
  "row_count": 10,
  "elapsed_ms": 2340,
  "clarification_emitted": false,
  "result_preview": [...],
  "result_parquet_url": "https://...",
  "result_url_expires_at": "2026-05-24T12:00:00Z",
  "explanation": "The query joins the orders and customers tables...",
  "chart_hint": "bar"
}

result_parquet_url is a presigned URL valid for FLYQUERY_RESULT_TTL_HOURS (default 24 h). Download the Parquet for the full result set.


POST /api/v1/query/stream

SSE stream. Same request body as POST /query. Events arrive as stages complete. See §8 for event names and payloads.


POST /api/v1/query:explain

Generate SQL and an EXPLAIN plan without executing.

Response{generated_sql, explain_plan, grounded_context_summary}


POST /api/v1/query:validate

Parse + classify generated SQL, run the AST firewall, return findings. Does not execute.


POST /api/v1/query:batch — N questions in one round-trip

Run many NL questions through the full pipeline in parallel on the server side. Per-question failures (one bad grounding, one DuckDB error) DO NOT abort the batch -- the response carries status="OK" or status="FAILED" + error per item, plus aggregate counts.

Use this for dashboards (one batch with N panel queries), comparison reports (same question against M datasets, by sending the same question with different dataset_id), or SDK callers that want to amortise auth + tenant context across many questions.

Request Content-Type: application/json

{
  "queries": [
    {"question": "Top 5 customers by revenue?", "dataset_id": "..."},
    {"question": "Refund rate this quarter?",   "dataset_id": "..."},
    {"question": "Total revenue?",              "dataset_id": "..."}
  ]
}

Response 200 OK

{
  "results": [
    {"index": 0, "status": "OK", "query_id": "...", "sql": "...",
     "execution_status": "OK", "preview": [...], "row_count": 5,
     "elapsed_ms": 4321, "chart_hint": "bar", "explanation": "..."},
    {"index": 1, "status": "FAILED", "error": "no dataset with id ..."}
  ],
  "total_queries": 3,
  "succeeded": 2,
  "failed": 1
}

SDK helpers:

  • Python: FlyqueryClient.ask_batch(dataset_id, questions).
  • Java: QueryApi.batch(new BatchQueryRequest().queries(...)).

Request {"sql": "SELECT ..."}

Response{classification, is_valid, findings: [{severity, message}]}


5.10 Direct SQL

Available only when workspace.allow_direct_sql = true.

Scope: flyquery.sql:execute

POST /api/v1/sql:execute

Request

{
  "sql": "SELECT COUNT(*) FROM orders",
  "dataset_id": "01906f2a-...",
  "row_cap": 500
}

Response 200 OK — same shape as POST /query, without the NL explanation.


POST /api/v1/sql:execute/stream

SSE variant.


5.11 Conversations

POST /api/v1/conversations

Create a conversation session.

Request {"dataset_id": "01906f2a-...", "title": "Revenue analysis"}

Response 201 Created{conversation_id, created_at}


GET /api/v1/conversations

List conversations. Optional: dataset_id.


GET /api/v1/conversations/{id}

Fetch conversation with all turns.


POST /api/v1/conversations/{id}/turn

Add a drill-down turn. The prior turn's executed_sql, table_qnames, and snapshot_pins are automatically carried into the Grounding agent as context.

Request {"question": "Now break that down by region"}

Response 200 OK — same shape as POST /query, with turn_index added.


5.12 History and ops

GET /api/v1/audit-events

Paginated audit-event ledger. Filters: event_type, actor, resource_kind, date_from, date_to, limit, offset. Response is the standard Paginated[AuditEventRead] envelope. Newest first.

Today's writers (more callsites land over time):

  • Dataset CRUD (dataset.created, dataset.updated, dataset.archived, dataset.purged)
  • Workspace CRUD (workspace.created, workspace.updated)
  • Agent-token mint / revoke (agent_token.minted, agent_token.revoked)

Future scope gate: flyquery.audit:read.


GET /api/v1/cost-events

Paginated per-call LLM cost ledger. Filters: actor, model, operation, date_from, date_to, limit, offset. Response is Paginated[CostEventRead]. One row per LLM-bearing operation.

Today's writers cover the query pipeline (grounding / generation / critic / explainer) and the ingest pipeline (describe / column-name proposer / relation proposer / rename detector).

Future scope gate: flyquery.billing:read. The GET /api/v1/billing rollup below is the aggregation on top of this raw stream — see billing.md.


The read-side query history, billing rollup, and stats endpoints below are served on top of the flyquery_queries, flyquery_cost_events, and workspace inventory tables. See src/flyquery/web/controllers/queries_controller.py, billing_controller.py, stats_controller.py.

GET /api/v1/queries

Paginated query history scoped to the caller's workspace, newest first.

Query paramsdataset_id, execution_status, semantic_path_taken, date_from (inclusive), date_to (exclusive), limit (clamped to [1, 200]), offset.

Each row is the compact QueryHistoryItem shape — heavy JSONB columns (candidates, clarification, PII findings) are omitted; use the detail endpoint below for the full payload. Response is the standard Paginated[QueryHistoryItem] envelope.

Source: queries_controller.py:52 (list_queries).


GET /api/v1/queries/{id}

Full single-query payload including every candidate proposal, the AST classification, every model identifier used (model_grounding / model_generation / model_critic / model_explainer), PII findings, clarification frame, retries, and the final error envelope if any. Returns QueryDetailRead.

Cross-tenant probing returns 404 (the same response as a missing query), so the endpoint never leaks the existence of another tenant's row.

Source: queries_controller.py:100 (get_query).


GET /api/v1/queries/{id}/result

Re-download the preview + a fresh presigned Parquet URL for a previously executed query.

{
  "query_id": "01906f40-...",
  "preview_json": [...],
  "parquet_presigned_url": "https://...",
  "result_byte_size": 204800,
  "ttl_expires_at": "2026-05-24T12:00:00Z"
}

parquet_presigned_url is None once ttl_expires_at has elapsed (default 24h, controlled by FLYQUERY_RESULT_TTL_HOURS) — the consumer must rerun the query to materialise a fresh URL. A presign-time failure (object reclaimed, backend transient outage) also collapses the URL to None while still returning the inline preview.

Source: queries_controller.py:124 (get_query_result).


GET /api/v1/billing

Aggregates flyquery_cost_events into day / week / month buckets scoped to the caller's workspace. Consume GET /api/v1/cost-events for the raw per-call ledger; this endpoint just rolls it up.

Query params: period = day (default) | week | month, date_from (inclusive), date_to (exclusive).

Returns BillingRollup with the split ingest_cost_cents / query_cost_cents / other_cost_cents per bucket plus the total_cost_cents sum. Buckets with zero cost are omitted (no empty days). An invalid period returns 400 invalid_request.

{
  "period": "day",
  "date_from": "2026-05-01T00:00:00Z",
  "date_to": "2026-05-24T00:00:00Z",
  "total_cost_cents": 14230,
  "breakdown": [
    {
      "date": "2026-05-23T00:00:00Z",
      "ingest_cost_cents": 200,
      "query_cost_cents": 14030,
      "other_cost_cents": 0,
      "total_cost_cents": 14230
    }
  ]
}

See billing.md and cost-tracking.md for the cost model and which call sites currently write to the ledger. Scope: flyquery.billing:read.

Source: billing_controller.py:32 (rollup).


GET /api/v1/stats

Compact workspace summary, returned as WorkspaceStats. Six fields:

{
  "storage_used_bytes": 524288000,
  "dataset_count": 5,
  "table_count": 23,
  "query_count_last_30d": 312,
  "token_count_last_30d": 412380,
  "ingest_job_count_pending": 0
}

Operator traffic, not hot-path — backed by a small set of COUNT(*) queries on demand without caching. See stats.md for field semantics. Scope: flyquery.billing:read (reused).

Source: stats_controller.py:32 (workspace_summary).


GET /api/v1/version

Returns {version, git_sha, build_time}.


Health probes — GET /actuator/health and siblings

Pyfly's actuator is enabled and exposes Spring-Boot-style health probes. Use these instead of ad-hoc /healthz / /readyz paths:

Path Purpose
GET /actuator/health Aggregate of every registered HealthIndicator. Returns {"status": "UP" | "DOWN"}.
GET /actuator/health/liveness Kubernetes liveness probe (process alive?).
GET /actuator/health/readiness Kubernetes readiness probe (deps reachable?).
GET /actuator/info {"app": {"name", "version", "description"}} from pyfly.yaml.
GET /actuator/metrics Prometheus-compatible counters.
GET /actuator/env Runtime configuration view (gate behind auth in production).
GET /admin/* Full admin UI surface for development.

These are not under /api/v1/* (deliberate — they're operator endpoints, not tenant-scoped business endpoints) and therefore don't require X-Tenant-Id / X-Workspace-Id.


5.13 Agent tokens

Operator-only. Creates long-lived bearer tokens for agent-tier access.

POST /api/v1/agent-tokens

Request

{
  "name": "etl-pipeline",
  "scopes": ["flyquery.files:upload", "flyquery.ingest:read", "flyquery.query:read"],
  "workspace_allowlist": ["analytics"],
  "dataset_allowlist": null,
  "expires_at": null,
  "rate_limit_rpm": 60
}

Response 201 Created

{
  "token_id": "01906f50-...",
  "token": "agt_a1b2c3d4_<32hex>",
  "prefix": "agt_a1b2c3d4",
  "name": "etl-pipeline",
  "scopes": ["flyquery.files:upload", "flyquery.ingest:read", "flyquery.query:read"],
  "created_at": "2026-05-23T12:00:00Z"
}

The full token is returned only once. The server stores the SHA-256 hash. The prefix is used for listing and revocation.


GET /api/v1/agent-tokens

List tokens (prefix + metadata only; no hashes).


DELETE /api/v1/agent-tokens/{id}

Revoke immediately.


6. Agent-tier endpoints

All agent-tier endpoints mirror their user-tier counterparts under /api/v1/agent/. They accept X-Agent-Token instead of JWT. Writes require Idempotency-Key.

The token's effective grant is the intersection of:

  1. Token scopes (scopes_json)
  2. dataset_allowlist_json (if set)
  3. workspace_allowlist_json (if set)
  4. AST classification of generated SQL
  5. Target table kind (UPLOADED rejects all writes; DERIVED accepts writes only under flyquery.derived:write)

Agent-tier endpoint index

Method Path Required scope
POST /agent/datasets/{id}/files flyquery.files:upload
PUT /agent/datasets/{ds}/tables/{id}:upload flyquery.files:upload
POST /agent/tables:derive flyquery.derived:write
GET /agent/tables/{id} flyquery.schema:read
GET /agent/tables/{id}/snapshots flyquery.schema:read
GET /agent/tables/{id}/changes flyquery.schema:read
PUT /agent/schema-objects/{id} flyquery.schema:annotate
GET /agent/schema-objects/{id}/lineage flyquery.lineage:read
POST /agent/ingest-jobs flyquery.ingest:run
GET /agent/ingest-jobs/{id} flyquery.ingest:read
GET /agent/ingest-jobs/{id}/stream flyquery.ingest:read
POST /agent/semantic/metrics flyquery.semantic:author
POST /agent/examples flyquery.examples:author
GET /agent/examples flyquery.examples:read
POST /agent/query flyquery.query:read
POST /agent/query/stream flyquery.query:read
POST /agent/query:explain flyquery.query:read
POST /agent/query:validate flyquery.query:read
POST /agent/sql:execute flyquery.sql:execute + workspace.allow_direct_sql=true
POST /agent/sql:execute/stream flyquery.sql:execute

All agent-tier endpoints also accept the same X-Tenant-Id, X-Workspace-Id, and X-Correlation-Id headers.


7. Scope catalog

23 scopes. A token carries a subset; the effective grant is the intersection of token scopes + workspace/dataset allowlists.

Scope Grants access to
flyquery.datasets:read GET /datasets, GET /datasets/{id}
flyquery.datasets:write POST /datasets, PUT /datasets/{id}, DELETE /datasets/{id}
flyquery.files:upload POST /datasets/{id}/files, PUT /datasets/{ds}/tables/{id}:upload
flyquery.files:read GET /files/{id}, GET /datasets/{id}/files
flyquery.schema:read GET /tables/*, GET /schema-objects:search, samples + profiles
flyquery.schema:annotate PUT /schema-objects/{id}, type-hint approval, rename-merge
flyquery.relations:read GET /datasets/{id}/relations
flyquery.relations:write POST /relations, approve/reject relation
flyquery.semantic:read GET /semantic/metrics, GET /semantic/dimensions, GET /glossary
flyquery.semantic:author Create/edit metrics, dimensions, glossary; publish/retire
flyquery.examples:read GET /examples (APPROVED only)
flyquery.examples:author POST /examples, approve/reject examples
flyquery.query:read POST /query, /query/stream, /query:explain, /query:validate
flyquery.derived:write POST /tables:derive + DML on DERIVED tables
flyquery.sql:execute POST /sql:execute (requires workspace.allow_direct_sql=true)
flyquery.conversations:* All conversation endpoints
flyquery.ingest:read GET /ingest-jobs, GET /ingest-jobs/{id}, stream
flyquery.ingest:run POST /ingest-jobs, POST /ingest-jobs/{id}:cancel
flyquery.lineage:read GET /schema-objects/{id}/lineage
flyquery.audit:read GET /audit
flyquery.billing:read GET /billing, GET /stats
flyquery.agent_tokens:write POST /agent-tokens, DELETE /agent-tokens/{id}
* Operator wildcard; grants everything. Never set on externally-minted tokens

8. SSE event reference

Query stream (POST /query/stream, POST /agent/query/stream)

Event name Payload fields
schema_linked grounded_context_summary, semantic_path, missing_info, candidate_table_count
clarification questions: [str], reasons: [str] — emitted alongside the answer, non-blocking
sql_generated candidate_count, chosen_index, candidate_summaries
executed row_count, elapsed_ms, retried_after_error, snapshot_pins: {table_id: snapshot_id}
explained summary, chart_hint
final Full AnswerResponse JSON including presigned URL
error RFC 7807 envelope

Ingest stream (GET /ingest-jobs/{id}/stream)

Event name Payload fields
received file_id, file_format, compression, size_bytes
parsed tables_extracted: [{table_id, name, n_columns, n_rows_estimate}]
reconciled table_id, n_added, n_removed, n_type_changed, n_renamed_candidate
sampled table_id, columns_sampled, columns_skipped
profiled table_id, tables_profiled, tables_skipped
relations_proposed heuristic, agent_proposed, dataset_total_relations
described table_id, objects_described, budget_remaining_cents
pii_tagged table_id, tags_set, redactions, late_redactions
embedded table_id, embeddings_written
snapshot_ready table_id, snapshot_id, snapshot_hash, parquet_byte_size
final Full IngestJobResult JSON
error RFC 7807 envelope

SQL execute stream (POST /sql:execute/stream)

Event name Payload fields
ast_classified classification, single_statement, table_refs: []
executed row_count, elapsed_ms, snapshot_pins
final Full SqlExecuteResponse JSON
error RFC 7807 envelope