Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
38 changes: 37 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,42 @@ follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and
the project uses [CalVer](https://calver.org/) (YY.MM.PP) per the
Firefly Framework convention (memory: `firefly_uses_calver`).

## [26.6.0] - 2026-06-01

### Fixed — Semantic layer brought up to its documented contract

- **The `SEMANTIC_LAYER` fast-path now actually executes.** A published metric's
compiled SQL is fetched (`SemanticRepository.get_by_name`, tenant+workspace
scoped), bound via `SemanticCompiler.bind`, and run straight through the AST
firewall + executor with **no GenerationAgent** — the metric name + version are
pinned in the query record. Previously the branch silently fell through to the
LLM because `semantic_repo` was never wired and `get_by_name` did not exist.

### Added

- Nested MetricFlow YAML schema (`metric:` / `dimension:` roots) replacing the
flat schema; all four metric types (`SIMPLE`, `RATIO`, `DERIVED`, `CUMULATIVE`)
compile to DuckDB SQL templates with `{extra_filter_clause}` / `{group_by_append}`
runtime slots filled by `SemanticCompiler.bind`. `count_distinct` is supported.
- Publish-time **sqlglot firewall**: single-SELECT, no DDL/commands/multi-statement,
no subqueries, anonymous-function allowlist, identifier regex — invalid/unsafe
definitions return RFC 7807 `400 semantic_compile_error`.
- Real dimensions: own `categorical|time` validator + compiler (grain-aware
`DATE_TRUNC`); metric `group_by` resolves published dimension names.
- Agent-tier mirrors: `/api/v1/agent/semantic/metrics`, `/api/v1/agent/semantic/dimensions`,
`/api/v1/agent/glossary` (scopes `flyquery.semantic:author` / `:read`).
- `metadata_json` column on metrics + dimensions (migration `0014`); glossary
DTOs accept the documented `synonyms` / `related_metrics` keys; glossary
`related_metrics` surfaced to grounding for SEMANTIC_LAYER routing.

### Changed

- Version rows persist `compiled_sql_template` (no longer NULL); `publish` records
it on the current version; `update` of a published metric recompiles + re-firewalls.
- Semantic repos/services are tenant+workspace scoped (defense-in-depth over RLS);
metrics/dimensions `list` gains a `status` filter; `SemanticVersionRead` exposes
doc-aligned `version_number` / `metric_id`.

## [26.5.14] - 2026-05-31

### Changed
Expand Down Expand Up @@ -909,7 +945,7 @@ Released by ancongui.
CRUD, agent-token mint/verify, ObjectStore port + LocalFs + S3
adapters, CI workflows.

## [Unreleased]
## [26.6.0] - 2026-06-01

### Added
- Foundation scaffold: pyproject + pyfly.yaml + Dockerfile + Taskfile
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ JSONL, Parquet, Avro, ORC, Arrow and Feather, plus their `.gz`,
[![OpenAPI](https://img.shields.io/badge/api-openapi%203.1-green)](docs/api-reference.md)
[![DuckDB](https://img.shields.io/badge/query%20engine-duckdb-yellow)](docs/architecture.md)
[![pgvector](https://img.shields.io/badge/vector--store-pgvector-336791)](docs/architecture.md)
[![Version](https://img.shields.io/badge/version-26.5.14-green.svg)](#)
[![Version](https://img.shields.io/badge/version-26.6.0-green.svg)](#)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

</div>
Expand Down
4 changes: 2 additions & 2 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ tasks:
-g python
-o sdks/python
--package-name flyquery_sdk
--additional-properties=packageVersion=26.5.12,projectName=flyquery-sdk,library=asyncio
--additional-properties=packageVersion=26.6.0,projectName=flyquery-sdk,library=asyncio
--skip-validate-spec
- cd sdks/python && uv pip install --system --quiet . || true

Expand All @@ -140,5 +140,5 @@ tasks:
--invoker-package com.firefly.flyquery
--api-package com.firefly.flyquery.api
--model-package com.firefly.flyquery.model
--additional-properties=artifactId=flyquery-sdk,groupId=com.firefly,artifactVersion=26.5.12,library=webclient,dateLibrary=java8,java8=true,useJakartaEe=true
--additional-properties=artifactId=flyquery-sdk,groupId=com.firefly,artifactVersion=26.6.0,library=webclient,dateLibrary=java8,java8=true,useJakartaEe=true
--skip-validate-spec
48 changes: 27 additions & 21 deletions docs/semantic-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,13 @@ The semantic layer gives operators confidence that sensitive business
definitions — revenue, churn, ARR — produce consistent SQL regardless of
which phrasing a user employs.

**v0 ships**: `SIMPLE` metric type (aggregate + expr + filter + group_by).
`RATIO`, `DERIVED`, and `CUMULATIVE` types are accepted in YAML and stored,
but execution is deferred to v1+. Attempting to query a non-SIMPLE published
metric falls through to the `SYNTHESIS` path with the metric definition
passed as few-shot context.
All four metric types — `SIMPLE`, `RATIO`, `DERIVED`, and `CUMULATIVE` — are
validated, compiled to DuckDB SQL templates, and executable via the
`SEMANTIC_LAYER` path. Measure expressions are **qualified** (`table.column`);
the compiler derives the source table from that prefix. `group_by` entries may
name a published dimension, which the compiler resolves to that dimension's
compiled expression. Cross-dataset metrics with explicit join paths remain
planned but unshipped (single-dataset only).

---

Expand Down Expand Up @@ -66,13 +68,13 @@ metric:
label: Total Revenue # human-readable display name
description: >
Sum of order_amount across completed orders, in USD cents.
type: simple # simple | ratio | derived | cumulative (v0: simple only)
type: simple # simple | ratio | derived | cumulative
type_params:
measure:
name: order_amount # column name in the source table
name: order_amount # measure name
agg: sum # sum | count | count_distinct | avg | min | max
expr: order_amount # optional: expression override
filter: "order_status = 'COMPLETED'" # optional: WHERE clause fragment
expr: orders.order_amount # qualified table.column the agg runs over
filter: "orders.order_status = 'COMPLETED'" # optional: WHERE fragment
group_by:
- region
- product_category
Expand Down Expand Up @@ -110,16 +112,17 @@ metric:
measure:
name: order_id
agg: count_distinct
filter: "order_status = 'COMPLETED'"
expr: orders.order_id
filter: "orders.order_status = 'COMPLETED'"
```

Compiled SQL template:
```sql
SELECT {group_by_cols}, COUNT(DISTINCT order_id) AS completed_order_count
FROM {table_ref}
WHERE order_status = 'COMPLETED'
{extra_filter}
GROUP BY {group_by_cols}
SELECT COUNT(DISTINCT orders.order_id) AS completed_order_count
FROM orders
WHERE orders.order_status = 'COMPLETED'
{extra_filter_clause}
GROUP BY {group_by_append}
```

### RATIO (v1+)
Expand Down Expand Up @@ -235,9 +238,8 @@ YAML changes.

### Compilation rules (SIMPLE type)

1. **Source table** — resolved to `{dataset}.{table_name}` where
`table_name` matches `measure.name`'s parent table in the dataset's
schema knowledge base.
1. **Source table** — taken from the qualified `measure.expr`
(`table.column`); the prefix before the dot is the `FROM` table.
2. **Aggregate expression** — `SUM(expr)`, `COUNT(DISTINCT expr)`, etc.
3. **Filter** — appended to the WHERE clause verbatim.
4. **Group-by columns** — validated to exist in the source table.
Expand Down Expand Up @@ -353,17 +355,21 @@ context in the GenerationAgent prompt.
| `POST` | `/api/v1/semantic/metrics/{id}:retire` | PUBLISHED → RETIRED; removed from retrieval |
| `GET` | `/api/v1/semantic/metrics/{id}/history` | Immutable version log |
| `POST` | `/api/v1/semantic/dimensions` | Create dimension |
| `GET` | `/api/v1/semantic/dimensions` | List dimensions |
| `GET` | `/api/v1/semantic/dimensions` | List; optional `status`, `dataset_id` |
| `GET` | `/api/v1/semantic/dimensions/{id}` | Full detail |
| `PUT` | `/api/v1/semantic/dimensions/{id}` | Update |
| `POST` | `/api/v1/semantic/dimensions/{id}:publish` | DRAFT → PUBLISHED |
| `POST` | `/api/v1/semantic/dimensions/{id}:retire` | PUBLISHED → RETIRED |
| `GET` | `/api/v1/semantic/dimensions/{id}/history` | Version log |
| `POST` | `/api/v1/glossary` | Create glossary term |
| `GET` | `/api/v1/glossary` | List terms (workspace-scoped) |
| `PUT` | `/api/v1/glossary/{id}` | Update |
| `DELETE` | `/api/v1/glossary/{id}` | Remove |

Agent-tier mirrors exist under `/api/v1/agent/semantic/metrics` with scope
`flyquery.semantic:author`.
Agent-tier mirrors exist under `/api/v1/agent/semantic/metrics`,
`/api/v1/agent/semantic/dimensions`, and `/api/v1/agent/glossary`. Writes
require scope `flyquery.semantic:author`; reads require `flyquery.semantic:read`.
They require an `X-Agent-Token` header and delegate to the same services.

---

Expand Down
71 changes: 71 additions & 0 deletions migrations/versions/0014_semantic_metadata_and_dimension_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright 2024-2026 Firefly Software Foundation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""semantic metadata_json + dimension_type

Adds the ``metadata_json`` column to semantic metrics + dimensions (stores
the metric/dimension ``meta`` block) and a first-class ``dimension_type``
(categorical|time) column to dimensions, so dimensions no longer borrow the
metrics-only ``metric_type`` enum. ``metric_type`` is left in place on the
dimensions table (with its DRAFT/SIMPLE defaults) for backward compatibility.

Revision ID: 0014_semantic_meta_dimtype
Revises: 0013_job_callbacks
"""
from __future__ import annotations

from alembic import op

revision = "0014_semantic_meta_dimtype"
down_revision = "0013_job_callbacks"


def upgrade() -> None:
op.execute(
"ALTER TABLE flyquery_semantic_metrics "
"ADD COLUMN IF NOT EXISTS metadata_json JSONB NOT NULL DEFAULT '{}'::jsonb"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions "
"ADD COLUMN IF NOT EXISTS metadata_json JSONB NOT NULL DEFAULT '{}'::jsonb"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions "
"ADD COLUMN IF NOT EXISTS dimension_type TEXT NOT NULL DEFAULT 'categorical'"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions "
"DROP CONSTRAINT IF EXISTS ck_flyquery_semantic_dimensions_dimtype"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions "
"ADD CONSTRAINT ck_flyquery_semantic_dimensions_dimtype "
"CHECK (dimension_type IN ('categorical','time'))"
)


def downgrade() -> None:
op.execute(
"ALTER TABLE flyquery_semantic_dimensions "
"DROP CONSTRAINT IF EXISTS ck_flyquery_semantic_dimensions_dimtype"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions DROP COLUMN IF EXISTS dimension_type"
)
op.execute(
"ALTER TABLE flyquery_semantic_dimensions DROP COLUMN IF EXISTS metadata_json"
)
op.execute(
"ALTER TABLE flyquery_semantic_metrics DROP COLUMN IF EXISTS metadata_json"
)
Loading
Loading