feat(admin-api): support inter-table dependencies in derived dataset validation#1912
Open
feat(admin-api): support inter-table dependencies in derived dataset validation#1912
Conversation
LNSD
reviewed
Mar 5, 2026
48380af to
ded30db
Compare
bcc4bbd to
b32ca21
Compare
Add self-qualified table references (self.table_name) enabling tables within a derived dataset to reference sibling tables. Includes topological ordering, cycle detection, and self-reference rejection. - Add `DepAliasOrSelfRef` type for parsing `self.`-qualified refs - Implement topological sort with `CyclicDepError` in `datasets-derived` - Register sibling schemas progressively via `SelfSchemaProvider` - Add `CYCLIC_DEPENDENCY`, `SELF_REFERENCING_TABLE`, `CATALOG_QUALIFIED_TABLE`, `INVALID_TABLE_NAME` error codes - Add runtime inter-table dependency support in worker-datasets-derived Signed-off-by: Mitchell Spencer <mitchellhspencer@gmail.com>
b32ca21 to
1b2d9e7
Compare
Deduplicate inter-table dep logic that was copy-pasted between manifest validation and schema inference handlers. - Add `resolve_inter_table_order` shared function in `common.rs` - Add `InterTableDepError` enum with `error_code()` to preserve API error codes - Replace inline dep extraction in both `common.rs` and `schema.rs` with shared function - Consolidate 6 duplicated error variants into 3 defined once Signed-off-by: Mitchell Spencer <mitchellhspencer@gmail.com>
Replace generic iterator parameter with concrete BTreeMap reference for clarity and caller ergonomics. - Change `impl IntoIterator<Item = (&TableName, &[TableReference<...>])>` to `&BTreeMap<TableName, Vec<TableReference<...>>>` - Simplify schema.rs call site from `.iter().map()` chain to direct `&parsed_refs` - Build explicit `table_refs_only` map in common.rs where tuple destructuring is needed Signed-off-by: Mitchell Spencer <mitchellhspencer@gmail.com>
Update generated spec to reflect renamed resolve_inter_table_order function in schema handler panics documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full inter-table dependency support for derived datasets — both validation (admin API) and runtime (dump engine). Tables within a derived dataset can now reference sibling tables using
self.<table_name>syntax (e.g.,SELECT * FROM self.blocks_base), consistent with the existingself.UDF convention.Part 1: Validation (admin API)
SelfSchemaProvider.add_table()for progressive schema registration during topological processingtopological_sort(), returningCYCLIC_DEPENDENCY400 errorSELF_REF_TABLE_NOT_FOUNDerror whenself.references target non-existent sibling tables/schemaand/manifestsendpointsPart 2: Runtime (dump engine)
physical_for_dump::create()intoresolve_external_deps()+build_catalog()so callers can inject self-ref entries alongside external depspartition_table_refs()to separateself.refs from external depsdataset_start_blockfallback (validation now guarantees tables have at least one reference viaNO_TABLE_REFERENCEScheck from fix(admin-api): reject derived tables with no source table references #1944)PhysicalTablemap from orchestrator to each table taskintra_deps_testE2E testError codes
New error codes added by this PR:
SELF_REFERENCING_TABLEself.<own_name>SELF_REF_TABLE_NOT_FOUNDself.-qualified reference targets a non-existent sibling tableCYCLIC_DEPENDENCYCATALOG_QUALIFIED_TABLETABLE_REFERENCE_RESOLUTION)INVALID_TABLE_NAMEBreaking change:
TABLE_REFERENCE_RESOLUTIONerror code renamed toCATALOG_QUALIFIED_TABLEfor catalog-qualified table references. Acceptable givenstatus: unstable.Key design decisions
self.convention: Aligns with UDF convention (self.functionName()). Parsed by DataFusion asTableReference::Partial { schema: "self", table: "..." }for_dump.rshas zero self-ref knowledge — it resolves external deps and builds catalogs from generic entries. Self-ref resolution lives intable.rswhere it belongsFailFastJoinSetcancellation if a sibling failsFiles changed
common/src/self_schema_provider.rsadd_table()for progressive schema registrationcommon/src/catalog/physical/for_dump.rscreate()→resolve_external_deps()+build_catalog(), addResolvedTableEntryworker-datasets-derived/src/job_impl.rsmaterialize_table()call; 1 unit testworker-datasets-derived/src/job_impl/table.rspartition_table_refs(), self-ref resolution in both phases, notification-driven polling; 5 unit testsdatasets-derived/src/sorting.rstopological_sort()andCyclicDepErroradmin-api/src/handlers/schema.rs/schemaadmin-api/src/handlers/common.rs/manifeststests/src/tests/it_dependencies.rs#[ignore]fromintra_deps_testdocs/feat/data-inter-table-dependencies.mdRelated
docs/feat/data-inter-table-dependencies.mdTest plan
intra_deps_testpasses (dump + query with inter-table deps)SELF_REFERENCING_TABLE,CYCLIC_DEPENDENCY,NO_TABLE_REFERENCESerrors via curl