diff --git a/docs/research/semantic-web-summary.md b/docs/research/semantic-web-summary.md new file mode 100644 index 0000000..aba3559 --- /dev/null +++ b/docs/research/semantic-web-summary.md @@ -0,0 +1,139 @@ +# Semantic Web Integration: What We Learned + +A team summary of GDS + OWL/SHACL/SPARQL integration via `gds-owl`. + +## The Short Version + +We can export **85% of a GDS specification** to Turtle/RDF files and +import it back losslessly. The 15% we lose is Python callables (transition +functions, constraint predicates, distance functions). This is a +mathematical certainty, not a gap we can close. + +## What Gets Exported (R1 -- Fully Representable) + +Everything structural round-trips perfectly through Turtle: + +| GDS Concept | RDF Representation | Validated By | +|---|---|---| +| Block names, roles, interfaces | OWL classes + properties | SHACL shapes | +| Port names and type tokens | Literals on Port nodes | SHACL datatype | +| Wiring topology (who connects to whom) | Wire nodes with source/target | SHACL cardinality | +| Entity/StateVariable declarations | Entity + StateVariable nodes | SHACL | +| TypeDef (name, python_type, units) | TypeDef node + properties | SHACL | +| Space fields | SpaceField blank nodes | SHACL | +| Parameter schema (names, types, bounds) | ParameterDef nodes | SHACL | +| Mechanism update targets (what writes where) | UpdateMapEntry nodes | SHACL | +| Admissibility dependencies (what reads what) | AdmissibilityDep nodes | SHACL | +| Transition read dependencies | TransitionReadEntry nodes | SHACL | +| State metric variable declarations | MetricVariableEntry nodes | SHACL | +| Canonical decomposition (h = f . g) | CanonicalGDS node | SHACL | +| Verification findings | Finding nodes | SHACL | + +**13 SHACL shapes** enforce structural correctness on the RDF graph. +**7 SPARQL query templates** enable cross-node analysis (blocks by role, +dependency paths, entity update maps, parameter impact, verification summaries). + +## What Requires SPARQL (R2 -- Structurally Representable) + +Some properties can't be checked by SHACL alone (which validates individual +nodes) but CAN be checked by SPARQL queries over the full graph: + +| Property | SPARQL Feature | Why SHACL Can't | +|---|---|---| +| Acyclicity (G-006) | Transitive closure (`p+`) | No path traversal in SHACL-core | +| Completeness (SC-001) | `FILTER NOT EXISTS` | No "for all X, exists Y" | +| Determinism (SC-002) | `GROUP BY` + `HAVING` | No cross-node aggregation | +| Dangling wirings (G-004) | `FILTER NOT EXISTS` | Name existence, not class membership | + +These all terminate (SPARQL over finite graphs always does) and are decidable. + +## What Cannot Be Exported (R3 -- Not Representable) + +These are **fundamentally** non-exportable. Not a tooling gap -- a +mathematical impossibility (Rice's theorem for callables, computational +class separation for string processing): + +| GDS Concept | Why R3 | What Happens on Export | +|---|---|---| +| `TypeDef.constraint` (e.g. `lambda x: x >= 0`) | Arbitrary Python callable | Exported as boolean flag `hasConstraint`; imported as `None` | +| `f_behav` (transition functions) | Arbitrary computation | Not stored in GDSSpec -- user responsibility | +| `AdmissibleInputConstraint.constraint` | Arbitrary callable | Exported as boolean flag; imported as `None` | +| `StateMetric.distance` | Arbitrary callable | Exported as boolean flag; imported as `None` | +| Auto-wiring token computation | Multi-pass string processing | Results exported (WiringIR edges); process is not | +| Construction validation | Python `@model_validator` logic | Structural result preserved; validation logic is not | + +**Key insight:** The *results* of R3 computation are always R1. Auto-wiring +produces WiringIR edges (R1). Validation produces pass/fail (R1). Only the +*process* is lost. + +## The Boundary in One Sentence + +> **You can represent everything about a system except what its programs +> actually do.** The canonical decomposition `h = f . g` makes this +> boundary explicit: `g` (topology) and `f_struct` (update targets) are +> fully representable; `f_behav` (how state actually changes) is not. + +## Practical Implications + +### What You Can Do With the Turtle Export + +1. **Share specs between tools** -- any RDF-aware tool (Protege, GraphDB, + Neo4j via neosemantics) can import a GDS spec +2. **Validate specs without Python** -- SHACL processors (TopBraid, pySHACL) + can check structural correctness +3. **Query specs with SPARQL** -- find all mechanisms that update a given + entity, trace dependency paths, check acyclicity +4. **Version and diff specs** -- Turtle is text, diffs are meaningful +5. **Cross-ecosystem interop** -- other OWL ontologies can reference GDS + classes/properties + +### What You Cannot Do + +1. **Run simulations from Turtle** -- you need the Python callables back +2. **Verify behavioral properties** -- "does this mechanism converge?" requires + executing `f_behav` +3. **Reproduce auto-wiring** -- the token overlap computation can't run in SPARQL + +### Round-Trip Fidelity + +Tested with property-based testing (Hypothesis): 100 random GDSSpecs +generated, exported to Turtle, parsed back, reimported. All structural +fields survive. Known lossy fields: + +- `TypeDef.constraint` -> `None` +- `TypeDef.python_type` -> falls back to `str` for non-builtin types +- `AdmissibleInputConstraint.constraint` -> `None` +- `StateMetric.distance` -> `None` +- Port/wire ordering -> set-based (RDF is unordered) +- Blank node identity -> content-based comparison, not node ID + +## Numbers + +| Metric | Count | +|---|---| +| R1 concepts (fully representable) | 12 | +| R2 concepts (SPARQL-needed) | 3 | +| R3 concepts (not representable) | 6 | +| SHACL shapes | 13 | +| SPARQL templates | 7 | +| Verification checks expressible in SHACL | 6 of 15 | +| Verification checks expressible in SPARQL | 6 more | +| Checks requiring Python | 2 of 15 | +| Round-trip PBT tests | 26 | +| Random specs tested | ~2,600 | + +## Paper Alignment + +The structural/behavioral split is a **framework design choice**, not a +paper requirement. The GDS paper (Zargham & Shorish 2022) defines +`U: X -> P(U)` as a single map; we split it into `U_struct` (dependency +graph, R1) and `U_behav` (constraint predicate, R3) for ontological +engineering. Same for `StateMetric` and `TransitionSignature`. The +canonical decomposition `h = f . g` IS faithful to the paper. + +## Files + +- `packages/gds-owl/` -- the full export/import/SHACL/SPARQL implementation +- `docs/research/formal-representability.md` -- the 800-line formal analysis +- `docs/research/verification/r3-undecidability.md` -- proofs for the R3 boundary +- `docs/research/verification/representability-proof.md` -- R1/R2 decidability + partition independence diff --git a/mkdocs.yml b/mkdocs.yml index ce6d554..f1033ff 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -384,6 +384,7 @@ nav: - Paper Implementation Gap: research/paper-implementation-gap.md - View Stratification: guides/view-stratification.md - Ecosystem: framework/ecosystem.md + - Semantic Web Integration: research/semantic-web-summary.md - Formal Verification: - Verification Plan: research/verification-plan.md - R3 Non-Representability: research/verification/r3-undecidability.md