Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions docs/research/semantic-web-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Semantic Web Integration: What We Learned

A team summary of GDS + OWL/SHACL/SPARQL integration via `gds-owl`.

## The Short Version

We can export **85% of a GDS specification** to Turtle/RDF files and
import it back losslessly. The 15% we lose is Python callables (transition
functions, constraint predicates, distance functions). This is a
mathematical certainty, not a gap we can close.

## What Gets Exported (R1 -- Fully Representable)

Everything structural round-trips perfectly through Turtle:

| GDS Concept | RDF Representation | Validated By |
|---|---|---|
| Block names, roles, interfaces | OWL classes + properties | SHACL shapes |
| Port names and type tokens | Literals on Port nodes | SHACL datatype |
| Wiring topology (who connects to whom) | Wire nodes with source/target | SHACL cardinality |
| Entity/StateVariable declarations | Entity + StateVariable nodes | SHACL |
| TypeDef (name, python_type, units) | TypeDef node + properties | SHACL |
| Space fields | SpaceField blank nodes | SHACL |
| Parameter schema (names, types, bounds) | ParameterDef nodes | SHACL |
| Mechanism update targets (what writes where) | UpdateMapEntry nodes | SHACL |
| Admissibility dependencies (what reads what) | AdmissibilityDep nodes | SHACL |
| Transition read dependencies | TransitionReadEntry nodes | SHACL |
| State metric variable declarations | MetricVariableEntry nodes | SHACL |
| Canonical decomposition (h = f . g) | CanonicalGDS node | SHACL |
| Verification findings | Finding nodes | SHACL |

**13 SHACL shapes** enforce structural correctness on the RDF graph.
**7 SPARQL query templates** enable cross-node analysis (blocks by role,
dependency paths, entity update maps, parameter impact, verification summaries).

## What Requires SPARQL (R2 -- Structurally Representable)

Some properties can't be checked by SHACL alone (which validates individual
nodes) but CAN be checked by SPARQL queries over the full graph:

| Property | SPARQL Feature | Why SHACL Can't |
|---|---|---|
| Acyclicity (G-006) | Transitive closure (`p+`) | No path traversal in SHACL-core |
| Completeness (SC-001) | `FILTER NOT EXISTS` | No "for all X, exists Y" |
| Determinism (SC-002) | `GROUP BY` + `HAVING` | No cross-node aggregation |
| Dangling wirings (G-004) | `FILTER NOT EXISTS` | Name existence, not class membership |

These all terminate (SPARQL over finite graphs always does) and are decidable.

## What Cannot Be Exported (R3 -- Not Representable)

These are **fundamentally** non-exportable. Not a tooling gap -- a
mathematical impossibility (Rice's theorem for callables, computational
class separation for string processing):

| GDS Concept | Why R3 | What Happens on Export |
|---|---|---|
| `TypeDef.constraint` (e.g. `lambda x: x >= 0`) | Arbitrary Python callable | Exported as boolean flag `hasConstraint`; imported as `None` |
| `f_behav` (transition functions) | Arbitrary computation | Not stored in GDSSpec -- user responsibility |
| `AdmissibleInputConstraint.constraint` | Arbitrary callable | Exported as boolean flag; imported as `None` |
| `StateMetric.distance` | Arbitrary callable | Exported as boolean flag; imported as `None` |
| Auto-wiring token computation | Multi-pass string processing | Results exported (WiringIR edges); process is not |
| Construction validation | Python `@model_validator` logic | Structural result preserved; validation logic is not |

**Key insight:** The *results* of R3 computation are always R1. Auto-wiring
produces WiringIR edges (R1). Validation produces pass/fail (R1). Only the
*process* is lost.

## The Boundary in One Sentence

> **You can represent everything about a system except what its programs
> actually do.** The canonical decomposition `h = f . g` makes this
> boundary explicit: `g` (topology) and `f_struct` (update targets) are
> fully representable; `f_behav` (how state actually changes) is not.

## Practical Implications

### What You Can Do With the Turtle Export

1. **Share specs between tools** -- any RDF-aware tool (Protege, GraphDB,
Neo4j via neosemantics) can import a GDS spec
2. **Validate specs without Python** -- SHACL processors (TopBraid, pySHACL)
can check structural correctness
3. **Query specs with SPARQL** -- find all mechanisms that update a given
entity, trace dependency paths, check acyclicity
4. **Version and diff specs** -- Turtle is text, diffs are meaningful
5. **Cross-ecosystem interop** -- other OWL ontologies can reference GDS
classes/properties

### What You Cannot Do

1. **Run simulations from Turtle** -- you need the Python callables back
2. **Verify behavioral properties** -- "does this mechanism converge?" requires
executing `f_behav`
3. **Reproduce auto-wiring** -- the token overlap computation can't run in SPARQL

### Round-Trip Fidelity

Tested with property-based testing (Hypothesis): 100 random GDSSpecs
generated, exported to Turtle, parsed back, reimported. All structural
fields survive. Known lossy fields:

- `TypeDef.constraint` -> `None`
- `TypeDef.python_type` -> falls back to `str` for non-builtin types
- `AdmissibleInputConstraint.constraint` -> `None`
- `StateMetric.distance` -> `None`
- Port/wire ordering -> set-based (RDF is unordered)
- Blank node identity -> content-based comparison, not node ID

## Numbers

| Metric | Count |
|---|---|
| R1 concepts (fully representable) | 12 |
| R2 concepts (SPARQL-needed) | 3 |
| R3 concepts (not representable) | 6 |
| SHACL shapes | 13 |
| SPARQL templates | 7 |
| Verification checks expressible in SHACL | 6 of 15 |
| Verification checks expressible in SPARQL | 6 more |
| Checks requiring Python | 2 of 15 |
| Round-trip PBT tests | 26 |
| Random specs tested | ~2,600 |

## Paper Alignment

The structural/behavioral split is a **framework design choice**, not a
paper requirement. The GDS paper (Zargham & Shorish 2022) defines
`U: X -> P(U)` as a single map; we split it into `U_struct` (dependency
graph, R1) and `U_behav` (constraint predicate, R3) for ontological
engineering. Same for `StateMetric` and `TransitionSignature`. The
canonical decomposition `h = f . g` IS faithful to the paper.

## Files

- `packages/gds-owl/` -- the full export/import/SHACL/SPARQL implementation
- `docs/research/formal-representability.md` -- the 800-line formal analysis
- `docs/research/verification/r3-undecidability.md` -- proofs for the R3 boundary
- `docs/research/verification/representability-proof.md` -- R1/R2 decidability + partition independence
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,7 @@ nav:
- Paper Implementation Gap: research/paper-implementation-gap.md
- View Stratification: guides/view-stratification.md
- Ecosystem: framework/ecosystem.md
- Semantic Web Integration: research/semantic-web-summary.md
- Formal Verification:
- Verification Plan: research/verification-plan.md
- R3 Non-Representability: research/verification/r3-undecidability.md
Expand Down
Loading