Skip to content

Latest commit

 

History

History
267 lines (183 loc) · 20.8 KB

File metadata and controls

267 lines (183 loc) · 20.8 KB

Architecture: knowledgecomplex

The 2×2 Responsibility Map

This is the central architectural constraint. Every rule in the system belongs to exactly one cell. The Python package exists to hide this table from the user.

OWL SHACL
Topological kc:Element base class; kc:Vertex, kc:Edge, kc:Face as subclasses. kc:Edge has exactly 2 boundedBy (Vertex); kc:Face has exactly 3 boundedBy (Edge). kc:Complex as collection of elements via kc:hasElement. Boundary vertices are distinct; boundary edges of a face form a closed triangle; boundary-closure of a complex (all instance-level; require sh:sparql)
Ontological Concrete subclasses and their allowed attributes; property domain/range declarations Controlled vocabulary enforcement (e.g. status ∈ {passing, failing, pending}); attribute presence rules; co-occurrence constraints

Why Both OWL and SHACL at Each Layer

Topological layer: OWL cardinality axioms enforce structural counts at the class level (reasoning over schema). SHACL is required for the closed-triangle constraint because OWL cannot express a constraint that references the co-values of three different property assertions on the same individual — this is a known expressivity boundary of OWL-DL. The sh:sparql constraint in kc_core_shapes.ttl is the explicit test of this boundary.

Ontological layer: OWL defines what attributes a concrete type has (property declarations, domain, range, subclass hierarchy). SHACL defines what values those attributes must have at the instance level (vocabulary constraints, cardinality on the concrete shape, required/optional). OWL cannot enforce controlled vocabulary on data properties at the instance level without enumerating individuals, which is inappropriate for string-valued attributes.


Component Layers

┌─────────────────────────────────────────────────────┐
│           Application / Demo  (user code)           │
│  build_my_instance()  |  domain-specific queries    │
│  Concrete elements: vertices, edges, faces           │
├─────────────────────────────────────────────────────┤
│          Domain Model  (user code)                  │
│  build_my_schema()    |  domain SPARQL templates    │
│  MyVertex, MyEdge, MyFace type definitions          │
├─────────────────────────────────────────────────────┤
│         knowledgecomplex Python Package             │
│  SchemaBuilder DSL     |  KnowledgeComplex I/O      │
│  (OWL + SHACL emit)   |  (rdflib graph + SPARQL)   │
├──────────────────────┬──────────────────────────────┤
│  kc_core.ttl         │  kc_core_shapes.ttl          │
│  (abstract OWL)      │  (abstract SHACL)            │
├──────────────────────┴──────────────────────────────┤
│         rdflib  |  pyshacl  |  owlrl                │
└─────────────────────────────────────────────────────┘

The domain model layer sits between the core framework and the application. It defines domain-specific types and queries using the core's SchemaBuilder DSL. The application layer then instantiates that model with concrete data.

The static resources (kc_core.ttl, kc_core_shapes.ttl) are loaded once at SchemaBuilder.__init__. Model schema and shapes are merged into the same rdflib Graph objects at runtime.


Abstraction Boundary: Core vs. Domain Models

The layers above are separated by a key abstraction boundary: the core framework (knowledgecomplex/) vs. domain models (user code). Everything inside the package boundary (core, static resources, libraries) is framework-owned and invariant. Everything outside (model definitions, instances) is user-authored.

Core Framework (knowledgecomplex/, prefixes kc: and kcs:)

  • Topological rule enforcement. The Element/Vertex/Edge/Face hierarchy, cardinality axioms, distinctness, closed-triangle, and boundary-closure constraints. Static OWL and SHACL shipped with the package. Users cannot modify them.
  • Superstructure attributes. kc:uri (optional, at-most-one) allows any element to reference a source file. Enforced by kcs:ElementShape.
  • Ontological rule authoring. SchemaBuilder provides the DSL for declaring types, attributes, and vocabularies. It generates OWL classes and SHACL shapes on behalf of the domain model but does not itself define any domain types.
  • Instance management. KnowledgeComplex loads the merged schema, manages the RDF graph, validates on every write, and executes named SPARQL queries.
  • Framework queries. Generic SPARQL templates (vertices, coboundary) that work for any domain model.

Domain Models (user code, prefix {namespace}:)

  • Ontological rule enforcement. The concrete OWL types and SHACL shapes generated by calling SchemaBuilder.add_*_type().
  • Concrete complex authoring. Instance data constructed via KnowledgeComplex.add_*() calls.
  • Domain queries. Model-specific SPARQL templates.

The Type Inheritance Chain Crosses the Boundary

kc:Element → kc:Vertex → aaa:spec → (instance "spec-001")
   core          core       model        application

The core owns Element → Vertex; the model owns Vertex → spec; the application owns the instance spec-001. The boundary is at the subclass declaration — add_vertex_type("spec") is the model calling the core's authoring API to extend the core's type hierarchy.

Layer Ownership of the 2×2 Map

OWL SHACL
Topological Core owns (static kc_core.ttl) Core owns (static kc_core_shapes.ttl)
Ontological Domain model authors via SchemaBuilder → core generates Domain model authors via vocab()/attributes → core generates

Both ontological cells are authored by the domain model but generated and managed by the core. The domain model never touches OWL or SHACL directly.


Key Design Decisions

DD1: Attributes over Subclasses (for Simple Domains)

For simple domain models, a single concrete type with a controlled-vocabulary attribute (e.g. verification with status ∈ {passing, failing, pending}) is preferred over two subclasses (PassingVerification, FailingVerification). The framework supports both patterns.

Rationale: The single-type-with-attribute pattern makes the data more inspectable before schema-level concerns are promoted. promote_to_attribute() supports the transition path from untyped patterns to typed attributes.

DD2: SPARQL Templates, Not Free Queries

All SPARQL is encapsulated as named template files. KnowledgeComplex.query() accepts only registered template names.

Rationale: Maintains API opacity, prevents arbitrary SPARQL from bypassing validation invariants, and makes the query surface explicit and testable.

DD3: Validation on Write

add_vertex(), add_edge(), and add_face() each trigger SHACL validation immediately and raise ValidationError on failure. Rollback removes all added triples on failure.

Rationale: Fail fast; keep the graph in a valid state at all times. Verification is not a batch post-processing step — it is enforced at assertion time.

DD4: Static Core Resources

kc_core.ttl and kc_core_shapes.ttl are static files shipped with the package, not generated at runtime.

Rationale: The topological rules are framework invariants, not user-configurable. Separating them from user schema makes the 2×2 boundary visible in the file system.

DD5: dump_owl() and dump_shacl() Merge Core and User Schema

Both dump methods return the full merged graph (core + user-defined), serialized as Turtle.

Rationale: The merged graph is what pyshacl and owlrl operate on. Showing the full graph makes the system inspectable and demonstrates that user types genuinely extend (not replace) the core ontology.

DD6: Shared-Domain Removal (_set_owl_domain)

When the same property name appears on multiple types, the OWL rdfs:domain assertion is removed (leaving no domain) rather than adding multiple domain values. SHACL shapes still enforce per-type constraints correctly via each type's NodeShape.

Rationale: Multiple rdfs:domain values trigger RDFS inference to classify any individual with that property as a member of all domain types — violating the type hierarchy. Removing domain resolves the conflict; SHACL handles the per-type enforcement.


Known OWL Expressivity Limits (Design Seams)

Constraint OWL can express? Resolution
Edge has exactly 2 boundary vertices Yes (cardinality on boundedBy) OWL cardinality axiom
Face has exactly 3 boundary edges Yes (cardinality on boundedBy) OWL cardinality axiom
Boundary vertices are distinct individuals No (OWL open-world; same-as/different-from is individual-level) SHACL sh:sparql (COUNT DISTINCT)
Boundary edges of a face form a closed triangle No (requires co-reference across 3 property values) SHACL sh:sparql constraint
Boundary-closure of a complex No (requires co-reference across hasElement and boundedBy on different individuals) SHACL sh:sparql constraint
Controlled vocabulary on data property No (without owl:oneOf on individuals, impractical for strings) SHACL sh:in
At-most-one kc:uri per element Not enforced practically (open-world) SHACL sh:maxCount 1 in ElementShape

These seams are documented as comments in the relevant .ttl files.


Interoperability: Flexo MMS and OpenMBEE

Because knowledgecomplex stores all data as RDF and enforces constraints via standard W3C technologies (OWL, SHACL, SPARQL), it is natively compatible with Flexo MMS — the Model Management System developed by the OpenMBEE community.

Why the fit is natural

Flexo MMS is a version-controlled model repository that speaks RDF natively. A KC instance graph is already a valid RDF dataset, so the integration path is direct:

KC concept MMS equivalent Notes
kc:Complex (instance graph) MMS model/branch A KC export is a self-contained RDF graph that can be committed as an MMS model revision
kc:boundedBy, kc:hasElement MMS element relationships Topological structure is expressed as standard RDF triples
SHACL shapes (kc_core_shapes.ttl + user shapes) MMS validation profiles Shapes can be registered in MMS to enforce KC constraints on committed models
kc:uri MMS element cross-references Provides traceability from KC elements to external artifacts (files, documents, URIs)
JSON-LD export (dump_graph(format="json-ld")) MMS ingest format JSON-LD is the primary API format for Flexo MMS

Integration patterns

Push to MMS: Export a KC instance via kc.export() or dump_graph(format="json-ld"), then commit to a Flexo MMS repository via its REST API. The OWL ontology and SHACL shapes can be committed alongside the instance data, enabling MMS-side validation.

Pull from MMS: Retrieve a model revision as JSON-LD from Flexo MMS, then load it into a KC instance via load_graph(kc, "model.jsonld"). The KC's SHACL verification (kc.verify()) ensures the imported data satisfies all topological and ontological constraints.

Version control: MMS provides branching, diffing, and merge capabilities at the RDF triple level. KC's ComplexDiff and ComplexSequence classes complement this by providing simplicial-complex-aware diffing (element-level adds/removes rather than triple-level changes).

What KC adds beyond MMS

Flexo MMS manages RDF models generically — it stores, versions, and queries them but does not enforce simplicial complex structure. KC adds the topological layer: boundary-closure, closed-triangle constraints, typed simplicial hierarchy, and algebraic topology computations (Betti numbers, Hodge decomposition). Together, MMS provides the model management infrastructure and KC provides the mathematical structure.

Reference

OpenMBEE (Open Model-Based Engineering Environment) is an open-source community developing tools for model-based systems engineering. Flexo MMS is its core model management system. See openmbee.org and github.com/Open-MBEE.


Deployment Architecture

The internal design described above (2x2 map, component layers, static resources) is the library's foundation. In practice, a knowledge complex is deployed through a stack of five layers, each building on the one below:

┌─────────────────────────────────────────────────────────────┐
│  5. LLM Tool Integration                                    │
│     Register KC operations as callable tools for a language  │
│     model. The complex serves as a deterministic expert      │
│     system — the LLM navigates, queries, and analyzes via    │
│     tool calls; the KC guarantees topological correctness    │
│     and returns structured, verifiable results.              │
├─────────────────────────────────────────────────────────────┤
│  4. MCP Server                                               │
│     Model Context Protocol server exposing KC as tools for   │
│     AI assistants (Claude, etc.). Each KC operation becomes   │
│     a tool: add_vertex, boundary, betti_numbers, audit, etc. │
├─────────────────────────────────────────────────────────────┤
│  3. Microservice (REST API)                                  │
│     Python-hosted service exposing KC operations over HTTP.   │
│     CRUD for elements, SPARQL query execution, SHACL         │
│     verification, algebraic topology analysis, export/import.│
├─────────────────────────────────────────────────────────────┤
│  2. Concrete Knowledge Complex                               │
│     An instance using a specific ontology. Typed vertices,   │
│     edges, and faces with attributes. SHACL-verified on      │
│     every write. Serialized as RDF (Turtle, JSON-LD).        │
│     Versioned via Flexo MMS or git.                          │
├─────────────────────────────────────────────────────────────┤
│  1. KC-Compatible Ontology                                   │
│     OWL class hierarchy extending kc:Vertex/Edge/Face.       │
│     SHACL shapes for attribute constraints. Publicly hosted  │
│     at persistent URIs (w3id.org). Dereferenceable — tools   │
│     can fetch the ontology and understand the type system.    │
└─────────────────────────────────────────────────────────────┘

Layer 1: Ontology

A KC-compatible ontology is an OWL ontology whose classes extend kc:Vertex, kc:Edge, and kc:Face, paired with SHACL shapes for instance-level constraints. Ontologies are authored via SchemaBuilder and exported as standard .ttl files. For public use, the ontology should be hosted at a persistent URI (e.g. https://w3id.org/kc/) so that other systems can dereference the IRI and retrieve the OWL/SHACL definitions. The knowledgecomplex.ontologies package ships three reference ontologies (operations, brand, research) as starting points.

Layer 2: Concrete Complex

A concrete knowledge complex is an RDF instance graph conforming to a specific ontology. It contains typed elements (vertices, edges, faces) with attributes, linked by kc:boundedBy and collected by kc:hasElement. SHACL verification enforces topological and ontological constraints on every write. The complex is serializable to Turtle, JSON-LD, or N-Triples and can be versioned via Flexo MMS or committed to a git repository as .ttl files.

Layer 3: Microservice

A Python-hosted HTTP service wraps the KnowledgeComplex API in a REST interface. Typical endpoints: element CRUD, named SPARQL queries, topological operations (boundary, star, closure), algebraic topology analysis (Betti numbers, Hodge decomposition, edge PageRank), SHACL verification and audit, and schema introspection. The service loads a schema at startup and manages one or more complex instances.

Layer 4: MCP Server

A Model Context Protocol server exposes KC operations as tools that AI assistants can call. Each KC method becomes an MCP tool: add_vertex, boundary, find_cliques, betti_numbers, audit, etc. The MCP server is a thin adapter over the microservice or the library directly, translating between MCP tool calls and KC Python API calls.

Layer 5: LLM Tool Integration

The knowledge complex is registered as a set of callable tools for a language model. The LLM uses the complex as a deterministic expert system — it navigates the simplicial structure, retrieves typed elements and their attributes, runs topological queries, and performs algebraic topology analysis via tool calls. The KC guarantees that every result is topologically valid and SHACL-verified. The LLM provides natural language understanding and reasoning; the KC provides structured, auditable, mathematically rigorous retrieval.

This separation is key: the LLM handles ambiguity, intent, and synthesis; the KC handles structure, correctness, and computation. Neither replaces the other.


Namespace Conventions

@prefix kc:   <https://w3id.org/kc#> .       # core framework
@prefix kcs:  <https://w3id.org/kc/shape#> . # core shapes
@prefix aaa:  <https://example.org/aaa#> .      # user namespace (example)
@prefix aaas: <https://example.org/aaa/shape#> .# user shapes (example)

User namespaces are set via SchemaBuilder(namespace="aaa"). The URI base https://example.org/ is a placeholder for local development; a real deployment would use a dereferenceable IRI.


File Inventory

File Layer Purpose
knowledgecomplex/resources/kc_core.ttl Abstract OWL Topological backbone: classes, properties, cardinality axioms, kc:uri
knowledgecomplex/resources/kc_core_shapes.ttl Abstract SHACL Topological constraints: distinctness, closed-triangle, boundary-closure, kc:uri at-most-one
knowledgecomplex/schema.py Python API — schema authoring SchemaBuilder DSL: add_*_type, dump_owl, dump_shacl, export, load
knowledgecomplex/graph.py Python API — instance I/O KnowledgeComplex: add_vertex, add_edge, add_face, query, dump_graph, export, load
knowledgecomplex/exceptions.py Public exceptions ValidationError, SchemaError, UnknownQueryError
knowledgecomplex/io.py Python API — serialization save_graph, load_graph, dump_graph — multi-format file I/O (Turtle, JSON-LD, N-Triples)
knowledgecomplex/viz.py Python API — visualization Hasse diagrams (plot_hasse), geometric realization (plot_geometric), to_networkx, verify_networkx
knowledgecomplex/analysis.py Python API — algebraic topology betti_numbers, euler_characteristic, hodge_laplacian, edge_pagerank (optional: numpy, scipy)
knowledgecomplex/clique.py Python API — clique inference find_cliques, infer_faces, fill_cliques — flagification and typed face inference
knowledgecomplex/filtration.py Python API — filtrations Filtration — nested subcomplex sequences, birth tracking, from_function
knowledgecomplex/diff.py Python API — diffs and sequences ComplexDiff, ComplexSequence — time-varying complexes with SPARQL UPDATE export/import
knowledgecomplex/codecs/markdown.py Codec — markdown files MarkdownCodec — YAML frontmatter + section-based round-trip; verify_documents
knowledgecomplex/queries/*.sparql Framework SPARQL 7 templates: vertices, coboundary, boundary, star, closure, skeleton, degree