Skip to content

Latest commit

 

History

History
276 lines (217 loc) · 21.5 KB

File metadata and controls

276 lines (217 loc) · 21.5 KB

ModelCypher Glossary: A Shared Vocabulary

Purpose: This document defines the precise meaning of terms used in ModelCypher. It serves as a "Handshake Protocol" between Human Users and AI Agents to ensure we are talking about the same concepts.

Notes:

  • In this repo, run commands as poetry run mc ....
  • Global CLI options can appear anywhere on the command line (example: mc model info ./model --output text).

Core Concepts

Manifold

The high-dimensional geometric structure induced by a model’s representations under a given task/probe setup.

  • Analogy: A crumpled sheet of paper in a 3D room. The room is the Parameter Space (billions of dimensions), but the paper (the model's actual behavior) is a lower-dimensional surface.
  • Operation: We compare/stitch manifolds by aligning representations on shared probes.

Intrinsic Dimension

The minimum number of variables needed to describe a model's state.

  • Analogy: A car moves in 3D space (x, y, z), but its "Intrinsic Dimension" is 2 (steering wheel angle, gas pedal).
  • Constraint view: A d-dimensional manifold embedded in R^D is a D-vector with (D−d) constraints; 2D is 3D with one constraint, 3D is 4D with one constraint, and so on.
  • Relevance: We explore whether some refusal/safety behaviors exhibit lower intrinsic dimension under specific probes. This is an empirical question, not a universal rule.
  • Used in: Paper 0, Paper 5 (primary focus on dimensionality cliff and plateau)
  • CLI: mc analyze dimension-profile

Semantic Prime [CONJECTURAL]

A conceptual primitive (e.g., "I", "YOU", "GOOD", "BAD") from the Natural Semantic Metalanguage (NSM) tradition, proposed (and debated) as cross-linguistically universal.

  • ModelCypher usage: We use semantic primes as a candidate anchor inventory. Whether they are invariant across model families is a falsifiable hypothesis, not an assumption.
  • Used in: Paper 0, Paper 1 (includes full 65-item inventory in Appendix A)
  • CLI: mc analyze concept-volume

Co-Orbiting

When two models (a Base Model and a Sidecar Adapter) process the same input in parallel without merging their weights.

  • Analogy: A driving instructor (Sidecar) sitting next to a student (Base Model), grabbing the wheel only when necessary.

Geometric Composition (Inference) [PROVEN]

The transformer forward pass viewed as ordered composition of state transforms, not additive layer-wise information construction.

  • Equation: h_0 = Embed(prefix), h_{l+1} = T_l(h_l), h_L = (T_{L-1} ∘ ... ∘ T_0)(h_0)
  • Order property: Composition is non-commutative (T_1(T_2(h)) != T_2(T_1(h))), so trajectory order carries signal.
  • Human explanation: "The model keeps transforming the same evolving state through different geometric views; it is not stacking independent semantic bricks."
  • See also: GEOMETRY-GUIDE.md, MISSION.md

Object (Mechanism)

In ModelCypher terminology, the "object" is the internal geometric mechanism: the manifold trajectory of hidden states generated by composed transforms.

  • Operational meaning: This is what drives behavior within a forward pass (h_0 -> ... -> h_L).
  • Human explanation: "The object is the actual high-dimensional structure doing the work."
  • Contrast: The object is not the token-probability table itself.
  • See also: MISSION.md

Shadow (Readout)

In ModelCypher terminology, the "shadow" is the observable projection of terminal geometry into token likelihoods.

  • Equation: logits_t = W_out h_{L,t} + b, p(token_t | prefix) = softmax(logits_t)
  • Operational meaning: Useful for decoding and evaluation, but not the internal causal mechanism of state evolution.
  • Human explanation: "The shadow is what we can rank and sample from after geometry has already done its work."
  • See also: GEOMETRY-GUIDE.md, MISSION.md

Metrics

Geodesic Distance

Distance measured along the data manifold via shortest paths on a k-NN graph.

  • Key principle: When ModelCypher reports a distance between points/representations, it is usually geodesic (k-NN graph shortest path), not raw Euclidean.
  • Why it matters: Euclidean distance can become less informative in high dimensions. Geodesic distance follows the local manifold structure implied by the point cloud/probe setup.
  • How it works: Build a k-NN graph (edges weighted by Euclidean distance—the one bootstrap step), then compute shortest paths through the graph.
  • Human explanation: "Distance measured by roads, not as the crow flies. In curved high-dimensional space, the straight line isn't the true path."

CKA (Centered Kernel Alignment)

A measure of similarity between two neural network layers that is robust to rotation.

  • Range: 0.0 (low similarity on the probe set) to 1.0 (identical centered Gram structure on the probe set).
  • Operational meaning: CKA = 1.0 means identical centered Gram structure on the probe set.
  • Bias note: Finite sampling can bias CKA (inflate or deflate). Use debiased HSIC and feature-sampling correction when possible.
  • Used in: Paper 0, Paper 1, Paper 3, Paper 4, Paper 5
  • CLI: CKA is computed internally during training and merge verification (no standalone CLI command)

Spectral Signature (Graph Laplacian)

Raw spectral measurements of a geodesic k-NN graph: Laplacian eigenvalues, algebraic connectivity (λ₂), component count (zero-eigenvalue multiplicity), and heat trace $H(t)=\sum_i e^{-t\lambda_i}$.

  • Operational meaning: Encodes diffusion geometry of the discrete manifold without requiring shared coordinates.
  • Used in: mc analyze spectral-trajectory
  • References: von Luxburg, A Tutorial on Spectral Clustering (PDF, arXiv:0711.0189); Heat Kernel Goes Topological (PDF, arXiv:2507.12380)

Jaccard Similarity (Intersection)

A measure of overlap between the active dimensions of two models.

  • Formula: $|A \cap B| / |A \cup B|$
  • Use: Determines if two models "speak the same language" regarding a specific prompt corpus.

Refusal Vector Magnitude

The norm of the activation vector associated with a refusal response (e.g., "I cannot do that").

Flavor Token

Active but non-functional tokens (e.g., "Sure!", "Here is a...", "calibrating flux") that do not advance the reasoning trajectory but serve to "grease" the conversation.

  • Note: Excessive low-information “filler” can correlate with higher entropy and lower factuality in some settings, but this is not a reliable detector on its own.

Artifacts

Intersection Map

A data structure (JSON) recording overlap diagnostics between two models under a probe corpus (e.g., layer-wise correlation/CKA-style signals).

  • Analogy: A “Venn diagram” of overlap under the chosen probe setup (not a claim of identical knowledge).

Safety Polytope

A geometric safety framing in which “safe” behavior is defined as staying within a bounded region of representation space (often modeled as a convex polytope).

  • Note: This is an active research direction (e.g., SaP: PDF, arXiv:2505.24445); ModelCypher treats it as a conceptual target rather than a universal claim.

Sidecar

A specialized, lightweight adapter (LoRA) trained to enforce specific geometric constraints (e.g., Safety, Persona) without altering the base model's general capabilities.


Advanced Mathematical Concepts (AI-to-Human Analogies)

Bhattacharyya Coefficient

A measure of overlap between two probability distributions.

  • Analogy: Imagine two bells (Gaussians) placed on a number line. The Bhattacharyya Coefficient measures how much they overlap. 1.0 = exact overlap (same bell), 0.0 = no overlap (completely separate).
  • Human explanation: "These two models see this concept in almost the same way (high overlap)" or "These models have very different representations (low overlap)."

Gromov-Wasserstein Distance

A measure of structural similarity between two metric spaces (manifolds) that don't share a common coordinate system.

  • Analogy: Comparing the street layouts of two cities without knowing their GPS coordinates. You compare "how things connect to each other" rather than absolute positions. Low distance = similar structure.
  • Human explanation: "The internal organization of these two models is similar, even though they don't share the same coordinate frame."

Procrustes Alignment

A method to rotate/scale one set of points to minimize alignment error.

  • Analogy: Placing two photographs on top of each other and rotating/scaling one until the faces align as closely as possible.
  • Human explanation: "We're finding the rotation that minimizes alignment error so we can compare these representations in a shared frame."
  • Used in: Paper 0, Paper 3 (anchor-locked Procrustes for adapter transfer)
  • See also: docs/research/math/gromov_wasserstein.md

Optimal Transport (Sinkhorn)

A mathematical framework for finding the cheapest way to transform one distribution into another.

  • Analogy: Moving piles of sand from one set of locations to another with minimum total effort. Each grain finds its "destination" and we measure the total cost.
  • Human explanation: "We're computing how much 'effort' it takes to morph one model's representation into another's."

Shannon Entropy

A measure of uncertainty or information content in a probability distribution.

  • Analogy: An unbiased coin has high entropy (maximum uncertainty). A rigged coin with 99% heads has low entropy (very predictable).
  • Human explanation: "The output distribution is spread across many tokens" (high entropy) or "The output distribution is concentrated on few tokens" (low entropy).
  • Used in: Paper 2 (ΔH safety signal, entropy reduction from modifiers)
  • CLI: mc analyze entropy-trajectory

KL Divergence (Kullback-Leibler)

A measure of how different one probability distribution is from another.

  • Analogy: Measuring how surprised you'd be if you expected distribution A but got distribution B. It's asymmetric - expecting heads but getting tails is different from expecting tails but getting heads.
  • Human explanation: "The adapter dramatically shifts the output distribution" (high KL) or "The adapter barely changes the output distribution" (low KL).

Hessian Eigenspectrum

The second-derivative matrix of the loss function, revealing the "curvature" of the optimization landscape.

  • Analogy: Standing on a hillside, the Hessian tells you not just the slope (gradient) but how the slope changes in each direction. Positive eigenvalues = bowl-shaped (stable). Negative = saddle point (unstable). Zero = flat direction.
  • Human explanation: "More positive eigenvalues means more locally bowl-shaped; mixed signs indicate saddle-like behavior."

Intrinsic Dimension (Two-NN)

The "true" number of independent directions in a dataset, estimated by looking at nearest-neighbor ratios.

  • Analogy: A piece of paper lives in 3D space but is actually 2-dimensional. Two-NN estimates how many dimensions the data "really" occupies.
  • Human explanation: "This model's representations live on a simpler surface than you'd expect from the raw dimension count."
  • Used in: Paper 5 (primary ID estimation method)
  • Reference: Facco et al. (2017). DOI:10.1038/s41598-017-11873-y

Advanced Metrology (CABE / Synthesis)

Sectional Curvature ($K$)

A measure of the "ruggedness" of the activation manifold.

  • Analogy: If the latent space is a golf course, $K$ tells you if you are on a flat green (stable) or a steep bunker (chaotic).
  • ML Equivalent: A local measure of curvature/sensitivity under a specific probe setup. Interpret relative to baselines; avoid hard thresholds.

Ghost Anchor (Synthesis) [CONJECTURAL]

A relational coordinate in a Target Model synthesized from a Source Model's relational footprint.

  • Analogy: Placing a "Virtual Flag" in a new city by knowing its exact distance from all probes in an old city.
  • ML Equivalent: Zero-shot weight synthesis. We "print" a new feature footprint into a model that was never trained on that data.

Relational Stress

The error metric for Manifold Transfer. It measures how much the relative distances between anchors drifted during projection.

  • Analogy: Stretching a rubber map over a globe. "Stress" is where the rubber starts to tear because the shapes don't fit exactly.
  • Human explanation: "Higher stress means more distortion of relative distances during transfer; lower stress means distances were preserved more closely."

Concept Volume (Influence)

Modeling a concept as a probability distribution (volume) rather than a single point (centroid).

  • Analogy: Instead of a "Dot" on a map, think of a "Fog Cloud." The density of the fog tells you how strongly that concept influences a specific latent region.
  • ML Equivalent: A Mahalanobis-regularized covariance matrix representing a concept's "Area of Effect" in the latent space.

3D Spatial Metrology

Spatial Prime Atlas

A collection of 23 anchor prompts designed to probe a model's encoding of 3-dimensional spatial relationships.

  • Categories: Vertical (ceiling, floor, sky), Lateral (left, right, east, west), Depth (foreground, background), Mass (balloon, stone), Furniture (chair, table, lamp).
  • Coordinates: Each anchor has expected (X, Y, Z) coordinates: X=lateral (Left=-1, Right=+1), Y=vertical (Down=-1, Up=+1), Z=depth (Far=-1, Near=+1).
  • Human explanation: "We test if the model has a consistent '3D map' inside its representations by seeing where spatial concepts cluster."

Gravity Gradient [CONJECTURAL]

The degree to which a model's latent space encodes a "down" direction where heavy objects cluster.

  • Analogy: In physics, heavy objects fall toward Earth. In a model with a gravity gradient, representations of "anvil" and "stone" cluster near "floor" and "ground," while "balloon" and "feather" cluster near "ceiling" and "sky."
  • Mass Correlation: A score from -1 to +1 measuring how well perceived mass correlates with position on the vertical axis.
  • Human explanation: "The model treats 'heavy' and 'down' as geometrically related, not just semantically related."

Volumetric Density

A measure of representational "mass" based on activation magnitude and neighborhood crowding.

  • Analogy: In physics, density = mass/volume. In latent space, we measure how "concentrated" a concept's representation is by its activation norm and how many neighbors crowd around it.
  • Inverse-Square Compliance: Does the representational density of distant objects (horizon, background) attenuate like light intensity (1/r²)?
  • Human explanation: "Heavy objects have 'denser' representations, and distant objects have 'fainter' representations, just like in physics."

Z-Axis Occlusion

Whether the model represents objects in depth order where "near" objects can block "far" objects.

  • Parallax Test: Moving the "observer" should shift foreground objects more than background objects.
  • Occlusion Probe: "The cup is in front of the bookshelf" vs "The bookshelf is behind the cup" should encode the same depth relationship.
  • Human explanation: "The model understands that closer things block farther things, not just that 'front' and 'back' are different words."

World Model Score (Visual-Spatial Grounding Density) [CONJECTURAL]

A raw measurement (0.0 to 1.0) of how concentrated a model's probability mass is along human-perceptual 3D axes.

  • Working hypothesis: Models encode physical invariants as structure in representation space; this metric estimates concentration along human-perceptual 3D axes under a fixed probe setup.
  • VL Models: Often show more concentration along human-perceptual 3D axes.
  • Text Models: May encode similar relations, but with probability distributed differently (e.g., along linguistic/formula patterns rather than visual patterns).
  • What it measures: The score indicates concentration on human-perceptual 3D axes. Higher values = more concentrated; lower values = more diffuse or aligned with alternative axes (linguistic, formula-based).
  • Analogy: A blind physicist understands gravity geometrically through equations and tactile experience. A sighted physicist has the same geometric knowledge but with probability concentrated on visual axes. Neither is "abstract"—both are geometric, just with different probability densities.
  • Human explanation: "This score measures how much the model's spatial concepts align with human visual experience, not whether it understands physics."

Social Geometry

Social Prime Atlas

A collection of 23 anchor prompts designed to probe a model's encoding of social relationships and hierarchies.

  • Categories: Power Hierarchy, Formality, Kinship, Status Markers, Age.
  • Axes: Power (status), Kinship (social distance), Formality (linguistic register).
  • Human explanation: "We test if the model has consistent 'social intuition' by seeing where social concepts cluster."

Social Manifold

The geometric structure in representation space encoding social relationships.

  • Working hypothesis: Models trained on human text encode multiple social axes (e.g., power, kinship, formality) that can be probed and tested for orthogonality/consistency.
  • Human explanation: "We probe whether the model separately encodes 'who has power over whom', 'who is socially close', and 'what register to use'."

Power Axis

A geometric dimension encoding status hierarchy from low to high.

  • Anchors: slave → servant → citizen → noble → emperor
  • Human explanation: "The model learned that 'slave' is below 'servant' is below 'citizen' without explicit labels."

Kinship Axis

A geometric dimension encoding social distance from adversarial to intimate.

  • Anchors: enemy → stranger → acquaintance → friend → family
  • Interpretation: Represents the model's implicit understanding of social closeness.
  • Human explanation: "The model understands that 'friend' is socially closer than 'stranger'."

Formality Axis

A geometric dimension encoding linguistic register from casual to formal.

  • Anchors: hey → hi → hello → greetings → salutations
  • Application: Could enable "politeness transfer" between models with different default registers.
  • Human explanation: "The model knows that 'salutations' is more formal than 'hey'."

Social Manifold Score (SMS)

A raw measurement (0.0 to 1.0) of how well a model encodes social structure.

  • Components: Computed from orthogonality, gradient consistency, and power detection (see the tool’s JSON output for fields and weights).
  • What it measures: Higher values indicate stronger encoding of orthogonal power/kinship/formality axes. Lower values indicate weaker or more diffuse social geometry.
  • Human explanation: "This score measures how strongly the model encodes implicit social relationships."

Latent Sociologist Hypothesis [CONJECTURAL]

The hypothesis that language models encode social relationships through geometric structure in their representation space.

  • Status: Research hypothesis. Treat as falsifiable and dataset-dependent; prefer reporting the measured SMS components and baseline context rather than qualitative conclusions.

Architecture Terms (AI Legibility)

Hexagonal Architecture (Ports and Adapters)

An architectural pattern where the core domain logic is isolated from external concerns.

  • Analogy: The domain is the "brain" that only speaks through well-defined interfaces (ports). Adapters translate between the brain and the outside world (databases, APIs, UIs).
  • Human explanation: "Core math/logic is in domain/, it talks to the outside world through ports/, and concrete implementations live in adapters/."

Port

An abstract interface (Python Protocol) that defines what operations the domain needs.

  • Location in ModelCypher: src/modelcypher/ports/
  • Example: Backend protocol defines tensor operations; the macOS backend implements it for Apple Silicon.

Adapter

A concrete implementation of a port that connects to external systems.

  • Location in ModelCypher: src/modelcypher/adapters/
  • Example: local_training.py implements the TrainingEngine port.