Purpose: This document defines the precise meaning of terms used in ModelCypher. It serves as a "Handshake Protocol" between Human Users and AI Agents to ensure we are talking about the same concepts.
Notes:
- In this repo, run commands as
poetry run mc .... - Global CLI options can appear anywhere on the command line (example:
mc model info ./model --output text).
The high-dimensional geometric structure induced by a model’s representations under a given task/probe setup.
- Analogy: A crumpled sheet of paper in a 3D room. The room is the Parameter Space (billions of dimensions), but the paper (the model's actual behavior) is a lower-dimensional surface.
- Operation: We compare/stitch manifolds by aligning representations on shared probes.
The minimum number of variables needed to describe a model's state.
- Analogy: A car moves in 3D space (x, y, z), but its "Intrinsic Dimension" is 2 (steering wheel angle, gas pedal).
- Constraint view: A d-dimensional manifold embedded in R^D is a D-vector with (D−d) constraints; 2D is 3D with one constraint, 3D is 4D with one constraint, and so on.
- Relevance: We explore whether some refusal/safety behaviors exhibit lower intrinsic dimension under specific probes. This is an empirical question, not a universal rule.
- Used in: Paper 0, Paper 5 (primary focus on dimensionality cliff and plateau)
- CLI:
mc analyze dimension-profile
A conceptual primitive (e.g., "I", "YOU", "GOOD", "BAD") from the Natural Semantic Metalanguage (NSM) tradition, proposed (and debated) as cross-linguistically universal.
- ModelCypher usage: We use semantic primes as a candidate anchor inventory. Whether they are invariant across model families is a falsifiable hypothesis, not an assumption.
- Used in: Paper 0, Paper 1 (includes full 65-item inventory in Appendix A)
- CLI:
mc analyze concept-volume
When two models (a Base Model and a Sidecar Adapter) process the same input in parallel without merging their weights.
- Analogy: A driving instructor (Sidecar) sitting next to a student (Base Model), grabbing the wheel only when necessary.
The transformer forward pass viewed as ordered composition of state transforms, not additive layer-wise information construction.
- Equation:
h_0 = Embed(prefix),h_{l+1} = T_l(h_l),h_L = (T_{L-1} ∘ ... ∘ T_0)(h_0) - Order property: Composition is non-commutative (
T_1(T_2(h)) != T_2(T_1(h))), so trajectory order carries signal. - Human explanation: "The model keeps transforming the same evolving state through different geometric views; it is not stacking independent semantic bricks."
- See also: GEOMETRY-GUIDE.md, MISSION.md
In ModelCypher terminology, the "object" is the internal geometric mechanism: the manifold trajectory of hidden states generated by composed transforms.
- Operational meaning: This is what drives behavior within a forward pass (
h_0 -> ... -> h_L). - Human explanation: "The object is the actual high-dimensional structure doing the work."
- Contrast: The object is not the token-probability table itself.
- See also: MISSION.md
In ModelCypher terminology, the "shadow" is the observable projection of terminal geometry into token likelihoods.
- Equation:
logits_t = W_out h_{L,t} + b,p(token_t | prefix) = softmax(logits_t) - Operational meaning: Useful for decoding and evaluation, but not the internal causal mechanism of state evolution.
- Human explanation: "The shadow is what we can rank and sample from after geometry has already done its work."
- See also: GEOMETRY-GUIDE.md, MISSION.md
Distance measured along the data manifold via shortest paths on a k-NN graph.
- Key principle: When ModelCypher reports a distance between points/representations, it is usually geodesic (k-NN graph shortest path), not raw Euclidean.
- Why it matters: Euclidean distance can become less informative in high dimensions. Geodesic distance follows the local manifold structure implied by the point cloud/probe setup.
- How it works: Build a k-NN graph (edges weighted by Euclidean distance—the one bootstrap step), then compute shortest paths through the graph.
- Human explanation: "Distance measured by roads, not as the crow flies. In curved high-dimensional space, the straight line isn't the true path."
A measure of similarity between two neural network layers that is robust to rotation.
- Range: 0.0 (low similarity on the probe set) to 1.0 (identical centered Gram structure on the probe set).
- Operational meaning: CKA = 1.0 means identical centered Gram structure on the probe set.
- Bias note: Finite sampling can bias CKA (inflate or deflate). Use debiased HSIC and feature-sampling correction when possible.
- Used in: Paper 0, Paper 1, Paper 3, Paper 4, Paper 5
- CLI: CKA is computed internally during training and merge verification (no standalone CLI command)
Raw spectral measurements of a geodesic k-NN graph: Laplacian eigenvalues, algebraic connectivity (λ₂), component count (zero-eigenvalue multiplicity), and heat trace
- Operational meaning: Encodes diffusion geometry of the discrete manifold without requiring shared coordinates.
- Used in:
mc analyze spectral-trajectory - References: von Luxburg, A Tutorial on Spectral Clustering (PDF, arXiv:0711.0189); Heat Kernel Goes Topological (PDF, arXiv:2507.12380)
A measure of overlap between the active dimensions of two models.
-
Formula:
$|A \cap B| / |A \cup B|$ - Use: Determines if two models "speak the same language" regarding a specific prompt corpus.
The norm of the activation vector associated with a refusal response (e.g., "I cannot do that").
Active but non-functional tokens (e.g., "Sure!", "Here is a...", "calibrating flux") that do not advance the reasoning trajectory but serve to "grease" the conversation.
- Note: Excessive low-information “filler” can correlate with higher entropy and lower factuality in some settings, but this is not a reliable detector on its own.
A data structure (JSON) recording overlap diagnostics between two models under a probe corpus (e.g., layer-wise correlation/CKA-style signals).
- Analogy: A “Venn diagram” of overlap under the chosen probe setup (not a claim of identical knowledge).
A geometric safety framing in which “safe” behavior is defined as staying within a bounded region of representation space (often modeled as a convex polytope).
- Note: This is an active research direction (e.g., SaP: PDF, arXiv:2505.24445); ModelCypher treats it as a conceptual target rather than a universal claim.
A specialized, lightweight adapter (LoRA) trained to enforce specific geometric constraints (e.g., Safety, Persona) without altering the base model's general capabilities.
A measure of overlap between two probability distributions.
- Analogy: Imagine two bells (Gaussians) placed on a number line. The Bhattacharyya Coefficient measures how much they overlap. 1.0 = exact overlap (same bell), 0.0 = no overlap (completely separate).
- Human explanation: "These two models see this concept in almost the same way (high overlap)" or "These models have very different representations (low overlap)."
A measure of structural similarity between two metric spaces (manifolds) that don't share a common coordinate system.
- Analogy: Comparing the street layouts of two cities without knowing their GPS coordinates. You compare "how things connect to each other" rather than absolute positions. Low distance = similar structure.
- Human explanation: "The internal organization of these two models is similar, even though they don't share the same coordinate frame."
A method to rotate/scale one set of points to minimize alignment error.
- Analogy: Placing two photographs on top of each other and rotating/scaling one until the faces align as closely as possible.
- Human explanation: "We're finding the rotation that minimizes alignment error so we can compare these representations in a shared frame."
- Used in: Paper 0, Paper 3 (anchor-locked Procrustes for adapter transfer)
- See also: docs/research/math/gromov_wasserstein.md
A mathematical framework for finding the cheapest way to transform one distribution into another.
- Analogy: Moving piles of sand from one set of locations to another with minimum total effort. Each grain finds its "destination" and we measure the total cost.
- Human explanation: "We're computing how much 'effort' it takes to morph one model's representation into another's."
A measure of uncertainty or information content in a probability distribution.
- Analogy: An unbiased coin has high entropy (maximum uncertainty). A rigged coin with 99% heads has low entropy (very predictable).
- Human explanation: "The output distribution is spread across many tokens" (high entropy) or "The output distribution is concentrated on few tokens" (low entropy).
- Used in: Paper 2 (ΔH safety signal, entropy reduction from modifiers)
- CLI:
mc analyze entropy-trajectory
A measure of how different one probability distribution is from another.
- Analogy: Measuring how surprised you'd be if you expected distribution A but got distribution B. It's asymmetric - expecting heads but getting tails is different from expecting tails but getting heads.
- Human explanation: "The adapter dramatically shifts the output distribution" (high KL) or "The adapter barely changes the output distribution" (low KL).
The second-derivative matrix of the loss function, revealing the "curvature" of the optimization landscape.
- Analogy: Standing on a hillside, the Hessian tells you not just the slope (gradient) but how the slope changes in each direction. Positive eigenvalues = bowl-shaped (stable). Negative = saddle point (unstable). Zero = flat direction.
- Human explanation: "More positive eigenvalues means more locally bowl-shaped; mixed signs indicate saddle-like behavior."
The "true" number of independent directions in a dataset, estimated by looking at nearest-neighbor ratios.
- Analogy: A piece of paper lives in 3D space but is actually 2-dimensional. Two-NN estimates how many dimensions the data "really" occupies.
- Human explanation: "This model's representations live on a simpler surface than you'd expect from the raw dimension count."
- Used in: Paper 5 (primary ID estimation method)
- Reference: Facco et al. (2017). DOI:10.1038/s41598-017-11873-y
A measure of the "ruggedness" of the activation manifold.
-
Analogy: If the latent space is a golf course,
$K$ tells you if you are on a flat green (stable) or a steep bunker (chaotic). - ML Equivalent: A local measure of curvature/sensitivity under a specific probe setup. Interpret relative to baselines; avoid hard thresholds.
A relational coordinate in a Target Model synthesized from a Source Model's relational footprint.
- Analogy: Placing a "Virtual Flag" in a new city by knowing its exact distance from all probes in an old city.
- ML Equivalent: Zero-shot weight synthesis. We "print" a new feature footprint into a model that was never trained on that data.
The error metric for Manifold Transfer. It measures how much the relative distances between anchors drifted during projection.
- Analogy: Stretching a rubber map over a globe. "Stress" is where the rubber starts to tear because the shapes don't fit exactly.
- Human explanation: "Higher stress means more distortion of relative distances during transfer; lower stress means distances were preserved more closely."
Modeling a concept as a probability distribution (volume) rather than a single point (centroid).
- Analogy: Instead of a "Dot" on a map, think of a "Fog Cloud." The density of the fog tells you how strongly that concept influences a specific latent region.
- ML Equivalent: A Mahalanobis-regularized covariance matrix representing a concept's "Area of Effect" in the latent space.
A collection of 23 anchor prompts designed to probe a model's encoding of 3-dimensional spatial relationships.
- Categories: Vertical (ceiling, floor, sky), Lateral (left, right, east, west), Depth (foreground, background), Mass (balloon, stone), Furniture (chair, table, lamp).
- Coordinates: Each anchor has expected (X, Y, Z) coordinates: X=lateral (Left=-1, Right=+1), Y=vertical (Down=-1, Up=+1), Z=depth (Far=-1, Near=+1).
- Human explanation: "We test if the model has a consistent '3D map' inside its representations by seeing where spatial concepts cluster."
The degree to which a model's latent space encodes a "down" direction where heavy objects cluster.
- Analogy: In physics, heavy objects fall toward Earth. In a model with a gravity gradient, representations of "anvil" and "stone" cluster near "floor" and "ground," while "balloon" and "feather" cluster near "ceiling" and "sky."
- Mass Correlation: A score from -1 to +1 measuring how well perceived mass correlates with position on the vertical axis.
- Human explanation: "The model treats 'heavy' and 'down' as geometrically related, not just semantically related."
A measure of representational "mass" based on activation magnitude and neighborhood crowding.
- Analogy: In physics, density = mass/volume. In latent space, we measure how "concentrated" a concept's representation is by its activation norm and how many neighbors crowd around it.
- Inverse-Square Compliance: Does the representational density of distant objects (horizon, background) attenuate like light intensity (1/r²)?
- Human explanation: "Heavy objects have 'denser' representations, and distant objects have 'fainter' representations, just like in physics."
Whether the model represents objects in depth order where "near" objects can block "far" objects.
- Parallax Test: Moving the "observer" should shift foreground objects more than background objects.
- Occlusion Probe: "The cup is in front of the bookshelf" vs "The bookshelf is behind the cup" should encode the same depth relationship.
- Human explanation: "The model understands that closer things block farther things, not just that 'front' and 'back' are different words."
A raw measurement (0.0 to 1.0) of how concentrated a model's probability mass is along human-perceptual 3D axes.
- Working hypothesis: Models encode physical invariants as structure in representation space; this metric estimates concentration along human-perceptual 3D axes under a fixed probe setup.
- VL Models: Often show more concentration along human-perceptual 3D axes.
- Text Models: May encode similar relations, but with probability distributed differently (e.g., along linguistic/formula patterns rather than visual patterns).
- What it measures: The score indicates concentration on human-perceptual 3D axes. Higher values = more concentrated; lower values = more diffuse or aligned with alternative axes (linguistic, formula-based).
- Analogy: A blind physicist understands gravity geometrically through equations and tactile experience. A sighted physicist has the same geometric knowledge but with probability concentrated on visual axes. Neither is "abstract"—both are geometric, just with different probability densities.
- Human explanation: "This score measures how much the model's spatial concepts align with human visual experience, not whether it understands physics."
A collection of 23 anchor prompts designed to probe a model's encoding of social relationships and hierarchies.
- Categories: Power Hierarchy, Formality, Kinship, Status Markers, Age.
- Axes: Power (status), Kinship (social distance), Formality (linguistic register).
- Human explanation: "We test if the model has consistent 'social intuition' by seeing where social concepts cluster."
The geometric structure in representation space encoding social relationships.
- Working hypothesis: Models trained on human text encode multiple social axes (e.g., power, kinship, formality) that can be probed and tested for orthogonality/consistency.
- Human explanation: "We probe whether the model separately encodes 'who has power over whom', 'who is socially close', and 'what register to use'."
A geometric dimension encoding status hierarchy from low to high.
- Anchors: slave → servant → citizen → noble → emperor
- Human explanation: "The model learned that 'slave' is below 'servant' is below 'citizen' without explicit labels."
A geometric dimension encoding social distance from adversarial to intimate.
- Anchors: enemy → stranger → acquaintance → friend → family
- Interpretation: Represents the model's implicit understanding of social closeness.
- Human explanation: "The model understands that 'friend' is socially closer than 'stranger'."
A geometric dimension encoding linguistic register from casual to formal.
- Anchors: hey → hi → hello → greetings → salutations
- Application: Could enable "politeness transfer" between models with different default registers.
- Human explanation: "The model knows that 'salutations' is more formal than 'hey'."
A raw measurement (0.0 to 1.0) of how well a model encodes social structure.
- Components: Computed from orthogonality, gradient consistency, and power detection (see the tool’s JSON output for fields and weights).
- What it measures: Higher values indicate stronger encoding of orthogonal power/kinship/formality axes. Lower values indicate weaker or more diffuse social geometry.
- Human explanation: "This score measures how strongly the model encodes implicit social relationships."
The hypothesis that language models encode social relationships through geometric structure in their representation space.
- Status: Research hypothesis. Treat as falsifiable and dataset-dependent; prefer reporting the measured SMS components and baseline context rather than qualitative conclusions.
An architectural pattern where the core domain logic is isolated from external concerns.
- Analogy: The domain is the "brain" that only speaks through well-defined interfaces (ports). Adapters translate between the brain and the outside world (databases, APIs, UIs).
- Human explanation: "Core math/logic is in
domain/, it talks to the outside world throughports/, and concrete implementations live inadapters/."
An abstract interface (Python Protocol) that defines what operations the domain needs.
- Location in ModelCypher:
src/modelcypher/ports/ - Example:
Backendprotocol defines tensor operations; the macOS backend implements it for Apple Silicon.
A concrete implementation of a port that connects to external systems.
- Location in ModelCypher:
src/modelcypher/adapters/ - Example:
local_training.pyimplements theTrainingEngineport.