ContextCore

GPU-accelerated context memory for on-device AI agents on Apple Silicon.

Stop forgetting. Build context windows in under 5 ms.

The Problem

LLMs forget. As conversations grow, early turns drop out, irrelevant history burns tokens, and rebuilding context gets slower. ContextCore sits between your agent loop and the model, using Metal compute shaders to score, rank, compress, and pack context on-device.

Features

Feature	What it means for you
Sub-5 ms window builds	`buildWindow` runs at 4.89 ms p99 on M2—users never feel the overhead.
Four memory tiers	Working, episodic, semantic, and procedural memory each have their own retention and retrieval rules.
Metal-accelerated scoring	GPU shaders score 63 million chunks/sec and beat the CPU path at scale.
Progressive compression	When the token budget gets tight, low-signal chunks are compressed automatically.
Background consolidation	Episodic memory is deduplicated, durable facts are promoted, and obvious noise gets evicted.
Attention-aware reranking	Chunks are reordered so the model’s attention lands on the most useful content first.

Installation

Add ContextCore to your Package.swift:

dependencies: [
    .package(url: "https://github.com/christopherkarani/ContextCore.git", from: "1.0.0")
]

Or add it in Xcode: File → Add Package Dependencies… → paste the URL above.

Quick Start

import ContextCore

// 1. Create a context
let context = try AgentContext()

// 2. Start a session
try await context.beginSession(systemPrompt: "You are a senior Swift engineer.")

// 3. Append conversation turns
try await context.append(turn: Turn(role: .user, content: "How do I fix this actor leak?"))

// 4. Build a scored, ranked, and packed context window
let window = try await context.buildWindow(
    currentTask: "Debug actor isolation",
    maxTokens: 4096
)

// 5. Send to your model
let prompt = window.formatted(style: .chatML)

Persist knowledge across sessions

// Remember something important
try await context.remember("User prefers async/await over completion handlers")

// Recall it when relevant
let facts = try await context.recall(query: "user preferences", k: 5)

// Save session state to disk
try await context.checkpoint(to: checkpointURL)

// Restore later
let restored = try await AgentContext.load(from: checkpointURL)

How It Works

flowchart TB
    subgraph Client ["Your Application"]
        Input([User Input])
    end

    subgraph Core ["ContextCore Engine"]
        direction TB
        Orch[AgentContext]

        subgraph Metal ["Metal Acceleration"]
            Scoring[Scoring Kernel]
            Attn[Attention Kernel]
        end

        subgraph Mem ["Memory Tiers"]
            Episodic[(Episodic)]
            Semantic[(Semantic)]
            Procedural[(Procedural)]
        end

        Packer[Window Packer]
    end

    Input --> Orch
    Orch -->|Query| Mem
    Mem -->|Candidates| Scoring
    Scoring -->|Ranked Chunks| Attn
    Attn -->|Reranked| Packer
    Packer -->|Final Prompt| Model([LLM Inference])

    style Core fill:#fff,stroke:#000,stroke-width:2px,color:#000
    style Metal fill:#000,stroke:#fff,stroke-width:1px,color:#fff
    style Scoring fill:#000,stroke:#fff,stroke-width:1px,color:#fff
    style Attn fill:#000,stroke:#fff,stroke-width:1px,color:#fff
    style Client fill:#fff,stroke:#000,stroke-dasharray: 5 5
    style Model fill:#000,color:#fff

Every call to buildWindow runs this pipeline:

Embed the current task using on-device CoreML (MiniLM, 384-dim).
Score episodic and semantic memory in parallel on the GPU.
Rerank by attention centrality to prevent clustering.
Pack into the token budget, guaranteeing the N most-recent turns.
Compress overflow chunks progressively (light → heavy → drop).
Order chunks for optimal model attention.

Why ContextCore?

	Without ContextCore	With ContextCore
Recall	Forgets early turns as context fills.	Perfect recall via semantic search across days of history.
Speed	Slows down linearly with context growth.	Sub-5 ms window builds, GPU-accelerated.
Cost	Wastes tokens on irrelevant history.	Packs only high-value tokens; compresses the rest.
Coherence	Loses track of long-running tasks.	Procedural memory tracks tool usage and task patterns.

Performance

All numbers from an M2 MacBook Pro. See BENCHMARKS.md for full methodology.

xychart-beta
    title "Window Build Latency (p99) — Lower is Better"
    x-axis ["Target Limit", "ContextCore (M2)"]
    y-axis "Milliseconds (ms)" 0 --> 25
    bar [20.0, 6.54]

xychart-beta
    title "Consolidation Time (2000 chunks) — Lower is Better"
    x-axis ["Target Limit", "ContextCore (M2)"]
    y-axis "Milliseconds (ms)" 0 --> 500
    bar [500.0, 19.7]

Metric	Value
`buildWindow` p99	4.89 ms (500 turns, 4096 tokens)
Consolidation p99	15.61 ms (2000 chunks)
GPU scoring throughput	63.36 M chunks/sec
GPU math speedup	2.45× vs CPU at 50 K chunks

Bring Your Own Backends

ContextCore ships with sensible defaults, but every critical component is a protocol you can swap:

let config = ContextConfiguration(
    embeddingProvider: MyOpenAIEmbeddingProvider(),
    tokenCounter: TikTokenCounter(),
    compressionDelegate: MyLLMCompressor()
)

let context = try AgentContext(configuration: config)

Protocol	Default	You Could Use
`EmbeddingProvider`	CoreML MiniLM (384-dim)	OpenAI, Ollama, any vector model
`TokenCounter`	Word-count heuristic	tiktoken, SentencePiece
`CompressionDelegate`	Extractive (no LLM)	GPT-based summarization

Requirements

Platform	Minimum
iOS	17.0+
macOS	14.0+
visionOS	1.0+
Swift	6.2
Hardware	Metal-capable Apple Silicon

Documentation

Full documentation lives in the docs site, including architecture notes, API reference, and an FAQ.

Contributing

Contributions are welcome. See GitHub Issues to report bugs or suggest features.

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Sources		Sources
Tests/ContextCoreTests		Tests/ContextCoreTests
docs-site		docs-site
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextCore

The Problem

Features

Installation

Quick Start

Persist knowledge across sessions

How It Works

Why ContextCore?

Performance

Bring Your Own Backends

Requirements

Documentation

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ContextCore

The Problem

Features

Installation

Quick Start

Persist knowledge across sessions

How It Works

Why ContextCore?

Performance

Bring Your Own Backends

Requirements

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages