Skip to content

feat(skills): add databricks-memory-prompts skill#564

Open
menonpg wants to merge 1 commit into
databricks-solutions:mainfrom
menonpg:feature/memory-prompts-skill
Open

feat(skills): add databricks-memory-prompts skill#564
menonpg wants to merge 1 commit into
databricks-solutions:mainfrom
menonpg:feature/memory-prompts-skill

Conversation

@menonpg

@menonpg menonpg commented Jun 18, 2026

Copy link
Copy Markdown

Summary

Add a skill for building memory-aware prompts — AI applications that learn from production feedback using RAG + RLM (Recursive Language Modeling) architecture.

Based on soul.py — open source RAG + RLM memory architecture
Contributed by ThinkCreate.AI


The Problem

You deploy a PII redaction prompt. Users report bugs:

  • "It missed phone extensions like 555-1234 x789"
  • "It redacted Boston General Hospital but that is not patient PII"

You fix the prompt. A month later, a colleague deploys a similar prompt — same bugs. The learning was in your head, not in the system.


The Solution: RAG + RLM Memory Architecture

This skill implements the memory pattern from soul.py.

How the Three Tables Connect

┌─────────────────────────────────────────────────────────────────┐
│                    THE COMPLETE FLOW                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  USER REPORTS BUG                                               │
│  "555-1234 x789 wasn't redacted"                             │
│              │                                                   │
│              ▼                                                   │
│  ┌─────────────────────────────────────┐                        │
│  │     feedback table (raw input)      │                        │
│  │  "User said extension not caught"   │                        │
│  │  "User said hospital name redacted" │                        │
│  │  "User said brother name missed"    │                        │
│  └─────────────────────────────────────┘                        │
│              │                                                   │
│              │  Weekly Lakeflow job runs ai_query():            │
│              │  "Summarize these 3 complaints into 1 pattern"   │
│              │                                                   │
│              ▼  ← THIS IS RLM (LLM distills feedback)           │
│  ┌─────────────────────────────────────┐                        │
│  │     patterns table (distilled)      │                        │
│  │  "Phone extensions need handling"   │  confidence: 0.9       │
│  │  "Facility names are not PII"       │  confidence: 0.85      │
│  └─────────────────────────────────────┘                        │
│              │                                                   │
│              │  You explicitly document high-confidence ones    │
│              ▼                                                   │
│  ┌─────────────────────────────────────┐                        │
│  │     decisions table (explicit)      │                        │
│  │  "Use [NAME] not [REDACTED]"        │  confidence: 1.0       │
│  └─────────────────────────────────────┘                        │
│              │                                                   │
│              │  At prompt-time: SELECT FROM patterns, decisions │
│              ▼  ← THIS IS RAG (retrieve before generate)        │
│  ┌─────────────────────────────────────┐                        │
│  │     ENHANCED PROMPT                 │                        │
│  │  "You are a PII system...           │                        │
│  │   Learned patterns:                 │                        │
│  │   - Phone extensions need handling" │                        │
│  └─────────────────────────────────────┘                        │
│              │                                                   │
│              ▼                                                   │
│  LLM generates better output                                    │
│  (which may produce new feedback → cycle continues)             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RAG (Retrieval-Augmented Generation)

At prompt-time:

  1. Query patterns and decisions tables
  2. Inject relevant context into the prompt
  3. LLM sees your accumulated knowledge

RLM (Recursive Language Modeling)

Periodically:

  1. Collect raw feedback from feedback table
  2. Use ai_query() to distill into patterns
  3. Store in patterns table with confidence scores
  4. Recursive: These patterns inform future LLM calls, which produce better outputs, which generate less feedback

How It Works

Step 1: Create tables (user must do this manually)

CREATE SCHEMA IF NOT EXISTS my_catalog.memory;

-- Raw user corrections go here
CREATE TABLE IF NOT EXISTS my_catalog.memory.feedback (
    id STRING DEFAULT uuid(),
    correction TEXT,
    task_scope STRING,
    created_at TIMESTAMP DEFAULT current_timestamp()
);

-- Distilled patterns (RLM output)
CREATE TABLE IF NOT EXISTS my_catalog.memory.patterns (
    id STRING DEFAULT uuid(),
    pattern TEXT NOT NULL,
    evidence TEXT,
    task_scope STRING,
    confidence DOUBLE DEFAULT 0.8,
    created_at TIMESTAMP DEFAULT current_timestamp()
);

-- Explicit decisions you document
CREATE TABLE IF NOT EXISTS my_catalog.memory.decisions (
    id STRING DEFAULT uuid(),
    decision TEXT NOT NULL,
    rationale TEXT,
    task_scope STRING,
    created_at TIMESTAMP DEFAULT current_timestamp()
);

Step 2: Log feedback when users report issues

INSERT INTO my_catalog.memory.feedback (correction, task_scope)
VALUES ('Phone extension x789 was not redacted', 'pii_redaction');

Step 3: RLM distillation (in 3-learning-pipeline.md)

# Weekly job: distill feedback into patterns using ai_query()
feedback_df = spark.sql("""
    SELECT correction, COUNT(*) as freq
    FROM my_catalog.memory.feedback
    WHERE task_scope = 'pii_redaction'
    GROUP BY correction
    HAVING COUNT(*) >= 3
""")

# Use LLM to extract a pattern from similar feedback
patterns_df = feedback_df.withColumn("pattern", expr("""
    ai_query('databricks-meta-llama-3-3-70b-instruct',
        concat('Extract one pattern from this feedback: ', correction))
"""))

# Store distilled patterns
patterns_df.select("pattern").write.mode("append").saveAsTable("my_catalog.memory.patterns")

Step 4: RAG retrieval at prompt-time

# Query the database for patterns
patterns_df = spark.sql("""
    SELECT pattern FROM my_catalog.memory.patterns
    WHERE task_scope = 'pii_redaction'
    ORDER BY confidence DESC
    LIMIT 5
""")

# Build enhanced prompt with retrieved context
enhanced_prompt = base_prompt + "\n\nLearned patterns:\n"
for row in patterns_df.collect():
    enhanced_prompt += f"- {row.pattern}\n"

What the LLM sees:

You are a PII redaction system...

Learned patterns:
- Phone numbers with extensions (x1234) require explicit handling

Components

File Purpose
SKILL.md Main skill — quick start, RAG + RLM architecture overview
1-memory-schema.md Production DDL with indexes, retention, confidence decay
2-vector-search-setup.md Semantic retrieval via Databricks Vector Search
3-learning-pipeline.md RLM distillation — auto-extract patterns from feedback using Lakeflow + ai_query()
4-mlflow-integration.md Track which memories shaped each prompt version

Prerequisites

  • Unity Catalog with CREATE TABLE permission
  • Serverless SQL Warehouse or DBR 15.1+ cluster
  • Tables must be created manually (Step 1 above)

Why This Matters

"MLflow tracks what you deployed. Memory tracks what you learned."

The RAG + RLM pattern from soul.py enables:

  • Feedback accumulates → raw user corrections in feedback table
  • Patterns distill → LLM extracts patterns via ai_query() (RLM)
  • Context injects → patterns retrieved at prompt-time (RAG)
  • Learning compounds → better outputs produce less feedback

Without memory: Session 1 fixes a bug. Session 10 hits the same bug.

With memory: Session 1 logs the pattern. Session 10 inherits it automatically.


Related


Contribution from ThinkCreate.AI

Copilot AI review requested due to automatic review settings June 18, 2026 00:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Databricks “skill” (databricks-memory-prompts) documenting patterns for memory-aware prompt construction using Unity Catalog tables, Vector Search retrieval, a Lakeflow learning loop, and MLflow Prompt Registry/Tracing lineage.

Changes:

  • Introduces a new SKILL.md with end-to-end patterns (schema → retrieval → prompt enhancement → MLflow registration).
  • Adds reference guides for Unity Catalog DDL, Vector Search setup, a Lakeflow learning pipeline, and MLflow integration patterns.
  • Provides example snippets for tracing, experiments, and prompt evolution/lineage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
databricks-skills/databricks-memory-prompts/SKILL.md Main skill doc with quick start schema + MemoryPromptEnhancer example and usage patterns
databricks-skills/databricks-memory-prompts/1-memory-schema.md Detailed Unity Catalog DDL for memory tables and supporting structures
databricks-skills/databricks-memory-prompts/2-vector-search-setup.md Vector Search endpoint/index setup and query helper examples
databricks-skills/databricks-memory-prompts/3-learning-pipeline.md Lakeflow pipeline example for extracting/updating/decaying memories
databricks-skills/databricks-memory-prompts/4-mlflow-integration.md MLflow Prompt Registry + tracing + experiment tracking + evolution examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +41
CREATE TABLE IF NOT EXISTS catalog.memory.decisions (
id STRING DEFAULT uuid(),
decision TEXT NOT NULL,
context TEXT,
rationale TEXT,
created_at TIMESTAMP DEFAULT current_timestamp(),
confidence DOUBLE DEFAULT 1.0,
tags ARRAY<STRING>
);
Comment on lines +44 to +52
CREATE TABLE IF NOT EXISTS catalog.memory.patterns (
id STRING DEFAULT uuid(),
pattern TEXT NOT NULL,
evidence TEXT,
frequency INT DEFAULT 1,
last_seen TIMESTAMP DEFAULT current_timestamp(),
confidence DOUBLE,
tags ARRAY<STRING>
);
Comment on lines +55 to +63
CREATE TABLE IF NOT EXISTS catalog.memory.feedback (
id STRING DEFAULT uuid(),
run_id STRING,
input_hash STRING,
output TEXT,
correction TEXT,
feedback_type STRING, -- 'correction', 'preference', 'complaint'
created_at TIMESTAMP DEFAULT current_timestamp()
);
Comment on lines +83 to +101
def retrieve_context(self, task_description: str, k: int = 5) -> dict:
"""Retrieve relevant memories for a task."""
# Semantic search across memory
results = self.vs_client.index(self.vector_index).similarity_search(
query_text=task_description,
columns=["memory_type", "content", "confidence"],
num_results=k
)

# Group by type
context = {"decisions": [], "patterns": [], "feedback": []}
for row in results.get("result", {}).get("data_array", []):
memory_type = row[0]
if memory_type in context:
context[memory_type].append({
"content": row[1],
"confidence": row[2]
})
return context
Comment on lines +103 to +124
def enhance_prompt(self, base_prompt: str, task: str) -> str:
"""Enhance a prompt with memory context."""
context = self.retrieve_context(task)

# Build context block
context_lines = []

if context["decisions"]:
context_lines.append("## Relevant Decisions")
for d in context["decisions"]:
context_lines.append(f"- {d['content']} (confidence: {d['confidence']:.2f})")

if context["patterns"]:
context_lines.append("\n## Learned Patterns")
for p in context["patterns"]:
context_lines.append(f"- {p['content']}")

if context["feedback"]:
context_lines.append("\n## Past Feedback")
for f in context["feedback"]:
context_lines.append(f"- {f['content']}")

Comment on lines +353 to +362
mem = spark.sql(f"""
SELECT * FROM catalog.memory.{link.memory_type}s
WHERE id = '{link.memory_id}'
""").first()
if mem:
memories.append({
"type": link.memory_type,
"content": mem.get("decision") or mem.get("pattern") or mem.get("correction"),
"influence": link.influence_score
})
Comment on lines +14 to +17
decision TEXT NOT NULL COMMENT 'The decision that was made',
context TEXT COMMENT 'What situation led to this decision',
rationale TEXT COMMENT 'Why this decision was made',
alternatives TEXT COMMENT 'Other options considered',
Comment on lines +45 to +46
pattern TEXT NOT NULL COMMENT 'The learned pattern',
evidence TEXT COMMENT 'Examples or proof of this pattern',
Comment on lines +78 to +81
input_text TEXT COMMENT 'The input that produced the output',
input_hash STRING COMMENT 'Hash of input for deduplication',
output_text TEXT COMMENT 'What the model produced',
correction TEXT COMMENT 'What the user said it should be',
Comment on lines +113 to +116
memory_type STRING NOT NULL COMMENT 'decision, pattern, or feedback',
source_id STRING NOT NULL COMMENT 'ID in source table',
content TEXT NOT NULL COMMENT 'Text to embed',
embedding ARRAY<FLOAT> COMMENT 'Vector embedding',
@menonpg menonpg force-pushed the feature/memory-prompts-skill branch 2 times, most recently from b536ad2 to 39ad106 Compare June 18, 2026 00:47
Add a skill for building AI applications with persistent memory using
RAG + RLM (Recursive Language Modeling) architecture from soul.py.

Components:
- SKILL.md: Main skill with quick start and architecture overview
- 1-memory-schema.md: Unity Catalog DDL for decisions, patterns, feedback
- 2-vector-search-setup.md: Vector Search index configuration
- 3-learning-pipeline.md: Lakeflow pipeline for pattern extraction
- 4-mlflow-integration.md: Prompt Registry + Tracing integration

The core pattern:
1. Store what you learn (patterns, decisions) in Unity Catalog
2. Retrieve relevant context when building prompts
3. Inject context into prompts before calling the LLM

Based on: github.com/menonpg/soul.py
Contributed by: ThinkCreate.AI (thinkcreate.ai)
@menonpg menonpg force-pushed the feature/memory-prompts-skill branch from 39ad106 to 490ca2e Compare June 18, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants