Resource Pipeline

Note: For high-level documentation, see the Documentation Index. This document provides implementation details.

Unified ETL (Extract, Transform, Load) pipeline for managing all resource types in midex. Replaces the fragmented content-registry and project-discovery systems with a clean, plugin-based architecture.

Architecture

server/src/
├── index.ts                  # Public API
├── manager.ts                # ResourceManager (orchestrator)
├── pipeline.ts               # ETL pipeline runner
├── types.ts                  # Core interfaces
│
├── lib/                      # Pipeline components
│   ├── extractor.ts          # Extract: find & read from filesystem
│   ├── transformer.ts        # Transform: parse, validate, normalize
│   ├── loader.ts             # Load: persist to database
│   ├── conflict.ts           # Conflict resolution
│   ├── hash.ts               # SHA-256 hashing
│   ├── validator.ts          # Zod validation wrapper
│   └── project-association.ts # Project association manager
│
├── plugins/                  # Resource plugins
│   ├── content.ts            # Agents/rules/workflows
│   ├── projects.ts           # Project discovery/association
│   └── tool-configs/         # AI tool configurations
│       ├── README.md         # Comprehensive documentation
│       ├── plugin.ts         # Main orchestrator
│       ├── transformer.ts    # Config transformation + secret redaction
│       ├── utils.ts          # Shared utilities (tool detection, paths)
│       ├── types.ts          # Type definitions
│       ├── extractors/       # Tool-specific extractors
│       │   ├── claude-code.ts # Claude Code CLI
│       │   ├── cursor.ts     # Cursor IDE
│       │   ├── windsurf.ts   # Windsurf (Codeium)
│       │   ├── vscode.ts     # VS Code + Copilot
│       │   └── intellij.ts   # IntelliJ (Junie)
│       └── __tests__/
│           └── validation.test.ts # Comprehensive test suite
│
└── schemas/                  # Validation schemas
    ├── content-schemas.ts    # Content types
    └── tool-config-schemas.ts # Tool configurations

Core Concepts

ETL Pipeline

Every resource type follows the same three-stage pattern:

Extract: Find and read resources from filesystem or other sources
Transform: Parse, validate, and normalize into structured format
Load: Persist to database with conflict resolution

Plugin Architecture

Each resource type is a plugin that implements the ResourcePlugin interface:

interface ResourcePlugin<T> {
  readonly name: string;
  readonly resourceType: string;

  extract(options: ExtractOptions): Promise<RawResource[]>;
  transform(raw: RawResource, options?: TransformOptions): Promise<TransformedResource<T>>;
  load(transformed: TransformedResource<T>, options: LoadOptions): Promise<void>;
  sync?(context: PipelineContext): Promise<SyncResult>;
}

Benefits:

Self-contained: Each plugin encapsulates all logic for its resource type
Reusable components: Extractor, transformer, loader shared across plugins
Easy to extend: Add new plugins without modifying core system
Type-safe: Full TypeScript support with generics

Usage

Initialize ResourceManager

import { ResourceManager } from '@/src';
import { initDatabase } from '@/database';

const db = await initDatabase();
const manager = await ResourceManager.init({
  database: db.connection,
  basePath: 'server/content',
});

Sync All Resources

// Sync all plugins (content + projects + tool-configs)
const results = await manager.syncAll();

console.log(`Content: +${results.content.added}, !${results.content.updated}`);
console.log(`Projects: +${results.projects.added}`);
console.log(`Tool Configs: +${results['tool-configs'].added}, !${results['tool-configs'].updated}`);

Sync Specific Plugin

// Sync only content
const result = await manager.sync('content');

console.log(`Added: ${result.added}`);
console.log(`Errors: ${result.errors}`);

MCP Integration

The MCP server (server/mcp/) uses the shared database tables populated by this pipeline:

workflows: Read by MCP to list available workflows and retrieve phases
agents: Read by MCP to get agent personas for workflow steps

The MCP server exposes these resources via a resources-first architecture:

7 resources (READ operations)
2 tools (WRITE operations: workflow.start, workflow.next_step)

Workflow state is tracked in dedicated v2 tables (workflow_executions_v2, workflow_steps_v2) managed by the MCP server.

Query Resources

// Query workflows
const workflows = await manager.query('workflow', {
  tags: ['security'],
  limit: 10,
});

// Get single resource
const agent = await manager.get('agent', 'supervisor');

Register Custom Plugin

import { ResourcePlugin } from '@/src';

class CustomPlugin implements ResourcePlugin {
  readonly name = 'custom';
  readonly resourceType = 'custom';

  async extract(options) { /* ... */ }
  async transform(raw, options) { /* ... */ }
  async load(transformed, options) { /* ... */ }
}

manager.registerPlugin(new CustomPlugin());
await manager.sync('custom');

Built-in Plugins

ContentPlugin

Manages agents, rules, and workflows as unified content resources.

Extraction:

Scans server/content/agents/, server/content/rules/, server/content/workflows/
Reads markdown files with frontmatter
Computes SHA-256 hash for change detection

Transformation:

Parses frontmatter using gray-matter
Validates against Zod schemas (AgentFrontmatterSchema, etc.)
Normalizes data structure

Loading:

Persists to agents, rules, workflows tables
Upserts on conflict (by name)
Stores tags as JSON arrays

ProjectsPlugin

Discovers and tracks projects across sessions.

Extraction:

Scans parent directory for subdirectories
Detects git repositories (.git directory)
Reads package.json metadata when available

Transformation:

Structures project data (name, path, isGitRepo, metadata)

Loading:

Persists to project_associations table
Updates last_used_at on access
Upserts on conflict (by path)

ToolConfigPlugin

Discovers and manages AI coding tool configurations across editors and IDEs.

Supported Tools:

Claude Code (CLI) - MCP, hooks, slash commands, agent rules
Cursor - MCP, settings, multi-file rules
Windsurf (Codeium) - MCP, multi-file rules, ignore patterns
VS Code (Copilot) - MCP, Copilot instructions, settings
IntelliJ (Junie) - MCP (limited)

Key Features:

Official patterns only: All configs based on vendor documentation
Cross-platform: Windows, macOS, Linux support
Dual-level: Project and user-level config extraction
Security: Secret redaction with configurable patterns
Incremental sync: Hash-based change detection

Configuration Files Extracted:

Claude Code: .mcp.json, .claude/settings.json, .claude/commands/*.md, CLAUDE.md

Cursor: .cursor/mcp.json, .cursor/settings.json, .cursor/rules/*.mdc, .cursorrules

Windsurf: .windsurf/mcp.json, .windsurf/rules/*.md, .windsurfrules, .codeiumignore

VS Code: .github/copilot-instructions.md, *.instructions.md, .vscode/settings.json

IntelliJ: .junie/mcp/mcp.json

For detailed documentation, see: Tool Config Plugin README

Pipeline Components

FilesystemExtractor

Generic extractor for filesystem resources:

Walks directory tree
Filters by patterns (e.g., *.md)
Excludes files starting with _ or .
Computes content hash
Captures last modified time

MarkdownTransformer

Specialized transformer for markdown resources:

Parses frontmatter with gray-matter
Validates frontmatter against Zod schema
Builds final data structure via callback
Optional strict/lenient validation modes

DatabaseLoader

Generic loader for database persistence:

Supports insert or upsert operations
Dynamic column mapping from data
Converts arrays/objects to JSON strings
Uses prepared statements for performance

Conflict Resolver

Resolves conflicts between filesystem and database:

Strategies: keep-newest, keep-filesystem, keep-database, manual
Hash-based detection: Compares SHA-256 hashes to detect changes
Timestamp-based resolution: Uses last modified time for keep-newest

Schemas

All content types validated with strict Zod schemas:

Agent Schema

{
  name: string (1-100 chars)
  description: string (max 500 chars)
  tags: string[] (max 20 items × 50 chars, default: [])
  version?: string (max 20 chars, optional)
}

Rule Schema

{
  name: string (1-100 chars)
  description: string (max 500 chars)
  globs: string[] (max 50 items × 200 chars, default: [])
  alwaysApply: boolean (default: false)
  tags: string[] (max 20 items × 50 chars, default: [])
}

Workflow Schema

{
  name: string (1-200 chars)
  description: string (max 2000 chars)
  tags: string[] (max 20 items × 50 chars, default: [])
  keywords: string[] (max 50 items × 100 chars, default: [])
  complexityHint?: 'simple' | 'moderate' | 'high'
}

Benefits Over Legacy Systems

Unified: Single system for all resource types
Consistent: Same ETL pattern across all types
Easy to extend: Just add a new plugin
Loosely coupled: Plugin interface decouples types from core
Type-safe: Full TypeScript with Zod validation

Performance

Incremental sync: Hash-based change detection avoids unnecessary updates
Prepared statements: Database operations use cached prepared statements
Parallel extraction: Can extract multiple resource types concurrently
Streaming: Large files handled efficiently with streaming readers

Testing

# Run all tests
npm test

# Run specific test
npx vitest src/manager.test.ts

Future Enhancements

Watch mode: Filesystem watching for real-time sync
Remote sources: Extract from APIs, S3, etc.
Bidirectional sync: Full conflict resolution with manual intervention UI
Batch operations: Bulk inserts for performance

API Reference

ResourceManager

class ResourceManager {
  static async init(options: ResourceManagerOptions): Promise<ResourceManager>

  registerPlugin(plugin: ResourcePlugin): void
  getPlugin(name: string): ResourcePlugin | undefined

  async syncAll(): Promise<Record<string, SyncResult>>
  async sync(pluginName: string): Promise<SyncResult>

  async query<T>(resourceType: string, options?: QueryOptions): Promise<T[]>
  async get<T>(resourceType: string, name: string): Promise<T | null>

  close(): void
}

Pipeline

class Pipeline {
  async run<T>(plugin: ResourcePlugin<T>, context: PipelineContext): Promise<SyncResult>
  async sync(plugin: ResourcePlugin, context: PipelineContext): Promise<SyncResult>

  async extract(plugin: ResourcePlugin, options: ExtractOptions): Promise<RawResource[]>
  async transform<T>(plugin: ResourcePlugin<T>, raw: RawResource, options?: TransformOptions): Promise<TransformedResource<T>>
  async load<T>(plugin: ResourcePlugin<T>, transformed: TransformedResource<T>, options: LoadOptions): Promise<void>
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource Pipeline

Architecture

Core Concepts

ETL Pipeline

Plugin Architecture

Usage

Initialize ResourceManager

Sync All Resources

Sync Specific Plugin

MCP Integration

Query Resources

Register Custom Plugin

Built-in Plugins

ContentPlugin

ProjectsPlugin

ToolConfigPlugin

Pipeline Components

FilesystemExtractor

MarkdownTransformer

DatabaseLoader

Conflict Resolver

Schemas

Agent Schema

Rule Schema

Workflow Schema

Benefits Over Legacy Systems

Performance

Testing

Future Enhancements

API Reference

ResourceManager

Pipeline

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Resource Pipeline

Architecture

Core Concepts

ETL Pipeline

Plugin Architecture

Usage

Initialize ResourceManager

Sync All Resources

Sync Specific Plugin

MCP Integration

Query Resources

Register Custom Plugin

Built-in Plugins

ContentPlugin

ProjectsPlugin

ToolConfigPlugin

Pipeline Components

FilesystemExtractor

MarkdownTransformer

DatabaseLoader

Conflict Resolver

Schemas

Agent Schema

Rule Schema

Workflow Schema

Benefits Over Legacy Systems

Performance

Testing

Future Enhancements

API Reference

ResourceManager

Pipeline