MCP Document Processor

An intelligent document processing system that uses the Model Context Protocol (MCP) to extract, analyze, and route business documents automatically.

Overview

The MCP Document Processor is a full-stack application (FastAPI backend + React frontend) that automates document processing workflows. The system classifies documents, extracts relevant information, and provides a user-friendly interface for document management.

How MCP Works

The Model Context Protocol (MCP) is the core framework that powers this document processor. Here's how it works in simple terms:

Document Context: When a document is uploaded, MCP creates a "context" object that acts as a container for:
- The original document text
- Document metadata (filename, upload time, etc.)
- Processing history log
- Extracted information with confidence scores
Memory System: These context objects are stored in a persistent memory system (file-based storage), allowing them to survive between API calls and server restarts.
Smart Routing: The system examines each document and routes it to the appropriate specialized processor:
```
Document → Router → Invoice/Contract/Email/Default Processor
```

Processing Pipeline: Each document follows this workflow:

Upload → Classification → Information Extraction → Storage → Viewing

History & Confidence Tracking: Every processing step is recorded with timestamps and confidence scores, providing full transparency into how information was extracted.

Key Features

Document Classification: Automatically identifies document types (invoices, contracts, emails)
Information Extraction: Extracts key data using ML models with confidence scoring
Persistent Storage: File-based document storage that persists between server restarts
Processing Pipeline: Specialized processors for different document types
Modern UI: React-based dashboard for document upload, viewing, and management

Architecture

Backend

Context System: MCPContext objects maintain document state throughout processing
Memory System: Configurable storage (in-memory or file-based) with TTL management
Processor Router: Routes documents to specialized processors based on content
Document Processors: Type-specific processors with a default fallback processor
ML Models: Mock implementations of entity extraction, classification, and summarization

Frontend

Dashboard: Document statistics and listing with filtering options
Document View: Detailed view of processed documents and extracted data
Upload Interface: Simple drag-and-drop document upload

Setup

Backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start server
python app.py  # Runs on http://localhost:8000

Frontend

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start development server
npm start  # Runs on http://localhost:3000

API Endpoints

POST /api/documents/upload: Upload a new document
POST /api/documents/{document_id}/process: Process a document
GET /api/documents/{document_id}: Get document details
GET /api/documents: List all documents
DELETE /api/documents/{document_id}: Delete a document
GET /api/memory-status: Check memory store status (debugging)

UI Features

Dashboard: Document statistics and listing with filtering options
Upload Interface: Simple drag-and-drop document upload
Document View: Detailed view of processed documents and extracted data

Technical Notes

Documents are stored in data/documents/ with a 24-hour TTL
The DefaultProcessor handles any document type that specialized processors cannot handle
CORS is enabled to allow frontend-backend communication
Proxy is configured in package.json to route API requests during development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP Document Processor

Overview

How MCP Works

Key Features

Architecture

Backend

Frontend

Setup

Backend

Frontend

API Endpoints

UI Features

Technical Notes

FilesExpand file tree

MCP.md

Latest commit

History

MCP.md

File metadata and controls

MCP Document Processor

Overview

How MCP Works

Key Features

Architecture

Backend

Frontend

Setup

Backend

Frontend

API Endpoints

UI Features

Technical Notes