Stop sending your sensitive documents to third-party services. FLAMEHAVEN FileSearch is a production-grade RAG search engine — BM25+hybrid retrieval, 34 file formats, multi-LLM (Gemini, OpenAI, Claude, Ollama) — running self-hosted in minutes, not days.
# Gemini (cloud) — one command, three minutes
docker run -d -p 8000:8000 -e GEMINI_API_KEY="your_key" flamehaven-filesearch:1.6.1
# Ollama — fully local, zero API cost (Gemma, Llama, Mistral, Qwen, Phi …)
# Step 1: pull a model → ollama pull gemma4:27b
docker run -d -p 8000:8000 \
-e LLM_PROVIDER=ollama \
-e LOCAL_MODEL=gemma4:27b \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
flamehaven-filesearch:1.6.1|
Production deployment in 3 minutes |
100% self-hosted |
Free tier: 1,500 queries/month |
| Capability | Detail |
|---|---|
| Search Modes | Keyword, semantic, and hybrid (BM25+RRF) with automatic typo correction |
| 34 File Formats | PDF, DOCX/DOC, XLSX, PPTX, RTF, HTML, CSV, LaTeX, WebVTT, images + plain text — see Document Parsing |
| RAG Pipeline | Structure-aware chunking, KnowledgeAtom 2-level indexing, sliding-window context enrichment, mtime parse cache |
| Ultra-Fast Vectors | DSP v2.0 generates embeddings in <1ms — no ML frameworks required |
| Source Attribution | Every answer links back to the originating document and chunk |
| Framework SDKs | LangChain, LlamaIndex, Haystack, CrewAI adapters out of the box |
| Enterprise Auth | API key hashing (SHA256+salt), OAuth2/OIDC, fine-grained permissions |
| Admin Dashboard | Real-time metrics, quota management, batch processing (1–100 queries) |
| Flexible Storage | SQLite (default) · PostgreSQL + pgvector · Redis cache (optional) |
What changed in each release? See CHANGELOG.md for the full version history.
The fastest path to production:
docker run -d \
-p 8000:8000 \
-e GEMINI_API_KEY="your_gemini_api_key" \
-e FLAMEHAVEN_ADMIN_KEY="secure_admin_password" \
-v $(pwd)/data:/app/data \
flamehaven-filesearch:1.6.1✅ Server running at http://localhost:8000
Perfect for integrating into existing applications:
from flamehaven_filesearch import FlamehavenFileSearch, FileSearchConfig
# Initialize
config = FileSearchConfig(google_api_key="your_gemini_key")
fs = FlamehavenFileSearch(config)
# Upload and search
fs.upload_file("company_handbook.pdf", store="docs")
result = fs.search("What is our remote work policy?", store="docs")
print(result['answer'])
# Output: "Employees can work remotely up to 3 days per week..."For language-agnostic integration:
# 1. Generate API key
curl -X POST http://localhost:8000/api/admin/keys \
-H "X-Admin-Key: your_admin_key" \
-d '{"name":"production","permissions":["upload","search"]}'
# 2. Upload document
curl -X POST http://localhost:8000/api/upload/single \
-H "Authorization: Bearer sk_live_abc123..." \
-F "file=@document.pdf" \
-F "store=my_docs"
# 3. Search
curl -X POST http://localhost:8000/api/search \
-H "Authorization: Bearer sk_live_abc123..." \
-H "Content-Type: application/json" \
-d
'{
"query": "What are the main findings?",
"store": "my_docs",
"search_mode": "hybrid"
}'# Core package (HTML, CSV, LaTeX, WebVTT, plain-text parsing included — zero extra deps)
pip install flamehaven-filesearch
# + Document parsers: PDF (pymupdf/pypdf), DOCX, XLSX, PPTX, RTF
pip install flamehaven-filesearch[parsers]
# + Image OCR (Pillow + pytesseract; requires Tesseract system binary)
pip install flamehaven-filesearch[vision]
# + Google Gemini API
pip install flamehaven-filesearch[google]
# + REST API server (FastAPI + uvicorn)
pip install flamehaven-filesearch[api]
# + HNSW vector index
pip install flamehaven-filesearch[vector]
# + PostgreSQL backend
pip install flamehaven-filesearch[postgres]
# Everything
pip install flamehaven-filesearch[all]
# Build from source
git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
docker build -t flamehaven-filesearch:1.6.1 .Framework SDKs (LangChain, LlamaIndex, etc.) are imported lazily — install only what you need:
# LangChain (pip install langchain-core)
from flamehaven_filesearch.integrations import FlamehavenLangChainLoader
docs = FlamehavenLangChainLoader("report.pdf", chunk=True).load()
# LlamaIndex (pip install llama-index-core)
from flamehaven_filesearch.integrations import FlamehavenLlamaIndexReader
nodes = FlamehavenLlamaIndexReader(chunk=True).load_data(["report.pdf", "slides.pptx"])
# Haystack (pip install haystack-ai)
from flamehaven_filesearch.integrations import FlamehavenHaystackConverter
result = FlamehavenHaystackConverter().run(sources=["report.pdf"])
# CrewAI (pip install crewai)
from flamehaven_filesearch.integrations import FlamehavenCrewAITool
tool = FlamehavenCrewAITool() # pass to your agent's tools listFLAMEHAVEN supports four LLM backends — switch with a single env var:
LLM_PROVIDER |
Required variables | Notes |
|---|---|---|
gemini (default) |
GEMINI_API_KEY |
Google Gemini file-search API |
ollama |
LOCAL_MODEL, OLLAMA_BASE_URL |
Local inference via Ollama — Gemma 4/3, Llama 3.2, Qwen 2.5, Mistral, Phi-4 … |
openai |
OPENAI_API_KEY |
OpenAI or any OpenAI-compatible endpoint |
anthropic |
ANTHROPIC_API_KEY |
Anthropic Claude |
openai_compatible |
OPENAI_API_KEY, OPENAI_BASE_URL |
vLLM, LM Studio, Kimi, etc. |
# Gemini (default)
export GEMINI_API_KEY="your_google_gemini_api_key"
# Ollama (fully local)
export LLM_PROVIDER=ollama
export LOCAL_MODEL=gemma4:27b # or gemma4:4b, qwen2.5:7b, llama3.2 …
export OLLAMA_BASE_URL=http://localhost:11434
# OpenAI
export LLM_PROVIDER=openai
export OPENAI_API_KEY="sk-..."
export DEFAULT_MODEL=gpt-4o-mini # optional override
# Anthropic
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY="sk-ant-..."export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_password"
# Plus the provider credentials above (at least one provider)export HOST="0.0.0.0" # Bind address
export PORT="8000" # Server port
export REDIS_HOST="localhost" # Distributed caching
export REDIS_PORT="6379" # Redis port
export MAX_OUTPUT_TOKENS="1024" # Max answer tokens
export TEMPERATURE="0.5" # Model temperature (0.0–1.0)
export MAX_SOURCES="5" # Max source documents per answerCreate a config.yaml for fine-tuned control:
vector_store:
quantization: int8
compression: gravitas_pack
search:
default_mode: hybrid
typo_correction: true
max_results: 10
security:
rate_limit: 100 # requests per minute
max_file_size: 52428800 # 50MB| Metric | Value | Notes |
|---|---|---|
| Vector Generation | <1ms |
DSP v2.0, zero ML dependencies |
| Memory Footprint | 75% reduced |
Int8 quantization vs float32 |
| Metadata Size | 90% smaller |
Gravitas-Pack compression |
| Test Suite | 476 tests |
All passing (pytest) |
| Cold Start | 3 seconds |
Docker container ready |
Environment: Docker on Apple M1 Mac, 16GB RAM
Document Set: 500 PDFs, ~2GB total
Health Check: 8ms
Search (cache hit): 9ms
Search (cache miss): 1,250ms (includes Gemini API call)
Batch Search (10): 2,500ms (parallel processing)
Upload (50MB file): 3,200ms (with indexing)
flowchart TD
Client(["Client\n(HTTP / SDK)"])
subgraph API["REST API Layer (FastAPI)"]
Upload["/api/upload"]
Search["/api/search"]
Admin["/api/admin"]
end
subgraph Engine["Engine Layer"]
FP["FileParser\n+ BackendRegistry\n(34 formats)"]
Cache["ParseCache\n(mtime-based)"]
Chunker["TextChunker\n+ KnowledgeAtom\n(chunk atoms)"]
DSP["DSP v2.0\nEmbedding Generator\n(<1ms, zero-ML)"]
BM25["BM25 + RRF\nHybrid Search\n(v1.6.0)"]
Scorer["SemanticScorer\n+ TypoCorrector"]
end
subgraph Storage["Storage Layer"]
SQLite[("SQLite\nMetadata Store")]
Vec[("Vector Store\n(local / pgvector)")]
Redis[("Redis Cache\n(optional)")]
end
subgraph LLM["LLM Provider (env: LLM_PROVIDER)"]
Gemini["Gemini\n(cloud)"]
Ollama["Ollama\n(local)"]
OAI["OpenAI /\nAnthropic /\nCompatible"]
end
Metrics["Metrics Logger"]
Client --> Upload & Search & Admin
Upload --> FP
FP <-->|"cache hit/miss"| Cache
FP --> Chunker
Chunker --> DSP
DSP --> Vec
FP --> SQLite
Search --> Scorer
Scorer --> DSP
DSP --> Vec
Scorer -->|"gemini"| Gemini
Scorer -->|"ollama"| Ollama
Scorer -->|"openai/anthropic"| OAI
LLM --> Client
Admin --> Metrics
Admin --> SQLite
Storage <-->|"read / write"| Redis
Full layer detail: Architecture.md
FLAMEHAVEN takes security seriously:
- ✅ API Key Hashing - SHA256 with salt
- ✅ Rate Limiting - Per-key quotas (default: 100/min)
- ✅ Permission System - Granular access control
- ✅ Audit Logging - Complete request history
- ✅ OWASP Headers - Security headers enabled by default
- ✅ Input Validation - Strict file type and size checks
# Use strong admin keys
export FLAMEHAVEN_ADMIN_KEY=$(openssl rand -base64 32)
# Enable HTTPS in production
# (use nginx/traefik as reverse proxy)
# Rotate API keys regularly
curl -X DELETE http://localhost:8000/api/admin/keys/old_key_id \
-H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY"Full roadmap: ROADMAP.md
- Multimodal search (image + text)
- HNSW vector indexing for faster search
- OAuth2/OIDC integration
- PostgreSQL backend (metadata + pgvector)
- Usage-budget controls and reporting
- pgvector tuning and reliability hardening
- CI/CD — ruff replaces flake8; pipelines fully green
- Universal Document Parser — 34 formats, zero doc-AI dependency (v1.5.0)
- Internal text chunker — structure-aware + token-aware, zero ML deps (v1.5.0)
- Framework integrations — LangChain, LlamaIndex, Haystack, CrewAI (v1.5.0)
- Backend Plugin Architecture —
AbstractFormatBackend+BackendRegistry(v1.5.2) - Parse cache — mtime-based,
extract_text(use_cache=True)(v1.5.2) - ContextExtractor — sliding-window RAG chunk enrichment (v1.5.2)
- Multi-provider LLM support — OpenAI, Claude, Ollama, Gemini (v1.5.3)
- BM25 + RRF hybrid search — Korean+English tokenizer, lazy per-store index
- KnowledgeAtom 2-level indexing — chunk atoms with fragment URIs
- Stable URI scheme —
local://<store>/<quote(abs_path)>, collision-free - core.py mixin segmentation — 1258 → 221 lines, 3 focused modules
- Fix:
search_streamdouble intent-refine bug
- CC reduction —
seek_vector_resonanceCC 8→2,_get_admin_userCC 10→1 - Dispatch table pattern —
_transform_dictunifies GravitasPacker compress/decompress -
_record_upload_failurehelper — eliminates 2× duplicated metrics blocks in api.py -
/healthexposesllm_provider+llm_model— frontend can detect active backend -
config.to_dict()exposesllm_provider,local_model,ollama_base_url - Frontend: provider-aware model selector (Gemini dropdown ↔ local model badge)
- Frontend: upload accept list expanded to all 34 supported formats
- Frontend: store datalist auto-populated from
/api/metrics - Frontend: version badge synced to
v1.6.1across all 6 dashboard pages - Ruff F401/F841 — 5 lint errors resolved, CI green
- Admin: Stores tab — create / list / delete stores (
POST|GET|DELETE /api/stores) - Admin: Ops tab — usage stats (
GET /api/admin/usage) + vector ops (stats / reindex / vacuum) - Landing: "Manage" deep-link to
admin.html#storeswith hash-based tab routing
- Multi-language support (15+ languages) — multilingual stopwords + jieba
- Kubernetes Helm charts
- Distributed indexing
❌ 401 Unauthorized Error
Problem: API returns 401 when making requests.
Solutions:
- Verify
FLAMEHAVEN_ADMIN_KEYenvironment variable is set - Check
Authorization: Bearer sk_live_...header format - Ensure API key hasn't expired (check admin dashboard)
# Debug: Check if admin key is set
echo $FLAMEHAVEN_ADMIN_KEY
# Regenerate API key
curl -X POST http://localhost:8000/api/admin/keys \
-H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY" \
-d '{"name":"debug","permissions":["search"]}'🐌 Slow Search Performance
Problem: Searches taking >5 seconds.
Solutions:
- Check cache hit rate:
FLAMEHAVEN_METRICS_ENABLED=1 curl http://localhost:8000/metrics - Enable Redis for distributed caching
- Verify Gemini API latency (should be <1.5s)
# Enable Redis caching
docker run -d --name redis redis:7-alpine
export REDIS_HOST=localhost💾 High Memory Usage
Problem: Container using >2GB RAM.
Solutions:
- Enable Redis with LRU eviction policy
- Reduce max file size in config
- Monitor with Prometheus endpoint
# Configure Redis memory limit
docker run -d \
-p 6379:6379 \
redis:7-alpine \
--maxmemory 512mb \
--maxmemory-policy allkeys-lruMore solutions in our Wiki Troubleshooting Guide.
Use the links below to jump to the most relevant guide.
| Topic | Description |
|---|---|
| Document Parsing | Supported formats, internal parsers, RAG chunking |
| Hybrid Search | BM25+RRF, KnowledgeAtom indexing, stable URI scheme (v1.6.0) |
| Framework Integrations | LangChain, LlamaIndex, Haystack, CrewAI adapters |
| API Reference | REST endpoints, payloads, rate limits |
| Architecture | How all layers fit together (v1.6.0) |
| Configuration Reference | Full list of environment variables and config fields |
| Production Deployment | Docker, systemd, reverse proxy, scaling tips |
| Troubleshooting | Step-by-step debugging playbook |
| Benchmarks | Performance measurements and methodology |
These Markdown files live inside the repository so they stay versioned alongside the code. Feel free to contribute improvements via pull requests.
- Interactive API Docs - OpenAPI/Swagger interface (when server is running)
- CHANGELOG - Version history and breaking changes
- CONTRIBUTING - How to contribute code
- Examples - Sample integrations and use cases
We love contributions! FLAMEHAVEN is better because of developers like you.
- 🟢 [Easy] Add dark mode to admin dashboard (1-2 hours)
- 🟡 [Medium] PostgreSQL backend for usage tracker (multi-instance deployments)
- 🔴 [Advanced] Kubernetes Helm charts for production deployment
See CONTRIBUTING.md for development setup and guidelines.
- 💬 Discussions: GitHub Discussions
- 🐛 Bug Reports: GitHub Issues
- 🔒 Security: security@flamehaven.space
- 📧 General: info@flamehaven.space
Distributed under the MIT License. See LICENSE for more information.
Built with amazing open source tools:
- FastAPI - Modern Python web framework
- Google Gemini - Semantic understanding and reasoning
- SQLite - Lightweight, embedded database
- Redis - In-memory caching (optional)
⭐ Star us on GitHub • 📖 Read the Docs • 🚀 Deploy Now
Built with 🔥 by the Flamehaven Core Team
Last updated: April 20, 2026 • Version 1.6.1
