From 1024385d1a431c95dfa23657680b8939a5a4c5a0 Mon Sep 17 00:00:00 2001
From: Suhani Nagpal <suhani.nagpal725@gmail.com>
Date: Fri, 22 May 2026 12:54:16 +0530
Subject: [PATCH] docs(cookbook): revamp docker-compose-quickstart against
 playbook

What changed

- Frontmatter: kebab-case schema with slug, author, products,
  frameworks, difficulty, tags, og-image, canonical, last-tested-date,
  structured last-tested-with, code-repo-url, page-type.
- Added "What you'll build" section listing the deliverables (21
  containers, four secrets, first user, smoke-test trace, ops
  cheatsheet) before the 10-15 minute image build investment.
- Added "Verify your environment" step before "Why self-host" so the
  first proof moment (Docker version + RAM check) lands in the first
  5% of the page instead of 90%. Catches the most common OOM-kill
  failure before the user invests time.
- Trimmed "Why self-host" from 159 to 53 words: two reasons
  (compliance + localhost speed) and a one-line bridge to the steps.
- Mermaid stack diagram simplified from 12 nodes + 4 subgraphs + 11
  arrows down to 4 nodes + 3 arrows showing the core write/CDC/read
  loop (SDK -> apps -> Postgres -[PeerDB CDC]-> ClickHouse -[reads]->
  apps). The full 21-container inventory lives in the existing per-
  layer table inside Step 3.
- Em dashes purged across body, tips, and troubleshooting.
- Troubleshooting table gained "Verify" column with a verification
  step per row.
- Replaced "Explore further" reference cards with a 4-item technical
  next-steps ladder (pin image tags, durable storage, TLS reverse
  proxy, Postgres backups) + one reference link.
- Step 5 closes with a real screenshot of the FAGI Tracing dashboard
  showing a live support_agent trace for the local-stack-smoke-test
  project (Trace Graph + span detail panel with the User Message,
  Input, and refund-policy Output captured end-to-end).

Depends on

- #660 (Mermaid component) for the diagram in "The self-hosted stack"
  to render. The diagram has been simplified; full functionality
  requires #660 to merge first.
---
 .../docker-compose-quickstart.mdx             | 199 +++++++++++++++---
 1 file changed, 166 insertions(+), 33 deletions(-)
diff --git a/src/pages/docs/cookbook/self-hosting/docker-compose-quickstart.mdx b/src/pages/docs/cookbook/self-hosting/docker-compose-quickstart.mdx
index 58c7fb85..74ee32c7 100644
--- a/src/pages/docs/cookbook/self-hosting/docker-compose-quickstart.mdx
+++ b/src/pages/docs/cookbook/self-hosting/docker-compose-quickstart.mdx
@@ -1,16 +1,51 @@
 ---
 title: "Deploy the Full Open-Source AI Stack Locally With Docker Compose in 5 Minutes"
-description: "Clone the Future AGI repo, configure .env, run `docker compose up`, and start sending traces. Five commands to a complete self-hosted stack on your laptop."
+slug: "docker-compose-quickstart"
+description: "Clone the FutureAGI repo, configure .env, run `docker compose up`, and start sending traces. Five commands to a complete self-hosted stack on your laptop."
+date: "2026-05-21"
+author: "futureagi-engineering"
+products:
+  - "traceAI"
+  - "Agent Command Center"
+  - "fi.evals"
+frameworks:
+  - "Docker"
+  - "Docker Compose"
+difficulty: "beginner"
+time-to-complete: "5 minutes"
+tags:
+  - "self-hosting"
+  - "deployment"
+  - "docker"
+og-image: "/images/cookbooks/docker-compose-quickstart/og.webp"
+canonical: "https://docs.futureagi.com/docs/cookbook/self-hosting/docker-compose-quickstart"
+last-tested-date: "2026-05-21"
+last-tested-with:
+  python: "3.11"
+  docker-engine: "24.0+"
+  docker-compose: "v2.20+"
+code-repo-url: "https://github.com/future-agi/future-agi"
+page-type: "cookbook"
 ---
 
-<TLDR>
-Five commands and one `.env` edit gets you a complete self-hosted Future AGI stack running locally: frontend, backend, gateway, Postgres, ClickHouse, Redis, MinIO, Temporal, and PeerDB CDC. All 21 containers, no external dependencies. Your traces, datasets, and evals stay on your machine.
-</TLDR>
-
 | Time | Difficulty |
 |------|-----------|
 | 5 min hands-on (10 to 15 min for first image build) | Beginner |
 
+<TLDR>
+Run the full FutureAGI platform (21 containers: dashboard, backend, gateway, Postgres, ClickHouse, Temporal, PeerDB) on your laptop with one `docker compose up`. Point your existing SDKs at `localhost:8000` instead of the cloud API. Every trace, dataset, eval, and model call stays on your machine. Five commands from clone to a verified trace in the dashboard.
+</TLDR>
+
+## What you'll build
+
+A fully local FutureAGI deployment running in Docker, with a smoke-test script proving traces flow end-to-end. By the end you will have:
+
+- The frontend, backend, gateway, Postgres, ClickHouse, Temporal, and PeerDB running on your machine (21 containers total).
+- Four required secrets generated and pinned in `.env`, plus optional provider keys (OpenAI, Anthropic) so the gateway can route real model calls.
+- A first user created via the frontend, or via the Django shell if you skip Mailgun.
+- One trace ingested from a local Python script, replicated through PeerDB into ClickHouse, and rendered in the dashboard at `http://localhost:3000`.
+- A short list of `docker compose` operations (logs, exec, down, down -v) you'll reuse daily.
+
 <Prerequisites>
 - Docker Engine 24.0+ and Docker Compose v2.20+ (`docker --version`, `docker compose version`)
 - 8+ GB RAM and 64+ GB disk allocated to Docker (Docker Desktop defaults of 2 to 4 GB will OOM-kill ClickHouse)
@@ -18,22 +53,65 @@ Five commands and one `.env` edit gets you a complete self-hosted Future AGI sta
 - Python 3.11
 </Prerequisites>
 
-## Tutorial
+## Verify your environment
+
+Before you spend 10-15 minutes on the first image build, run this one-liner. It exits non-zero if any prerequisite is missing, which catches the most common reason the stack fails to start: low Docker memory.
+
+```bash
+docker --version && docker compose version && \
+docker info --format '{{.MemTotal}}' | \
+  awk '{ gb = $1/1024/1024/1024; print "Docker RAM: " gb " GB"; if (gb < 8) { print "WARN: increase Docker memory to 8+ GB before continuing"; exit 1 } }'
+```
+
+Expected output:
+
+```text
+Docker version 24.0.7, build afdd53b
+Docker Compose version v2.21.0
+Docker RAM: 12 GB
+```
+
+If you see `WARN: increase Docker memory to 8+ GB`, open Docker Desktop → Settings → Resources and bump the memory before continuing. Everything else fails downstream from this.
+
+## Why self-host
+
+Self-host when your traces contain PII, PHI, or proprietary IP that can't leave your network for compliance, or when you want every span, eval, and model call to round-trip on `localhost` instead of the public internet. The five steps below deploy the full FAGI platform on your laptop in one `docker compose up`.
+
+## The self-hosted stack
+
+One Docker Compose file orchestrates 21 containers across four layers: apps (frontend, backend, gateway), databases (Postgres, ClickHouse, Redis, MinIO), a workflow engine (Temporal), and a replication stack (PeerDB) that copies Postgres to ClickHouse for fast analytics. Point your code at `http://localhost:8000` and every trace, dataset, eval, and model call stays in the local Docker network.
+
+<Mermaid code={`flowchart LR
+    sdk["Your SDK<br/>localhost:8000"] --> apps["FAGI apps<br/>(frontend, backend, gateway)"]
+    apps --> pg[(Postgres)]
+    pg -.PeerDB CDC.-> ch[(ClickHouse)]
+    ch -.analytics reads.-> apps
+`} />
+
+The five steps below verify your prerequisites, clone the repo, set the four required secrets, start the stack, create your first user, and send a test trace to verify every layer is wired.
 
 <Steps>
 <Step title="Clone the repo">
 
+There's no separate `pip install` or `npm install`. The repo *is* the install. Cloning gets you the Compose file, the application source, and the four Dockerfiles needed to build each service.
+
 ```bash
 git clone https://github.com/future-agi/future-agi.git
 cd future-agi
 ```
 
-The OSS build uses `futureagi/Dockerfile.oss` (Python 3.11 base) and builds locally, so there's nothing to pre-pull. First-build downloads about 6 GB of layers; subsequent boots reuse the cache.
+Expected output:
+
+```text
+Cloning into 'future-agi'...
+```
+
+The OSS build uses `futureagi/Dockerfile.oss` (Python 3.11 base) and builds locally, so there's nothing to pre-pull. First build downloads about 6 GB of layers; subsequent boots reuse the cache.
 
 </Step>
 <Step title="Configure .env">
 
-Copy the template and rotate the four `CHANGEME` placeholders.
+Four secrets are baked into running services at boot (the Django session signer, the Postgres user password, the MinIO root password, and the gateway-to-backend internal key). They have to be set before `docker compose up` or services will refuse to start. Provider and Mailgun keys are optional but useful to add now so you don't have to restart later.
 
 ```bash
 cp .env.example .env
@@ -73,17 +151,38 @@ See [Environment Variables](/docs/self-hosting/environment) for the full list of
 </Step>
 <Step title="Start the stack">
 
+`docker compose up -d` builds any missing images, then starts all 21 containers in dependency order (data stores first, then workflow, then app services that depend on them). The whole stack is up when the backend logs `Application startup complete`. Detached mode (`-d`) means your shell returns immediately so you can tail one service's logs without losing the full output.
+
 ```bash
 docker compose up -d
 docker compose ps --format "{{.Names}} {{.Status}}"
 ```
 
-`-d` runs detached. The `--format` flag prints one line per service so you can scan health quickly without horizontal-scrolling the default table. The stack is ready when the backend logs `Application startup complete`:
+Expected output (abbreviated):
+
+```text
+frontend      Up 12 seconds
+backend       Up 10 seconds (healthy)
+worker        Up 10 seconds
+gateway       Up 8 seconds
+postgres      Up 15 seconds (healthy)
+clickhouse    Up 14 seconds (healthy)
+redis         Up 15 seconds (healthy)
+...
+```
+
+All services should show `Up`. Watch the backend until startup completes:
 
 ```bash
 docker compose logs -f backend
 ```
 
+Wait for this line before continuing:
+
+```text
+INFO:     Application startup complete.
+```
+
 What you just started:
 
 | Layer | Services |
@@ -101,7 +200,7 @@ First boot builds from source. Subsequent `docker compose up` calls reuse the ca
 <span id="step-4"></span>
 <Step title="Open the dashboard and create your first user">
 
-Three URLs are now live on your machine:
+Three URLs are live now because the stack is three apps in one (the FutureAGI frontend, the FutureAGI backend, and the PeerDB admin UI for inspecting CDC replication). For day-to-day use you only need the first one. The second is the API your SDKs will call. The third is for verifying replication is healthy if you ever debug analytics issues.
 
 | Service | URL | Notes |
 |---------|-----|-------|
@@ -132,7 +231,9 @@ u.save()
 </Step>
 <Step title="Send your first trace to the local stack">
 
-Point the FutureAGI instrumentation SDK at your local backend with the `FI_BASE_URL` env var. Anything else is identical to the cloud setup.
+Sending a trace is the smoke test for the whole deployment. A single request exercises every layer: the SDK posts spans (structured event records that make up a trace) over HTTP to the backend, the backend writes to Postgres, PeerDB replicates to ClickHouse, the frontend reads from ClickHouse, and the gateway proxies the OpenAI call so the cost shows up in your local cost tracking. If the trace appears in the dashboard, every component is wired correctly.
+
+Point the FutureAGI instrumentation SDK at your local backend with the `FI_BASE_URL` env var. Everything else is identical to the cloud setup.
 
 ```bash
 pip install fi-instrumentation-otel traceai-openai openai
@@ -143,8 +244,10 @@ import os
 from fi_instrumentation import register, FITracer
 from fi_instrumentation.fi_types import ProjectType
 from traceai_openai import OpenAIInstrumentor
+from openai import OpenAI
 
-os.environ["FI_BASE_URL"] = "http://localhost:8000"  # SDK sends spans to /tracer/v1/traces on this host
+# Point the SDK at the local backend instead of the cloud API
+os.environ["FI_BASE_URL"] = "http://localhost:8000"
 
 trace_provider = register(
     project_type=ProjectType.OBSERVE,
@@ -153,30 +256,63 @@ trace_provider = register(
 OpenAIInstrumentor().instrument(tracer_provider=trace_provider)
 tracer = FITracer(trace_provider.get_tracer("local-stack-smoke-test"))
 
-from openai import OpenAI
 client = OpenAI()
 
-@tracer.agent(name="hello_agent")
-def hello_agent(q: str) -> str:
-    r = client.chat.completions.create(
+@tracer.agent(name="support_agent")
+def support_agent(question: str) -> str:
+    completion = client.chat.completions.create(
         model="gpt-4o-mini",
-        messages=[{"role": "user", "content": q}],
+        messages=[{"role": "user", "content": question}],
     )
-    return r.choices[0].message.content
+    return completion.choices[0].message.content
+
+answer = support_agent("What is the refund policy for cancelled orders?")
+print(answer)
 
-print(hello_agent("Say hi to my self-hosted Future AGI stack."))
+# Flush before exit so the trace reaches the backend
 trace_provider.force_flush()
 ```
 
-Open **Tracing -> local-stack-smoke-test** in the dashboard. You should see one parent span (`hello_agent`) with the OpenAI call nested underneath. If the trace shows up, every layer of the stack is wired correctly: backend ingestion, ClickHouse via PeerDB, frontend rendering, gateway routing.
+Expected output:
+
+```text
+Our refund policy for cancelled orders...
+```
+
+The exact answer varies, but any non-error response means the gateway routed to OpenAI successfully. Open **Tracing → local-stack-smoke-test** in the dashboard at `http://localhost:3000`. Verify:
+
+- One parent span named `support_agent`
+- The OpenAI call nested underneath with model, tokens, and latency
+- If both appear, every layer is wired: backend ingestion, Postgres, PeerDB replication to ClickHouse, and the frontend
+
+<img src="https://fi-cookbook-assets.s3.ap-south-1.amazonaws.com/self-hosting/docker-compose-quickstart/01-local-stack-trace.png" alt="FutureAGI Tracing dashboard for the local-stack-smoke-test project showing one trace: the support_agent root span (agent type) with a nested ChatCompletion child span (LLM type) in the Trace Graph on the left, and on the right the span detail panel with User Message and Input 'What is the refund policy for cancelled orders?', the model output containing the refund-policy answer, plus tag, duration, cost, and token columns at the top" style={{width: "100%", borderRadius: "0.75rem", border: "1px solid var(--color-border-default)"}} />
 
 </Step>
 </Steps>
 
+
+## Troubleshooting
+
+| Symptom | Likely cause | Fix | Verify |
+|---|---|---|---|
+| ClickHouse or backend exits with `OOMKilled` | Docker Desktop RAM too low (default is 2-4 GB) | Increase Docker memory to 8+ GB in Docker Desktop → Settings → Resources | `docker compose ps` shows ClickHouse `Up (healthy)` after a restart |
+| `backend` logs show `django.db.utils.OperationalError: could not connect to server` | Postgres hasn't finished initializing yet | Wait 15-30 seconds and restart the backend: `docker compose restart backend` | Backend logs show `Application startup complete.` |
+| Trace sent but nothing appears in the dashboard | PeerDB replication hasn't caught up, or `FI_BASE_URL` not set | Check `docker compose logs peerdb-server` for errors. Confirm `FI_BASE_URL=http://localhost:8000` is exported | Re-run the smoke-test script; trace appears in **Tracing → local-stack-smoke-test** within 10s |
+| `docker compose up` fails with `port is already allocated` | Another service (or a previous run) is using the same port | Run `docker compose down` first, or check `lsof -i :3000` / `lsof -i :8000` and kill the conflicting process | `docker compose up -d` returns without error and all 21 containers show `Up` |
+| Frontend loads but shows a blank page or 502 | Backend not healthy yet, or frontend can't reach it | Run `docker compose ps`; backend should show `healthy`. Check `docker compose logs frontend` for connection errors | `curl http://localhost:8000/health/` returns 200 |
+| `code-executor` fails on macOS or Cloud Run | Needs `privileged: true`, which Docker Desktop supports but serverless runtimes don't | Only supported on Docker Desktop (macOS/Linux/WSL2). Not compatible with ECS Fargate or Cloud Run | `docker compose ps code-executor` shows `Up`, not `Restarting` |
+
+## What you built
+
 <Check>
-You're running 21 containers, ingested a trace through the same code path the cloud uses, and rendered it in a dashboard at http://localhost:3000. Every byte stayed on your machine.
+21 containers running locally, a trace ingested through the same code path as the cloud, and a dashboard at `http://localhost:3000` rendering it. Every byte stayed on your machine.
 </Check>
 
+- Full FutureAGI platform (tracing, evals, gateway, prompt management) on infrastructure you control
+- SDK pointed at `localhost:8000`; switch back to `api.futureagi.com` by removing `FI_BASE_URL`
+- PeerDB replication keeping ClickHouse in sync with Postgres for fast analytics queries
+- Direct code access to every service for debugging or extending platform behavior
+
 ## Common operations
 
 ```bash
@@ -193,16 +329,13 @@ docker compose down
 docker compose down -v
 ```
 
-## Explore further
+## Next steps
+
+Once the stack runs locally, the next operational moves before this becomes anything more than a laptop demo:
+
+1. **Pin image tags before going to production.** Replace `image: future-agi/backend:latest` with explicit versions in your `docker-compose.yml` so a build cache invalidation doesn't change behavior under you.
+2. **Move ClickHouse and Postgres to durable storage.** Bind them to managed volumes (or external hosts) so `docker compose down -v` can't wipe your traces. The fast-iteration ergonomics of the laptop setup become a production hazard otherwise.
+3. **Put a reverse proxy with TLS in front.** Terminate HTTPS at nginx or Caddy and route `frontend:3000` and `backend:8000` behind one host so cookies, CSRF, and SDK base URLs all use a single canonical domain.
+4. **Set up Postgres backups.** PeerDB rebuilds ClickHouse from Postgres, so Postgres is the single source of truth that needs a real backup policy (pg_dump cron at minimum; managed Postgres in prod).
 
-<CardGroup cols={3}>
-  <Card title="Docker Compose reference" icon="docker" href="/docs/self-hosting/docker-compose">
-    Full deployment modes (full stack, dev overlay, frontend-only)
-  </Card>
-  <Card title="Environment variables" icon="settings" href="/docs/self-hosting/environment">
-    Every secret, port, and runtime flag the stack reads
-  </Card>
-  <Card title="Production hardening" icon="shield" href="/docs/self-hosting/production">
-    Reverse proxy, HTTPS, secret rotation before exposing the stack
-  </Card>
-</CardGroup>
+Reference: [Docker Compose deployment modes](/docs/self-hosting/docker-compose) for the full set of overlays (dev, frontend-only), and [Production hardening](/docs/self-hosting/production) for the full HTTPS + secret rotation checklist.