warpdotdev · rachaelrenk · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/.agents/skills/weekly-404-monitor/SKILL.md b/.agents/skills/weekly-404-monitor/SKILL.md
@@ -0,0 +1,139 @@
+---
+name: weekly-404-monitor
+description: Weekly recurring agent that surfaces broken docs.warp.dev URLs by querying the docs_404 Rudderstack track event, diffing against existing vercel.json redirects, and posting a summary to Slack. Use for the Monday 9am PT scheduled Oz agent that monitors 404 gaps and supports the ongoing redirect-fix workflow.
+---
+
+# Weekly 404 monitor
+
+Runs every Monday at 9am PT. Identifies new broken URL patterns on docs.warp.dev, surfaces the top uncovered paths, and posts a concise Slack summary so the docs team can prioritize redirect additions.
+
+## Prerequisites
+
+The following environment secrets must be set in the Oz cloud agent environment:
+
+- `METABASE_API_KEY` — Metabase API key for BigQuery queries. If unavailable, the run must fail fast with a clear error.
+- `SLACK_BOT_TOKEN` — Slack bot token for posting to the docs channel. If unavailable, write a no-post report to the run output instead.
+- `SLACK_CHANNEL_ID` — Slack channel ID for **`#growth-docs`**. Find it in Slack by right-clicking the channel → Copy link (the ID begins with `C`). There is no fallback — the run will skip Slack posting if this is unset.
+
+Do NOT print, log, or include secret values in reports, commits, or Slack messages.
+
+## Workflow
+
+### 1. Query docs_404 events
+
+Run `python3 .agents/skills/weekly-404-monitor/run_404_report.py` in the docs repo.
+
+The script:
+- Queries `warp-data-357114.prod.stg_website_events` via the Metabase API
+- Extracts `broken_url` from `event_properties` for all `event_name = 'docs_404'` events in the past 7 days
+- Groups by `broken_url`, sorted by hit count descending
+- Returns a ranked list of broken URLs and their hit counts for the current week
+- Computes the same for the prior week (days 8–14) for trend comparison
+- Total weekly 404 count (current + prior) for the trend line
+
+### 2. Fetch current vercel.json redirect sources
+
+Fetch `vercel.json` from the docs repo (already checked out locally in the cloud environment, or via GitHub raw URL `https://raw.githubusercontent.com/warpdotdev/docs/main/vercel.json`).
+
+Extract all `source` values from the `redirects` array. Normalise: lowercase, strip trailing slashes and anchor fragments.
+
+### 3. Find uncovered URLs
+
+For each broken URL in the current week's data:
+- Normalise (lowercase, strip trailing slash, strip query params and fragments)
+- Check if it exists as a `source` in `vercel.json` redirects
+- If not covered, it is a **gap**
+
+### 4. Compute delta vs prior week
+
+Compare this week's uncovered gaps against last week's uncovered gaps (from step 1 prior-week query).
+
+**New gaps** = uncovered this week AND not seen as uncovered last week.
+**Resolved** = uncovered last week AND now either covered (has redirect) or no longer generating 404s.
+
+### 5. Post Slack summary
+
+Post a Slack message using the Block Kit format defined in the "Slack message format" section below.
+
+If `SLACK_BOT_TOKEN` is unavailable, write the full Slack message body to the run output instead and note that Slack posting was skipped.
+
+### 6. Write CSV artifact
+
+Write `404-report-YYYY-MM-DD.csv` to `data/404-reports/` in the docs repo working directory. Format:
+
+```
+broken_url,hits_this_week,hits_last_week,is_covered_by_redirect,is_new_gap
+/old/path,42,0,false,true
+/another/path,18,22,false,false
+```
+
+Do NOT commit this file to the repo. It is an Oz run artifact only — readable from the Oz web app Runs page.
+
+## Slack message format
+
+Use Slack Block Kit. The message should be scannable in under 30 seconds.
+
+```
+📊 *docs.warp.dev 404 Report* — week of {YYYY-MM-DD}
+
+*Total 404s this week:* {N} ({+N / -N vs last week})
+*Uncovered broken URLs:* {M} ({+N new this week})
+
+*Top 10 uncovered URLs (by hits):*
+{hit_count}  `/path` {🆕 if new this week}
+...
+
+*{K} resolved since last week* (redirect added or traffic stopped)
+
+→ Add missing redirects: `vercel.json` › `redirects` array (PR against `main`)
+→ Full breakdown: {oz_run_url}
+```
+
+Rules:
+- Cap the list at 10 entries. If there are more, note "and N more — see full CSV in the run."
+- Mark new gaps with 🆕.
+- If total 404s this week is less than 50, add a brief positive note: "404 volume is low — good signal that redirect coverage is working."
+- Never include raw user data (e.g. query strings with user IDs, tokens) in the Slack message. Strip query params from broken_url before displaying.
+
+## Self-review before posting
+
+Before posting to Slack, verify:
+- The `docs_404` event exists in `stg_website_events` for the query window. If the table has no rows for `event_name = 'docs_404'`, it means PR #191 has not been live long enough to collect data. Post a clear "no data yet" message to Slack and end the run.
+- The Metabase query completed successfully (HTTP 200, no `error` field in the response body).
+- The `broken_url` field was present in the event properties for at least some rows. If it is consistently null, the `docs_404` tracking implementation has a bug — report it in the Slack message and tag the docs team.
+- The vercel.json redirect list was loaded successfully and contains more than 500 entries (sanity check that the file is not truncated).
+- The CSV artifact was written before posting to Slack.
+
+## No-data report
+
+If `stg_website_events` returns 0 rows for `event_name = 'docs_404'` in the past 7 days, post this Slack message:
+
+```
+⏳ *docs.warp.dev 404 Report* — week of {YYYY-MM-DD}
+
+No `docs_404` events recorded yet. This is expected if PR #191 (404 instrumentation) has been live for less than a week, or if the Rudderstack write key is not set in the Vercel environment.
+
+Check: Vercel project env vars include `PUBLIC_RUDDERSTACK_WRITE_KEY` and `PUBLIC_RUDDERSTACK_DATA_PLANE_URL`.
+```
+
+## Failure handling
+
+- If the Metabase query fails (non-200, timeout, or query error), post a brief failure notice to Slack, include the error message, and end the run with a non-zero exit code.
+- Do NOT silently swallow errors or post incomplete data as if it were complete.
+- Log all HTTP requests and responses to stdout for debugging via the Oz run log viewer.
+
+## Scheduling
+
+This skill is designed for an Oz scheduled agent with a weekly cron trigger: every Monday at 9am PT (`0 17 * * 1` in UTC).
+
+To deploy:
+1. Push this skill to `main` in the docs repo.
+2. Verify the **`buzz`** Oz environment (oz.warp.dev → Environments) has these secrets set:
+   - `METABASE_API_KEY` — Metabase API key for BigQuery
+   - `SLACK_BOT_TOKEN` — Slack bot token
+   - `SLACK_CHANNEL_ID` — ID for `#growth-docs` (right-click channel in Slack → Copy link; the ID starts with `C`)
+3. In the Oz web app (oz.warp.dev), create a new scheduled agent:
+   - **Skill**: `weekly-404-monitor` from `warpdotdev/docs`
+   - **Schedule**: `0 17 * * 1` (UTC) = 9am PT (Mondays)
+   - **Environment**: `buzz` (already has `warpdotdev/docs` checked out)
+   - **Branch**: `main`
diff --git a/.agents/skills/weekly-404-monitor/run_404_report.py b/.agents/skills/weekly-404-monitor/run_404_report.py
@@ -0,0 +1,251 @@
+#!/usr/bin/env python3
+"""
+Weekly 404 monitor — data collection script.
+
+Queries the docs_404 Rudderstack track event from stg_website_events via
+the Metabase API, diffs against vercel.json redirect sources, and writes:
+  - JSON report to stdout (for the agent to parse)
+  - CSV artifact to data/404-reports/YYYY-MM-DD.csv
+
+Usage (called by the weekly-404-monitor skill):
+    python3 .agents/skills/weekly-404-monitor/run_404_report.py
+
+Required env vars:
+    METABASE_API_KEY  — Metabase API key
+
+Optional env vars:
+    VERCEL_JSON_PATH  — Path to vercel.json (default: ./vercel.json)
+    REPORT_DIR        — Output directory for CSV artifacts (default: ./data/404-reports)
+"""
+
+import csv
+import json
+import os
+import re
+import sys
+import urllib.error
+import urllib.request
+from datetime import date, timedelta
+from pathlib import Path
+
+
+BASE = "https://warp.metabaseapp.com/api"
+DB_ID = 2  # BigQuery prod
+
+
+def metabase_headers():
+    key = os.environ.get("METABASE_API_KEY")
+    if not key:
+        print("ERROR: METABASE_API_KEY is not set.", file=sys.stderr)
+        sys.exit(1)
+    return {"X-API-Key": key, "Content-Type": "application/json"}
+
+
+def run_query(sql: str) -> list[dict]:
+    """Execute a BigQuery SQL query via the Metabase /dataset endpoint."""
+    headers = metabase_headers()
+    body = json.dumps({
+        "database": DB_ID,
+        "type": "native",
+        "native": {"query": sql},
+    }).encode()
+    req = urllib.request.Request(f"{BASE}/dataset", data=body, headers=headers)
+    try:
+        with urllib.request.urlopen(req, timeout=120) as resp:
+            result = json.loads(resp.read())
+    except urllib.error.HTTPError as e:
+        print(f"ERROR: Metabase query failed: HTTP {e.code}: {e.read().decode()[:500]}",
+              file=sys.stderr)
+        sys.exit(1)
+
+    if result.get("error"):
+        print(f"ERROR: Metabase query error: {result['error']}", file=sys.stderr)
+        sys.exit(1)
+
+    data = result.get("data", {})
+    cols = [c["name"] for c in data.get("cols", [])]
+    rows = data.get("rows", [])
+    return [dict(zip(cols, row)) for row in rows]
+
+
+def query_404_events(days_start: int, days_end: int) -> list[dict]:
+    """
+    Return broken_url counts for the window [days_start, days_end) days ago.
+    days_start=1, days_end=8  → past 7 days (current week)
+    days_start=8, days_end=15 → 8-14 days ago (prior week)
+    """
+    sql = f"""
+SELECT
+  REGEXP_REPLACE(
+    SPLIT(JSON_VALUE(event_properties, '$.broken_url'), '?')[OFFSET(0)],
+    r'#.*$', ''
+  ) AS broken_url,
+  COUNT(*) AS hits
+FROM `warp-data-357114.prod.stg_website_events`
+WHERE event_type = 'track'
+  AND event_name = 'docs_404'
+  AND JSON_VALUE(event_properties, '$.broken_url') IS NOT NULL
+  AND event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {days_end - 1} DAY)
+  AND event_date < DATE_SUB(CURRENT_DATE(), INTERVAL {days_start - 1} DAY)
+GROUP BY 1
+HAVING broken_url IS NOT NULL AND broken_url != ''
+ORDER BY 2 DESC
+LIMIT 500
+"""
+    return run_query(sql)
+
+
+def total_404_count(days_start: int, days_end: int) -> int:
+    sql = f"""
+SELECT COUNT(*) AS total
+FROM `warp-data-357114.prod.stg_website_events`
+WHERE event_type = 'track'
+  AND event_name = 'docs_404'
+  AND event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {days_end - 1} DAY)
+  AND event_date < DATE_SUB(CURRENT_DATE(), INTERVAL {days_start - 1} DAY)
+"""
+    rows = run_query(sql)
+    return int(rows[0]["total"]) if rows else 0
+
+
+def load_redirect_sources(vercel_json_path: Path) -> set[str]:
+    """Load all redirect source paths from vercel.json, normalised."""
+    if not vercel_json_path.exists():
+        # Try fetching from GitHub
+        url = "https://raw.githubusercontent.com/warpdotdev/docs/main/vercel.json"
+        try:
+            with urllib.request.urlopen(url, timeout=10) as resp:
+                data = json.loads(resp.read())
+        except Exception as e:
+            print(f"ERROR: Could not load vercel.json from disk or GitHub: {e}",
+                  file=sys.stderr)
+            sys.exit(1)
+    else:
+        with open(vercel_json_path) as f:
+            data = json.load(f)
+
+    redirects = data.get("redirects", [])
+    if len(redirects) < 500:
+        print(f"WARNING: vercel.json has only {len(redirects)} redirects — "
+              "sanity check failed (expected 500+). Data may be incomplete.",
+              file=sys.stderr)
+
+    sources = set()
+    for r in redirects:
+        src = r.get("source", "").lower().rstrip("/").split("#")[0].split("?")[0]
+        sources.add(src)
+    return sources
+
+
+def normalise_url(url: str) -> str:
+    """Normalise a broken URL for comparison against vercel.json sources."""
+    if not url:
+        return ""
+    # Extract just the path (no scheme/host)
+    url = re.sub(r"^https?://[^/]+", "", url)
+    # Strip query params and fragments
+    url = url.split("?")[0].split("#")[0]
+    # Lowercase, strip trailing slash
+    url = url.lower().rstrip("/")
+    return url or "/"
+
+
+def main():
+    vercel_path = Path(os.environ.get("VERCEL_JSON_PATH", "vercel.json"))
+    report_dir = Path(os.environ.get("REPORT_DIR", "data/404-reports"))
+    today = date.today()
+
+    print(f"Running weekly 404 report for week ending {today}", file=sys.stderr)
+
+    # 1. Query current and prior week
+    print("Querying current week (past 7 days)...", file=sys.stderr)
+    current_week = query_404_events(1, 8)
+    print(f"  {len(current_week)} unique broken URLs found", file=sys.stderr)
+
+    print("Querying prior week (days 8-14)...", file=sys.stderr)
+    prior_week = query_404_events(8, 15)
+
+    total_current = total_404_count(1, 8)
+    total_prior = total_404_count(8, 15)
+
+    # 2. Load redirect sources
+    print("Loading vercel.json redirect sources...", file=sys.stderr)
+    redirect_sources = load_redirect_sources(vercel_path)
+
+    # 3. Build prior-week gap set for delta calculation
+    prior_gaps: set[str] = set()
+    for row in prior_week:
+        norm = normalise_url(row["broken_url"])
+        if norm and norm not in redirect_sources:
+            prior_gaps.add(norm)
+
+    # 4. Build current-week report
+    report_rows = []
+    for row in current_week:
+        raw_url = row.get("broken_url") or ""
+        norm = normalise_url(raw_url)
+        if not norm:
+            continue
+        hits_current = int(row.get("hits") or 0)
+        hits_prior = next(
+            (int(r["hits"]) for r in prior_week
+             if normalise_url(r.get("broken_url") or "") == norm),
+            0
+        )
+        is_covered = norm in redirect_sources
+        is_new_gap = (not is_covered) and (norm not in prior_gaps)
+
+        report_rows.append({
+            "broken_url": norm,
+            "hits_this_week": hits_current,
+            "hits_last_week": hits_prior,
+            "is_covered_by_redirect": is_covered,
+            "is_new_gap": is_new_gap,
+        })
+
+    # 5. Compute resolved (was a gap last week, is no longer generating hits)
+    current_urls = {normalise_url(r["broken_url"]) for r in current_week}
+    newly_covered = {
+        g for g in prior_gaps
+        if g in redirect_sources  # redirect was added
+    }
+    traffic_stopped = {
+        g for g in prior_gaps
+        if g not in current_urls and g not in redirect_sources  # stopped naturally
+    }
+    resolved_count = len(newly_covered) + len(traffic_stopped)
+
+    # 6. Write CSV artifact
+    report_dir.mkdir(parents=True, exist_ok=True)
+    csv_path = report_dir / f"404-report-{today.isoformat()}.csv"
+    with open(csv_path, "w", newline="") as f:
+        writer = csv.DictWriter(f, fieldnames=[
+            "broken_url", "hits_this_week", "hits_last_week",
+            "is_covered_by_redirect", "is_new_gap"
+        ])
+        writer.writeheader()
+        writer.writerows(report_rows)
+    print(f"Wrote CSV artifact: {csv_path}", file=sys.stderr)
+
+    # 7. Output JSON summary to stdout for the agent
+    uncovered = [r for r in report_rows if not r["is_covered_by_redirect"]]
+    new_gaps = [r for r in uncovered if r["is_new_gap"]]
+
+    summary = {
+        "report_date": today.isoformat(),
+        "total_404s_this_week": total_current,
+        "total_404s_last_week": total_prior,
+        "trend_delta": total_current - total_prior,
+        "uncovered_count": len(uncovered),
+        "new_gaps_count": len(new_gaps),
+        "resolved_count": resolved_count,
+        "top_10_uncovered": uncovered[:10],
+        "csv_path": str(csv_path),
+        "has_data": len(current_week) > 0,
+    }
+
+    print(json.dumps(summary, indent=2))
+
+
+if __name__ == "__main__":
+    main()