Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions .agents/skills/weekly-404-monitor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
name: weekly-404-monitor
description: Weekly recurring agent that surfaces broken docs.warp.dev URLs by querying the docs_404 Rudderstack track event, diffing against existing vercel.json redirects, and posting a summary to Slack. Use for the Monday 9am PT scheduled Oz agent that monitors 404 gaps and supports the ongoing redirect-fix workflow.
---

# Weekly 404 monitor

Runs every Monday at 9am PT. Identifies new broken URL patterns on docs.warp.dev, surfaces the top uncovered paths, and posts a concise Slack summary so the docs team can prioritize redirect additions.

## Prerequisites

The following environment secrets must be set in the Oz cloud agent environment:

- `METABASE_API_KEY` — Metabase API key for BigQuery queries. If unavailable, the run must fail fast with a clear error.
- `SLACK_BOT_TOKEN` — Slack bot token for posting to the docs channel. If unavailable, write a no-post report to the run output instead.
- `SLACK_CHANNEL_ID` — Slack channel ID for **`#growth-docs`**. Find it in Slack by right-clicking the channel → Copy link (the ID begins with `C`). There is no fallback — the run will skip Slack posting if this is unset.

Do NOT print, log, or include secret values in reports, commits, or Slack messages.

## Workflow

### 1. Query docs_404 events

Run `python3 .agents/skills/weekly-404-monitor/run_404_report.py` in the docs repo.

The script:
- Queries `warp-data-357114.prod.stg_website_events` via the Metabase API
- Extracts `broken_url` from `event_properties` for all `event_name = 'docs_404'` events in the past 7 days
- Groups by `broken_url`, sorted by hit count descending
- Returns a ranked list of broken URLs and their hit counts for the current week
- Computes the same for the prior week (days 8–14) for trend comparison
- Total weekly 404 count (current + prior) for the trend line

### 2. Fetch current vercel.json redirect sources

Fetch `vercel.json` from the docs repo (already checked out locally in the cloud environment, or via GitHub raw URL `https://raw.githubusercontent.com/warpdotdev/docs/main/vercel.json`).

Extract all `source` values from the `redirects` array. Normalise: lowercase, strip trailing slashes and anchor fragments.

### 3. Find uncovered URLs

For each broken URL in the current week's data:
- Normalise (lowercase, strip trailing slash, strip query params and fragments)
- Check if it exists as a `source` in `vercel.json` redirects
- If not covered, it is a **gap**

### 4. Compute delta vs prior week

Compare this week's uncovered gaps against last week's uncovered gaps (from step 1 prior-week query).

**New gaps** = uncovered this week AND not seen as uncovered last week.
**Resolved** = uncovered last week AND now either covered (has redirect) or no longer generating 404s.

### 5. Post Slack summary

Post a Slack message using the Block Kit format defined in the "Slack message format" section below.

If `SLACK_BOT_TOKEN` is unavailable, write the full Slack message body to the run output instead and note that Slack posting was skipped.

### 6. Write CSV artifact

Write `404-report-YYYY-MM-DD.csv` to `data/404-reports/` in the docs repo working directory. Format:

```
broken_url,hits_this_week,hits_last_week,is_covered_by_redirect,is_new_gap
/old/path,42,0,false,true
/another/path,18,22,false,false
```

Do NOT commit this file to the repo. It is an Oz run artifact only — readable from the Oz web app Runs page.

## Slack message format

Use Slack Block Kit. The message should be scannable in under 30 seconds.

```
📊 *docs.warp.dev 404 Report* — week of {YYYY-MM-DD}

*Total 404s this week:* {N} ({+N / -N vs last week})
*Uncovered broken URLs:* {M} ({+N new this week})

*Top 10 uncovered URLs (by hits):*
{hit_count} `/path` {🆕 if new this week}
...

*{K} resolved since last week* (redirect added or traffic stopped)

→ Add missing redirects: `vercel.json` › `redirects` array (PR against `main`)
→ Full breakdown: {oz_run_url}
```

Rules:
- Cap the list at 10 entries. If there are more, note "and N more — see full CSV in the run."
- Mark new gaps with 🆕.
- If total 404s this week is less than 50, add a brief positive note: "404 volume is low — good signal that redirect coverage is working."
- Never include raw user data (e.g. query strings with user IDs, tokens) in the Slack message. Strip query params from broken_url before displaying.

## Self-review before posting

Before posting to Slack, verify:
- The `docs_404` event exists in `stg_website_events` for the query window. If the table has no rows for `event_name = 'docs_404'`, it means PR #191 has not been live long enough to collect data. Post a clear "no data yet" message to Slack and end the run.
- The Metabase query completed successfully (HTTP 200, no `error` field in the response body).
- The `broken_url` field was present in the event properties for at least some rows. If it is consistently null, the `docs_404` tracking implementation has a bug — report it in the Slack message and tag the docs team.
- The vercel.json redirect list was loaded successfully and contains more than 500 entries (sanity check that the file is not truncated).
- The CSV artifact was written before posting to Slack.

## No-data report

If `stg_website_events` returns 0 rows for `event_name = 'docs_404'` in the past 7 days, post this Slack message:

```
⏳ *docs.warp.dev 404 Report* — week of {YYYY-MM-DD}

No `docs_404` events recorded yet. This is expected if PR #191 (404 instrumentation) has been live for less than a week, or if the Rudderstack write key is not set in the Vercel environment.

Check: Vercel project env vars include `PUBLIC_RUDDERSTACK_WRITE_KEY` and `PUBLIC_RUDDERSTACK_DATA_PLANE_URL`.
```

## Failure handling

- If the Metabase query fails (non-200, timeout, or query error), post a brief failure notice to Slack, include the error message, and end the run with a non-zero exit code.
- Do NOT silently swallow errors or post incomplete data as if it were complete.
- Log all HTTP requests and responses to stdout for debugging via the Oz run log viewer.

## Scheduling

This skill is designed for an Oz scheduled agent with a weekly cron trigger: every Monday at 9am PT (`0 17 * * 1` in UTC).

To deploy:
1. Push this skill to `main` in the docs repo.
2. Verify the **`buzz`** Oz environment (oz.warp.dev → Environments) has these secrets set:
- `METABASE_API_KEY` — Metabase API key for BigQuery
- `SLACK_BOT_TOKEN` — Slack bot token
- `SLACK_CHANNEL_ID` — ID for `#growth-docs` (right-click channel in Slack → Copy link; the ID starts with `C`)
3. In the Oz web app (oz.warp.dev), create a new scheduled agent:
- **Skill**: `weekly-404-monitor` from `warpdotdev/docs`
- **Schedule**: `0 17 * * 1` (UTC) = 9am PT (Mondays)
- **Environment**: `buzz` (already has `warpdotdev/docs` checked out)
- **Branch**: `main`
251 changes: 251 additions & 0 deletions .agents/skills/weekly-404-monitor/run_404_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
#!/usr/bin/env python3
"""
Weekly 404 monitor — data collection script.

Queries the docs_404 Rudderstack track event from stg_website_events via
the Metabase API, diffs against vercel.json redirect sources, and writes:
- JSON report to stdout (for the agent to parse)
- CSV artifact to data/404-reports/YYYY-MM-DD.csv

Usage (called by the weekly-404-monitor skill):
python3 .agents/skills/weekly-404-monitor/run_404_report.py

Required env vars:
METABASE_API_KEY — Metabase API key

Optional env vars:
VERCEL_JSON_PATH — Path to vercel.json (default: ./vercel.json)
REPORT_DIR — Output directory for CSV artifacts (default: ./data/404-reports)
"""

import csv
import json
import os
import re
import sys
import urllib.error
import urllib.request
from datetime import date, timedelta
from pathlib import Path


BASE = "https://warp.metabaseapp.com/api"
DB_ID = 2 # BigQuery prod


def metabase_headers():
key = os.environ.get("METABASE_API_KEY")
if not key:
print("ERROR: METABASE_API_KEY is not set.", file=sys.stderr)
sys.exit(1)
return {"X-API-Key": key, "Content-Type": "application/json"}


def run_query(sql: str) -> list[dict]:
"""Execute a BigQuery SQL query via the Metabase /dataset endpoint."""
headers = metabase_headers()
body = json.dumps({
"database": DB_ID,
"type": "native",
"native": {"query": sql},
}).encode()
req = urllib.request.Request(f"{BASE}/dataset", data=body, headers=headers)
try:
with urllib.request.urlopen(req, timeout=120) as resp:
result = json.loads(resp.read())
except urllib.error.HTTPError as e:
print(f"ERROR: Metabase query failed: HTTP {e.code}: {e.read().decode()[:500]}",
file=sys.stderr)
sys.exit(1)

if result.get("error"):
print(f"ERROR: Metabase query error: {result['error']}", file=sys.stderr)
sys.exit(1)

data = result.get("data", {})
cols = [c["name"] for c in data.get("cols", [])]
rows = data.get("rows", [])
return [dict(zip(cols, row)) for row in rows]


def query_404_events(days_start: int, days_end: int) -> list[dict]:
"""
Return broken_url counts for the window [days_start, days_end) days ago.
days_start=1, days_end=8 → past 7 days (current week)
days_start=8, days_end=15 → 8-14 days ago (prior week)
"""
sql = f"""
SELECT
REGEXP_REPLACE(
SPLIT(JSON_VALUE(event_properties, '$.broken_url'), '?')[OFFSET(0)],
r'#.*$', ''
) AS broken_url,
COUNT(*) AS hits
FROM `warp-data-357114.prod.stg_website_events`
WHERE event_type = 'track'
AND event_name = 'docs_404'
AND JSON_VALUE(event_properties, '$.broken_url') IS NOT NULL
AND event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {days_end - 1} DAY)
AND event_date < DATE_SUB(CURRENT_DATE(), INTERVAL {days_start - 1} DAY)
GROUP BY 1
HAVING broken_url IS NOT NULL AND broken_url != ''
ORDER BY 2 DESC
LIMIT 500
"""
return run_query(sql)


def total_404_count(days_start: int, days_end: int) -> int:
sql = f"""
SELECT COUNT(*) AS total
FROM `warp-data-357114.prod.stg_website_events`
WHERE event_type = 'track'
AND event_name = 'docs_404'
AND event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {days_end - 1} DAY)
AND event_date < DATE_SUB(CURRENT_DATE(), INTERVAL {days_start - 1} DAY)
"""
rows = run_query(sql)
return int(rows[0]["total"]) if rows else 0


def load_redirect_sources(vercel_json_path: Path) -> set[str]:
"""Load all redirect source paths from vercel.json, normalised."""
if not vercel_json_path.exists():
# Try fetching from GitHub
url = "https://raw.githubusercontent.com/warpdotdev/docs/main/vercel.json"
try:
with urllib.request.urlopen(url, timeout=10) as resp:
data = json.loads(resp.read())
except Exception as e:
print(f"ERROR: Could not load vercel.json from disk or GitHub: {e}",
file=sys.stderr)
sys.exit(1)
else:
with open(vercel_json_path) as f:
data = json.load(f)

redirects = data.get("redirects", [])
if len(redirects) < 500:
print(f"WARNING: vercel.json has only {len(redirects)} redirects — "
"sanity check failed (expected 500+). Data may be incomplete.",
file=sys.stderr)

sources = set()
for r in redirects:
src = r.get("source", "").lower().rstrip("/").split("#")[0].split("?")[0]
sources.add(src)
return sources


def normalise_url(url: str) -> str:
"""Normalise a broken URL for comparison against vercel.json sources."""
if not url:
return ""
# Extract just the path (no scheme/host)
url = re.sub(r"^https?://[^/]+", "", url)
# Strip query params and fragments
url = url.split("?")[0].split("#")[0]
# Lowercase, strip trailing slash
url = url.lower().rstrip("/")
return url or "/"


def main():
vercel_path = Path(os.environ.get("VERCEL_JSON_PATH", "vercel.json"))
report_dir = Path(os.environ.get("REPORT_DIR", "data/404-reports"))
today = date.today()

print(f"Running weekly 404 report for week ending {today}", file=sys.stderr)

# 1. Query current and prior week
print("Querying current week (past 7 days)...", file=sys.stderr)
current_week = query_404_events(1, 8)
print(f" {len(current_week)} unique broken URLs found", file=sys.stderr)

print("Querying prior week (days 8-14)...", file=sys.stderr)
prior_week = query_404_events(8, 15)

total_current = total_404_count(1, 8)
total_prior = total_404_count(8, 15)

# 2. Load redirect sources
print("Loading vercel.json redirect sources...", file=sys.stderr)
redirect_sources = load_redirect_sources(vercel_path)

# 3. Build prior-week gap set for delta calculation
prior_gaps: set[str] = set()
for row in prior_week:
norm = normalise_url(row["broken_url"])
if norm and norm not in redirect_sources:
prior_gaps.add(norm)

# 4. Build current-week report
report_rows = []
for row in current_week:
raw_url = row.get("broken_url") or ""
norm = normalise_url(raw_url)
if not norm:
continue
hits_current = int(row.get("hits") or 0)
hits_prior = next(
(int(r["hits"]) for r in prior_week
if normalise_url(r.get("broken_url") or "") == norm),
0
)
is_covered = norm in redirect_sources
is_new_gap = (not is_covered) and (norm not in prior_gaps)

report_rows.append({
"broken_url": norm,
"hits_this_week": hits_current,
"hits_last_week": hits_prior,
"is_covered_by_redirect": is_covered,
"is_new_gap": is_new_gap,
})

# 5. Compute resolved (was a gap last week, is no longer generating hits)
current_urls = {normalise_url(r["broken_url"]) for r in current_week}
newly_covered = {
g for g in prior_gaps
if g in redirect_sources # redirect was added
}
traffic_stopped = {
g for g in prior_gaps
if g not in current_urls and g not in redirect_sources # stopped naturally
}
resolved_count = len(newly_covered) + len(traffic_stopped)

# 6. Write CSV artifact
report_dir.mkdir(parents=True, exist_ok=True)
csv_path = report_dir / f"404-report-{today.isoformat()}.csv"
with open(csv_path, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=[
"broken_url", "hits_this_week", "hits_last_week",
"is_covered_by_redirect", "is_new_gap"
])
writer.writeheader()
writer.writerows(report_rows)
print(f"Wrote CSV artifact: {csv_path}", file=sys.stderr)

# 7. Output JSON summary to stdout for the agent
uncovered = [r for r in report_rows if not r["is_covered_by_redirect"]]
new_gaps = [r for r in uncovered if r["is_new_gap"]]

summary = {
"report_date": today.isoformat(),
"total_404s_this_week": total_current,
"total_404s_last_week": total_prior,
"trend_delta": total_current - total_prior,
"uncovered_count": len(uncovered),
"new_gaps_count": len(new_gaps),
"resolved_count": resolved_count,
"top_10_uncovered": uncovered[:10],
"csv_path": str(csv_path),
"has_data": len(current_week) > 0,
}

print(json.dumps(summary, indent=2))


if __name__ == "__main__":
main()
Loading
Loading