From 05bf1b52b3734052454765b9a1d94b5fdb4a65a1 Mon Sep 17 00:00:00 2001
From: Brendan O'Leary <brendan@olearycrew.com>
Date: Thu, 5 Mar 2026 11:30:13 -0500
Subject: [PATCH] Add plan for gathering data for public profile

---
 plans/public-profile-cache.md | 239 ++++++++++++++++++++++++++++++++++
 1 file changed, 239 insertions(+)
 create mode 100644 plans/public-profile-cache.md

diff --git a/plans/public-profile-cache.md b/plans/public-profile-cache.md
new file mode 100644
index 000000000..40ed34fd2
--- /dev/null
+++ b/plans/public-profile-cache.md
@@ -0,0 +1,239 @@
+# Public Profile: Caching Strategy for `microdollar_usage`
+
+## Problem
+
+The public profile feature needs aggregated per-user, per-feature usage data (request counts, token totals, daily activity heatmaps, streaks). The source data lives in `microdollar_usage` joined with `microdollar_usage_metadata` — a massive, high-write-volume table that cannot be queried directly for every profile view or even on a periodic cron without creating load problems.
+
+This document evaluates four caching strategies and recommends one.
+
+---
+
+## Context: Existing Patterns
+
+The codebase already has one write-time pre-aggregation pattern: `organization_user_usage`. On every insert into `microdollar_usage`, `ingestOrganizationTokenUsage()` upserts a daily per-user per-org cost summary via `ON CONFLICT DO UPDATE SET microdollar_usage += cost`. This avoids ever needing to re-aggregate `microdollar_usage` for org balance checks.
+
+The `user_period_cache` table exists in the schema with shareability columns (`shared_url_token`, `shared_at`) and a JSONB `data` column, but has zero read/write application code — only the GDPR `softDeleteUser` deletes from it.
+
+All current usage-display endpoints (`/api/profile/usage`, `user.getAutocompleteMetrics`, org usage details) query `microdollar_usage` directly with `SUM()`/`COUNT()` aggregations. These will also degrade as the table grows, independent of the public profile feature.
+
+---
+
+## Options Evaluated
+
+### Option 1: Write-time upsert into a daily summary table
+
+On every insert into `microdollar_usage`, also upsert into a new `user_feature_daily_usage` table:
+
+```sql
+INSERT INTO user_feature_daily_usage (kilo_user_id, feature_id, usage_date, request_count, total_tokens)
+VALUES ($1, $2, CURRENT_DATE, 1, $3)
+ON CONFLICT (kilo_user_id, feature_id, usage_date)
+DO UPDATE SET
+  request_count = user_feature_daily_usage.request_count + 1,
+  total_tokens = user_feature_daily_usage.total_tokens + EXCLUDED.total_tokens
+```
+
+Added as a CTE in the existing `insertUsageAndMetadataWithBalanceUpdate()` in `processUsage.ts`.
+
+**Pros:**
+
+- Proven pattern — mirrors `organization_user_usage` in the same codebase, same write path
+- Zero query cost at read time — the summary table IS the cache
+- No cron needed for freshness — always up-to-date
+- Tiny table: users x features x days (a heavy user with all 11 features active every day for a year = ~4,000 rows)
+- Heatmap, streak, and active_days calculations become trivial queries on the small table
+
+**Cons:**
+
+- Adds ~1 upsert per LLM request to the hot write path (but `organization_user_usage` already does exactly this)
+- Requires one-time backfill from existing data
+- New table + migration
+
+---
+
+### Option 2: Cron-based aggregation into `user_period_cache` only
+
+The approach from the original plan: a scheduled job queries `microdollar_usage` + `microdollar_usage_metadata` per user and writes the aggregated JSON to `user_period_cache`.
+
+**Pros:**
+
+- No changes to the write path
+- Uses existing `user_period_cache` table as-is
+
+**Cons:**
+
+- Still queries `microdollar_usage` on every refresh cycle, just less often
+- Incremental merge for `active_days` / `COUNT(DISTINCT date)` / streaks requires storing intermediate state (full date sets) in the JSONB — gets large for heavy users
+- Cron batching N users x (aggregation query + heatmap query) creates periodic load spikes on Postgres
+- The join with `microdollar_usage_metadata` is expensive — `microdollar_usage_metadata` has no index on `(id, feature_id)`
+- Scales poorly: more users opting in = longer cron runs
+
+---
+
+### Option 3: Postgres materialized view
+
+```sql
+CREATE MATERIALIZED VIEW user_feature_daily_mv AS
+SELECT
+  mu.kilo_user_id,
+  mum.feature_id,
+  mum.created_at::date AS usage_date,
+  COUNT(*) AS request_count,
+  SUM(mu.input_tokens + mu.output_tokens) AS total_tokens
+FROM microdollar_usage mu
+JOIN microdollar_usage_metadata mum ON mum.id = mu.id
+GROUP BY mu.kilo_user_id, mum.feature_id, mum.created_at::date;
+```
+
+Refreshed via `pg_cron` or Vercel cron calling `REFRESH MATERIALIZED VIEW CONCURRENTLY`.
+
+**Pros:**
+
+- No changes to the write path
+- SQL-native, no application logic for aggregation
+
+**Cons:**
+
+- `REFRESH` re-reads the entire join — this IS the full table scan problem, just scheduled
+- No incremental refresh in Postgres — the entire view is recomputed each time
+- The materialized view itself becomes large (every user x feature x day)
+- `CONCURRENTLY` requires a unique index on the view
+- Tight coupling to Postgres internals — harder to reason about in application code
+- Doesn't compose with `user_period_cache` for sharing URLs
+
+---
+
+### Option 4: Hybrid — write-time daily summary + lazy `user_period_cache` (recommended)
+
+Combines Option 1's write-time daily summary with on-demand computation for `user_period_cache`:
+
+1. **Write path**: Upsert into `user_feature_daily_usage` on every insert (same as Option 1)
+2. **Read path**: When a public profile is requested, check `user_period_cache` for a fresh entry. If stale (>6h) or missing, recompute from `user_feature_daily_usage` (fast) and write to `user_period_cache`.
+3. **No cron**: Cache is lazily populated on first access and refreshed on subsequent accesses if stale.
+
+The lazy refresh is fast because it reads from the pre-aggregated daily summary, not from `microdollar_usage`. Computing streaks, active_days, and heatmap from a few hundred rows per user takes <10ms.
+
+**Pros:**
+
+- Everything from Option 1, plus:
+- No cron job to manage
+- Only active profiles get computed — zero wasted work for profiles nobody visits
+- Cache stays warm for popular profiles
+- Stale-while-revalidate possible: serve stale cache immediately, trigger async refresh
+- `user_period_cache` still serves its purpose for sharing URLs and pre-rendered JSON
+
+**Cons:**
+
+- Same write-path cost as Option 1 (1 extra upsert per request)
+- First access after staleness window has slightly higher latency to recompute from daily summary (~50-100ms)
+- Same backfill requirement as Option 1
+
+---
+
+## Recommendation: Option 4
+
+Option 4 is the strongest choice for this codebase:
+
+1. **Completely eliminates querying `microdollar_usage` for public profiles.** The daily summary absorbs aggregation cost incrementally at write time, amortized across millions of requests. Reading ~4K rows per user per year from the summary table is effectively free.
+
+2. **Follows the existing pattern.** `organization_user_usage` already proves this upsert-on-write approach works in this codebase, in this write path, with this traffic.
+
+3. **No cron means no batch load spikes.** The cron approach concentrates N expensive queries into a burst. Lazy computation spreads load naturally with actual demand.
+
+4. **Handles inactive users for free.** If nobody visits a profile, no work happens. No need for heuristics like "stop refreshing after N days of inactivity."
+
+5. **The daily summary table is independently useful.** Beyond public profiles, it can replace the full-table scans in `/api/profile/usage` (which currently aggregates ALL user history from `microdollar_usage` per request) and `user.getAutocompleteMetrics`. This is infrastructure, not a profile-specific hack.
+
+---
+
+## Implementation Sketch
+
+### New table: `user_feature_daily_usage`
+
+```
+user_feature_daily_usage
+  kilo_user_id   text      NOT NULL
+  feature_id     integer   NOT NULL  -- FK -> feature
+  usage_date     date      NOT NULL
+  request_count  integer   NOT NULL  DEFAULT 0
+  total_tokens   bigint    NOT NULL  DEFAULT 0
+
+  UNIQUE (kilo_user_id, feature_id, usage_date)
+  INDEX  (kilo_user_id, usage_date)
+```
+
+### Write path change
+
+Add one CTE to the existing `insertUsageAndMetadataWithBalanceUpdate()` in `processUsage.ts`:
+
+```sql
+, daily_usage_upsert AS (
+  INSERT INTO user_feature_daily_usage
+    (kilo_user_id, feature_id, usage_date, request_count, total_tokens)
+  SELECT
+    $kilo_user_id,
+    (SELECT feature_id FROM feature_cte),
+    CURRENT_DATE,
+    1,
+    $input_tokens + $output_tokens
+  ON CONFLICT (kilo_user_id, feature_id, usage_date)
+  DO UPDATE SET
+    request_count = user_feature_daily_usage.request_count + 1,
+    total_tokens  = user_feature_daily_usage.total_tokens + EXCLUDED.total_tokens
+)
+```
+
+The `feature_cte` already exists in the CTE chain. Rows where `feature_id` is null (untagged usage) are excluded — the `SELECT` returns no rows if the subquery is null, so the upsert is a no-op.
+
+### Read path
+
+A function `computePublicProfileData(userId: string)` that:
+
+1. Reads all `user_feature_daily_usage` rows for the user (365-day window or full history)
+2. Joins with the `feature` table to get feature names
+3. Computes in TypeScript: per-feature stats, heatmap, streaks, active_days, totals
+4. Writes the result to `user_period_cache` with `cache_type = 'public_profile'`
+
+The `publicProfile.get` tRPC procedure:
+
+1. Looks up `user_period_cache` by `shared_url_token`
+2. If fresh (computed_at within 6h), returns `data` directly
+3. If stale or missing, calls `computePublicProfileData()`, then returns the result
+
+### Backfill
+
+A one-time migration script that aggregates existing `microdollar_usage` + `microdollar_usage_metadata` into `user_feature_daily_usage`. Process in date-range batches (e.g., one month at a time) to avoid long-running transactions:
+
+```sql
+INSERT INTO user_feature_daily_usage (kilo_user_id, feature_id, usage_date, request_count, total_tokens)
+SELECT
+  mu.kilo_user_id,
+  mum.feature_id,
+  mum.created_at::date,
+  COUNT(*),
+  SUM(mu.input_tokens + mu.output_tokens)
+FROM microdollar_usage mu
+JOIN microdollar_usage_metadata mum ON mum.id = mu.id
+WHERE mum.feature_id IS NOT NULL
+  AND mum.created_at >= $batch_start
+  AND mum.created_at < $batch_end
+GROUP BY mu.kilo_user_id, mum.feature_id, mum.created_at::date
+ON CONFLICT (kilo_user_id, feature_id, usage_date)
+DO UPDATE SET
+  request_count = user_feature_daily_usage.request_count + EXCLUDED.request_count,
+  total_tokens  = user_feature_daily_usage.total_tokens + EXCLUDED.total_tokens;
+```
+
+### GDPR
+
+`user_feature_daily_usage` contains `kilo_user_id`, so `softDeleteUser` in `src/lib/user.ts` must delete from it:
+
+```typescript
+await tx.delete(user_feature_daily_usage).where(eq(user_feature_daily_usage.kilo_user_id, userId));
+```
+
+Add a corresponding test in `src/lib/user.test.ts`.
+
+### Nullable `feature_id`
+
+`microdollar_usage_metadata.feature_id` is nullable. Rows without a `feature_id` are excluded from the daily summary (the CTE `SELECT` returns no rows when the feature subquery is null). These are likely old records from before feature tracking was added and should not appear on public profiles.