Skip to content

Add climate news seed and ListClimateNews RPC#2532

Merged
koala73 merged 5 commits intomainfrom
feat/climate-add-climate-news-intelligence
Apr 2, 2026
Merged

Add climate news seed and ListClimateNews RPC#2532
koala73 merged 5 commits intomainfrom
feat/climate-add-climate-news-intelligence

Conversation

@FayezBast
Copy link
Copy Markdown
Collaborator

@FayezBast FayezBast commented Mar 30, 2026

Summary

Adds seed-climate-news.mjs to aggregate 9 authoritative climate/environment RSS feeds into climate:news-intelligence:v1, wires a 30-minute relay seed loop, and exposes the data through a new ListClimateNews climate proto RPC and server handler.

Not included in this PR:

  • MCP get_climate_data expansion to include climate:news-intelligence:v1
  • Bootstrap hydration registration for the new climate news key

Fixes #2469
Fixes #2560

Type of change

  • New feature
  • New data source / feed
  • Documentation

Affected areas

  • News panels / RSS feeds
  • API endpoints (/api/*)
  • Other: Climate proto/service seed pipeline

@mintlify
Copy link
Copy Markdown

mintlify Bot commented Mar 30, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
WorldMonitor 🟢 Ready View Preview Mar 30, 2026, 1:36 AM

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 2, 2026 4:54am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 30, 2026

Greptile Summary

This PR wires a full climate news intelligence pipeline: 9 RSS feeds are aggregated every 30 minutes by a new seed-climate-news.mjs script (spawned from the ais-relay.cjs relay loop), stored in Redis under climate:news-intelligence:v1, and exposed via a new ListClimateNews gRPC-over-HTTP endpoint backed by proto definitions and a generated TypeScript server/client. The change follows the established seed → Redis → RPC handler pattern used throughout the codebase.

Key findings:

  • P1 — intervalMin mismatch in seed-health.js: The health monitoring entry records intervalMin: 45 while the relay loop fires every 30 minutes. This will cause the seed-health endpoint to report incorrect staleness status. health.js already uses the correct 3× 30-minute window (maxStaleMin: 90), making the two health files inconsistent.
  • P2 — Missing bootstrap hydration: Per AGENTS.md, new data sources must be wired into api/bootstrap.js. The PR author explicitly defers this, but the gap means the first cold-cache page load will not benefit from pre-fetched data, and the convention is left untracked.
  • P2 — Atom fallback scope: The <entry> fallback in parseRssItems triggers only when zero <item> elements were found across all 9 feeds combined, not per-feed. Future Atom-only feeds added to FEEDS would silently return no items.
  • P2 — CDATA regex edge case: extractTag uses optional CDATA markers, so a tag whose content includes a literal ]]> substring would be silently truncated. Rare in well-formed RSS, but worth hardening.

Confidence Score: 4/5

Safe to merge after fixing the intervalMin mismatch in seed-health.js; everything else is minor.

One P1 issue: seed-health.js records intervalMin: 45 while the relay fires every 30 minutes, causing incorrect staleness monitoring. All remaining findings are P2 and do not block correctness of the happy path.

api/seed-health.jsintervalMin should be 30 to match the relay loop.

Important Files Changed

Filename Overview
scripts/seed-climate-news.mjs New seed script fetching 9 RSS feeds into climate:news-intelligence:v1; Atom fallback is per-batch (not per-feed) and the CDATA regex has a truncation edge-case.
api/seed-health.js Registers climate:news-intelligence health entry with intervalMin: 45, but the relay loop fires every 30 minutes — creates incorrect staleness monitoring.
server/worldmonitor/climate/v1/list-climate-news.ts Clean read-through RPC handler using getCachedJson; correctly falls back to empty response on cache miss or error.
scripts/ais-relay.cjs Adds 30-minute climate news seed loop delegating to the standalone .mjs script via execFile; in-flight guard prevents concurrent runs.
api/health.js Adds climateNews to both STANDALONE_KEYS and SEED_META with maxStaleMin: 90 (3× the 30-min interval) — correct.
server/gateway.ts Registers the new route at cache tier slow (30 min) — consistent with the 30-minute seed interval.
proto/worldmonitor/climate/v1/climate_news_item.proto New proto message for ClimateNewsItem; int64 fields correctly annotated with INT64_ENCODING_NUMBER.
proto/worldmonitor/climate/v1/service.proto Adds ListClimateNews RPC to ClimateService with correct HTTP GET annotation.
src/generated/server/worldmonitor/climate/v1/service_server.ts Generated server stub routes GET /api/climate/v1/list-climate-news to handler.listClimateNews — consistent with other routes.

Sequence Diagram

sequenceDiagram
    participant Relay as ais-relay.cjs (Railway)
    participant Seed as seed-climate-news.mjs
    participant RSS as RSS Feeds (x9)
    participant Redis as Upstash Redis
    participant Edge as Vercel Edge Function
    participant Client as Browser Client

    loop Every 30 minutes
        Relay->>Seed: execFile(node, seed-climate-news.mjs)
        Seed->>RSS: fetch() x9 feeds (15s timeout each)
        RSS-->>Seed: XML responses
        Seed->>Seed: parseRssItems() / dedup / sort
        Seed->>Redis: SET climate:news-intelligence:v1 (TTL 1800s)
        Seed->>Redis: SET seed-meta:climate:news-intelligence
        Seed-->>Relay: exit 0
    end

    Client->>Edge: GET /api/climate/v1/list-climate-news
    Edge->>Redis: getCachedJson(climate:news-intelligence:v1)
    Redis-->>Edge: { items[], fetchedAt }
    Edge-->>Client: ListClimateNewsResponse (JSON)
Loading

Comments Outside Diff (2)

  1. api/seed-health.js, line 29 (link)

    P1 intervalMin mismatch with actual relay loop

    The entry records intervalMin: 45, but ais-relay.cjs sets the climate news relay to fire every 30 minutes. The mismatch means the seed-health endpoint computes staleness on a 45-minute cadence while updates actually arrive on a 30-minute cadence — potentially producing incorrect "overdue" status in health monitoring.

    For reference, health.js correctly uses maxStaleMin: 90 (3× the 30-minute interval), making the two health files internally inconsistent with each other.

  2. api/bootstrap.js, line 9-80 (link)

    P2 Missing bootstrap hydration — violates AGENTS.md convention

    AGENTS.md documents an explicit rule:

    New data sources MUST have bootstrap hydration wired in api/bootstrap.js

    The climate:news-intelligence:v1 key is not registered in BOOTSTRAP_CACHE_KEYS. The PR description acknowledges this is intentionally deferred, but it means the first cold-cache page load will block on the RPC call rather than using pre-fetched data, and the convention is left untracked without a follow-up ticket reference.

    Context Used: AGENTS.md (source)

Reviews (1): Last reviewed commit: "Add climate news seed and ListClimateNew..." | Re-trigger Greptile

Comment thread scripts/seed-climate-news.mjs Outdated
Comment on lines +92 to +110
if (items.length === 0) {
const entryRe = /<entry\b[^>]*>([\s\S]*?)<\/entry>/gi;
while ((match = entryRe.exec(bounded)) !== null) {
const block = match[1];
const title = decodeHtmlEntities(extractTag(block, 'title'));
const url = extractLink(block);
const publishedAt = parseDateMs(block);
const rawSummary = extractTag(block, 'summary') || extractTag(block, 'content');
if (!title || !url || !publishedAt) continue;
items.push({
id: `${stableHash(url)}-${publishedAt}`,
title,
url,
sourceName,
publishedAt,
summary: cleanSummary(rawSummary),
});
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Atom fallback only fires when all RSS <item> blocks return zero items across the entire feed

The Atom <entry> fallback is gated on if (items.length === 0). This means it is only attempted when the entire feed contained no <item> elements. For feeds that publish a mix of both (unusual but valid), Atom entries would be silently dropped. More practically, all 9 feeds in FEEDS are standard RSS feeds, so this isn't an immediate bug — but any future feed addition that uses Atom will silently return 0 items unless this gate is removed.

Consider restructuring to always attempt both parsers and merge results (deduplicating on id), or at minimum add a comment explaining that the fallback is per-feed, not per-batch.

Comment on lines +40 to +43
function extractTag(block, tagName) {
const re = new RegExp(`<${tagName}[^>]*>(?:<!\\[CDATA\\[)?([\\s\\S]*?)(?:\\]\\]>)?<\\/${tagName}>`, 'i');
return (block.match(re) || [])[1]?.trim() || '';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 extractTag regex mishandles CDATA sections with embedded ]]> content

The current regex makes the CDATA start/end markers optional via (?:..)?:

<${tagName}[^>]*>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/${tagName}>

Because both the opening <![CDATA[ and closing ]]> markers are optional and independent, a tag such as <title><![CDATA[Breaking news ]]> & more]]></title> will capture only Breaking news (stopping at the first ]]>), silently truncating the title. The fix is to use two separate branches — one for CDATA content and one for plain text — rather than making both markers optional.

@FayezBast
Copy link
Copy Markdown
Collaborator Author

FayezBast commented Mar 30, 2026

p1 overclaims impact.
the configured intervalMin is wrong in seed-health.js,
but the actual stale threshold still ends up matching health.js at 90 minutes (45*2)

@SebastienMelki
Copy link
Copy Markdown
Collaborator

@FayezBast — good pipeline addition. A few things before merging:

  1. seed-health.js intervalMin: Change from 45 to 30 to match the actual relay loop interval. The two health files should tell the same story.
  2. Bootstrap deferral: Please open a follow-up issue for wiring climate:news-intelligence:v1 into api/bootstrap.js and link it here.
  3. Atom fallback scope: The entry fallback triggers only when zero item elements are found across ALL feeds, not per-feed. Any future Atom-only feed would silently return no items. Not a blocker but worth noting.

Otherwise the pipeline design is solid.

koala73
koala73 previously requested changes Apr 2, 2026
Copy link
Copy Markdown
Owner

@koala73 koala73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #2532 (climate news seed + ListClimateNews RPC)

Good structure overall: ais-relay subprocess delegation, proto/RPC plumbing, and health wiring all follow existing patterns. Two blocking issues before this can merge.

P1 — Blocks merge

#107 — CACHE_TTL = 30min = 1× interval (gold standard violation)
seed-climate-news.mjs sets CACHE_TTL = 1800, which equals exactly one relay interval. Any seed delay lets the key expire before the next run. Fix: CACHE_TTL = 5400 (90min = 3×).

#108 — Merge conflict with PR #2531
This PR was branched before #2531's fix commit bumped climate:anomalies:v1v2. GitHub shows CONFLICTING. After #2531 merges, rebase onto main and accept v2 in both api/bootstrap.js and server/_shared/cache-keys.ts.

P2 — Should fix

#109 — No retry timer on subprocess failure
seedClimateNews() catches errors and logs, but doesn't schedule a retry. Gold standard is retry in 20min on failure. With TTL = 30min (even after the fix to 90min), a failure mid-run still degrades freshness. Pattern: climateNewsRetryTimer = setTimeout(() => seedClimateNews(), CLIMATE_NEWS_RETRY_MS) in the catch block.

#110 — Hard-coded Redis key in list-climate-news.ts
const SEED_CACHE_KEY = 'climate:news-intelligence:v1' should import a named constant from cache-keys.ts (see get-co2-monitoring.ts from #2531 for the pattern).

#111climateNews in PENDING_CONSUMERS without a frontend consumer planned
climateNews is added to both BOOTSTRAP_CACHE_KEYS (50KB of headlines pushed on every cold start) and PENDING_CONSUMERS (test suppression). The PR body says bootstrap registration isn't included, but it is — these are contradictory. Either wire a frontend consumer in this PR, or move climateNews to SEED_ONLY_BOOTSTRAP_CACHE_KEYS and remove from bootstrap until the panel is built.

Deployment order note

This PR depends on #2531 being merged and Railway crons set up first (zone-normals monthly + CO2 daily). See todo #101 in the project for the full Railway setup steps.

FayezBast and others added 4 commits April 2, 2026 08:32
… constant

- CACHE_TTL: 1800 to 5400 (90min = 3x 30-min relay interval, gold standard)
- ais-relay: add 20-min retry timer on subprocess failure; clear on success
- cache-keys.ts: export CLIMATE_NEWS_KEY named constant
- list-climate-news.ts: import CLIMATE_NEWS_KEY instead of hard-coding string
@koala73 koala73 dismissed their stale review April 2, 2026 04:52

I fixed it

@koala73 koala73 merged commit b2bae30 into main Apr 2, 2026
8 checks passed
@koala73 koala73 deleted the feat/climate-add-climate-news-intelligence branch April 2, 2026 04:55
koala73 added a commit that referenced this pull request Apr 2, 2026
…Dockerfile

Yahoo Finance and CoinPaprika fail from Railway datacenter IPs (rate
limiting). Added PROXY_URL fallback to fetchYahooChartDirect (used by
5 seeders) and relay chart proxy endpoint. Added shared
_fetchCoinPaprikaTickers with proxy fallback + 5min cache (3 crypto
seeders share one fetch). Added CoinPaprika fallback to CryptoSectors
(previously had none).

Isolated OREF_PROXY_AUTH exclusively for OREF alerts. OpenSky,
seed-military-flights, and _proxy-utils now fall back to PROXY_URL
instead of the expensive IL-exit proxy.

Added seed-climate-news.mjs + _seed-utils.mjs COPY to Dockerfile.relay
(missing since PR #2532). Added pizzint bootstrap hydration to
cache-keys.ts, bootstrap.js, and src/services/pizzint.ts.
koala73 added a commit that referenced this pull request Apr 2, 2026
* fix(relay): proxy fallback for Yahoo/Crypto, isolate OREF proxy, fix Dockerfile

Yahoo Finance and CoinPaprika fail from Railway datacenter IPs (rate
limiting). Added PROXY_URL fallback to fetchYahooChartDirect (used by
5 seeders) and relay chart proxy endpoint. Added shared
_fetchCoinPaprikaTickers with proxy fallback + 5min cache (3 crypto
seeders share one fetch). Added CoinPaprika fallback to CryptoSectors
(previously had none).

Isolated OREF_PROXY_AUTH exclusively for OREF alerts. OpenSky,
seed-military-flights, and _proxy-utils now fall back to PROXY_URL
instead of the expensive IL-exit proxy.

Added seed-climate-news.mjs + _seed-utils.mjs COPY to Dockerfile.relay
(missing since PR #2532). Added pizzint bootstrap hydration to
cache-keys.ts, bootstrap.js, and src/services/pizzint.ts.

* fix(relay): address review — remove unused reverseMap, guard double proxy

- Remove dead reverseMap identity map in CryptoSectors Paprika fallback
- Add _proxied flag to handleYahooChartRequest._tryProxy to prevent
  double proxy call on timeout→destroy→error sequence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow-up: wire climate:news-intelligence:v1 into api/bootstrap.js feat(climate): add climate news intelligence seeder (9 RSS feeds)

3 participants