Add climate news seed and ListClimateNews RPC#2532
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR wires a full climate news intelligence pipeline: 9 RSS feeds are aggregated every 30 minutes by a new Key findings:
Confidence Score: 4/5Safe to merge after fixing the One P1 issue:
Important Files Changed
Sequence DiagramsequenceDiagram
participant Relay as ais-relay.cjs (Railway)
participant Seed as seed-climate-news.mjs
participant RSS as RSS Feeds (x9)
participant Redis as Upstash Redis
participant Edge as Vercel Edge Function
participant Client as Browser Client
loop Every 30 minutes
Relay->>Seed: execFile(node, seed-climate-news.mjs)
Seed->>RSS: fetch() x9 feeds (15s timeout each)
RSS-->>Seed: XML responses
Seed->>Seed: parseRssItems() / dedup / sort
Seed->>Redis: SET climate:news-intelligence:v1 (TTL 1800s)
Seed->>Redis: SET seed-meta:climate:news-intelligence
Seed-->>Relay: exit 0
end
Client->>Edge: GET /api/climate/v1/list-climate-news
Edge->>Redis: getCachedJson(climate:news-intelligence:v1)
Redis-->>Edge: { items[], fetchedAt }
Edge-->>Client: ListClimateNewsResponse (JSON)
|
| if (items.length === 0) { | ||
| const entryRe = /<entry\b[^>]*>([\s\S]*?)<\/entry>/gi; | ||
| while ((match = entryRe.exec(bounded)) !== null) { | ||
| const block = match[1]; | ||
| const title = decodeHtmlEntities(extractTag(block, 'title')); | ||
| const url = extractLink(block); | ||
| const publishedAt = parseDateMs(block); | ||
| const rawSummary = extractTag(block, 'summary') || extractTag(block, 'content'); | ||
| if (!title || !url || !publishedAt) continue; | ||
| items.push({ | ||
| id: `${stableHash(url)}-${publishedAt}`, | ||
| title, | ||
| url, | ||
| sourceName, | ||
| publishedAt, | ||
| summary: cleanSummary(rawSummary), | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Atom fallback only fires when all RSS
<item> blocks return zero items across the entire feed
The Atom <entry> fallback is gated on if (items.length === 0). This means it is only attempted when the entire feed contained no <item> elements. For feeds that publish a mix of both (unusual but valid), Atom entries would be silently dropped. More practically, all 9 feeds in FEEDS are standard RSS feeds, so this isn't an immediate bug — but any future feed addition that uses Atom will silently return 0 items unless this gate is removed.
Consider restructuring to always attempt both parsers and merge results (deduplicating on id), or at minimum add a comment explaining that the fallback is per-feed, not per-batch.
| function extractTag(block, tagName) { | ||
| const re = new RegExp(`<${tagName}[^>]*>(?:<!\\[CDATA\\[)?([\\s\\S]*?)(?:\\]\\]>)?<\\/${tagName}>`, 'i'); | ||
| return (block.match(re) || [])[1]?.trim() || ''; | ||
| } |
There was a problem hiding this comment.
extractTag regex mishandles CDATA sections with embedded ]]> content
The current regex makes the CDATA start/end markers optional via (?:..)?:
<${tagName}[^>]*>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/${tagName}>
Because both the opening <![CDATA[ and closing ]]> markers are optional and independent, a tag such as <title><![CDATA[Breaking news ]]> & more]]></title> will capture only Breaking news (stopping at the first ]]>), silently truncating the title. The fix is to use two separate branches — one for CDATA content and one for plain text — rather than making both markers optional.
|
p1 overclaims impact. |
|
@FayezBast — good pipeline addition. A few things before merging:
Otherwise the pipeline design is solid. |
koala73
left a comment
There was a problem hiding this comment.
Review — PR #2532 (climate news seed + ListClimateNews RPC)
Good structure overall: ais-relay subprocess delegation, proto/RPC plumbing, and health wiring all follow existing patterns. Two blocking issues before this can merge.
P1 — Blocks merge
#107 — CACHE_TTL = 30min = 1× interval (gold standard violation)
seed-climate-news.mjs sets CACHE_TTL = 1800, which equals exactly one relay interval. Any seed delay lets the key expire before the next run. Fix: CACHE_TTL = 5400 (90min = 3×).
#108 — Merge conflict with PR #2531
This PR was branched before #2531's fix commit bumped climate:anomalies:v1 → v2. GitHub shows CONFLICTING. After #2531 merges, rebase onto main and accept v2 in both api/bootstrap.js and server/_shared/cache-keys.ts.
P2 — Should fix
#109 — No retry timer on subprocess failure
seedClimateNews() catches errors and logs, but doesn't schedule a retry. Gold standard is retry in 20min on failure. With TTL = 30min (even after the fix to 90min), a failure mid-run still degrades freshness. Pattern: climateNewsRetryTimer = setTimeout(() => seedClimateNews(), CLIMATE_NEWS_RETRY_MS) in the catch block.
#110 — Hard-coded Redis key in list-climate-news.ts
const SEED_CACHE_KEY = 'climate:news-intelligence:v1' should import a named constant from cache-keys.ts (see get-co2-monitoring.ts from #2531 for the pattern).
#111 — climateNews in PENDING_CONSUMERS without a frontend consumer planned
climateNews is added to both BOOTSTRAP_CACHE_KEYS (50KB of headlines pushed on every cold start) and PENDING_CONSUMERS (test suppression). The PR body says bootstrap registration isn't included, but it is — these are contradictory. Either wire a frontend consumer in this PR, or move climateNews to SEED_ONLY_BOOTSTRAP_CACHE_KEYS and remove from bootstrap until the panel is built.
Deployment order note
This PR depends on #2531 being merged and Railway crons set up first (zone-normals monthly + CO2 daily). See todo #101 in the project for the full Railway setup steps.
… constant - CACHE_TTL: 1800 to 5400 (90min = 3x 30-min relay interval, gold standard) - ais-relay: add 20-min retry timer on subprocess failure; clear on success - cache-keys.ts: export CLIMATE_NEWS_KEY named constant - list-climate-news.ts: import CLIMATE_NEWS_KEY instead of hard-coding string
163411b to
50f6d82
Compare
…Dockerfile Yahoo Finance and CoinPaprika fail from Railway datacenter IPs (rate limiting). Added PROXY_URL fallback to fetchYahooChartDirect (used by 5 seeders) and relay chart proxy endpoint. Added shared _fetchCoinPaprikaTickers with proxy fallback + 5min cache (3 crypto seeders share one fetch). Added CoinPaprika fallback to CryptoSectors (previously had none). Isolated OREF_PROXY_AUTH exclusively for OREF alerts. OpenSky, seed-military-flights, and _proxy-utils now fall back to PROXY_URL instead of the expensive IL-exit proxy. Added seed-climate-news.mjs + _seed-utils.mjs COPY to Dockerfile.relay (missing since PR #2532). Added pizzint bootstrap hydration to cache-keys.ts, bootstrap.js, and src/services/pizzint.ts.
* fix(relay): proxy fallback for Yahoo/Crypto, isolate OREF proxy, fix Dockerfile Yahoo Finance and CoinPaprika fail from Railway datacenter IPs (rate limiting). Added PROXY_URL fallback to fetchYahooChartDirect (used by 5 seeders) and relay chart proxy endpoint. Added shared _fetchCoinPaprikaTickers with proxy fallback + 5min cache (3 crypto seeders share one fetch). Added CoinPaprika fallback to CryptoSectors (previously had none). Isolated OREF_PROXY_AUTH exclusively for OREF alerts. OpenSky, seed-military-flights, and _proxy-utils now fall back to PROXY_URL instead of the expensive IL-exit proxy. Added seed-climate-news.mjs + _seed-utils.mjs COPY to Dockerfile.relay (missing since PR #2532). Added pizzint bootstrap hydration to cache-keys.ts, bootstrap.js, and src/services/pizzint.ts. * fix(relay): address review — remove unused reverseMap, guard double proxy - Remove dead reverseMap identity map in CryptoSectors Paprika fallback - Add _proxied flag to handleYahooChartRequest._tryProxy to prevent double proxy call on timeout→destroy→error sequence
Summary
Adds
seed-climate-news.mjsto aggregate 9 authoritative climate/environment RSS feeds intoclimate:news-intelligence:v1, wires a 30-minute relay seed loop, and exposes the data through a newListClimateNewsclimate proto RPC and server handler.Not included in this PR:
get_climate_dataexpansion to includeclimate:news-intelligence:v1Fixes #2469
Fixes #2560
Type of change
Affected areas
/api/*)