WG21 C++ paper tracker with ISO draft probing and Slack notifications.
A Python project that probes the isocpp.org paper system for unpublished D-paper drafts, monitors for new paper assignments at the frontier, and notifies a Slack channel when watched authors publish.
Docs: Developer onboarding (clone → DB → tests → run) · Maintainer handoff · Contributing · Changelog · Security · Code of conduct
If you only need to run tests or a local instance, start with onboarding before the Slack app sections below.
- Per-user watchlists -- each user manages their own list of authors and paper numbers via DM; the scout sends a personal DM when a match is found
- ISO draft probing -- Three-tier async HEAD requests to
isocpp.org/files/papers/detect unpublished D-papers - Frontier monitoring -- Automatically probes newly assigned paper numbers beyond the current highest
- 30-minute polling -- Fetches wg21.link/index.json every 30 minutes (configurable)
- Rate-limited posting -- All Slack messages are queued through a background thread that enforces 1 msg/sec per channel and respects HTTP 429
Retry-After - PostgreSQL storage -- All state (probe history, index cache, watchlists) lives in Postgres; logs stay as rotating files
- Status command --
statusshows papers loaded, last poll time, and probe stats
- Go to https://api.slack.com/apps and click Create New App
- Choose From scratch
- Name it
paperscout(or whatever you prefer), select your workspace, click Create App
Go to OAuth & Permissions in the left sidebar. Under Bot Token Scopes, add:
| Scope | Why |
|---|---|
chat:write |
Post messages to channels and send DMs |
chat:write.public |
Post to public channels the scout hasn't been invited to |
im:history |
Read messages in 1:1 DMs with the scout |
im:write |
Open 1:1 DM conversations to deliver watchlist alerts |
mpim:history |
Read messages in group DMs the scout has been invited to |
mpim:write |
Reply in group DMs |
channels:history |
Read messages in public channels |
groups:history |
Read messages in private channels the scout is invited to |
groups:write |
Reply in private channels |
app_mentions:read |
Respond when someone @paperscouts |
Note on group DMs (
mpim): When the scout is invited to a group DM,watchlistcommands are rejected with a friendly error telling the user to use a 1:1 DM instead.statusandhelpwork normally. Thempim:historyandmpim:writescopes are needed to receive and reply to those messages.
Go to Event Subscriptions in the left sidebar:
- Toggle Enable Events to On
- Under Subscribe to bot events, add:
message.channels(messages in public channels)message.groups(messages in private channels)message.im(1:1 direct messages)message.mpim(group direct messages)app_mention(when someone @mentions the scout)
- You will set the Request URL after the scout is running (step 7)
Go to App Home in the left sidebar:
- Under Show Tabs, make sure Messages Tab is enabled
- Check Allow users to send Slash commands and messages from the messages tab
- Go to OAuth & Permissions
- Click Install to Workspace at the top
- Authorize the app
- Copy the Bot User OAuth Token (starts with
xoxb-) - Go to Basic Information and copy the Signing Secret
cd paperscout-python
cp .env.example .envEdit .env with your credentials and preferences:
SLACK_SIGNING_SECRET=<your signing secret from step 5>
SLACK_BOT_TOKEN=xoxb-<your bot token from step 5>
PORT=3000
# PostgreSQL connection string (required)
DATABASE_URL=postgresql://user:password@localhost:5432/paperscout
# Slack channel ID for general notifications (new frontier drafts, D→P transitions).
# To find it: open the channel in Slack, click the channel name
# at the top, scroll to the bottom of the popup -- the ID looks like C0123456789
NOTIFICATION_CHANNEL=C0123456789
# Explicit number ranges to always probe as hot (optional)
FRONTIER_EXPLICIT_RANGES=[{"min": 4033, "max": 4042}, {"min": 4049, "max": 4080}]
Install and run:
```bash
pip install -e .
python -m paperscoutOnce the scout is running and reachable at a public URL:
- Go back to Event Subscriptions in the Slack app config
- Set Request URL to
https://your-server.com/slack/events - Slack will send a challenge request -- the scout responds automatically
- Click Save Changes
For local testing with ngrok:
ngrok http 3000
# Use the ngrok URL: https://abc123.ngrok.io/slack/events- Public channel notifications: The scout posts to
NOTIFICATION_CHANNELautomatically (viachat:write.public). No invite needed. - Private channels: Type
/invite @paperscoutin the private channel for@mentionsupport. - Watchlist DMs (required): Each user must open a 1:1 DM with
paperscoutto manage their personal watchlist. The scout will also DM users proactively when their watchlist matches a new paper. - Group DMs: The scout can be invited, but
watchlistcommands will be rejected with a message directing the user to use a 1:1 DM.
- DM the scout:
status— should reply with papers loaded, last poll time, and probe stats - DM the scout:
watchlist add Niebler— should confirm the author was added (as an author entry) - DM the scout:
watchlist add 2300— should confirm the paper was added (as a paper number entry) - DM the scout:
watchlist list— should show both entries with their types - DM the scout:
watchlist remove Niebler— should confirm removal - Type
@paperscout statusin a channel — should reply in-thread - Check your notification channel after 30 minutes — frontier hits and D→P transitions appear there; personal watchlist matches arrive as DMs
The scout runs as a Docker container deployed via CD. A push to main deploys to production; a push to develop deploys to staging. Both paths run the same workflow and the same job — only the GitHub Environment changes.
Push to main → CI tests → SSH into prod → git pull --ff-only → docker compose up --build → Health check (retry)
Push to develop → CI tests → SSH into staging → git pull --ff-only → docker compose up --build → Health check (retry)
Create two environments under Settings → Environments: production and staging. Both use the same secret names (different values per environment) and a small set of per-environment Variables:
| Type | Name | Production | Staging |
|---|---|---|---|
| Secret | SERVER_HOST |
prod host / IP | staging host / IP |
| Secret | SERVER_USER |
deploy user | deploy user |
| Secret | SERVER_SSH_KEY |
private key | private key |
| Secret | SERVER_PORT |
optional (default 22) |
optional (default 22) |
| Variable | DEPLOY_PATH |
/opt/paperscout |
/opt/paperscout-staging |
| Variable | DEPLOY_BRANCH |
main |
develop |
| Variable | HEALTH_PORT |
9101 |
9102 (or whatever staging maps) |
The workflow picks the environment from the branch (refs/heads/main → production, refs/heads/develop → staging), so values like DEPLOY_PATH and HEALTH_PORT are not hard-coded in the YAML.
Tip: enable Required reviewers on the
productionenvironment for a manual approval gate before prod deploys.
# On the production server (after Docker, PostgreSQL, and nginx are set up)
git clone https://github.com/cppalliance/paperscout-python.git /opt/paperscout
cd /opt/paperscout
cp .env.example .env # edit with real credentials
docker compose up -d --build
curl -sf http://localhost:9101/healthOn the staging server (separate host or separate path on the same host; must match the staging environment's DEPLOY_PATH and expose /health on HEALTH_PORT):
git clone -b develop https://github.com/cppalliance/paperscout-python.git /opt/paperscout-staging
cd /opt/paperscout-staging
cp .env.example .env # use staging credentials / DB / Slack app as appropriate
docker compose up -d --build
curl -sf http://localhost:9102/healthSee deploy/SERVER_SETUP.md for the full Ubuntu 22.04 provisioning guide, and .github/workflows/cd.yml for the CD pipeline.
Database backups run daily via .github/workflows/db-backup.yml, uploading pg_dump snapshots to Google Cloud Storage.
Watchlist commands only work in a 1:1 DM with the scout (each user has their own independent watchlist). status and help work everywhere — DMs, group DMs, and channels via @paperscout.
| Command | Where | Description |
|---|---|---|
watchlist |
DM only | Show your personal watchlist |
watchlist list |
DM only | Show your personal watchlist |
watchlist add <name-or-number> |
DM only | Add an author name substring or paper number — type is auto-detected |
watchlist remove <name-or-number> |
DM only | Remove an entry from your watchlist |
status |
Anywhere | Show papers loaded, last poll time, probe stats |
help |
Anywhere | Show command summary |
- Author entries (
watchlist add Niebler) — match when the author field of a new index paper contains the substring (case-insensitive), or when the first ~1,000 words of a newly discovered draft mention the name. - Paper number entries (
watchlist add 2300) — match when a draft for that number is newly discovered, or when the paper appears in the wg21.link index.
When a match is found, all hits for that user are batched and sent as a single DM.
All parameters are configurable via environment variables or a .env file. See .env.example for the complete list.
| Variable | Description |
|---|---|
SLACK_SIGNING_SECRET |
Slack app signing secret |
SLACK_BOT_TOKEN |
Slack bot token (xoxb-...) |
DATABASE_URL |
PostgreSQL connection string (postgresql://user:pass@host:5432/db) |
| Variable | Default | Description |
|---|---|---|
POLL_INTERVAL_MINUTES |
30 |
Main polling cycle interval |
POLL_OVERRUN_COOLDOWN_SECONDS |
300 |
Minimum sleep after a poll cycle that overran the interval (avoids tight loops when work or errors stretch a cycle) |
ENABLE_BULK_WG21 |
true |
Fetch wg21.link/index.json each cycle |
ENABLE_BULK_OPENSTD |
true |
Reserved for open-std.org scraping (not yet scheduled) |
ENABLE_ISO_PROBE |
true |
Run isocpp.org HEAD probing each cycle |
| Variable | Default | Description |
|---|---|---|
PROBE_PREFIXES |
["D","P"] |
Prefixes for gap/unknown numbers |
PROBE_EXTENSIONS |
[".pdf",".html"] |
File extensions to check |
| Variable | Default | Description |
|---|---|---|
FRONTIER_WINDOW_ABOVE |
60 |
Numbers above effective frontier to probe every cycle |
FRONTIER_WINDOW_BELOW |
30 |
Numbers below effective frontier to probe every cycle |
FRONTIER_EXPLICIT_RANGES |
[] |
Additional explicit ranges, e.g. [{"min":4033,"max":4060}] |
FRONTIER_GAP_THRESHOLD |
50 |
Max gap between consecutive P-numbers before treating a number as an outlier (prevents pre-assigned far-future numbers like P5000 from shifting the frontier) |
| Variable | Default | Description |
|---|---|---|
HOT_LOOKBACK_MONTHS |
6 |
Papers with a date within this window are probed every cycle |
HOT_REVISION_DEPTH |
2 |
Revisions ahead of known latest to probe for hot papers |
| Variable | Default | Description |
|---|---|---|
COLD_REVISION_DEPTH |
1 |
Revisions ahead of known latest for cold papers |
COLD_CYCLE_DIVISOR |
48 |
Cold pool is split into N slices; each cycle probes 1 slice (48 × 30 min = 24 h) |
GAP_MAX_REV |
1 |
For gap/unknown numbers, probe R0 through this revision |
| Variable | Default | Description |
|---|---|---|
ALERT_MODIFIED_HOURS |
24 |
Only notify for hits where the server's Last-Modified header is within this many hours of now. Falls back to "alert" when the header is absent. |
| Variable | Default | Description |
|---|---|---|
HTTP_CONCURRENCY |
20 |
Maximum simultaneous probe requests |
HTTP_TIMEOUT_SECONDS |
10 |
Request timeout for HEAD probes |
HTTP_USE_HTTP2 |
true |
Enable HTTP/2 for all requests |
| Variable | Default | Description |
|---|---|---|
NOTIFICATION_CHANNEL |
"" |
Slack channel ID for general alerts (frontier hits, D→P transitions); empty = disabled |
NOTIFY_ON_FRONTIER_HIT |
true |
Notify on recently modified draft near the frontier |
NOTIFY_ON_ANY_DRAFT |
true |
Notify on any other recently modified draft |
NOTIFY_ON_DP_TRANSITION |
true |
Notify when a tracked D-paper appears in the index as its published P counterpart |
Personal watchlist matches (author or paper number) are always sent as a DM to the matching user — they are not posted to
NOTIFICATION_CHANNEL.
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
"" |
PostgreSQL DSN — required |
DATA_DIR |
./data |
Directory for log files |
CACHE_TTL_HOURS |
1 |
How long the wg21.link index cache is considered fresh |
paperscout-python/
src/paperscout/
__main__.py Entry point; wires together all components
config.py All settings via pydantic-settings
models.py Paper dataclass, PaperPrefix/PaperType/FileExt enums
sources.py WG21Index (PaperCache-backed), ISOProber, open-std.org scraper
monitor.py Scheduler, diff engine, PerUserMatches, PollResult
scout.py Slack Bolt app, MessageQueue, notify_channel, notify_users
storage.py PaperCache, ProbeState, UserWatchlist (all PostgreSQL-backed)
db.py ThreadedConnectionPool init and schema DDL
health.py HTTP health-check endpoint (GET /health on port 8080)
data/ Log files (gitignored); all other state lives in PostgreSQL
deploy/
paperscout.conf Reference nginx site config (443 → 3000, /health → 8080)
SERVER_SETUP.md Full Ubuntu 22.04 server provisioning guide
tests/
Dockerfile Multi-stage build (python:3.12-slim)
docker-compose.yml Single-service compose (builds locally, connects to host PostgreSQL)
.github/workflows/
ci.yml Test matrix on push/PR to main
cd.yml SSH deploy (git pull + build) on push to main
db-backup.yml Daily pg_dump to Google Cloud Storage
| Table | Purpose |
|---|---|
paper_cache |
TTL-cached wg21.link index JSON blob |
discovered_urls |
All URLs seen by the ISO prober with timestamps |
probe_miss_counts |
Exponential backoff counters per paper number |
poll_state |
Last-poll timestamp (singleton row) |
user_watchlist |
Per-user author/paper entries with type discrimination |
Every P-number from 1 to the effective frontier is probed. Numbers are divided into a hot set (probed every 30 min) and a cold pool (probed once per day by distributing 1/48 of the pool each cycle).
| Frequency | What | Condition | Per-cycle URLs |
|---|---|---|---|
| Hot (every cycle) | Watchlist papers | union of all users' watched paper numbers | D-prefix, latest+1..+2, pdf+html |
| Hot (every cycle) | Frontier numbers | ±window around effective frontier | D+P, R0..R1 for unknowns; D, latest+1..+2 for known |
| Hot (every cycle) | Recently active papers | date within HOT_LOOKBACK_MONTHS |
D-prefix, latest+1..+2, pdf+html |
| Cold (1/48 per cycle ≈ daily) | All other P-numbers | everything else | D-prefix, latest+1, pdf+html |
| Cold (1/48 per cycle) | Gap numbers (no index entry) | 1..frontier minus known | D+P, R0..R1, pdf+html |
Typical per-cycle request count: ~1,600–2,000 HEAD requests (~8–10 s at 20 concurrent, 100 ms latency). A full sweep of all ~4,000 P-numbers completes within ~24 h of continuous 30-min polling.
When a HEAD probe returns 200, the scout reads the Last-Modified response header. It only sends a Slack notification if the file was modified within ALERT_MODIFIED_HOURS (default 24 h). This means:
- A D-paper uploaded today → alert sent
- A D-paper uploaded 6 months ago that we hadn't tracked → silently added to discovered, no alert
- No
Last-Modifiedheader (unusual) → treated as recent, alert sent
The Last-Modified timestamp is shown in every notification message.
| Source | URL | What it covers |
|---|---|---|
| wg21.link | https://wg21.link/index.json |
All published P/N papers with metadata |
| open-std.org | https://www.open-std.org/jtc1/sc22/wg21/docs/papers/{year}/ |
Yearly HTML tables (scraper defined, not yet scheduled) |
| isocpp.org | https://isocpp.org/files/papers/{D|P}{num}R{rev}.{pdf|html} |
D-paper drafts (no index, requires probing) |
slack-bolt— Slack app frameworkhttpx[http2]— Async HTTP with HTTP/2 supportpydantic-settings— Type-safe configurationpsycopg2-binary— PostgreSQL adapter (sync, thread-safe)
git clone https://github.com/cppalliance/paperscout-python.git
cd paperscout-python
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"Use ./run (bash, works in Git Bash on Windows and on Linux/macOS). make is a thin wrapper around the same script and requires GNU Make.
./run test # fast test run, no coverage
./run cov # tests + coverage report + 90% gate
./run check # alias for cov -- run this before every push
./run clean # remove .coverage, coverage.xml, caches
./run help # list all targetsEquivalent make targets (Linux / CI):
make test
make cov
make check
make cleanOverride the Python interpreter if needed:
PYTHON=python3.12 ./run cov./run check exits non-zero if any test fails or if coverage drops below 90%.
The .github/workflows/ci.yml workflow runs automatically on every push and pull request to main or develop:
- Matrix: Python 3.10, 3.11, and 3.12 on
ubuntu-latest - Steps: install →
pytest --cov→ coverage summary written to the job summary tab - Gate: build fails if coverage drops below 90% (
--cov-fail-under=90) - Artefact: the
coverage.xmlreport from the Python 3.12 run is uploaded and kept for 7 days
Coverage details are visible in the Summary tab of each workflow run (rendered as a Markdown table by coverage report --format=markdown).
The .github/workflows/cd.yml workflow runs on push to main or develop (and supports workflow_dispatch from either branch):
- Test — single Python 3.12 pytest run as a gate (re-uses the same coverage threshold as CI).
- Deploy — single environment-driven job:
- Selects the GitHub Environment from the branch (
main→production,develop→staging). - SSHes using the environment-scoped secrets (
SERVER_HOST,SERVER_USER,SERVER_SSH_KEY, optionalSERVER_PORT). - Reads per-environment variables (
DEPLOY_PATH,DEPLOY_BRANCH,HEALTH_PORT) so the same workflow targets prod or staging without code changes. - Runs
git fetch+git checkout+git pull --ff-onlyagainstDEPLOY_BRANCHto keep deploys deterministic, thendocker compose up -d --build paperscout.
- Selects the GitHub Environment from the branch (
- Health check — bounded retry loop (12 × 5s) against
http://localhost:${HEALTH_PORT}/health; the job fails if the endpoint never returns 200.
A concurrency group keyed by branch prevents overlapping deploys to the same environment. Production and staging targets stay independent because the secret values and variable values differ per environment.
The .github/workflows/db-backup.yml workflow runs daily at 3 AM UTC (and supports manual dispatch):
- SSHes into the server and runs
pg_dumpon the host's PostgreSQL - Uploads the dump to Google Cloud Storage (
gs://paperscout-backups/) - Old backups are auto-pruned by a GCS lifecycle rule (30 days)
CD secrets and variables are configured per GitHub Environment (production and staging); see the table in Deployment. Other secrets (e.g. database backups) are documented in deploy/SERVER_SETUP.md.