paperscout-python

WG21 C++ paper tracker with ISO draft probing and Slack notifications.

A Python project that probes the isocpp.org paper system for unpublished D-paper drafts, monitors for new paper assignments at the frontier, and notifies a Slack channel when watched authors publish.

Docs: Developer onboarding (clone → DB → tests → run) · Maintainer handoff · Contributing · Changelog · Security · Code of conduct

If you only need to run tests or a local instance, start with onboarding before the Slack app sections below.

Features

Per-user watchlists -- each user manages their own list of authors and paper numbers via DM; the scout sends a personal DM when a match is found
ISO draft probing -- Three-tier async HEAD requests to isocpp.org/files/papers/ detect unpublished D-papers
Frontier monitoring -- Automatically probes newly assigned paper numbers beyond the current highest
30-minute polling -- Fetches wg21.link/index.json every 30 minutes (configurable)
Rate-limited posting -- All Slack messages are queued through a background thread that enforces 1 msg/sec per channel and respects HTTP 429 Retry-After
PostgreSQL storage -- All state (probe history, index cache, watchlists) lives in Postgres; logs stay as rotating files
Status command -- status shows papers loaded, last poll time, and probe stats

Slack App Setup

1. Create the Slack App

Go to https://api.slack.com/apps and click Create New App
Choose From scratch
Name it paperscout (or whatever you prefer), select your workspace, click Create App

2. Configure Bot Permissions

Go to OAuth & Permissions in the left sidebar. Under Bot Token Scopes, add:

Scope	Why
`chat:write`	Post messages to channels and send DMs
`chat:write.public`	Post to public channels the scout hasn't been invited to
`im:history`	Read messages in 1:1 DMs with the scout
`im:write`	Open 1:1 DM conversations to deliver watchlist alerts
`mpim:history`	Read messages in group DMs the scout has been invited to
`mpim:write`	Reply in group DMs
`channels:history`	Read messages in public channels
`groups:history`	Read messages in private channels the scout is invited to
`groups:write`	Reply in private channels
`app_mentions:read`	Respond when someone `@paperscout`s

Note on group DMs (mpim): When the scout is invited to a group DM, watchlist commands are rejected with a friendly error telling the user to use a 1:1 DM instead. status and help work normally. The mpim:history and mpim:write scopes are needed to receive and reply to those messages.

3. Enable Events

Go to Event Subscriptions in the left sidebar:

Toggle Enable Events to On
Under Subscribe to bot events, add:
- message.channels (messages in public channels)
- message.groups (messages in private channels)
- message.im (1:1 direct messages)
- message.mpim (group direct messages)
- app_mention (when someone @mentions the scout)
You will set the Request URL after the scout is running (step 7)

4. Enable DMs

Go to App Home in the left sidebar:

Under Show Tabs, make sure Messages Tab is enabled
Check Allow users to send Slash commands and messages from the messages tab

5. Install to Workspace

Go to OAuth & Permissions
Click Install to Workspace at the top
Authorize the app
Copy the Bot User OAuth Token (starts with xoxb-)
Go to Basic Information and copy the Signing Secret

6. Configure and Start the Scout

cd paperscout-python
cp .env.example .env

Edit .env with your credentials and preferences:

SLACK_SIGNING_SECRET=<your signing secret from step 5>
SLACK_BOT_TOKEN=xoxb-<your bot token from step 5>
PORT=3000

# PostgreSQL connection string (required)
DATABASE_URL=postgresql://user:password@localhost:5432/paperscout

# Slack channel ID for general notifications (new frontier drafts, D→P transitions).
# To find it: open the channel in Slack, click the channel name
# at the top, scroll to the bottom of the popup -- the ID looks like C0123456789
NOTIFICATION_CHANNEL=C0123456789

# Explicit number ranges to always probe as hot (optional)
FRONTIER_EXPLICIT_RANGES=[{"min": 4033, "max": 4042}, {"min": 4049, "max": 4080}]

Install and run:

```bash
pip install -e .
python -m paperscout

7. Set the Request URL

Once the scout is running and reachable at a public URL:

Go back to Event Subscriptions in the Slack app config
Set Request URL to https://your-server.com/slack/events
Slack will send a challenge request -- the scout responds automatically
Click Save Changes

For local testing with ngrok:

ngrok http 3000
# Use the ngrok URL: https://abc123.ngrok.io/slack/events

8. Invite the Scout

Public channel notifications: The scout posts to NOTIFICATION_CHANNEL automatically (via chat:write.public). No invite needed.
Private channels: Type /invite @paperscout in the private channel for @mention support.
Watchlist DMs (required): Each user must open a 1:1 DM with paperscout to manage their personal watchlist. The scout will also DM users proactively when their watchlist matches a new paper.
Group DMs: The scout can be invited, but watchlist commands will be rejected with a message directing the user to use a 1:1 DM.

9. Verify It Works

DM the scout: status — should reply with papers loaded, last poll time, and probe stats
DM the scout: watchlist add Niebler — should confirm the author was added (as an author entry)
DM the scout: watchlist add 2300 — should confirm the paper was added (as a paper number entry)
DM the scout: watchlist list — should show both entries with their types
DM the scout: watchlist remove Niebler — should confirm removal
Type @paperscout status in a channel — should reply in-thread
Check your notification channel after 30 minutes — frontier hits and D→P transitions appear there; personal watchlist matches arrive as DMs

Deployment

The scout runs as a Docker container deployed via CD. A push to main deploys to production; a push to develop deploys to staging. Both paths run the same workflow and the same job — only the GitHub Environment changes.

Push to main    → CI tests → SSH into prod    → git pull --ff-only → docker compose up --build → Health check (retry)
Push to develop → CI tests → SSH into staging → git pull --ff-only → docker compose up --build → Health check (retry)

Configure GitHub Environments

Create two environments under Settings → Environments: production and staging. Both use the same secret names (different values per environment) and a small set of per-environment Variables:

Type	Name	Production	Staging
Secret	`SERVER_HOST`	prod host / IP	staging host / IP
Secret	`SERVER_USER`	deploy user	deploy user
Secret	`SERVER_SSH_KEY`	private key	private key
Secret	`SERVER_PORT`	optional (default `22`)	optional (default `22`)
Variable	`DEPLOY_PATH`	`/opt/paperscout`	`/opt/paperscout-staging`
Variable	`DEPLOY_BRANCH`	`main`	`develop`
Variable	`HEALTH_PORT`	`9101`	`9102` (or whatever staging maps)

The workflow picks the environment from the branch (refs/heads/main → production, refs/heads/develop → staging), so values like DEPLOY_PATH and HEALTH_PORT are not hard-coded in the YAML.

Tip: enable Required reviewers on the production environment for a manual approval gate before prod deploys.

Quick start on a fresh server

# On the production server (after Docker, PostgreSQL, and nginx are set up)
git clone https://github.com/cppalliance/paperscout-python.git /opt/paperscout
cd /opt/paperscout
cp .env.example .env        # edit with real credentials
docker compose up -d --build
curl -sf http://localhost:9101/health

On the staging server (separate host or separate path on the same host; must match the staging environment's DEPLOY_PATH and expose /health on HEALTH_PORT):

git clone -b develop https://github.com/cppalliance/paperscout-python.git /opt/paperscout-staging
cd /opt/paperscout-staging
cp .env.example .env   # use staging credentials / DB / Slack app as appropriate
docker compose up -d --build
curl -sf http://localhost:9102/health

See deploy/SERVER_SETUP.md for the full Ubuntu 22.04 provisioning guide, and .github/workflows/cd.yml for the CD pipeline.

Database backups run daily via .github/workflows/db-backup.yml, uploading pg_dump snapshots to Google Cloud Storage.

Scout Commands

Watchlist commands only work in a 1:1 DM with the scout (each user has their own independent watchlist). status and help work everywhere — DMs, group DMs, and channels via @paperscout.

Command	Where	Description
`watchlist`	DM only	Show your personal watchlist
`watchlist list`	DM only	Show your personal watchlist
`watchlist add <name-or-number>`	DM only	Add an author name substring or paper number — type is auto-detected
`watchlist remove <name-or-number>`	DM only	Remove an entry from your watchlist
`status`	Anywhere	Show papers loaded, last poll time, probe stats
`help`	Anywhere	Show command summary

Watchlist matching

Author entries (watchlist add Niebler) — match when the author field of a new index paper contains the substring (case-insensitive), or when the first ~1,000 words of a newly discovered draft mention the name.
Paper number entries (watchlist add 2300) — match when a draft for that number is newly discovered, or when the paper appears in the wg21.link index.

When a match is found, all hits for that user are batched and sent as a single DM.

Environment Variables

All parameters are configurable via environment variables or a .env file. See .env.example for the complete list.

Required

Variable	Description
`SLACK_SIGNING_SECRET`	Slack app signing secret
`SLACK_BOT_TOKEN`	Slack bot token (`xoxb-...`)
`DATABASE_URL`	PostgreSQL connection string (`postgresql://user:pass@host:5432/db`)

Scheduling

Variable	Default	Description
`POLL_INTERVAL_MINUTES`	`30`	Main polling cycle interval
`POLL_OVERRUN_COOLDOWN_SECONDS`	`300`	Minimum sleep after a poll cycle that overran the interval (avoids tight loops when work or errors stretch a cycle)
`ENABLE_BULK_WG21`	`true`	Fetch wg21.link/index.json each cycle
`ENABLE_BULK_OPENSTD`	`true`	Reserved for open-std.org scraping (not yet scheduled)
`ENABLE_ISO_PROBE`	`true`	Run isocpp.org HEAD probing each cycle

Probe Prefixes / Extensions

Variable	Default	Description
`PROBE_PREFIXES`	`["D","P"]`	Prefixes for gap/unknown numbers
`PROBE_EXTENSIONS`	`[".pdf",".html"]`	File extensions to check

Frontier

Variable	Default	Description
`FRONTIER_WINDOW_ABOVE`	`60`	Numbers above effective frontier to probe every cycle
`FRONTIER_WINDOW_BELOW`	`30`	Numbers below effective frontier to probe every cycle
`FRONTIER_EXPLICIT_RANGES`	`[]`	Additional explicit ranges, e.g. `[{"min":4033,"max":4060}]`
`FRONTIER_GAP_THRESHOLD`	`50`	Max gap between consecutive P-numbers before treating a number as an outlier (prevents pre-assigned far-future numbers like P5000 from shifting the frontier)

Hot Probing (every 30-min cycle)

Variable	Default	Description
`HOT_LOOKBACK_MONTHS`	`6`	Papers with a date within this window are probed every cycle
`HOT_REVISION_DEPTH`	`2`	Revisions ahead of known latest to probe for hot papers

Cold Probing (full coverage, distributed ≈ once per day)

Variable	Default	Description
`COLD_REVISION_DEPTH`	`1`	Revisions ahead of known latest for cold papers
`COLD_CYCLE_DIVISOR`	`48`	Cold pool is split into N slices; each cycle probes 1 slice (48 × 30 min = 24 h)
`GAP_MAX_REV`	`1`	For gap/unknown numbers, probe R0 through this revision

Timestamp-Based Alerting

Variable	Default	Description
`ALERT_MODIFIED_HOURS`	`24`	Only notify for hits where the server's `Last-Modified` header is within this many hours of now. Falls back to "alert" when the header is absent.

HTTP Client

Variable	Default	Description
`HTTP_CONCURRENCY`	`20`	Maximum simultaneous probe requests
`HTTP_TIMEOUT_SECONDS`	`10`	Request timeout for HEAD probes
`HTTP_USE_HTTP2`	`true`	Enable HTTP/2 for all requests

Notifications

Variable	Default	Description
`NOTIFICATION_CHANNEL`	`""`	Slack channel ID for general alerts (frontier hits, D→P transitions); empty = disabled
`NOTIFY_ON_FRONTIER_HIT`	`true`	Notify on recently modified draft near the frontier
`NOTIFY_ON_ANY_DRAFT`	`true`	Notify on any other recently modified draft
`NOTIFY_ON_DP_TRANSITION`	`true`	Notify when a tracked D-paper appears in the index as its published P counterpart

Personal watchlist matches (author or paper number) are always sent as a DM to the matching user — they are not posted to NOTIFICATION_CHANNEL.

Storage

Variable	Default	Description
`DATABASE_URL`	`""`	PostgreSQL DSN — required
`DATA_DIR`	`./data`	Directory for log files
`CACHE_TTL_HOURS`	`1`	How long the wg21.link index cache is considered fresh

Architecture

paperscout-python/
  src/paperscout/
    __main__.py     Entry point; wires together all components
    config.py       All settings via pydantic-settings
    models.py       Paper dataclass, PaperPrefix/PaperType/FileExt enums
    sources.py      WG21Index (PaperCache-backed), ISOProber, open-std.org scraper
    monitor.py      Scheduler, diff engine, PerUserMatches, PollResult
    scout.py        Slack Bolt app, MessageQueue, notify_channel, notify_users
    storage.py      PaperCache, ProbeState, UserWatchlist (all PostgreSQL-backed)
    db.py           ThreadedConnectionPool init and schema DDL
    health.py       HTTP health-check endpoint (GET /health on port 8080)
  data/             Log files (gitignored); all other state lives in PostgreSQL
  deploy/
    paperscout.conf Reference nginx site config (443 → 3000, /health → 8080)
    SERVER_SETUP.md Full Ubuntu 22.04 server provisioning guide
  tests/
  Dockerfile        Multi-stage build (python:3.12-slim)
  docker-compose.yml  Single-service compose (builds locally, connects to host PostgreSQL)
  .github/workflows/
    ci.yml          Test matrix on push/PR to main
    cd.yml          SSH deploy (git pull + build) on push to main
    db-backup.yml   Daily pg_dump to Google Cloud Storage

PostgreSQL Schema

Table	Purpose
`paper_cache`	TTL-cached wg21.link index JSON blob
`discovered_urls`	All URLs seen by the ISO prober with timestamps
`probe_miss_counts`	Exponential backoff counters per paper number
`poll_state`	Last-poll timestamp (singleton row)
`user_watchlist`	Per-user author/paper entries with type discrimination

Two-Frequency Probing Strategy

Every P-number from 1 to the effective frontier is probed. Numbers are divided into a hot set (probed every 30 min) and a cold pool (probed once per day by distributing 1/48 of the pool each cycle).

Frequency	What	Condition	Per-cycle URLs
Hot (every cycle)	Watchlist papers	union of all users' watched paper numbers	D-prefix, latest+1..+2, pdf+html
Hot (every cycle)	Frontier numbers	±window around effective frontier	D+P, R0..R1 for unknowns; D, latest+1..+2 for known
Hot (every cycle)	Recently active papers	date within `HOT_LOOKBACK_MONTHS`	D-prefix, latest+1..+2, pdf+html
Cold (1/48 per cycle ≈ daily)	All other P-numbers	everything else	D-prefix, latest+1, pdf+html
Cold (1/48 per cycle)	Gap numbers (no index entry)	1..frontier minus known	D+P, R0..R1, pdf+html

Typical per-cycle request count: ~1,600–2,000 HEAD requests (~8–10 s at 20 concurrent, 100 ms latency). A full sweep of all ~4,000 P-numbers completes within ~24 h of continuous 30-min polling.

Alerting by Last-Modified

When a HEAD probe returns 200, the scout reads the Last-Modified response header. It only sends a Slack notification if the file was modified within ALERT_MODIFIED_HOURS (default 24 h). This means:

A D-paper uploaded today → alert sent
A D-paper uploaded 6 months ago that we hadn't tracked → silently added to discovered, no alert
No Last-Modified header (unusual) → treated as recent, alert sent

The Last-Modified timestamp is shown in every notification message.

Data Sources

Source	URL	What it covers
wg21.link	`https://wg21.link/index.json`	All published P/N papers with metadata
open-std.org	`https://www.open-std.org/jtc1/sc22/wg21/docs/papers/{year}/`	Yearly HTML tables (scraper defined, not yet scheduled)
isocpp.org	`https://isocpp.org/files/papers/{D\|P}{num}R{rev}.{pdf\|html}`	D-paper drafts (no index, requires probing)

Dependencies

slack-bolt — Slack app framework
httpx[http2] — Async HTTP with HTTP/2 support
pydantic-settings — Type-safe configuration
psycopg2-binary — PostgreSQL adapter (sync, thread-safe)

Development

Setup

git clone https://github.com/cppalliance/paperscout-python.git
cd paperscout-python
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Running tests locally

Use ./run (bash, works in Git Bash on Windows and on Linux/macOS). make is a thin wrapper around the same script and requires GNU Make.

./run test      # fast test run, no coverage
./run cov       # tests + coverage report + 90% gate
./run check     # alias for cov -- run this before every push
./run clean     # remove .coverage, coverage.xml, caches
./run help      # list all targets

Equivalent make targets (Linux / CI):

make test
make cov
make check
make clean

Override the Python interpreter if needed:

PYTHON=python3.12 ./run cov

./run check exits non-zero if any test fails or if coverage drops below 90%.

Continuous Integration

The .github/workflows/ci.yml workflow runs automatically on every push and pull request to main or develop:

Matrix: Python 3.10, 3.11, and 3.12 on ubuntu-latest
Steps: install → pytest --cov → coverage summary written to the job summary tab
Gate: build fails if coverage drops below 90% (--cov-fail-under=90)
Artefact: the coverage.xml report from the Python 3.12 run is uploaded and kept for 7 days

Coverage details are visible in the Summary tab of each workflow run (rendered as a Markdown table by coverage report --format=markdown).

Continuous Deployment

The .github/workflows/cd.yml workflow runs on push to main or develop (and supports workflow_dispatch from either branch):

Test — single Python 3.12 pytest run as a gate (re-uses the same coverage threshold as CI).
Deploy — single environment-driven job:
- Selects the GitHub Environment from the branch (main → production, develop → staging).
- SSHes using the environment-scoped secrets (SERVER_HOST, SERVER_USER, SERVER_SSH_KEY, optional SERVER_PORT).
- Reads per-environment variables (DEPLOY_PATH, DEPLOY_BRANCH, HEALTH_PORT) so the same workflow targets prod or staging without code changes.
- Runs git fetch + git checkout + git pull --ff-only against DEPLOY_BRANCH to keep deploys deterministic, then docker compose up -d --build paperscout.
Health check — bounded retry loop (12 × 5s) against http://localhost:${HEALTH_PORT}/health; the job fails if the endpoint never returns 200.

A concurrency group keyed by branch prevents overlapping deploys to the same environment. Production and staging targets stay independent because the secret values and variable values differ per environment.

Database Backups

The .github/workflows/db-backup.yml workflow runs daily at 3 AM UTC (and supports manual dispatch):

SSHes into the server and runs pg_dump on the host's PostgreSQL
Uploads the dump to Google Cloud Storage (gs://paperscout-backups/)
Old backups are auto-pruned by a GCS lifecycle rule (30 days)

CD secrets and variables are configured per GitHub Environment (production and staging); see the table in Deployment. Other secrets (e.g. database backups) are documented in deploy/SERVER_SETUP.md.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
.vscode		.vscode
data		data
deploy		deploy
docs		docs
src/paperscout		src/paperscout
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
run		run

Folders and files

Latest commit

History

Repository files navigation

paperscout-python

Features

Slack App Setup

1. Create the Slack App

2. Configure Bot Permissions

3. Enable Events

4. Enable DMs

5. Install to Workspace

6. Configure and Start the Scout

7. Set the Request URL

8. Invite the Scout

9. Verify It Works

Deployment

Configure GitHub Environments

Quick start on a fresh server

Scout Commands

Watchlist matching

Environment Variables

Required

Scheduling

Probe Prefixes / Extensions

Frontier

Hot Probing (every 30-min cycle)

Cold Probing (full coverage, distributed ≈ once per day)

Timestamp-Based Alerting

HTTP Client

Notifications

Storage

Architecture

PostgreSQL Schema

Two-Frequency Probing Strategy

Alerting by Last-Modified

Data Sources

Dependencies

Development

Setup

Running tests locally

Continuous Integration

Continuous Deployment

Database Backups

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages