|
| 1 | +# Best-of-Breed Publisher Template Report |
| 2 | + |
| 3 | +Date: 2026-05-26 |
| 4 | + |
| 5 | +## Purpose |
| 6 | + |
| 7 | +This report records the initial research recommendation for selecting existing OSHConnect-Python publisher implementations as exemplars for four upcoming data-source publisher additions. The goal is to identify the strongest current patterns for richness, completeness, and accuracy before designing new publisher work. |
| 8 | + |
| 9 | +## Executive Recommendation |
| 10 | + |
| 11 | +Use a small template family rather than a single universal publisher template. |
| 12 | + |
| 13 | +The strongest primary exemplar is `publishers/usgs_eq` for any new event-feed or feed-adapter source. It has the best combination of data-model rigor, authoritative metadata, clear CSAPI modeling, explicit runtime semantics, and enrichment planning. |
| 14 | + |
| 15 | +For station networks, imagery feeds, or strict server compatibility work, use complementary exemplars: |
| 16 | + |
| 17 | +| New source shape | Best existing example | Primary reason | |
| 18 | +| --- | --- | --- | |
| 19 | +| Event feed, alert feed, or one API stream | `publishers/usgs_eq` | Best Pattern C feed-adapter model, rich metadata, official source documentation, explicit event revision dedupe. | |
| 20 | +| Fixed station network or physical sensor fleet | `publishers/usgs_water` | Richest station-level model, sidecar station metadata, multiple datastreams per system, strong official provenance. | |
| 21 | +| Imagery, media, or camera feed | `publishers/usgs_nims` | Best media-feed pattern, image URL modeling, duplicate suppression, and companion datastream behavior. | |
| 22 | +| Strict CSAPI/SensorML compatibility | `publishers/aviation_wx` plus `publishers/bootstrap_helpers.py` | Best reference for strict parser constraints, GeoJSON stub separation, and SensorML PUT behavior. | |
| 23 | + |
| 24 | +## Evaluation Criteria |
| 25 | + |
| 26 | +The recommendation is based on these qualities: |
| 27 | + |
| 28 | +- Metadata richness: official documentation links, SensorML bodies, identifiers, classifiers, contacts, documents, deployments, and result schemas. |
| 29 | +- Completeness: bootstrap script, runtime publisher, config or sidecar data, clean/bootstrap modes, dry-run behavior, operational notes, and enrichment plan. |
| 30 | +- Accuracy: source semantics grounded in authoritative upstream documentation, explicit field meanings, correct observation model, and avoidance of misleading CSAPI resource modeling. |
| 31 | +- Runtime robustness: duplicate suppression, rate-limit handling, reconnect behavior, server compatibility workarounds, and stable datastream discovery. |
| 32 | +- Extensibility: clear boundaries between baseline runtime, optional enrichment, and future UI/client use. |
| 33 | + |
| 34 | +## Findings |
| 35 | + |
| 36 | +### 1. USGS Earthquake Is the Best Event-Feed Exemplar |
| 37 | + |
| 38 | +`publishers/usgs_eq` should be the default starting point for event feeds, alert feeds, and API streams where the source is not a fleet of physical stations. |
| 39 | + |
| 40 | +Key strengths: |
| 41 | + |
| 42 | +- Correct Pattern C model: one procedure, one feed-adapter system, one datastream, and deployment grouping. |
| 43 | +- Avoids the common modeling mistake of creating one CSAPI system per event. |
| 44 | +- Uses authoritative USGS earthquake source documentation and records optional enrichment surfaces. |
| 45 | +- Publishes one CSAPI observation per earthquake event. |
| 46 | +- Deduplicates by `(eventId, updatedTime)`, so revised events are republished while unchanged feed entries are skipped. |
| 47 | +- Includes a total bootstrap/data-model enrichment pack that documents source verification, omitted upstream fields, and future detail/FDSN enrichment boundaries. |
| 48 | + |
| 49 | +Use this pattern when a new data source is conceptually a live feed rather than a set of deployed sensors. |
| 50 | + |
| 51 | +### 2. USGS Water Is the Best Station-Network Exemplar |
| 52 | + |
| 53 | +`publishers/usgs_water` should be the primary model for fixed stations, physical assets, and parameterized sensor networks. |
| 54 | + |
| 55 | +Key strengths: |
| 56 | + |
| 57 | +- One CSAPI system per monitoring location. |
| 58 | +- Multiple datastreams per station, with explicit parameter semantics. |
| 59 | +- Rich station sidecar data in `stations.json`. |
| 60 | +- Strong official USGS Water Data OGC API references. |
| 61 | +- SensorML captures station identifiers, classifiers, contacts, documents, characteristics, capabilities, and position. |
| 62 | +- Runtime handles API keys, request delay, rate-limit backoff, station filtering, duplicate suppression, and datastream discovery quirks. |
| 63 | + |
| 64 | +Use this pattern when the new source has named locations, sites, platforms, gauges, monitors, or other physical assets that should appear as systems. |
| 65 | + |
| 66 | +### 3. USGS NIMS Is the Best Media/Imagery Exemplar |
| 67 | + |
| 68 | +`publishers/usgs_nims` should be the reference for image-producing sources, camera feeds, and companion media datastreams. |
| 69 | + |
| 70 | +Key strengths: |
| 71 | + |
| 72 | +- Models imagery as a companion datastream on existing USGS Water systems. |
| 73 | +- Captures image URL, thumbnail/full image concepts, media type, camera identity, and latest-file semantics. |
| 74 | +- Handles upstream rate limits, cooldown/backoff, and duplicate suppression by filename. |
| 75 | +- Uses a curated `cameras.json` sidecar. |
| 76 | + |
| 77 | +Use this pattern when the new source produces media artifacts rather than conventional scalar observations. |
| 78 | + |
| 79 | +### 4. Aviation WX Is the Best Strict-Compatibility Reference |
| 80 | + |
| 81 | +`publishers/aviation_wx` is not necessarily the richest domain model overall, but it is the most useful example for strict CSAPI and SensorML compatibility constraints. |
| 82 | + |
| 83 | +Key strengths: |
| 84 | + |
| 85 | +- Documents strict parser behavior directly in the bootstrap. |
| 86 | +- Separates small GeoJSON create stubs from rich SensorML update bodies. |
| 87 | +- Records csapi-go-v2 compatibility quirks. |
| 88 | +- Uses server-specific result normalization where required. |
| 89 | +- Demonstrates multi-station runtime behavior with duplicate suppression. |
| 90 | + |
| 91 | +Use this pattern as a guardrail for all new publishers, especially when targeting both OSH SensorHub and stricter CSAPI servers. |
| 92 | + |
| 93 | +## Baseline Standard for New Publishers |
| 94 | + |
| 95 | +Every new publisher should follow these conventions unless the data source clearly requires a different model: |
| 96 | + |
| 97 | +- Use `publishers/bootstrap_helpers.py` for idempotent create/update/delete behavior. |
| 98 | +- Create resources with minimal GeoJSON stubs, then PUT rich SensorML using `application/sml+json`. |
| 99 | +- Use stable UIDs; never depend on server-assigned IDs in source code or config. |
| 100 | +- Include authoritative source documentation links in procedure, system, datastream, and deployment metadata where appropriate. |
| 101 | +- Define an explicit result schema with units, field definitions, and omitted-field notes if the upstream source is richer than the baseline result body. |
| 102 | +- Add config or sidecar files for curated station/camera/source lists. |
| 103 | +- Implement duplicate suppression using source-native identifiers and update timestamps where possible. |
| 104 | +- Handle HTTP 429 or source throttling with cooldown/backoff behavior. |
| 105 | +- Support `--dry-run`, `--once`, and interval control for safe validation. |
| 106 | +- Keep baseline polling separate from optional enrichment or expensive detail fetches. |
| 107 | + |
| 108 | +## Pattern Selection Rules |
| 109 | + |
| 110 | +When the four candidate sources are provided, classify each first: |
| 111 | + |
| 112 | +1. If it is a stream of events, alerts, tracks, reports, incidents, detections, or records from one API feed, start from `usgs_eq`. |
| 113 | +2. If it is a list of physical stations or monitoring locations, start from `usgs_water`. |
| 114 | +3. If it is a camera/image/media source, start from `usgs_nims`. |
| 115 | +4. If it is a moving-object feed with many transient assets, use `usgs_eq` and compare against `opensky` for runtime-specific field handling. |
| 116 | +5. If it must run against csapi-go-v2 or another strict server, review `aviation_wx` and `bootstrap_helpers.py` before finalizing the bootstrap payloads. |
| 117 | + |
| 118 | +## Non-Preferred Starting Points |
| 119 | + |
| 120 | +The following publishers remain useful references but should not be the primary template for new work: |
| 121 | + |
| 122 | +- `publishers/iss`: useful for a simple moving-object demo, but too specialized and thin as a general template. |
| 123 | +- Earlier NWS/NDBC/CO-OPS patterns: operationally valuable, but the repository research notes show they were candidates for further metadata enrichment. |
| 124 | +- `publishers/opensky`: useful for moving-object feed semantics and bounding-box configuration, but less complete as the general best-of-breed exemplar than USGS EQ. |
| 125 | + |
| 126 | +## Recommended Next Step |
| 127 | + |
| 128 | +When the four new data sources are available, produce a per-source classification table with: |
| 129 | + |
| 130 | +- source type and recommended exemplar, |
| 131 | +- expected CSAPI model, |
| 132 | +- proposed procedures/systems/datastreams/deployments, |
| 133 | +- required sidecar/config files, |
| 134 | +- dedupe key and revision strategy, |
| 135 | +- rate-limit/backoff strategy, |
| 136 | +- authoritative source documentation links, |
| 137 | +- optional enrichment surfaces. |
| 138 | + |
| 139 | +This should happen before implementation so the four publishers share a coherent design language rather than diverging into one-off scripts. |
0 commit comments