diff --git a/demos/USGS_WaterData_ContinuousData_Examples.ipynb b/demos/USGS_WaterData_ContinuousData_Examples.ipynb
new file mode 100644
index 00000000..735e5439
--- /dev/null
+++ b/demos/USGS_WaterData_ContinuousData_Examples.ipynb
@@ -0,0 +1,257 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d664492b",
+ "metadata": {},
+ "source": [
+ "# Continuous Data\n",
+ "\n",
+ "Continuous data are collected by automated sensors, typically at a fixed\n",
+ "15-minute interval (you may also hear them called \"instantaneous values\" or\n",
+ "\"IV\"). They are described by parameter name and parameter code, and retrieved\n",
+ "with `get_continuous`.\n",
+ "\n",
+ "This notebook covers the two things that matter when a continuous pull gets\n",
+ "large: `dataretrieval` **chunks big requests for you** and can **resume** a pull\n",
+ "that was interrupted partway through, and the one case you still handle yourself\n",
+ "— the service's 3-year-per-request time limit."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e7e06e81",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "from dataretrieval import waterdata\n",
+ "\n",
+ "site = \"USGS-0208458892\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b0136bd1",
+ "metadata": {},
+ "source": [
+ "## What continuous data are available?\n",
+ "\n",
+ "Filter the combined metadata to `data_type=\"Continuous values\"` to see which\n",
+ "time series a site offers and how far back each goes:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6f8a9d87",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "continuous_available, _ = waterdata.get_combined_metadata(\n",
+ " monitoring_location_id=site,\n",
+ " data_type=\"Continuous values\",\n",
+ ")\n",
+ "avail = continuous_available[[\"parameter_code\", \"parameter_name\", \"begin\", \"end\"]]\n",
+ "avail.sort_values(\"parameter_code\").reset_index(drop=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fdaa8150",
+ "metadata": {},
+ "source": [
+ "## Large requests are chunked for you\n",
+ "\n",
+ "Any list-valued argument — a long list of monitoring locations, several parameter\n",
+ "codes, a complex CQL filter — can push a single request URL past the server's\n",
+ "~8 KB limit. `dataretrieval` handles this automatically: it splits the query into\n",
+ "URL-sized sub-requests, issues them, and recombines (and de-duplicates) the\n",
+ "results into one frame. **You never need to loop over sites yourself** — request\n",
+ "everything in one call.\n",
+ "\n",
+ "For example, asking for several parameter codes at once just returns one combined\n",
+ "long-format frame:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6bc05102",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "multi, _ = waterdata.get_continuous(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=[\"00095\", \"00010\"], # specific conductance + water temperature\n",
+ " time=\"2024-07-01/2024-07-02\",\n",
+ ")\n",
+ "multi.groupby(\"parameter_code\")[\"value\"].agg([\"count\", \"min\", \"max\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "353ad4ec",
+ "metadata": {},
+ "source": [
+ "## Resilient pulls: resume after an interruption\n",
+ "\n",
+ "A large request becomes many sub-requests under the hood, so a long pull can be\n",
+ "interrupted partway through by a rate limit (HTTP 429) or a transient server\n",
+ "error (HTTP 5xx). Rather than discard the work already done, `dataretrieval`\n",
+ "raises a `ChunkInterrupted` that **preserves the completed sub-requests** and\n",
+ "lets you continue:\n",
+ "\n",
+ "- `QuotaExhausted` (429) and `ServiceInterrupted` (5xx) both subclass\n",
+ " `ChunkInterrupted`.\n",
+ "- `exc.partial_frame` holds whatever completed before the failure.\n",
+ "- `exc.retry_after` is the server's suggested wait (when provided).\n",
+ "- `exc.call.resume()` re-issues **only the still-pending** sub-requests and\n",
+ " returns the full `(data, metadata)`.\n",
+ "\n",
+ "The pattern below waits out the interruption and resumes until the pull\n",
+ "finishes. (In normal conditions the request completes on the first try and the\n",
+ "`except` block never runs.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e2e9ddff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import time\n",
+ "\n",
+ "from dataretrieval.waterdata.chunking import ChunkInterrupted\n",
+ "\n",
+ "try:\n",
+ " sensor_data, _ = waterdata.get_continuous(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00095\",\n",
+ " time=\"2024-07-01/2024-07-08\",\n",
+ " )\n",
+ "except ChunkInterrupted as exc:\n",
+ " print(\n",
+ " f\"interrupted after {exc.completed_chunks}/{exc.total_chunks} chunks; resuming\"\n",
+ " )\n",
+ " while True:\n",
+ " time.sleep(exc.retry_after or 5 * 60) # honor Retry-After, else back off\n",
+ " try:\n",
+ " sensor_data, _ = exc.call.resume()\n",
+ " break\n",
+ " except ChunkInterrupted as again:\n",
+ " exc = again\n",
+ "\n",
+ "print(f\"{len(sensor_data):,} rows\")\n",
+ "sensor_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "397e87b5",
+ "metadata": {},
+ "source": [
+ "## The 3-year window: the one axis you split yourself\n",
+ "\n",
+ "There is one limit the library does **not** chunk for you: the continuous service\n",
+ "returns at most **3 years of data per request**, and a time window is not a\n",
+ "list-shaped axis it can fan out. (With no `time` argument the service returns the\n",
+ "latest year; continuous data also has no geometry column and ignores bounding-box\n",
+ "queries.)\n",
+ "\n",
+ "So a multi-year, single-site pull is the one place you still split by time. The\n",
+ "service is most efficient one calendar year at a time, so build a list of yearly\n",
+ "windows:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bd26d199",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Split [start, end] into per-calendar-year (start, end) date strings.\n",
+ "def year_chunks(start, end):\n",
+ " start, end = pd.Timestamp(start), pd.Timestamp(end)\n",
+ " edges = pd.to_datetime([f\"{y}-01-01\" for y in range(start.year + 1, end.year + 1)])\n",
+ " starts = [start, *edges]\n",
+ " ends = [*(edges - pd.Timedelta(days=1)), end]\n",
+ " return [\n",
+ " (s.strftime(\"%Y-%m-%d\"), e.strftime(\"%Y-%m-%d\")) for s, e in zip(starts, ends)\n",
+ " ]\n",
+ "\n",
+ "\n",
+ "# Covering a full multi-year record (no data downloaded here):\n",
+ "pd.DataFrame(year_chunks(\"2012-10-01\", \"2025-09-30\"), columns=[\"start\", \"end\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3bc4f40f",
+ "metadata": {},
+ "source": [
+ "Then request each window and concatenate. (We use a short two-window span here so\n",
+ "the notebook runs quickly; widen the dates for a full period of record.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "01ebb4a0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "chunks = year_chunks(\"2023-10-01\", \"2024-03-31\")\n",
+ "\n",
+ "frames = []\n",
+ "for start, end in chunks:\n",
+ " part, _ = waterdata.get_continuous(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00095\",\n",
+ " time=f\"{start}/{end}\",\n",
+ " )\n",
+ " frames.append(part)\n",
+ "\n",
+ "por = pd.concat(frames, ignore_index=True)\n",
+ "print(\n",
+ " f\"{len(por):,} rows from {len(chunks)} windows, \"\n",
+ " f\"{por['time'].min()} -> {por['time'].max()}\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e2487bf4",
+ "metadata": {},
+ "source": [
+ "Wrap each window's call in the resume pattern above for an unattended,\n",
+ "restart-safe pull. USGS also expects to offer a direct full-period-of-record\n",
+ "download before the legacy NWIS services are decommissioned, which may make\n",
+ "time-window splitting unnecessary — check the documentation for updates.\n",
+ "\n",
+ "## More help\n",
+ "\n",
+ "- Documentation: \n",
+ "- Chunking and resume internals: `dataretrieval.waterdata.chunking`\n",
+ "- Issues / questions: \n",
+ "- Equivalent R article: [Continuous Data](https://doi-usgs.github.io/dataRetrieval/articles/continuous_pr.html)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_DailyStatistics_Examples.ipynb b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
new file mode 100644
index 00000000..ffe9647d
--- /dev/null
+++ b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
@@ -0,0 +1,437 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "fe73969b",
+ "metadata": {},
+ "source": [
+ "# Daily statistics: `get_stats_por` and `get_stats_date_range`\n",
+ "\n",
+ "`get_stats_por` and `get_stats_date_range` return pre-computed temporal\n",
+ "statistics from the [modernized statistics API](https://api.waterdata.usgs.gov/statistics/v0/docs),\n",
+ "the modern replacement for the legacy NWIS statistics service. The two functions wrap\n",
+ "endpoints that look similar but answer different questions:\n",
+ "\n",
+ "| Function | API endpoint | Returns |\n",
+ "| --- | --- | --- |\n",
+ "| `get_stats_por` | `observationNormals` | day-of-year and month-of-year statistics across the period of record |\n",
+ "| `get_stats_date_range` | `observationIntervals` | monthly and annual statistics within a requested date range |\n",
+ "\n",
+ "A couple of usage notes:\n",
+ "\n",
+ "- Pass `computation_type=` to choose the statistic — `arithmetic_mean`,\n",
+ " `median`, `minimum`, `maximum`, or `percentile`.\n",
+ "- There is no dedicated argument to return only day-of-year vs. month-of-year\n",
+ " (or only calendar vs. water year), so filter the returned `time_of_year_type`\n",
+ " / `interval_type` column in pandas, as shown below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d6ab1ce4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.dates as mdates\n",
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "\n",
+ "from dataretrieval import waterdata\n",
+ "\n",
+ "%matplotlib inline\n",
+ "\n",
+ "site = \"USGS-02037500\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf8868ae",
+ "metadata": {},
+ "source": [
+ "## Fetching day-of-year and month-of-year statistics\n",
+ "\n",
+ "Day-of-year and month-of-year statistics aggregate observations for the same\n",
+ "calendar day or month across many years to describe typical seasonal conditions\n",
+ "(all Januarys, or all January 1sts). Below we request day-of-year discharge\n",
+ "averages for January 1 and 2 — note `start_date`/`end_date` are in `MM-DD`\n",
+ "format:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f0ab13bb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "jan_por_mean, _ = waterdata.get_stats_por(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " computation_type=\"arithmetic_mean\",\n",
+ " start_date=\"01-01\",\n",
+ " end_date=\"01-02\",\n",
+ ")\n",
+ "jan_por_mean[[\"time_of_year\", \"time_of_year_type\", \"computation\", \"value\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3dc2b04f",
+ "metadata": {},
+ "source": [
+ "The first two rows are the day-of-year averages. What's the third row? Its\n",
+ "`time_of_year_type` is `month_of_year` — it's the average across all *Januarys*.\n",
+ "This is a quirk of the statistics API: whenever the `start_date`–`end_date` range\n",
+ "overlaps the first day of a month (here `01-01`), you also get the month-of-year\n",
+ "summary.\n",
+ "\n",
+ "To return only one type, filter the `time_of_year_type` column — here,\n",
+ "month-of-year only:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4d561aba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "moy = jan_por_mean[jan_por_mean[\"time_of_year_type\"] == \"month_of_year\"]\n",
+ "moy[[\"time_of_year\", \"time_of_year_type\", \"computation\", \"value\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "43fe1eef",
+ "metadata": {},
+ "source": [
+ "### Percentile band plot\n",
+ "\n",
+ "Now an example that shows the power of the statistics API: we pull *all*\n",
+ "day-of-year discharge percentiles for the site. Computing these without the API\n",
+ "would mean downloading the entire daily period of record and computing\n",
+ "percentiles by hand.\n",
+ "\n",
+ "By default `get_stats_por` sets `expand_percentiles=True`, returning one row per\n",
+ "percentile with the value in `value` and the threshold in `percentile`\n",
+ "(minimum is reported as percentile 0, maximum as 100)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "18bd842c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "full_por_percentiles, _ = waterdata.get_stats_por(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " computation_type=[\"minimum\", \"maximum\", \"percentile\"],\n",
+ " start_date=\"01-01\",\n",
+ " end_date=\"12-31\",\n",
+ ")\n",
+ "# The January 1 day-of-year percentiles (used on the WDFN state pages):\n",
+ "jan1 = full_por_percentiles[\n",
+ " (full_por_percentiles[\"time_of_year\"] == \"01-01\")\n",
+ " & (full_por_percentiles[\"time_of_year_type\"] == \"day_of_year\")\n",
+ "]\n",
+ "jan1.sort_values(\"percentile\")[[\"time_of_year\", \"computation\", \"percentile\", \"value\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c8fc4a28",
+ "metadata": {},
+ "source": [
+ "Pivoting the day-of-year rows so each percentile is a column lets us draw the\n",
+ "percentile \"ribbons\" — each band spans two adjacent percentiles (min–5th, 5th–10th, …):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aaa72823",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doy = full_por_percentiles[\n",
+ " full_por_percentiles[\"time_of_year_type\"] == \"day_of_year\"\n",
+ "].copy()\n",
+ "doy[\"value\"] = pd.to_numeric(doy[\"value\"], errors=\"coerce\") # API returns strings\n",
+ "bands = doy.pivot_table(index=\"time_of_year\", columns=\"percentile\", values=\"value\")\n",
+ "bands.columns = [int(c) for c in bands.columns]\n",
+ "bands = bands.sort_index() # \"MM-DD\" strings sort chronologically within a year\n",
+ "\n",
+ "# x positions: map MM-DD onto a reference (leap) year so 02-29 is included\n",
+ "x = pd.to_datetime(\"2024-\" + bands.index, format=\"%Y-%m-%d\")\n",
+ "\n",
+ "# (lo, hi) percentile range, fill color, legend label\n",
+ "band_defs = [\n",
+ " ((95, 100), \"#292f6b\", \"95th Percentile - Max\"),\n",
+ " ((90, 95), \"#5699c0\", \"90th - 95th Percentile\"),\n",
+ " ((75, 90), \"#aacee0\", \"75th - 90th Percentile\"),\n",
+ " ((25, 75), \"#e9e9e9\", \"25th - 75th Percentile\"),\n",
+ " ((10, 25), \"#ebd6ab\", \"10th - 25th Percentile\"),\n",
+ " ((5, 10), \"#dcb668\", \"5th - 10th Percentile\"),\n",
+ " ((0, 5), \"#8f4f1f\", \"Min - 5th Percentile\"),\n",
+ "]\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(9, 5))\n",
+ "for (lo, hi), color, label in band_defs:\n",
+ " ax.fill_between(x, bands[lo], bands[hi], facecolor=color, alpha=0.7, label=label)\n",
+ "ax.set_yscale(\"log\")\n",
+ "ax.xaxis.set_major_locator(mdates.MonthLocator())\n",
+ "ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%b\"))\n",
+ "ax.set_xlabel(\"Month\")\n",
+ "ax.set_ylabel(\"Discharge, cubic feet per second\")\n",
+ "ax.set_title(\"Day-of-year percentile bands (USGS-02037500)\")\n",
+ "ax.legend(title=\"Historical percentiles\", fontsize=7, loc=\"upper right\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7b4075bd",
+ "metadata": {},
+ "source": [
+ "Finally, overlay the actual daily mean discharge so we can see where recent\n",
+ "conditions fall relative to the historical bands — exactly the view on the\n",
+ "[Water Data for the Nation (WDFN) statistical graphs](https://waterdata.usgs.gov/monitoring-location/USGS-02037500/statistical-graphs/).\n",
+ "We pull two water years of daily means and join them to the bands by month-day."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "961eea3a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "daily, _ = waterdata.get_daily(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " statistic_id=\"00003\",\n",
+ " time=[\"2024-01-01\", \"2025-12-31\"],\n",
+ ")\n",
+ "daily = daily.sort_values(\"time\").reset_index(drop=True)\n",
+ "daily[\"md\"] = daily[\"time\"].dt.strftime(\"%m-%d\")\n",
+ "\n",
+ "# Repeat the day-of-year bands across each actual calendar date\n",
+ "b = bands.reindex(daily[\"md\"]).reset_index(drop=True)\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(9, 5))\n",
+ "for (lo, hi), color, label in band_defs:\n",
+ " ax.fill_between(\n",
+ " daily[\"time\"], b[lo], b[hi], facecolor=color, alpha=0.7, label=label\n",
+ " )\n",
+ "ax.plot(daily[\"time\"], daily[\"value\"], color=\"black\", lw=0.9, label=\"Daily mean\")\n",
+ "prov = daily[daily[\"approval_status\"] == \"Provisional\"]\n",
+ "ax.scatter(prov[\"time\"], prov[\"value\"], color=\"red\", s=5, zorder=3, label=\"Provisional\")\n",
+ "ax.set_yscale(\"log\")\n",
+ "ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))\n",
+ "ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%b %Y\"))\n",
+ "ax.set_ylabel(\"Discharge, cubic feet per second\")\n",
+ "ax.set_title(\"Daily mean discharge vs. historical percentile bands\")\n",
+ "ax.legend(fontsize=7, ncol=2, loc=\"upper right\")\n",
+ "fig.autofmt_xdate()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e31f1726",
+ "metadata": {},
+ "source": [
+ "## Fetching monthly and annual statistics within a date range\n",
+ "\n",
+ "Unlike the day-/month-of-year normals, `get_stats_date_range` summarizes specific\n",
+ "months and years inside a requested window. Here we ask for the average discharge\n",
+ "for January 2024 — note the `YYYY-MM-DD` date format:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0bc8cd83",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "jan_daterange_mean, _ = waterdata.get_stats_date_range(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " computation_type=\"arithmetic_mean\",\n",
+ " start_date=\"2024-01-01\",\n",
+ " end_date=\"2024-01-31\",\n",
+ ")\n",
+ "jan_daterange_mean[[\"start_date\", \"end_date\", \"interval_type\", \"value\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7d915aed",
+ "metadata": {},
+ "source": [
+ "Instead of `time_of_year`, the output has `start_date`, `end_date`, and\n",
+ "`interval_type`. The first row is the monthly average; the API also returns the\n",
+ "**calendar year** and **water year** averages for any year intersecting the\n",
+ "range. A 93-day window can therefore touch two calendar and two water years:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cfe28029",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "multiyear, _ = waterdata.get_stats_date_range(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " computation_type=\"arithmetic_mean\",\n",
+ " start_date=\"2023-09-30\",\n",
+ " end_date=\"2024-01-01\",\n",
+ ")\n",
+ "multiyear[[\"start_date\", \"end_date\", \"interval_type\", \"value\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9c30978f",
+ "metadata": {},
+ "source": [
+ "Filter the `interval_type` column (values `month`, `calendar_year`,\n",
+ "`water_year`) to keep only certain intervals — here, the annual rows:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7ff90e81",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "multiyear[multiyear[\"interval_type\"].isin([\"calendar_year\", \"water_year\"])][\n",
+ " [\"start_date\", \"end_date\", \"interval_type\", \"value\"]\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "061b9cbe",
+ "metadata": {},
+ "source": [
+ "### Monthly mean table\n",
+ "\n",
+ "We can reproduce something like a Water Year Summary monthly-mean table. We pull\n",
+ "the full period of record (no dates), keep the monthly intervals, and aggregate\n",
+ "by calendar month in water-year order. (Values may differ slightly from the\n",
+ "official summaries due to rounding.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1c705056",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "monthly_raw, _ = waterdata.get_stats_date_range(\n",
+ " monitoring_location_id=site,\n",
+ " parameter_code=\"00060\",\n",
+ " computation_type=\"arithmetic_mean\",\n",
+ ")\n",
+ "m = monthly_raw[monthly_raw[\"interval_type\"] == \"month\"].copy()\n",
+ "m[\"start_date\"] = pd.to_datetime(m[\"start_date\"])\n",
+ "m[\"value\"] = pd.to_numeric(m[\"value\"], errors=\"coerce\")\n",
+ "m = m[(m[\"start_date\"] >= \"2004-10-01\") & (m[\"start_date\"] < \"2025-09-01\")]\n",
+ "m = m.dropna(subset=[\"value\"])\n",
+ "m[\"month\"] = m[\"start_date\"].dt.strftime(\"%b\")\n",
+ "m[\"water_year\"] = (m[\"start_date\"] + pd.DateOffset(months=3)).dt.year\n",
+ "\n",
+ "\n",
+ "def summarize(g):\n",
+ " hi = g.loc[g[\"value\"].idxmax()]\n",
+ " lo = g.loc[g[\"value\"].idxmin()]\n",
+ " return pd.Series(\n",
+ " {\n",
+ " \"Mean\": round(g[\"value\"].mean()),\n",
+ " \"Max (WY)\": f\"{round(hi['value'])} ({int(hi['water_year'])})\",\n",
+ " \"Min (WY)\": f\"{round(lo['value'])} ({int(lo['water_year'])})\",\n",
+ " }\n",
+ " )\n",
+ "\n",
+ "\n",
+ "wy_order = [\n",
+ " \"Oct\",\n",
+ " \"Nov\",\n",
+ " \"Dec\",\n",
+ " \"Jan\",\n",
+ " \"Feb\",\n",
+ " \"Mar\",\n",
+ " \"Apr\",\n",
+ " \"May\",\n",
+ " \"Jun\",\n",
+ " \"Jul\",\n",
+ " \"Aug\",\n",
+ " \"Sep\",\n",
+ "]\n",
+ "table = m.groupby(\"month\")[[\"value\", \"water_year\"]].apply(summarize).reindex(wy_order)\n",
+ "table.T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31cd8b14",
+ "metadata": {},
+ "source": [
+ "## Statistics API tips\n",
+ "\n",
+ "The statistics API does **not** follow the OGC standards used by the\n",
+ "`api.waterdata.usgs.gov/ogcapi/v0/` endpoints. A few things to keep in mind:\n",
+ "\n",
+ "- **Higher rate limits.** At the time of writing the statistics API allows ~4000\n",
+ " requests/hour per IP (per token if a token is supplied).\n",
+ "- **All columns, always.** There is no `skip_geometry` or `properties` argument —\n",
+ " the API returns the full column set.\n",
+ "- **Month-of-year normals.** To get month-of-year statistics from\n",
+ " `get_stats_por`, make the `start_date`–`end_date` range overlap the first of\n",
+ " the month (e.g. `01-01`–`03-01` returns the January, February, and March\n",
+ " month-of-year stats in addition to each day-of-year).\n",
+ "- **Monthly/annual intervals.** `get_stats_date_range` returns a summary for\n",
+ " every calendar month, calendar year, and water year that intersects the range.\n",
+ "- **Median = the 50th percentile.** Requesting both `median` and `percentile`\n",
+ " duplicates the median; you rarely need both.\n",
+ "- **Min/max are not percentiles.** Use\n",
+ " `computation_type=[\"minimum\", \"maximum\", \"percentile\"]` for a complete set of\n",
+ " order statistics (as we did for the band plot).\n",
+ "- **Fixed percentiles.** `percentile` only ever returns the 5th, 10th, 25th,\n",
+ " 50th, 75th, 90th, and 95th. For other percentiles, pull the daily record with\n",
+ " `get_daily` and compute them yourself.\n",
+ "- **Watch `sample_count`.** It's the number of observations behind a statistic;\n",
+ " there is no minimum, so a monthly/annual value can rest on a single daily\n",
+ " observation.\n",
+ "\n",
+ "## More help\n",
+ "\n",
+ "- Documentation: \n",
+ "- Statistics documentation: \n",
+ "- Equivalent R article: [daily statistics](https://doi-usgs.github.io/dataRetrieval/articles/daily_data_statistics.html)\n",
+ "- Issues / questions: "
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
new file mode 100644
index 00000000..ea8deac2
--- /dev/null
+++ b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
@@ -0,0 +1,549 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "438bbb08",
+ "metadata": {},
+ "source": [
+ "# Discrete water-quality samples: `get_samples`\n",
+ "\n",
+ "As USGS retires the legacy NWIS discrete water-quality services, the new\n",
+ "*Water Data for the Nation* samples service takes their place. In Python it is\n",
+ "exposed through three functions in `dataretrieval.waterdata`:\n",
+ "\n",
+ "- `get_samples` — retrieve discrete water-quality results (or, with `service=`,\n",
+ " the matching locations, activities, projects, or organizations).\n",
+ "- `get_samples_summary` — summarize what data a single site has.\n",
+ "- `get_codes` — list the allowable values for the categorical query arguments.\n",
+ "\n",
+ "We'll cover retrieving data from a known site, using geographic filters, and\n",
+ "discovering what data are available. The interactive web UI is at\n",
+ " and the API docs are at\n",
+ ".\n",
+ "\n",
+ "> Column names: unlike the OGC `get_daily` / `get_monitoring_locations`\n",
+ "> functions, the samples service uses WQX3-style names such as\n",
+ "> `Location_Latitude`, `Activity_StartDateTime`, and `Result_Measure`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "257b6197",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "\n",
+ "from dataretrieval import waterdata\n",
+ "from dataretrieval.waterdata import PROFILE_LOOKUP\n",
+ "\n",
+ "%matplotlib inline\n",
+ "plt.rcParams[\"figure.figsize\"] = (7, 4)\n",
+ "\n",
+ "\n",
+ "def map_sites(df, title=\"\"):\n",
+ " \"\"\"Static scatter plot of sample-site locations. Use folium for interactive.\"\"\"\n",
+ " lon = pd.to_numeric(df[\"Location_Longitude\"], errors=\"coerce\")\n",
+ " lat = pd.to_numeric(df[\"Location_Latitude\"], errors=\"coerce\")\n",
+ " fig, ax = plt.subplots(figsize=(7, 5))\n",
+ " ax.scatter(lon, lat, s=10, color=\"red\", alpha=0.7)\n",
+ " ax.set_xlabel(\"Longitude\")\n",
+ " ax.set_ylabel(\"Latitude\")\n",
+ " ax.set_title(f\"{title} ({len(df)} sites)\")\n",
+ " plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3c166c18",
+ "metadata": {},
+ "source": [
+ "## Retrieving data from a known site\n",
+ "\n",
+ "Given a USGS site, `get_samples_summary` reports what discrete-sample data are\n",
+ "available there — one row per (characteristic group, characteristic,\n",
+ "user-supplied characteristic) with result and activity counts."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "27e0d33a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "site = \"USGS-04183500\"\n",
+ "data_at_site, _ = waterdata.get_samples_summary(monitoringLocationIdentifier=site)\n",
+ "data_at_site.sort_values(\"resultCount\", ascending=False).head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e388b2d5",
+ "metadata": {},
+ "source": [
+ "Note the `characteristicUserSupplied` column: asking for a bare characteristic\n",
+ "like *Phosphorus* would return both filtered and unfiltered values mixed\n",
+ "together. `characteristicUserSupplied` is a very specific descriptor (similar to\n",
+ "a long-form USGS parameter code) that lets you isolate exactly the constituent\n",
+ "you want. To pull the underlying data, use `get_samples`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "86bfc2b5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "user_char = \"Phosphorus as phosphorus, water, unfiltered\"\n",
+ "phos_data, _ = waterdata.get_samples(\n",
+ " monitoringLocationIdentifier=site,\n",
+ " characteristicUserSupplied=user_char,\n",
+ ")\n",
+ "print(f\"default ('fullphyschem') profile -> {phos_data.shape[1]} columns\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "593529c6",
+ "metadata": {},
+ "source": [
+ "The default profile (`fullphyschem`, the \"Full physical chemical\" profile) is\n",
+ "comprehensive, hence the very wide table. For plotting we usually only need a few\n",
+ "columns, so ask for the `narrow` profile instead:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "682226d1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "phos_narrow, _ = waterdata.get_samples(\n",
+ " monitoringLocationIdentifier=site,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " profile=\"narrow\",\n",
+ ")\n",
+ "print(f\"'narrow' profile -> {phos_narrow.shape[1]} columns\")\n",
+ "phos_narrow[[\"Activity_StartDateTime\", \"Result_Measure\", \"Result_MeasureUnit\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "697e0827",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x = pd.to_datetime(phos_narrow[\"Activity_StartDateTime\"], errors=\"coerce\")\n",
+ "y = pd.to_numeric(phos_narrow[\"Result_Measure\"], errors=\"coerce\")\n",
+ "fig, ax = plt.subplots(figsize=(7, 4))\n",
+ "ax.scatter(x, y, s=10)\n",
+ "ax.set_xlabel(\"Date\")\n",
+ "ax.set_ylabel(user_char, wrap=True)\n",
+ "ax.set_title(phos_narrow[\"Location_Name\"].iloc[0])\n",
+ "fig.autofmt_xdate()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6573353a",
+ "metadata": {},
+ "source": [
+ "## Return data types\n",
+ "\n",
+ "Two arguments control what comes back: `service` defines the *kind* of data and\n",
+ "`profile` defines which columns of that kind are returned. The valid combinations\n",
+ "are published in `PROFILE_LOOKUP`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "49ceacca",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "PROFILE_LOOKUP"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "380fb6d1",
+ "metadata": {},
+ "source": [
+ "## Geographic filters\n",
+ "\n",
+ "Often you don't know a site number but you do have an area of interest. Below we\n",
+ "keep the queries lightweight by setting `service=\"locations\"` and\n",
+ "`profile=\"site\"` (so we get *where* data exists, not the result values\n",
+ "themselves) and filter on our phosphorus characteristic.\n",
+ "\n",
+ "### Bounding box\n",
+ "\n",
+ "A bounding box is `[west, south, east, north]` (longitudes then latitudes):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d2d582ff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "bbox = [-90.8, 44.2, -89.9, 45.0]\n",
+ "bbox_sites, _ = waterdata.get_samples(\n",
+ " boundingBox=bbox,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " service=\"locations\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "map_sites(bbox_sites, \"Phosphorus sites in bounding box\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c05a5786",
+ "metadata": {},
+ "source": [
+ "### Hydrologic unit codes (HUCs)\n",
+ "\n",
+ "HUCs identify drainage areas; this filter accepts 2-, 4-, 6-, 8-, 10-, or\n",
+ "12-digit codes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fbbf7898",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "huc_sites, _ = waterdata.get_samples(\n",
+ " hydrologicUnit=\"070700\",\n",
+ " characteristicUserSupplied=user_char,\n",
+ " service=\"locations\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "map_sites(huc_sites, \"Phosphorus sites in HUC 070700\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "151d88ba",
+ "metadata": {},
+ "source": [
+ "### Distance from a point\n",
+ "\n",
+ "Supply a latitude, longitude, and radius in miles:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9711e26c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "point_sites, _ = waterdata.get_samples(\n",
+ " pointLocationLatitude=43.074680,\n",
+ " pointLocationLongitude=-89.428054,\n",
+ " pointLocationWithinMiles=20,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " service=\"locations\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "map_sites(point_sites, \"Phosphorus sites within 20 mi of Madison, WI\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec22beac",
+ "metadata": {},
+ "source": [
+ "### County FIPS\n",
+ "\n",
+ "County FIPS codes take the form `US:SS:CCC`. Wisconsin's state code is available\n",
+ "from `dataretrieval.codes`, and Dane County's full FIPS is `US:55:025`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f07b210b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from dataretrieval.codes import states\n",
+ "\n",
+ "wi = states.fips_codes[\"Wisconsin\"] # \"55\"\n",
+ "dane_county = f\"US:{wi}:025\"\n",
+ "county_sites, _ = waterdata.get_samples(\n",
+ " countyFips=dane_county,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " service=\"locations\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "map_sites(county_sites, \"Phosphorus sites in Dane County, WI\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8e43b993",
+ "metadata": {},
+ "source": [
+ "### State FIPS\n",
+ "\n",
+ "State FIPS codes take the form `US:SS`. A whole-state query can return a lot of\n",
+ "sites, so here we also constrain the activity start date to October–November 2024\n",
+ "(see *Additional query parameters* below):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "83519737",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "state_fip = f\"US:{wi}\" # \"US:55\"\n",
+ "state_sites_recent, _ = waterdata.get_samples(\n",
+ " stateFips=state_fip,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " service=\"locations\",\n",
+ " activityStartDateLower=\"2024-10-01\",\n",
+ " activityStartDateUpper=\"2024-11-30\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "map_sites(state_sites_recent, \"WI phosphorus sites, Oct-Nov 2024\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0aab190b",
+ "metadata": {},
+ "source": [
+ "## Additional query parameters\n",
+ "\n",
+ "Several parameters narrow the results further. The allowable values for the\n",
+ "categorical ones come from `get_codes`. Note that `get_codes` returns a plain\n",
+ "`DataFrame` (no metadata tuple).\n",
+ "\n",
+ "### `siteTypeCode` / `siteTypeName`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f21e23e7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "site_type_info = waterdata.get_codes(code_service=\"sitetype\")\n",
+ "site_type_info[[\"typeCode\", \"typeLongName\"]].head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fcdf0025",
+ "metadata": {},
+ "source": [
+ "### `activityMediaName`\n",
+ "\n",
+ "The environmental medium that was sampled or analyzed:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "64369260",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "waterdata.get_codes(code_service=\"samplemedia\")[\"activityMedia\"].tolist()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "647de77a",
+ "metadata": {},
+ "source": [
+ "### `characteristicGroup`\n",
+ "\n",
+ "A broad category describing the measurement (generally following the Water\n",
+ "Quality Portal groups):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d1b139a9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "waterdata.get_codes(code_service=\"characteristicgroup\")[\"characteristicGroup\"].tolist()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cfa4bfbf",
+ "metadata": {},
+ "source": [
+ "### `characteristic` and `usgsPCode`\n",
+ "\n",
+ "The `characteristics` table lists specific constituents along with their USGS\n",
+ "parameter codes:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "72c32873",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "characteristic_info = waterdata.get_codes(code_service=\"characteristics\")\n",
+ "print(\"unique characteristic names:\")\n",
+ "print(characteristic_info[\"characteristicName\"].drop_duplicates().head().tolist())\n",
+ "print(\"\\nexample USGS parameter codes:\")\n",
+ "print(characteristic_info[\"parameterCode\"].dropna().drop_duplicates().head().tolist())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0872a69",
+ "metadata": {},
+ "source": [
+ "### `characteristicUserSupplied`\n",
+ "\n",
+ "The USGS \"observed property\" — the detailed descriptor that replaces the old\n",
+ "parameter name / pcode for discrete data, and the value we filtered on above:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "236c0f76",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "waterdata.get_codes(code_service=\"observedproperty\")[\"observedProperty\"].head().tolist()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3caa694d",
+ "metadata": {},
+ "source": [
+ "Other filters worth knowing about: `projectIdentifier` (needs prior project\n",
+ "info), `recordIdentifierUserSupplied` (needs the supplier's record id), and\n",
+ "`activityStartDateLower` / `activityStartDateUpper` for date ranges (used above)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6dfb0d2f",
+ "metadata": {},
+ "source": [
+ "## Data discovery\n",
+ "\n",
+ "Combining a geographic filter with site-type and characteristic filters lets you\n",
+ "zero in on candidate sites. For example, lakes in Dane County, WI that measured\n",
+ "our phosphorus characteristic:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8af3af88",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "county_lake_sites, _ = waterdata.get_samples(\n",
+ " countyFips=dane_county,\n",
+ " characteristicUserSupplied=user_char,\n",
+ " siteTypeName=\"Lake, Reservoir, Impoundment\",\n",
+ " service=\"locations\",\n",
+ " profile=\"site\",\n",
+ ")\n",
+ "print(f\"{len(county_lake_sites)} lake sites measuring phosphorus in Dane County, WI\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87f31bda",
+ "metadata": {},
+ "source": [
+ "`get_samples_summary` accepts one site at a time, so we loop over the candidate\n",
+ "sites to tally how much phosphorus data each has — useful for deciding which\n",
+ "sites to actually pull results from."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "421b6982",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "rows = []\n",
+ "for loc_id in county_lake_sites[\"Location_Identifier\"]:\n",
+ " avail, _ = waterdata.get_samples_summary(monitoringLocationIdentifier=loc_id)\n",
+ " rows.append(avail[avail[\"characteristicUserSupplied\"] == user_char])\n",
+ "\n",
+ "all_data = pd.concat(rows, ignore_index=True)\n",
+ "all_data.sort_values(\"resultCount\", ascending=False)[\n",
+ " [\n",
+ " \"monitoringLocationIdentifier\",\n",
+ " \"resultCount\",\n",
+ " \"activityCount\",\n",
+ " \"firstActivity\",\n",
+ " \"mostRecentActivity\",\n",
+ " ]\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d2614654",
+ "metadata": {},
+ "source": [
+ "This summary helps narrow down which sites to request data from — whether you\n",
+ "need sites with recent data, lots of data, or just any measurement at all.\n",
+ "\n",
+ "## More help\n",
+ "\n",
+ "- Documentation: \n",
+ "- Samples API docs: \n",
+ "- Equivalent R article: [Introducing read_waterdata_samples](https://doi-usgs.github.io/dataRetrieval/articles/samples_data.html)\n",
+ "- Issues / questions: "
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_Introduction_Examples.ipynb b/demos/USGS_WaterData_Introduction_Examples.ipynb
new file mode 100644
index 00000000..4c8b9935
--- /dev/null
+++ b/demos/USGS_WaterData_Introduction_Examples.ipynb
@@ -0,0 +1,668 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "a8818c89",
+ "metadata": {},
+ "source": [
+ "# Introduction to the USGS Water Data APIs\n",
+ "\n",
+ "The [USGS Water Data APIs](https://api.waterdata.usgs.gov/ogcapi/v0/) are the\n",
+ "modern, OGC-based replacement for the legacy NWIS web services. In Python they are\n",
+ "exposed through the `dataretrieval.waterdata` module, which will gradually replace\n",
+ "the older `dataretrieval.nwis` functions.\n",
+ "\n",
+ "This notebook tours each new function. The NWIS shut-down timeline is still\n",
+ "uncertain, so we recommend migrating to the `waterdata` functions sooner rather\n",
+ "than later.\n",
+ "\n",
+ "If you are coming from the R `dataRetrieval` package, the functions map across as\n",
+ "follows:\n",
+ "\n",
+ "| R `dataRetrieval` | Python `dataretrieval.waterdata` |\n",
+ "| --- | --- |\n",
+ "| `read_waterdata_monitoring_location` | `get_monitoring_locations` |\n",
+ "| `read_waterdata_ts_meta` / `read_waterdata_combined_meta` | `get_time_series_metadata` / `get_combined_metadata` |\n",
+ "| `read_waterdata_parameter_codes` | `get_reference_table(collection=\"parameter-codes\")` |\n",
+ "| `read_waterdata_daily` | `get_daily` |\n",
+ "| `read_waterdata_continuous` | `get_continuous` |\n",
+ "| `read_waterdata_field_measurements` | `get_field_measurements` |\n",
+ "| `read_waterdata_channel` | `get_channel` |\n",
+ "| `read_waterdata_latest_continuous` / `read_waterdata_latest_daily` | `get_latest_continuous` / `get_latest_daily` |\n",
+ "| `read_waterdata` (CQL) | the `filter` / `filter_lang` arguments on any function |\n",
+ "| `read_waterdata_metadata` | `get_reference_table` |\n",
+ "| `read_waterdata_samples` | `get_samples` |\n",
+ "| `read_waterdata_stats_por` / `read_waterdata_stats_daterange` | `get_stats_por` / `get_stats_date_range` |"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "03b51493",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "\n",
+ "from dataretrieval import waterdata\n",
+ "\n",
+ "%matplotlib inline\n",
+ "plt.rcParams[\"figure.figsize\"] = (7, 4)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "27cea444",
+ "metadata": {},
+ "source": [
+ "> **Return values.** Every `dataretrieval.waterdata` function returns a\n",
+ "> `(data, metadata)` tuple. The first element is a `pandas.DataFrame` (or a\n",
+ "> `geopandas.GeoDataFrame` when the service returns a geometry column); the\n",
+ "> second is a small metadata object describing the request. Throughout this\n",
+ "> notebook we unpack the tuple as `df, md = waterdata.get_...(...)`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e38880f",
+ "metadata": {},
+ "source": [
+ "## New features\n",
+ "\n",
+ "The new API endpoints each deliver a different type of USGS water data, and they\n",
+ "all share features the legacy services lacked.\n",
+ "\n",
+ "### Flexible queries\n",
+ "\n",
+ "The new functions expose **all** of the query parameters the API supports, each\n",
+ "defaulting to `None`. You do **not** need to (and usually should not) specify\n",
+ "them all. Filters are combined with a Boolean *AND*: passing both a list of\n",
+ "monitoring locations and a list of parameter codes returns only the\n",
+ "combinations of the two. Because every argument is named, your IDE can\n",
+ "autocomplete the options.\n",
+ "\n",
+ "### Flexible columns returned\n",
+ "\n",
+ "Use the `properties` argument to choose which columns come back. The full set of\n",
+ "available properties for a collection is published in that collection's schema,\n",
+ "e.g. ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d59a461b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Ask for just a few columns instead of the full ~40-column record.\n",
+ "sites_info, _ = waterdata.get_monitoring_locations(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " properties=[\n",
+ " \"monitoring_location_id\",\n",
+ " \"site_type\",\n",
+ " \"drainage_area\",\n",
+ " \"monitoring_location_name\",\n",
+ " ],\n",
+ ")\n",
+ "sites_info.drop(columns=\"geometry\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "54ace335",
+ "metadata": {},
+ "source": [
+ "### API tokens\n",
+ "\n",
+ "USGS now rate-limits requests per IP address per hour. If you hit the limit you\n",
+ "can request a free API token at . Keep it\n",
+ "out of shared scripts and version control. (At the time of writing the Python\n",
+ "`dataretrieval` package does not yet wire a token into these calls; the rate\n",
+ "limits are generous for the queries below.)\n",
+ "\n",
+ "### Contextual Query Language (CQL2)\n",
+ "\n",
+ "The APIs accept [CQL2](https://www.loc.gov/standards/sru/cql/) expressions for\n",
+ "complex queries through the `filter` / `filter_lang` arguments. See the\n",
+ "[General retrieval and CQL2](#general-retrieval-and-cql2) section below.\n",
+ "\n",
+ "### Simple features\n",
+ "\n",
+ "Spatial collections return a `geometry` column, so `get_*` calls give you a\n",
+ "`geopandas.GeoDataFrame` that drops straight into geospatial workflows. Pass `skip_geometry=True` to get a plain `DataFrame`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "20c5dd03",
+ "metadata": {},
+ "source": [
+ "## Lessons learned\n",
+ "\n",
+ "### Request many sites in one call\n",
+ "\n",
+ "`dataretrieval` automatically splits a large request — many monitoring\n",
+ "locations, several parameter codes, or a complex filter — into URL-sized\n",
+ "sub-requests and recombines the results, and it can resume a long pull that hits\n",
+ "a rate limit or transient server error without refetching completed work. So\n",
+ "pass all your sites in one call rather than looping over them.\n",
+ "\n",
+ "The main exception is **continuous** data, which is capped at 3 years per\n",
+ "request. See the *Continuous Data* notebook for large continuous pulls."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e49f3ad0",
+ "metadata": {},
+ "source": [
+ "## New functions\n",
+ "\n",
+ "### Monitoring location\n",
+ "\n",
+ "`get_monitoring_locations` returns site metadata. To browse the service in a\n",
+ "web browser, visit\n",
+ ".\n",
+ "\n",
+ "A simple request for one known USGS site:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d42fc61a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sites_info, _ = waterdata.get_monitoring_locations(\n",
+ " monitoring_location_id=\"USGS-01491000\"\n",
+ ")\n",
+ "print(f\"{sites_info.shape[1]} columns returned\")\n",
+ "sites_info.drop(columns=\"geometry\").T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "29d1be4d",
+ "metadata": {},
+ "source": [
+ "Any returned column can also be used as an input filter. For example, to find\n",
+ "every stream site in Wisconsin:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cf090884",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sites_wi, _ = waterdata.get_monitoring_locations(\n",
+ " state_name=\"Wisconsin\",\n",
+ " site_type=\"Stream\",\n",
+ ")\n",
+ "print(f\"{len(sites_wi)} Wisconsin stream sites\")\n",
+ "sites_wi[[\"monitoring_location_id\", \"monitoring_location_name\", \"geometry\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c4096bf3",
+ "metadata": {},
+ "source": [
+ "Because the result is a `GeoDataFrame`, plotting the locations is a one-liner.\n",
+ "For an interactive map, `folium` works well with the same data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ce4c88a7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ax = sites_wi.plot(markersize=4, figsize=(7, 5))\n",
+ "ax.set_title(\"USGS stream monitoring locations in Wisconsin\")\n",
+ "ax.set_xlabel(\"Longitude\")\n",
+ "ax.set_ylabel(\"Latitude\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e162fcf",
+ "metadata": {},
+ "source": [
+ "### Time series & combined metadata\n",
+ "\n",
+ "`get_combined_metadata` merges time-series metadata\n",
+ "(`get_time_series_metadata`) and field-measurement metadata by site, telling you\n",
+ "which time series a site offers and the span of each."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a593b5e8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ts_available, _ = waterdata.get_combined_metadata(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " parameter_code=[\"00060\", \"00010\"],\n",
+ ")\n",
+ "cols = [\"parameter_name\", \"statistic_id\", \"begin\", \"end\", \"last_modified\"]\n",
+ "ts_available[cols]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c294fee3",
+ "metadata": {},
+ "source": [
+ "### Parameter codes\n",
+ "\n",
+ "Parameter-code descriptions come from the `parameter-codes` reference table:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cc1601c0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pcode_info, _ = waterdata.get_reference_table(\n",
+ " collection=\"parameter-codes\",\n",
+ " query={\"id\": \"00660\"},\n",
+ ")\n",
+ "pcode_info.T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "330064f2",
+ "metadata": {},
+ "source": [
+ "### Daily values\n",
+ "\n",
+ "`get_daily` returns daily values. Browse it at\n",
+ "."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d1fef3df",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "daily_data, _ = waterdata.get_daily(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " parameter_code=[\"00060\", \"00010\"],\n",
+ " statistic_id=\"00003\",\n",
+ " time=[\"2023-10-01\", \"2024-09-30\"],\n",
+ ")\n",
+ "daily_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4817c8c9",
+ "metadata": {},
+ "source": [
+ "Notice the data come back in **long** format — one observation per row. Long\n",
+ "data are usually easier to work with; here we facet by `parameter_code`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f0578529",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "params = sorted(daily_data[\"parameter_code\"].unique())\n",
+ "fig, axes = plt.subplots(len(params), 1, figsize=(7, 5), sharex=True)\n",
+ "for ax, pcode in zip(axes, params):\n",
+ " sub = daily_data[daily_data[\"parameter_code\"] == pcode]\n",
+ " ax.scatter(sub[\"time\"], sub[\"value\"], s=4)\n",
+ " ax.set_ylabel(pcode)\n",
+ "axes[0].set_title(\"Daily values at USGS-01491000 (water year 2024)\")\n",
+ "axes[-1].set_xlabel(\"time\")\n",
+ "fig.autofmt_xdate()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8af565c7",
+ "metadata": {},
+ "source": [
+ "### Continuous\n",
+ "\n",
+ "`get_continuous` returns instantaneous (sensor) values. Browse it at\n",
+ ".\n",
+ "\n",
+ "This service currently allows at most **3 years** of data per request; with no\n",
+ "`time` argument it returns the latest year. Continuous data have no geometry\n",
+ "column and do not support bounding-box queries. For large pulls, see the\n",
+ "*Continuous Data* notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2dbcdd47",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sensor_data, _ = waterdata.get_continuous(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " parameter_code=\"00060\",\n",
+ " time=\"2024-09-01/2024-09-03\",\n",
+ ")\n",
+ "sensor_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6b4fa772",
+ "metadata": {},
+ "source": [
+ "### Field measurements\n",
+ "\n",
+ "`get_field_measurements` returns discrete field measurements, including\n",
+ "groundwater levels."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "12d4649a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "field_data, _ = waterdata.get_field_measurements(\n",
+ " monitoring_location_id=[\n",
+ " \"USGS-451605097071701\",\n",
+ " \"USGS-263819081585801\",\n",
+ " ],\n",
+ " time=[\"2023-10-01\", \"2024-09-30\"],\n",
+ ")\n",
+ "field_data[[\"time\", \"monitoring_location_id\", \"parameter_code\", \"value\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6a6f9ba4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(7, 4))\n",
+ "for site, sub in field_data.groupby(\"monitoring_location_id\"):\n",
+ " ax.scatter(sub[\"time\"], sub[\"value\"], s=12, label=site)\n",
+ "ax.set_ylabel(\"value\")\n",
+ "ax.set_title(\"Field measurements\")\n",
+ "ax.legend(fontsize=7)\n",
+ "fig.autofmt_xdate()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a709774",
+ "metadata": {},
+ "source": [
+ "### Channel measurements\n",
+ "\n",
+ "`get_channel` returns channel-geometry measurements that accompany\n",
+ "`get_field_measurements`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fb3105ff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "channel, _ = waterdata.get_channel(monitoring_location_id=\"USGS-02238500\")\n",
+ "channel[[\"time\", \"channel_width\", \"channel_area\", \"channel_velocity\"]].head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0495e7ac",
+ "metadata": {},
+ "source": [
+ "### Latest continuous & latest daily\n",
+ "\n",
+ "`get_latest_continuous` and `get_latest_daily` have no NWIS equivalent — they\n",
+ "return the single most recent observation for each time series."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d82d74ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "latest_uv, _ = waterdata.get_latest_continuous(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " parameter_code=\"00060\",\n",
+ ")\n",
+ "cols = [\"time\", \"value\", \"approval_status\", \"parameter_code\", \"unit_of_measure\"]\n",
+ "latest_uv[cols].T"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a624271d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "latest_dv, _ = waterdata.get_latest_daily(\n",
+ " monitoring_location_id=\"USGS-01491000\",\n",
+ " parameter_code=\"00060\",\n",
+ ")\n",
+ "latest_dv[cols].T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "65390398",
+ "metadata": {},
+ "source": [
+ "### General retrieval and CQL2\n",
+ "\n",
+ "The OGC `get_*` functions accept a CQL2 expression through the `filter` /\n",
+ "`filter_lang` arguments, so even complex queries run against these same\n",
+ "functions — there is no separate \"general retrieval\" call.\n",
+ "\n",
+ "CQL2 supports a wildcard via `LIKE` (`%` matches any trailing characters). This\n",
+ "is handy for hydrologic unit codes, which may be stored as `02070010` or as a\n",
+ "longer code beginning with those digits. To get every site whose HUC starts with\n",
+ "`02070010`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "455de9d3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "huc_sites, _ = waterdata.get_monitoring_locations(\n",
+ " filter=\"hydrologic_unit_code LIKE '02070010%'\",\n",
+ " filter_lang=\"cql-text\",\n",
+ ")\n",
+ "print(f\"{len(huc_sites)} sites in HUC 02070010\")\n",
+ "ax = huc_sites.plot(markersize=2, figsize=(7, 5))\n",
+ "ax.set_title(\"Sites within HUC 02070010\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3fb5d920",
+ "metadata": {},
+ "source": [
+ "> **Numeric filters.** Every queryable on the Water Data API is typed as a\n",
+ "> *string*, so an unquoted numeric comparison like `drainage_area > 1000` is\n",
+ "> rejected by the server (and quoting it gives a misleading lexicographic\n",
+ "> comparison). `dataretrieval` catches this and raises a `ValueError`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "82f8f1b5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " waterdata.get_monitoring_locations(\n",
+ " filter=\"drainage_area > 1000\",\n",
+ " filter_lang=\"cql-text\",\n",
+ " )\n",
+ "except ValueError as e:\n",
+ " print(type(e).__name__, \"->\", str(e)[:120], \"...\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd8ae008",
+ "metadata": {},
+ "source": [
+ "The recommended pattern is to filter on the string-valued attributes the server\n",
+ "understands (state, site type, HUC, …) and then do the **numeric** reduction in\n",
+ "pandas. For example, large-drainage stream sites in Wisconsin and Minnesota:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a13e984e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sites, _ = waterdata.get_monitoring_locations(\n",
+ " state_name=[\"Wisconsin\", \"Minnesota\"],\n",
+ " site_type=\"Stream\",\n",
+ " properties=[\n",
+ " \"monitoring_location_id\",\n",
+ " \"monitoring_location_name\",\n",
+ " \"state_name\",\n",
+ " \"drainage_area\",\n",
+ " ],\n",
+ ")\n",
+ "big = sites[pd.to_numeric(sites[\"drainage_area\"], errors=\"coerce\") > 1000]\n",
+ "print(f\"{len(big)} of {len(sites)} WI/MN stream sites drain > 1000 sq mi\")\n",
+ "big.drop(columns=\"geometry\").head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "10f0a74d",
+ "metadata": {},
+ "source": [
+ "### Reference tables\n",
+ "\n",
+ "`get_reference_table` exposes a variety of metadata tables. Any returned column\n",
+ "can be filtered on. See the\n",
+ "*USGS Reference Lists* notebook for the full list of collections.\n",
+ "\n",
+ "### Discrete samples\n",
+ "\n",
+ "Discrete USGS water-quality data are served from a separate (non-OGC) endpoint\n",
+ "via `get_samples`. See the *Discrete water-quality samples* notebook.\n",
+ "\n",
+ "### Daily data statistics\n",
+ "\n",
+ "Pre-computed temporal summary statistics are available through `get_stats_por`\n",
+ "(day-of-year / month-of-year) and `get_stats_date_range` (calendar month, calendar\n",
+ "year, water year). See the *Daily statistics* notebook."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "21d144d3",
+ "metadata": {},
+ "source": [
+ "## More notes\n",
+ "\n",
+ "### `limit` and paging\n",
+ "\n",
+ "The `limit` argument sets how many rows come back **per page**, not the overall\n",
+ "total — by default `dataretrieval` pages through everything. You rarely need to\n",
+ "touch it; lowering it can help on a spotty connection.\n",
+ "\n",
+ "### The `id` column\n",
+ "\n",
+ "Each endpoint natively returns an `id` column, and that value is used as an input\n",
+ "to *other* endpoints under a different name (the monitoring-locations `id` is the\n",
+ "`monitoring_location_id` everywhere else). `dataretrieval` renames `id`\n",
+ "accordingly, but you can request the raw `id` column via `properties`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa2f8528",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "site = \"USGS-02238500\"\n",
+ "renamed, _ = waterdata.get_monitoring_locations(\n",
+ " monitoring_location_id=site,\n",
+ " properties=[\"monitoring_location_id\", \"state_name\", \"country_name\"],\n",
+ ")\n",
+ "raw_id, _ = waterdata.get_monitoring_locations(\n",
+ " monitoring_location_id=site,\n",
+ " properties=[\"id\", \"state_name\", \"country_name\"],\n",
+ ")\n",
+ "print(\"renamed:\", [c for c in renamed.columns if c != \"geometry\"])\n",
+ "print(\"raw id :\", [c for c in raw_id.columns if c != \"geometry\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7dcc03a9",
+ "metadata": {},
+ "source": [
+ "## More help\n",
+ "\n",
+ "- Documentation: \n",
+ "- R package docs (source of these examples): \n",
+ "- Issues / questions: "
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_ReferenceLists_Examples.ipynb b/demos/USGS_WaterData_ReferenceLists_Examples.ipynb
new file mode 100644
index 00000000..9799ba16
--- /dev/null
+++ b/demos/USGS_WaterData_ReferenceLists_Examples.ipynb
@@ -0,0 +1,138 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "aef324b2",
+ "metadata": {},
+ "source": [
+ "# USGS Reference Lists\n",
+ "\n",
+ "`get_reference_table` returns the metadata \"reference\" tables for the USGS Water\n",
+ "Data API. These tables enumerate the allowable values for the filter arguments\n",
+ "used elsewhere in the `waterdata` module — for example, the `site-types` table\n",
+ "lists every valid `site_type_code`, and `parameter-codes` lists every valid\n",
+ "`parameter_code`.\n",
+ "\n",
+ "`get_reference_table` returns a `(data, metadata)` tuple."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6f365047",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from typing import get_args\n",
+ "\n",
+ "from IPython.display import Markdown, display\n",
+ "\n",
+ "from dataretrieval import waterdata\n",
+ "from dataretrieval.waterdata.types import METADATA_COLLECTIONS\n",
+ "\n",
+ "collections = list(get_args(METADATA_COLLECTIONS))\n",
+ "collections"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af9731ff",
+ "metadata": {},
+ "source": [
+ "## A single reference table\n",
+ "\n",
+ "Fetch one table by name. The first column is the table's primary code (the\n",
+ "collection name, singularized, with hyphens turned into underscores — e.g.\n",
+ "`site-types` -> `site_type`):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9840b289",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "site_types, _ = waterdata.get_reference_table(collection=\"site-types\")\n",
+ "print(f\"{len(site_types)} rows\")\n",
+ "site_types.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec3a00c8",
+ "metadata": {},
+ "source": [
+ "You can also pass a `query` to retrieve a subset — for instance specific\n",
+ "parameter codes by `id`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "09b5de2d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "some_pcodes, _ = waterdata.get_reference_table(\n",
+ " collection=\"parameter-codes\",\n",
+ " query={\"id\": \"00060,00065,00010\"},\n",
+ ")\n",
+ "some_pcodes[[\"parameter_code\", \"parameter_name\", \"unit_of_measure\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b02202cf",
+ "metadata": {},
+ "source": [
+ "## All reference tables\n",
+ "\n",
+ "The full set of collections is enumerated in `METADATA_COLLECTIONS`. Below we\n",
+ "preview the first few rows of each. (Most are small lookup tables; a couple —\n",
+ "notably `parameter-codes` and `hydrologic-unit-codes` — are large, so we only\n",
+ "display the head.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "514392c0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for collection in collections:\n",
+ " df, _ = waterdata.get_reference_table(collection=collection)\n",
+ " preview = df.drop(columns=\"geometry\") if \"geometry\" in df.columns else df\n",
+ " display(Markdown(f\"### `{collection}` \\n{len(df):,} rows\"))\n",
+ " display(preview.head(3))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d820806",
+ "metadata": {},
+ "source": [
+ "## More help\n",
+ "\n",
+ "- Documentation: \n",
+ "- See the *Introduction to the USGS Water Data APIs* notebook for how these reference\n",
+ " values feed the `get_*` filter arguments.\n",
+ "- Equivalent R article: [USGS Reference Lists](https://doi-usgs.github.io/dataRetrieval/articles/Reference_Lists.html)\n",
+ "- Issues / questions: "
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink b/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink
new file mode 100644
index 00000000..b169abdf
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink
@@ -0,0 +1,3 @@
+{
+ "path": "../../../demos/USGS_WaterData_ContinuousData_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink b/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink
new file mode 100644
index 00000000..c12f7840
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink
@@ -0,0 +1,3 @@
+{
+ "path": "../../../demos/USGS_WaterData_DailyStatistics_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink b/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink
new file mode 100644
index 00000000..4729fe36
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink
@@ -0,0 +1,3 @@
+{
+ "path": "../../../demos/USGS_WaterData_DiscreteSamples_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink b/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink
new file mode 100644
index 00000000..9a442fe4
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink
@@ -0,0 +1,3 @@
+{
+ "path": "../../../demos/USGS_WaterData_Introduction_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink b/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink
new file mode 100644
index 00000000..0600ecac
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink
@@ -0,0 +1,3 @@
+{
+ "path": "../../../demos/USGS_WaterData_ReferenceLists_Examples.ipynb"
+}
diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst
index 6011fc4b..de6f1b25 100644
--- a/docs/source/examples/index.rst
+++ b/docs/source/examples/index.rst
@@ -15,6 +15,23 @@ covers a basic introduction to module functions and usage.
WaterData_demo
+USGS Water Data API vignettes
+-----------------------------
+These notebooks are Python ports of the new USGS Water Data API vignettes from
+the R `dataRetrieval`_ package. Each introduces a family of ``waterdata``
+functions and is executed against the live USGS Water Data API.
+
+.. _dataRetrieval: https://doi-usgs.github.io/dataRetrieval/
+
+.. toctree::
+ :maxdepth: 1
+
+ USGS_WaterData_Introduction_Examples
+ USGS_WaterData_DiscreteSamples_Examples
+ USGS_WaterData_DailyStatistics_Examples
+ USGS_WaterData_ContinuousData_Examples
+ USGS_WaterData_ReferenceLists_Examples
+
Simple uses of the ``dataretrieval`` package
--------------------------------------------