Overview
When the OpenTelemetry collector is enabled (spec.instrumentation, OpenTelemetryMetrics feature gate on) against a PostgreSQL 17 cluster, the sqlquery metrics receiver fails to emit ccp_stat_checkpointer_sync_time and logs an "Error scraping metrics" error on every collection interval (every 5s by default).
Deeper analysis with claude revealed that, in the bundled checkpointer query definitions, the two double precision time columns are tagged inconsistently. write_time is correctly declared value_type: double, but sync_time has no value_type. The sqlquery receiver's value_type defaults to int (receiver README), so sync_time is parsed with strconv.Atoi. Once cumulative sync_time (milliseconds) grows large enough that the driver renders it in scientific notation (e.g. 2.774625e+06), Atoi fails.
Both pg_stat_checkpointer.write_time and pg_stat_checkpointer.sync_time are double precision (milliseconds) per the PostgreSQL 17 docs, so both should be value_type: double.
Source (both PG-version variants affected):
internal/collector/gte_pg17_fast_metrics.yaml#L52-L53:
- metric_name: ccp_stat_checkpointer_write_time
value_column: write_time
value_type: double # tagged correctly
...
- metric_name: ccp_stat_checkpointer_sync_time
value_column: sync_time
# value_type: double missing
internal/collector/lt_pg17_fast_metrics.yaml#L51-L52 has the same omission. The generated artifact reflects the defect as well (value_type present on write_time, absent on sync_time): internal/collector/generated/gte_pg17_fast_metrics.json.
Environment
Please provide the following details:
- Platform:
Azure Kubernetes Service
- Platform Version:
1.32.10
- PGO Image Tag:
ubi9-5.8.5-0 (defect confirmed present in source on v5.8.5, v5.8.8, v6.0.2, and main as of 2026-06-09; likely all 5.8.x / 6.0.x)
- Postgres Version:
17
- Storage:
disk.csi.azure.com, StandardSSD_LRS SKU
- Collector: otelcol-contrib 0.139.0 (bundled in
crunchydata/postgres-operator:ubi9-5.8.5-0)
- Feature gate:
OpenTelemetryMetrics enabled
Steps to Reproduce
REPRO
- Enable the
OpenTelemetryMetrics feature gate on the operator.
- Create a PostgreSQL 17
PostgresCluster with spec.instrumentation set so the collector sidecar scrapes the bundled metrics queries.
- Run the cluster long enough (or generate enough checkpoint activity) for the cumulative
pg_stat_checkpointer.sync_time value to grow large enough that the database driver renders it in scientific notation (e.g. 2.774625e+06).
- Check the collector container logs.
EXPECTED
ccp_stat_checkpointer_sync_time is emitted as a double, like its sibling ccp_stat_checkpointer_write_time, with no scrape error.
ACTUAL
The sync_time metric fails to emit, and the collector logs an "Error scraping metrics" error at every collection interval (every 5s by default).
Logs
Below error is repeating
error scraperhelper@v0.139.0/obs_metrics.go:61 Error scraping metrics
{"otelcol.component.id": "sqlquery/5s", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics",
"error": "row 0: rowToMetric: setDataPointValue: col \"sync_time\": error converting to integer:
strconv.Atoi: parsing \"2.774625e+06\": invalid syntax"}
Additional Information
As mentioned in overview, some analysis is done by Claude. As a workaround, I'm currently removing the query by using below configuration:
spec:
instrumentation:
metrics:
customQueries:
remove:
- ccp_stat_checkpointer_sync_time
Overview
When the OpenTelemetry collector is enabled (
spec.instrumentation,OpenTelemetryMetricsfeature gate on) against a PostgreSQL 17 cluster, thesqlquerymetrics receiver fails to emitccp_stat_checkpointer_sync_timeand logs an "Error scraping metrics" error on every collection interval (every 5s by default).Deeper analysis with claude revealed that, in the bundled checkpointer query definitions, the two
double precisiontime columns are tagged inconsistently.write_timeis correctly declaredvalue_type: double, butsync_timehas novalue_type. Thesqlqueryreceiver'svalue_typedefaults toint(receiver README), sosync_timeis parsed withstrconv.Atoi. Once cumulativesync_time(milliseconds) grows large enough that the driver renders it in scientific notation (e.g.2.774625e+06),Atoifails.Both
pg_stat_checkpointer.write_timeandpg_stat_checkpointer.sync_timearedouble precision(milliseconds) per the PostgreSQL 17 docs, so both should bevalue_type: double.Source (both PG-version variants affected):
internal/collector/gte_pg17_fast_metrics.yaml#L52-L53:internal/collector/lt_pg17_fast_metrics.yaml#L51-L52has the same omission. The generated artifact reflects the defect as well (value_typepresent onwrite_time, absent onsync_time):internal/collector/generated/gte_pg17_fast_metrics.json.Environment
Please provide the following details:
Azure Kubernetes Service1.32.10ubi9-5.8.5-0(defect confirmed present in source onv5.8.5,v5.8.8,v6.0.2, andmainas of 2026-06-09; likely all 5.8.x / 6.0.x)17disk.csi.azure.com, StandardSSD_LRS SKUcrunchydata/postgres-operator:ubi9-5.8.5-0)OpenTelemetryMetricsenabledSteps to Reproduce
REPRO
OpenTelemetryMetricsfeature gate on the operator.PostgresClusterwithspec.instrumentationset so the collector sidecar scrapes the bundled metrics queries.pg_stat_checkpointer.sync_timevalue to grow large enough that the database driver renders it in scientific notation (e.g.2.774625e+06).EXPECTED
ccp_stat_checkpointer_sync_timeis emitted as a double, like its siblingccp_stat_checkpointer_write_time, with no scrape error.ACTUAL
The
sync_timemetric fails to emit, and the collector logs an "Error scraping metrics" error at every collection interval (every 5s by default).Logs
Below error is repeating
Additional Information
As mentioned in overview, some analysis is done by Claude. As a workaround, I'm currently removing the query by using below configuration: