cert-checker: fix logging & push metrics#8763
Conversation
|
@lenaunderwood22, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
There was a problem hiding this comment.
Pull request overview
This PR updates cert-checker to emit structured audit logs (instead of printing a large multi-line JSON blob to stdout) and to publish run metrics (good/bad cert counts, latency, last-run timestamp) to a Prometheus Pushgateway discovered via DNS SRV (intended to replace an external wrapper script per issue #8753).
Changes:
- Replace stdout JSON report output with per-certificate audit error logs plus a final audit summary.
- Add Prometheus metrics for cert-checker results and push them to a Pushgateway (incl. SRV discovery + shared
cmd.PushMetricshelper). - Update tests/config fixtures and vendor the Prometheus
pushsubpackage.
Reviewed changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/cert-checker/main.go |
Switch to audit logging, add metrics, add SRV-based Pushgateway discovery and push behavior. |
cmd/cert-checker/main_test.go |
Update tests to assert audit log emission rather than JSON report contents. |
cmd/shell.go |
Add PushMetrics helper using Prometheus push client. |
test/config/cert-checker.json |
Remove unexpiredOnly field from fixture config. |
test/config-next/cert-checker.json |
Remove unexpiredOnly field from fixture config. |
vendor/modules.txt |
Add vendored package path for prometheus/push. |
vendor/github.com/prometheus/client_golang/prometheus/push/push.go |
Vendor Prometheus Pushgateway client implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
aarongable
left a comment
There was a problem hiding this comment.
LGTM with some nits / test improvements.
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
|
|
||
| func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error { |
There was a problem hiding this comment.
This exported function is missing a doc comment.
| func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error { | |
| // PushMetrics pushes the provided Prometheus metrics to the provided | |
| // Pushgateway URL with the provided job name. | |
| func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error { |
| hostname = "unknown" | ||
| } | ||
| return push.New(pushgatewayURL, jobname). | ||
| Gatherer(gatherer). |
There was a problem hiding this comment.
push.New() is initialized with a default http.Client with carries no timeout. We should call Client and pass in our own with a sensible timeout so that this call never hangs indefinitely.
| Gatherer(gatherer). | |
| Client(&http.Client{Timeout: 10 * time.Second}). | |
| Gatherer(gatherer). |
| Name: "cert_checker_bad_count", | ||
| Help: "Cert-checker count of bad certificates", | ||
| }) | ||
| return &certCheckerMetrics{checkerLatency, checkerTimestamp, checkerGoodCount, checkerBadCount} |
There was a problem hiding this comment.
The last three of these positional arguments in the struct literal are of the same type, it would be very easy to accidentally swap them without noticing. Instead use the field names as well in the composition of the struct literal:
| return &certCheckerMetrics{checkerLatency, checkerTimestamp, checkerGoodCount, checkerBadCount} | |
| return &certCheckerMetrics{ | |
| checkerLatency: checkerLatency, | |
| checkerTimestamp: checkerTimestamp, | |
| checkerGoodCount: checkerGoodCount, | |
| checkerBadCount: checkerBadCount, | |
| } |
Fixes #8753