Skip to content

cert-checker: fix logging & push metrics#8763

Open
lenaunderwood22 wants to merge 32 commits into
mainfrom
cert-checker-logs
Open

cert-checker: fix logging & push metrics#8763
lenaunderwood22 wants to merge 32 commits into
mainfrom
cert-checker-logs

Conversation

@lenaunderwood22

Copy link
Copy Markdown
Contributor

Fixes #8753

@lenaunderwood22 lenaunderwood22 requested a review from a team as a code owner May 26, 2026 21:25
@lenaunderwood22 lenaunderwood22 requested a review from ezekiel May 26, 2026 21:25
ezekiel
ezekiel previously approved these changes May 26, 2026
@ezekiel ezekiel requested review from a team and aarongable and removed request for a team May 26, 2026 21:56
@github-actions

Copy link
Copy Markdown
Contributor

@lenaunderwood22, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@ezekiel ezekiel requested review from a team and removed request for aarongable May 26, 2026 21:56
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go
Comment thread cmd/cert-checker/main.go
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/shell.go
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
aarongable
aarongable previously approved these changes May 28, 2026
@aarongable aarongable requested a review from ezekiel May 28, 2026 23:33
ezekiel
ezekiel previously approved these changes May 28, 2026
@lenaunderwood22 lenaunderwood22 dismissed stale reviews from ezekiel and aarongable via 15ce586 May 29, 2026 14:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates cert-checker to emit structured audit logs (instead of printing a large multi-line JSON blob to stdout) and to publish run metrics (good/bad cert counts, latency, last-run timestamp) to a Prometheus Pushgateway discovered via DNS SRV (intended to replace an external wrapper script per issue #8753).

Changes:

  • Replace stdout JSON report output with per-certificate audit error logs plus a final audit summary.
  • Add Prometheus metrics for cert-checker results and push them to a Pushgateway (incl. SRV discovery + shared cmd.PushMetrics helper).
  • Update tests/config fixtures and vendor the Prometheus push subpackage.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cmd/cert-checker/main.go Switch to audit logging, add metrics, add SRV-based Pushgateway discovery and push behavior.
cmd/cert-checker/main_test.go Update tests to assert audit log emission rather than JSON report contents.
cmd/shell.go Add PushMetrics helper using Prometheus push client.
test/config/cert-checker.json Remove unexpiredOnly field from fixture config.
test/config-next/cert-checker.json Remove unexpiredOnly field from fixture config.
vendor/modules.txt Add vendored package path for prometheus/push.
vendor/github.com/prometheus/client_golang/prometheus/push/push.go Vendor Prometheus Pushgateway client implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/shell.go
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
aarongable
aarongable previously approved these changes Jun 5, 2026

@aarongable aarongable left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits / test improvements.

Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main_test.go Outdated
Comment thread cmd/cert-checker/main_test.go Outdated
Comment thread cmd/cert-checker/main_test.go Outdated
Comment thread cmd/cert-checker/main_test.go Outdated
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
lenaunderwood22 and others added 9 commits June 9, 2026 08:46
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>

@beautifulentropy beautifulentropy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of the hard work on this, I have just a couple more comments and then I'm good to approve!

Comment thread cmd/shell.go
Comment on lines +579 to +580

func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exported function is missing a doc comment.

Suggested change
func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error {
// PushMetrics pushes the provided Prometheus metrics to the provided
// Pushgateway URL with the provided job name.
func PushMetrics(jobname, pushgatewayURL string, gatherer prometheus.Gatherer, logger blog.Logger) error {

Comment thread cmd/shell.go
hostname = "unknown"
}
return push.New(pushgatewayURL, jobname).
Gatherer(gatherer).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push.New() is initialized with a default http.Client with carries no timeout. We should call Client and pass in our own with a sensible timeout so that this call never hangs indefinitely.

Suggested change
Gatherer(gatherer).
Client(&http.Client{Timeout: 10 * time.Second}).
Gatherer(gatherer).

Comment thread cmd/cert-checker/main.go
Name: "cert_checker_bad_count",
Help: "Cert-checker count of bad certificates",
})
return &certCheckerMetrics{checkerLatency, checkerTimestamp, checkerGoodCount, checkerBadCount}

@beautifulentropy beautifulentropy Jun 12, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last three of these positional arguments in the struct literal are of the same type, it would be very easy to accidentally swap them without noticing. Instead use the field names as well in the composition of the struct literal:

Suggested change
return &certCheckerMetrics{checkerLatency, checkerTimestamp, checkerGoodCount, checkerBadCount}
return &certCheckerMetrics{
checkerLatency: checkerLatency,
checkerTimestamp: checkerTimestamp,
checkerGoodCount: checkerGoodCount,
checkerBadCount: checkerBadCount,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve cert-checker's log and metric output

5 participants