-
Notifications
You must be signed in to change notification settings - Fork 20
feat(tracing): add OpenTelemetry tracing support #3946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add distributed tracing with OTLP HTTP exporter for observability. Components instrumented: - HTTP server (Echo middleware) - HTTP clients (outgoing requests) - GORM database queries - HashiCorp Vault client requests - External crypto storage client Features: - W3C Trace Context propagation (traceparent headers) - Logs enriched with trace_id/span_id for correlation - Logs forwarded to OTLP endpoint when tracing enabled - Audit logs included in trace context Configuration: - tracing.endpoint: OTLP collector endpoint (host:port) - tracing.insecure: use HTTP instead of HTTPS Known limitations: - gRPC connections not instrumented (v5 legacy functionality) - Azure Key Vault uses Azure SDK which requires separate instrumentation via azotel package
|
Coverage Impact ⬇️ Merging this pull request will decrease total coverage on Modified Files with Diff Coverage (16)
🤖 Increase coverage with AI coding...🚦 See full report on Qlty Cloud » 🛟 Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive OpenTelemetry tracing support to the Nuts node, enabling distributed tracing across HTTP requests, database operations, and external service calls. The implementation uses OTLP HTTP exporters for both traces and logs, with W3C Trace Context propagation.
Key changes:
- Adds core tracing infrastructure with configurable OTLP endpoint and automatic instrumentation
- Instruments all HTTP clients and servers, database queries (GORM), and external storage backends
- Enriches logs with trace context (trace_id/span_id) and forwards logs to OTLP endpoint when enabled
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| core/tracing.go | Core tracing setup with OpenTelemetry SDK, trace/log exporters, and logrus hooks for trace correlation |
| core/tracing_test.go | Unit tests for tracing initialization and logrus hook functionality |
| core/engine.go | Integrates tracing setup early in system configuration and ensures proper shutdown |
| core/server_config.go | Adds tracing configuration structure with endpoint and insecure TLS options |
| core/http_client.go | Wraps internal HTTP client transport with OpenTelemetry instrumentation |
| http/engine.go | Adds Echo middleware for tracing inbound HTTP requests (skips health/metrics/status) |
| http/client/client.go | Wraps all HTTP client transports with OpenTelemetry instrumentation |
| http/client/client_test.go | Tests for tracing-enabled and disabled HTTP client behavior |
| storage/engine.go | Adds GORM tracing plugin when tracing is enabled |
| crypto/storage/vault/vault.go | Instruments Vault HTTP client with OpenTelemetry transport |
| crypto/storage/external/client.go | Instruments external crypto storage HTTP client |
| pki/validator.go | Instruments PKI validator HTTP client for certificate revocation checks |
| audit/audit.go | Registers tracing hooks with audit logger to include audit logs in traces |
| cmd/root.go | Updates comment to reflect tracing initialization in Configure() |
| docs/pages/deployment/monitoring.rst | Comprehensive documentation for tracing configuration and capabilities |
| docs/pages/deployment/server_options.rst | Documents new tracing configuration options |
| go.mod | Adds OpenTelemetry dependencies and updates test/protobuf versions |
| go.sum | Checksum updates for new and updated dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass request context to log calls so trace_id and span_id are included in log output when tracing is enabled.
When embedded in another app, nuts-node's spans were attributed to the host's service name because both shared the global TracerProvider. Add GetTracerProvider() that returns nuts-node's own provider, and use it in otelecho middleware. Only set global provider when not embedded.
❌ 7 blocking issues (9 total)
|
Pass tracing configuration (OTLP endpoint, insecure flag) from knooppunt to the embedded nuts-node via environment variables. Add otelhttp transport to the reverse proxy for proper trace context propagation across the proxy boundary. Depends on nuts-foundation/nuts-node#3946
* feat(nutsnode): propagate tracing config to embedded nuts-node Pass tracing configuration (OTLP endpoint, insecure flag) from knooppunt to the embedded nuts-node via environment variables. Add otelhttp transport to the reverse proxy for proper trace context propagation across the proxy boundary. Depends on nuts-foundation/nuts-node#3946 * refactor(nutsnode): use strconv.FormatBool for NUTS_TRACING_INSECURE * test(nutsnode): add TracingConfig env var delegation tests
reinkrul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you alter an e2e test, e.g. e2e-tests/oauth-flow/rfc021, so that it runs with tracing enabled? Just to make sure that with tracing, most important things still work.
- Test verifies trace propagation across nodes by injecting traceparent - Configure unique service names (nodeA, nodeB) to distinguish nodes in traces - Verify trace contains spans from both services, proving distributed tracing works Extended tracing configuration: - Add tracing.servicename option (defaults to 'nuts-node')
Refactored OpenTelemetry tracing from core/tracing.go into a proper tracing/ engine package following the standard engine pattern. Key improvements: - Use official otellogrus bridge instead of custom implementation - Remove disableableHook pattern (OTEL shutdown makes Emit() a no-op) - Add trace context (trace_id, span_id) to audit logs for correlation - Fix enabled flag not being reset on setup failure - Create hooks once and register to both standard and audit loggers
Extend assertJaegerTrace to verify both services and span patterns, proving that gorm and http-client instrumentation work end-to-end. - Add span pattern verification (gorm.Query, http-client) - Add curl timeout to prevent hangs - Change service check from exact match to contains - Simplify argument parsing (space-separated)
When nuts-node is embedded in another application (e.g., nuts-knooppunt) that has already configured OpenTelemetry, nuts-node now correctly uses the parent's infrastructure instead of creating its own. Changes: - Detect embedded mode by checking for existing SDK TracerProvider - In embedded mode: reuse parent's TracerProvider, propagator, error handler, and logging - only set up HTTP transport wrapper and logrus hook for trace context - In standalone mode: full OTEL setup as before - Fix race condition on nutsTracerProvider using atomic.Pointer - Add timeout to shutdown (30s) and error recovery (5s) - Include shutdown errors via errors.Join instead of discarding - Check enabled flag in logrus hook to skip processing after shutdown - Extract setupHTTPTransport helper to reduce duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…t calls This ensures consistent tracing behavior in embedded mode where GetTracerProvider() returns the parent's provider instead of the global.
Move tracing.* options to their own **Tracing** section placed alphabetically between Storage and policy. Also fix tracing.servicename to have default value in defaults column rather than description.
The getTransport function already documents the tracing behavior.

Add distributed tracing with OTLP HTTP exporter for observability.
Components instrumented:
Features:
Configuration:
Known limitations: