Skip to content

Conversation

@JorisHeadease
Copy link
Contributor

Add distributed tracing with OTLP HTTP exporter for observability.

Components instrumented:

  • HTTP server (Echo middleware)
  • HTTP clients (outgoing requests)
  • GORM database queries
  • HashiCorp Vault client requests
  • External crypto storage client

Features:

  • W3C Trace Context propagation (traceparent headers)
  • Logs enriched with trace_id/span_id for correlation
  • Logs forwarded to OTLP endpoint when tracing enabled
  • Audit logs included in trace context

Configuration:

  • tracing.endpoint: OTLP collector endpoint (host:port)
  • tracing.insecure: use HTTP instead of HTTPS

Known limitations:

  • gRPC connections not instrumented (v5 legacy functionality)
  • Azure Key Vault uses Azure SDK which requires separate instrumentation via azotel package

Add distributed tracing with OTLP HTTP exporter for observability.

Components instrumented:
- HTTP server (Echo middleware)
- HTTP clients (outgoing requests)
- GORM database queries
- HashiCorp Vault client requests
- External crypto storage client

Features:
- W3C Trace Context propagation (traceparent headers)
- Logs enriched with trace_id/span_id for correlation
- Logs forwarded to OTLP endpoint when tracing enabled
- Audit logs included in trace context

Configuration:
- tracing.endpoint: OTLP collector endpoint (host:port)
- tracing.insecure: use HTTP instead of HTTPS

Known limitations:
- gRPC connections not instrumented (v5 legacy functionality)
- Azure Key Vault uses Azure SDK which requires separate
  instrumentation via azotel package
@qltysh
Copy link

qltysh bot commented Dec 3, 2025

Qlty

Coverage Impact

⬇️ Merging this pull request will decrease total coverage on master by 0.05%.

Modified Files with Diff Coverage (16)

RatingFile% DiffUncovered Line #s
Coverage rating: B Coverage rating: B
http/engine.go40.0%114-122
Coverage rating: B Coverage rating: B
core/engine.go100.0%
Coverage rating: A Coverage rating: A
cmd/root.go100.0%
Coverage rating: B Coverage rating: B
pki/validator.go63.6%97-100
Coverage rating: A Coverage rating: B
audit/audit.go75.0%71-72
Coverage rating: C Coverage rating: C
vdr/api/v2/api.go100.0%
Coverage rating: B Coverage rating: B
http/client/client.go81.3%51-53
Coverage rating: C Coverage rating: C
storage/engine.go25.0%332-334
Coverage rating: A Coverage rating: B
core/http_client.go75.0%109-110
Coverage rating: B Coverage rating: B
auth/api/iam/api.go50.0%339
Coverage rating: A Coverage rating: B
crypto/storage/external/client.go66.7%89-92
Coverage rating: B Coverage rating: B
vdr/didsubject/manager.go100.0%
Coverage rating: B Coverage rating: B
crypto/storage/vault/vault.go37.5%120-124
New file Coverage rating: F
tracing/cmd/cmd.go0.0%28-37
New file Coverage rating: B
tracing/engine.go87.1%100-101, 122-125...
New file Coverage rating: F
tracing/config.go0.0%23-25
Total77.7%
🤖 Increase coverage with AI coding...

In the `feat/otel-tracing` branch, add test coverage for this new code:

- `audit/audit.go` -- Line 71-72
- `auth/api/iam/api.go` -- Line 339
- `core/http_client.go` -- Line 109-110
- `crypto/storage/external/client.go` -- Line 89-92
- `crypto/storage/vault/vault.go` -- Line 120-124
- `http/client/client.go` -- Line 51-53
- `http/engine.go` -- Line 114-122
- `pki/validator.go` -- Line 97-100
- `storage/engine.go` -- Line 332-334
- `tracing/cmd/cmd.go` -- Line 28-37
- `tracing/config.go` -- Line 23-25
- `tracing/engine.go` -- Lines 100-101, 122-125, 177-179, 243-244, 260-261, 267-272, 276-277, 298-299, 310-311, 334-335, and 376-377

🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive OpenTelemetry tracing support to the Nuts node, enabling distributed tracing across HTTP requests, database operations, and external service calls. The implementation uses OTLP HTTP exporters for both traces and logs, with W3C Trace Context propagation.

Key changes:

  • Adds core tracing infrastructure with configurable OTLP endpoint and automatic instrumentation
  • Instruments all HTTP clients and servers, database queries (GORM), and external storage backends
  • Enriches logs with trace context (trace_id/span_id) and forwards logs to OTLP endpoint when enabled

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
core/tracing.go Core tracing setup with OpenTelemetry SDK, trace/log exporters, and logrus hooks for trace correlation
core/tracing_test.go Unit tests for tracing initialization and logrus hook functionality
core/engine.go Integrates tracing setup early in system configuration and ensures proper shutdown
core/server_config.go Adds tracing configuration structure with endpoint and insecure TLS options
core/http_client.go Wraps internal HTTP client transport with OpenTelemetry instrumentation
http/engine.go Adds Echo middleware for tracing inbound HTTP requests (skips health/metrics/status)
http/client/client.go Wraps all HTTP client transports with OpenTelemetry instrumentation
http/client/client_test.go Tests for tracing-enabled and disabled HTTP client behavior
storage/engine.go Adds GORM tracing plugin when tracing is enabled
crypto/storage/vault/vault.go Instruments Vault HTTP client with OpenTelemetry transport
crypto/storage/external/client.go Instruments external crypto storage HTTP client
pki/validator.go Instruments PKI validator HTTP client for certificate revocation checks
audit/audit.go Registers tracing hooks with audit logger to include audit logs in traces
cmd/root.go Updates comment to reflect tracing initialization in Configure()
docs/pages/deployment/monitoring.rst Comprehensive documentation for tracing configuration and capabilities
docs/pages/deployment/server_options.rst Documents new tracing configuration options
go.mod Adds OpenTelemetry dependencies and updates test/protobuf versions
go.sum Checksum updates for new and updated dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pass request context to log calls so trace_id and span_id are
included in log output when tracing is enabled.
@JorisHeadease JorisHeadease marked this pull request as ready for review December 3, 2025 15:59
When embedded in another app, nuts-node's spans were attributed to the
host's service name because both shared the global TracerProvider.

Add GetTracerProvider() that returns nuts-node's own provider, and use
it in otelecho middleware. Only set global provider when not embedded.
@qltysh
Copy link

qltysh bot commented Dec 4, 2025

❌ 7 blocking issues (9 total)

Tool Category Rule Count
shellcheck Lint Declare and assign separately to avoid masking return values. 5
shellcheck Lint attempt appears unused. Verify use (or export if used externally). 1
shellcheck Lint Double quote to prevent globbing and word splitting. 1
ripgrep Lint // TODO: Global static vars have caused testing issues before, requiring moves to instantiated types. 1
qlty Structure Function with many returns (count = 6): setupStandaloneTracing 1

JorisHeadease added a commit to nuts-foundation/nuts-knooppunt that referenced this pull request Dec 4, 2025
Pass tracing configuration (OTLP endpoint, insecure flag) from knooppunt
to the embedded nuts-node via environment variables. Add otelhttp transport
to the reverse proxy for proper trace context propagation across the proxy
boundary.

Depends on nuts-foundation/nuts-node#3946
JorisHeadease added a commit to nuts-foundation/nuts-knooppunt that referenced this pull request Dec 8, 2025
* feat(nutsnode): propagate tracing config to embedded nuts-node

Pass tracing configuration (OTLP endpoint, insecure flag) from knooppunt
to the embedded nuts-node via environment variables. Add otelhttp transport
to the reverse proxy for proper trace context propagation across the proxy
boundary.

Depends on nuts-foundation/nuts-node#3946

* refactor(nutsnode): use strconv.FormatBool for NUTS_TRACING_INSECURE

* test(nutsnode): add TracingConfig env var delegation tests
Copy link
Member

@reinkrul reinkrul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you alter an e2e test, e.g. e2e-tests/oauth-flow/rfc021, so that it runs with tracing enabled? Just to make sure that with tracing, most important things still work.

- Test verifies trace propagation across nodes by injecting traceparent
- Configure unique service names (nodeA, nodeB) to distinguish nodes in traces
- Verify trace contains spans from both services, proving distributed tracing works

Extended tracing configuration:
- Add tracing.servicename option (defaults to 'nuts-node')
Refactored OpenTelemetry tracing from core/tracing.go into a proper
tracing/ engine package following the standard engine pattern.

Key improvements:
- Use official otellogrus bridge instead of custom implementation
- Remove disableableHook pattern (OTEL shutdown makes Emit() a no-op)
- Add trace context (trace_id, span_id) to audit logs for correlation
- Fix enabled flag not being reset on setup failure
- Create hooks once and register to both standard and audit loggers
Extend assertJaegerTrace to verify both services and span patterns,
proving that gorm and http-client instrumentation work end-to-end.

- Add span pattern verification (gorm.Query, http-client)
- Add curl timeout to prevent hangs
- Change service check from exact match to contains
- Simplify argument parsing (space-separated)
When nuts-node is embedded in another application (e.g., nuts-knooppunt)
that has already configured OpenTelemetry, nuts-node now correctly uses
the parent's infrastructure instead of creating its own.

Changes:
- Detect embedded mode by checking for existing SDK TracerProvider
- In embedded mode: reuse parent's TracerProvider, propagator, error
  handler, and logging - only set up HTTP transport wrapper and logrus
  hook for trace context
- In standalone mode: full OTEL setup as before
- Fix race condition on nutsTracerProvider using atomic.Pointer
- Add timeout to shutdown (30s) and error recovery (5s)
- Include shutdown errors via errors.Join instead of discarding
- Check enabled flag in logrus hook to skip processing after shutdown
- Extract setupHTTPTransport helper to reduce duplication
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…t calls

This ensures consistent tracing behavior in embedded mode where
GetTracerProvider() returns the parent's provider instead of the global.
Move tracing.* options to their own **Tracing** section placed
alphabetically between Storage and policy. Also fix tracing.servicename
to have default value in defaults column rather than description.
The getTransport function already documents the tracing behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants