Part of the Observability — OpenTelemetry Tracing v1 initiative (master tracking: #108). Effort: S (2–3 engineer-days). Risk: medium (real SDK construction, exporter wiring, multiple branches in sampler/protocol). Depends on: Phase 0 (#101).
Goal
Construct a real SDK TracerProvider (OTLP exporter, batch processor, resource, sampler, egress-enforced transport, clean shutdown). Nothing wired into runtime yet — this PR builds the construction surface; Phase 2 invokes it.
Files
| File |
Change |
forge-core/observability/otel.go |
New. TracingConfig struct, NewTracerProvider(ctx, cfg, transport) (*sdktrace.TracerProvider, error), sampler/resource construction |
forge-core/observability/otel_test.go |
New. Table-driven: protocol selection, sampler parsing, resource attrs, disabled→nil path |
TracingConfig (pure data; resolved upstream in cli)
type TracingConfig struct {
Enabled bool
Endpoint string // resolved, non-empty when active
Protocol string // "http/protobuf" (default) | "grpc"
Sampler string // parentbased_always_on (default) | always_on | always_off | traceidratio | parentbased_traceidratio
SamplerRatio float64 // default 1.0
Headers map[string]string // OTLP headers (auth); prefer env for secrets
Timeout time.Duration // default 10s
ServiceName string // resolved: OTEL_SERVICE_NAME || agent_id
ServiceVersion string // agent version
RuntimeVersion string // forge build version
ResourceAttrs map[string]string // merged OTEL_RESOURCE_ATTRIBUTES
Redact bool // default true (consumed by instrumentation, not here)
CaptureContent bool // default false; enterprise opt-in
}
NewTracerProvider implementation
- Exporter by protocol:
otlptracehttp (default, port 4318 convention) or otlptracegrpc (4317). Pass endpoint, headers, timeout.
- Egress-enforced transport injected into the HTTP exporter via
otlptracehttp.WithHTTPClient(&http.Client{Transport: transport}) where transport is supplied by the caller (security.EgressTransportFromContext equivalent, resolved in cli). For gRPC, document that endpoint must be allowlisted (gRPC dial can't take the http transport — note this limitation; HTTP is the recommended/default protocol).
- Resource:
service.name, service.version, plus forge.runtime.version, merged ResourceAttrs. Use resource.New with semconv.
- Sampler: map the
Sampler / SamplerRatio strings to sdktrace.ParentBased(...) / AlwaysSample() / NeverSample() / TraceIDRatioBased(ratio).
- Span processor:
sdktrace.NewBatchSpanProcessor(exporter) (production batching).
- Return the
*sdktrace.TracerProvider. Caller owns Shutdown(ctx).
Verify
go build ./...
go test ./forge-core/observability/ -v
# Confirm sampler string parsing and protocol branch are covered;
# bad protocol returns a wrapped error.
Anti-patterns to avoid
SimpleSpanProcessor in production path (must be BatchSpanProcessor).
- Constructing its own un-enforced
http.DefaultClient — transport must be supplied by the caller.
- Reading env or
forge.yaml here — config is resolved in cli (Phase 2) and passed in.
Invariants
- forge-core stays a pure library: it may import otel, but the call sites depend on the runtime seam from Phase 0, not on cli/build packages.
- Pure Go (no cgo / no OS deps) — vendors cleanly. No sidecar collector required (OTLP can target one, but it's optional).
- The OTLP exporter's traffic is outbound and must pass the egress enforcer — the endpoint host is registered in the build-time allowlist (Phase 6) and the exporter uses the egress-enforced transport.
Goal
Construct a real SDK
TracerProvider(OTLP exporter, batch processor, resource, sampler, egress-enforced transport, clean shutdown). Nothing wired into runtime yet — this PR builds the construction surface; Phase 2 invokes it.Files
forge-core/observability/otel.goTracingConfigstruct,NewTracerProvider(ctx, cfg, transport) (*sdktrace.TracerProvider, error), sampler/resource constructionforge-core/observability/otel_test.goTracingConfig(pure data; resolved upstream in cli)NewTracerProviderimplementationotlptracehttp(default, port 4318 convention) orotlptracegrpc(4317). Pass endpoint, headers, timeout.otlptracehttp.WithHTTPClient(&http.Client{Transport: transport})wheretransportis supplied by the caller (security.EgressTransportFromContextequivalent, resolved in cli). For gRPC, document that endpoint must be allowlisted (gRPC dial can't take the http transport — note this limitation; HTTP is the recommended/default protocol).service.name,service.version, plusforge.runtime.version, mergedResourceAttrs. Useresource.Newwithsemconv.Sampler/SamplerRatiostrings tosdktrace.ParentBased(...)/AlwaysSample()/NeverSample()/TraceIDRatioBased(ratio).sdktrace.NewBatchSpanProcessor(exporter)(production batching).*sdktrace.TracerProvider. Caller ownsShutdown(ctx).Verify
Anti-patterns to avoid
SimpleSpanProcessorin production path (must beBatchSpanProcessor).http.DefaultClient— transport must be supplied by the caller.forge.yamlhere — config is resolved in cli (Phase 2) and passed in.Invariants