Skip to content

refactor(telemetry)!: move backend tracing onto plugin/hook pattern#1181

Open
ajbozarth wants to merge 5 commits into
generative-computing:mainfrom
ajbozarth:feat/enhanced-tracing
Open

refactor(telemetry)!: move backend tracing onto plugin/hook pattern#1181
ajbozarth wants to merge 5 commits into
generative-computing:mainfrom
ajbozarth:feat/enhanced-tracing

Conversation

@ajbozarth
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth commented Jun 1, 2026

Pull Request

Issue

Fixes #1045, fixes #1046, fixes #1047. Phase 1 of #444. Also closes out item 2 of #909.

Description

Phase 1 of the tracing epic. Migrates backend span emission from inline calls scattered across five backends onto a BackendTracingPlugin that subscribes to the existing generation_* hooks (chat path) and new generation_batch_* hooks (raw path). Spans stay live on the OTel context across the API call so nested instrumentation (httpx, langchain) parents under the backend span.

Reviewer call-outs:

  1. Breaking: env var rename. MELLEA_TRACE_*MELLEA_TRACES_* (plural), aligned with OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and MELLEA_METRICS_*. Adds MELLEA_TRACES_ENABLED umbrella flag and opt-in MELLEA_TRACES_OTLP. No deprecation shim — tracing was still pre-1.0 surface.
  2. Breaking: gen_ai.system removed in favor of gen_ai.provider.name. Cleans up the dual emission introduced by feat(telemetry): close five OTel GenAI semantic convention emission gaps (#1035) #1036.
  3. Additive: new hook surface. generation_batch_pre_call / generation_batch_post_call / generation_batch_error for plugin authors. Raw path now emits one span per generate_from_raw call (per OTel GenAI semconv), not one per MOT.
  4. Architectural: spans now live on the OTel context during API calls. Chat and raw paths both keep the backend span active across the network call, so nested instrumentation parents under it instead of floating at the root.
  5. Rebase artifact: ModelOutputThunk.cancel_generation (added upstream by feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 in 9e8a9636) had its body rewritten to fire GENERATION_ERROR instead of calling the removed _telemetry_span API. Reviewers may notice as "extra surface", but it's rebase-driven, not refactor-driven.

BREAKING CHANGE: Public telemetry API surface narrowed:

  • MELLEA_TRACE_* env vars renamed to plural MELLEA_TRACES_* with no deprecation shim.
  • The deprecated gen_ai.system span attribute is removed; consumers should read gen_ai.provider.name instead.
  • The split helpers is_application_tracing_enabled() and is_backend_tracing_enabled() are replaced by a single is_tracing_enabled().
  • The application-tracing helpers add_span_event, start_backend_span, end_backend_span, and trace_backend are removed from mellea.telemetry. Backend spans are now emitted by the BackendTracingPlugin automatically; application spans use trace_application (unchanged).
  • The mellea.telemetry.backend_instrumentation module is deleted along with its exports (start_generate_span, instrument_generate_from_raw, record_token_usage, record_response_metadata, finalize_backend_span).

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Phase 1 of the tracing epic: migrates backend span emission from inline
calls scattered across five backends onto a BackendTracingPlugin that
subscribes to the existing generation_* hooks for chat and new
generation_batch_* hooks for raw. Spans stay live on the OTel context
across the API call so nested instrumentation parents under them.

- Renames MELLEA_TRACE_* env vars to plural MELLEA_TRACES_* and
  introduces MELLEA_TRACES_ENABLED umbrella flag, opt-in
  MELLEA_TRACES_OTLP, and signal-specific
  OTEL_EXPORTER_OTLP_TRACES_ENDPOINT. No deprecation shim.
- Drops deprecated gen_ai.system attribute in favor of
  gen_ai.provider.name.
- Prefixes application-span attributes with mellea. for consistency
  with backend spans and the existing logging/metrics pillars.
- Eagerly initialises the tracer provider and registers the
  BackendTracingPlugin at module import when MELLEA_TRACES_ENABLED is
  truthy. Tests reset module state and call _setup_tracing() to re-init
  after env-var changes, removing the need for importlib.reload.
- Adds generation_batch_pre_call/post_call/error hook types so raw-path
  emits one span per generate_from_raw call (matching OTel GenAI
  semconv) rather than one per MOT.
- Deletes mellea/telemetry/backend_instrumentation.py; backends no
  longer import from mellea.telemetry (except .context).
- Removes the _telemetry_span round-trip on mot._meta and the
  tracing-specific error block in core/base.py; error-path span
  closure lives in the plugin's generation_error hook.

Closes generative-computing#1045, generative-computing#1046, generative-computing#1047 (Phase 1 of generative-computing#444).

BREAKING CHANGE: Public telemetry API surface narrowed:

- MELLEA_TRACE_* env vars renamed to plural MELLEA_TRACES_* with no
  deprecation shim.
- The deprecated gen_ai.system span attribute is removed; consumers
  should read gen_ai.provider.name instead.
- The split helpers is_application_tracing_enabled() and
  is_backend_tracing_enabled() are replaced by a single
  is_tracing_enabled().
- The application-tracing helpers add_span_event, start_backend_span,
  end_backend_span, and trace_backend are removed from
  mellea.telemetry. Backend spans are now emitted by the
  BackendTracingPlugin automatically; application spans use
  trace_application (unchanged).
- The mellea.telemetry.backend_instrumentation module is deleted along
  with its exports (start_generate_span, instrument_generate_from_raw,
  record_token_usage, record_response_metadata, finalize_backend_span).

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth
Copy link
Copy Markdown
Contributor Author

Follow-up work spawned by this PR (not part of the tracing epic):

@ajbozarth
Copy link
Copy Markdown
Contributor Author

I also opened #1180 which fixes a bug in metrics that also applies here. (ie that needs to merge first)

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I just have a few questions and a few places I think extra comments would be helpful.

Comment thread mellea/backends/huggingface.py
Comment thread mellea/core/base.py
Comment thread mellea/plugins/hooks/generation.py Outdated
Comment thread mellea/telemetry/tracing_plugins.py
Comment thread mellea/telemetry/tracing_plugins.py
- HF backend: detect stop-sequence termination as finish_reason="stop"
  (previously dropped silently when stopping on a configured stop_string).
- Add unit tests for HF post_processing finish_reasons derivation.
- Clarify generation_id docstrings/comments to distinguish the Mellea-side
  hook correlation ID from GenerationMetadata.response_id.
- Document why BackendTracingPlugin.on_pre_call runs SEQUENTIAL while the
  other tracing hooks run FIRE_AND_FORGET.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth
Copy link
Copy Markdown
Contributor Author

@jakelorocco I believe 00831a4 addresses all your actionable comments

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a well-structured refactor — moving backend tracing onto the plugin/hook pattern makes it composable and avoids the tight coupling the old backend_instrumentation module had. The OTel GenAI semconv attributes are correct on the happy path. A few issues need addressing, with one blocker around the pre-computed MOT path.

Two tooling findings can't be filed as inline comments since those files aren't in this diff:

tooling/docs-autogen/audit_coverage.py (lines 30–32)backend_instrumentation was deleted in this PR but is still listed in _CONFIRMED_INTERNAL_MODULES. The comment says this must stay in sync with generate-ast.py, so both need updating. Remove "backend_instrumentation" from the frozenset (leaving just {"json_util"}).

tooling/docs-autogen/generate-ast.py (lines 437–444) — Same deleted module, same stale entry. Remove the "mellea/telemetry/backend_instrumentation" entry and its associated comment block.

Comment thread mellea/core/backend.py
)
raise
# Save the ID for the post_call / error hooks.
mot._generation_id = generation_id
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKER — When _generate_from_context returns an already-computed MOT (DummyBackend, any caching layer, any backend that pre-fills the thunk synchronously), avalue() short-circuits at base.py:583 and never calls astream(). Since generation_post_call only fires inside astream, the pre_call span is never closed, the OTel context token is never detached, and the entry in _in_flight_spans grows without bound.

The secondary effect is worse: the leaked context token means any asyncio task spawned after this point inherits a context whose current OTel span is the stale, already-exported span. Nested instrumented calls — HTTP clients, other OTel-aware libraries — will appear as children of a span that has already ended, corrupting the trace tree.

The fix goes here, right after mot._generation_id = generation_id. Check mot.is_computed() and if true, fire generation_post_call directly before returning:

if mot.is_computed() and has_plugins(HookType.GENERATION_POST_CALL):
    from ..plugins.hooks.generation import GenerationPostCallPayload
    await invoke_hook(
        HookType.GENERATION_POST_CALL,
        GenerationPostCallPayload(
            model_output=mot,
            generation_id=generation_id,
            latency_ms=0.0,  # pre-computed: no inference latency to report
        ),
        backend=self,
    )

This mirrors the path astream already takes for lazy MOTs and closes the span immediately. It also makes the code honour the contract GenerationPostCallPayload's docstring already promises for pre-computed thunks ("fires before generate_from_context returns") — today that sentence describes intended, not actual, behaviour.

return span


def _detach_token(token: Token[Context] | None) -> None:
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except ValueError here is dead code. opentelemetry.context.detach() in opentelemetry-api 1.41.1 wraps its body in except Exception, logs Failed to detach context via logger.exception (ERROR level, full traceback, under the opentelemetry.context logger), and returns without re-raising. The ValueError never escapes for this catch to see.

In practice: this emits an ERROR-level log with a Python traceback on every LLM call. Both generation_post_call and generation_error are FIRE_AND_FORGET, which spawns a new asyncio task. The OTel context token is bound to the pre_call task. Detaching it from a different task always hits this error path — not occasionally, unconditionally. Any production deployment with OTel diagnostics and error alerting will see a continuous flood, one traceback per generation. (Verified end-to-end against opentelemetry-api 1.41.1: the cross-task detach raises internally, OTel swallows and logs it, and detach returns.)

The root cause is structural. The simplest fix: do not store the token in _in_flight_spans at all. The token's only purpose is to restore the caller's prior context after the span ends, but FIRE_AND_FORGET callers are ephemeral tasks that do not need their context restored. Stash just the span:

_in_flight_spans: dict[str, Span] = {}

Skip _detach_token entirely. The span lifecycle (start/end) is unaffected, and the ERROR spam stops. Note this does not remove the stale-current-span issue within the caller task — the pre_call attach is never undone there either way — but the failing detach was not fixing that today, so this is strictly a noise reduction. Properly restoring the caller's context would require detaching in the same task that attached.


def is_application_tracing_enabled() -> bool:
"""Check if application tracing is enabled.
def is_content_tracing_enabled() -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_content_tracing_enabled() is defined, exported, and documented, but nothing in _tracing_setters.py or tracing_plugins.py calls it. Setting MELLEA_TRACES_CONTENT=true or OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true produces no content on any span.

The OTel GenAI semconv defines gen_ai.input.messages and gen_ai.output.messages for this case — the prompt is available in GenerationPreCallPayload.prompt and the response in mot.generation after post_processing. Whether you implement this now or defer it, the function should either have callers or a comment saying the capture path is not yet implemented, so nobody assumes it's already working.

Comment on lines +20 to +25
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._setup_tracing()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_plugins_registered isn't being reset here. That flag gates whether _register_tracing_plugins() runs — once it flips to True it stays True for the rest of the process. Tests that reset module state via this helper don't get a clean slate: any subsequent _setup_tracing() call will skip re-registration entirely. Tests that need to control or verify plugin registration will see state that leaked from a previous test.

Suggested change
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._setup_tracing()
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._plugins_registered = False
tracing._setup_tracing()

Comment on lines +26 to +32
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._in_flight_spans.clear()
tracing._setup_tracing()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as test_tracing.py_plugins_registered missing from the reset.

Suggested change
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._in_flight_spans.clear()
tracing._setup_tracing()
def _reset_tracing_state() -> None:
"""Reset module state and re-run setup so env-var changes take effect."""
tracing._tracer_provider = None
tracing._application_tracer = None
tracing._backend_tracer = None
tracing._in_flight_spans.clear()
tracing._plugins_registered = False
tracing._setup_tracing()

if not payload.generation_id:
return
mot = payload.model_output
gen = mot.generation if mot is not None else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chat pre_call hook starts the span with model=None, provider=None by design — the chat path doesn't know the model until post_processing. But on the error path, if the error fires before post_processing() completes, mot.generation.model and mot.generation.provider are still None (and mot itself may be None, which this line already handles). set_request_attrs in _tracing_setters.py skips both attributes when falsy, so the error span ends with neither gen_ai.request.model nor gen_ai.provider.name set.

Error spans with no model or provider are hard to triage — you can't tell which backend threw the error at a glance. Worth noting as a known limitation here, or plumbing the model/provider from the backend config at pre_call time if it's accessible there.

Comment on lines 96 to 102
from .tracing import (
add_span_event,
end_backend_span,
is_application_tracing_enabled,
is_backend_tracing_enabled,
is_content_tracing_enabled,
is_tracing_enabled,
set_span_attribute,
set_span_error,
start_backend_span,
trace_application,
trace_backend,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_span_status_error was added in this PR and is in tracing.py's __all__, but it's not in the package-level export. Users importing from mellea.telemetry won't find it there; they'd have to import from the submodule directly. Inconsistent with how the other span helpers are exposed.

Also needs adding to __all__ lower in the file (between "set_span_error" and "trace_application").

Suggested change
from .tracing import (
add_span_event,
end_backend_span,
is_application_tracing_enabled,
is_backend_tracing_enabled,
is_content_tracing_enabled,
is_tracing_enabled,
set_span_attribute,
set_span_error,
start_backend_span,
trace_application,
trace_backend,
)
from .tracing import (
is_content_tracing_enabled,
is_tracing_enabled,
set_span_attribute,
set_span_error,
set_span_status_error,
trace_application,
)

)


def finish_backend_span_success(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish_backend_span_error has an if not _OTEL_AVAILABLE: return guard at the top, but finish_backend_span_success doesn't. Functionally harmless since both pop from _in_flight_spans first, and that dict is only populated when a tracer is available. The asymmetry is just a bit surprising when reading the two functions side by side.

@@ -41,11 +46,16 @@ class GenerationPostCallPayload(MelleaBasePayload):
model_output: The fully-computed `ModelOutputThunk`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring body just above this (lines 38–42) contains a false statement: "For already-computed thunks (e.g. cached responses) it fires before generate_from_context returns." That code path doesn't exist — generate_from_context doesn't check mot.is_computed() and doesn't fire this hook early. This is the same gap as the blocker above, but the docstring is wrong regardless of when that gets fixed. The sentence starting "For already-computed thunks" should be removed.

Comment on lines +216 to +217
> **Tip:** In pytest, use `monkeypatch.setenv` to control tracing per-test —
> see `test/telemetry/test_tracing.py` for the canonical pattern.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing readers to an internal test file isn't great docs practice — external users won't have it, and contributors still have to go hunting. Easier to just inline the pattern.

Suggested change
> **Tip:** In pytest, use `monkeypatch.setenv` to control tracing per-test —
> see `test/telemetry/test_tracing.py` for the canonical pattern.
> **Tip:** In pytest, use `monkeypatch.setenv("MELLEA_TRACES_ENABLED", "true")` to enable tracing for a test. Call `mellea.telemetry.tracing._setup_tracing()` after setting the env var (and again on teardown) so the module picks up the change — env vars are read at initialisation time, not on every call.

@planetf1
Copy link
Copy Markdown
Contributor

planetf1 commented Jun 4, 2026

Overall take: The architectural direction here is right. Moving backend tracing onto the plugin/hook pattern decouples it cleanly from individual backends — new backends get instrumentation for free, and the separation of concerns is much better than the old backend_instrumentation module. The GenAI semconv attribute coverage is solid on the happy path.

Two structural issues need fixing before this merges:

  1. Span leak on pre-computed MOTs (blocker comment on backend.py:117) — any backend that returns a pre-filled thunk bypasses astream entirely, so the pre_call span is never closed and the OTel context token is never cleaned up. The leaked token also corrupts the trace context for any subsequent work in that task. The fix is small: check mot.is_computed() after _generate_from_context returns and fire generation_post_call directly if true.

  2. ERROR log on every LLM call (warning comment on tracing.py:289) — FIRE_AND_FORGET hooks run in a different asyncio task from the one that attached the OTel context token, so every single post_call and error hook triggers a Failed to detach context ERROR log with a full traceback. The fix is also small: stop storing the token in _in_flight_spans and drop _detach_token entirely — the token is only useful in the task that created it.

The remaining comments (semconv gap, test state isolation, missing export, stale tooling refs) are all quick fixes. None of them block the approach.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking bugs introduced in this PR, one note referencing a follow-up issue.

finish_backend_span_error,
finish_backend_span_success,
start_backend_span,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level import on lines 17–21 creates a circular import that silences the entire plugin framework at runtime.

Sequence:

  1. MELLEA_TRACES_ENABLED=truetracing.py:181 calls _setup_tracing() at module load.
  2. _setup_tracing() calls _register_tracing_plugins(), which imports tracing_plugins.py.
  3. tracing_plugins.py:17 tries to import finish_backend_span_error, finish_backend_span_success, start_backend_span from tracing.py — which is still partially initialised at that point.
  4. The resulting ImportError is caught at tracing.py:132–139 and replaced with: "plugin framework not installed / pip install mellea[telemetry]".

cpex is installed. The warning is wrong. All backend spans and every metrics plugin (TokenMetricsPlugin, LatencyMetricsPlugin, ErrorMetricsPlugin) are silently dead — they never register.

Empirically confirmed: running the walkthrough with all OTLP env vars set produces start_session/session_context/aact application spans but zero backend chat spans and zero mellea.* metrics.

The TYPE_CHECKING guard is already used on lines 24–32 for the payload types — the same pattern applies here. Options: move the three function imports inside each hook method (lazy, cheap since called once per generation), or bind them after tracing.py is fully initialised.

if len(response_value) > 500:
response_value = response_value[:500] + "..."
set_span_attribute(span, "response", response_value)
set_span_attribute(span, "mellea.response", response_value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mellea.response is set unconditionally on every aact span, regardless of whether the operator has opted in to content capture.

is_content_tracing_enabled() is defined in tracing.py but is never called anywhere in the codebase. The MELLEA_TRACES_CONTENT opt-in contract documented in the PR description is not enforced.

Up to 500 chars of model output (potentially including PII or confidential content) reaches every configured OTLP exporter on every call by default. An operator who has not set MELLEA_TRACES_CONTENT=true has no way to prevent this.

Fix: gate this call on is_content_tracing_enabled(). The function is already there; it just needs to be wired.

action = payload.action
fmt = payload.format
start_backend_span(
"chat",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span name "chat" diverges from the OTel GenAI semantic conventions, which specify "{operation} {gen_ai.request.model}" — e.g. "chat granite4.1:8b". The model suffix enables model-level filtering directly in trace UIs and pipelines without requiring an attribute filter.

This is one of several semconv alignment items tracked in a follow-up issue (to be linked once filed). Noting it here since start_backend_span("chat", ...) is new in this PR and the fix is a one-liner at this call site — but whether to address it here or in the follow-up is the author's call.

finish_backend_span_error,
finish_backend_span_success,
start_backend_span,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level import on lines 17–21 creates a circular import that silences the entire plugin framework at runtime.

Sequence:

  1. MELLEA_TRACES_ENABLED=truetracing.py:181 calls _setup_tracing() at module load.
  2. _setup_tracing() calls _register_tracing_plugins(), which imports tracing_plugins.py.
  3. tracing_plugins.py:17 tries to import finish_backend_span_error, finish_backend_span_success, start_backend_span from tracing.py — which is still partially initialised at that point.
  4. The resulting ImportError is caught at tracing.py:132–139 and replaced with: "plugin framework not installed / pip install mellea[telemetry]".

cpex is installed. The warning is wrong. All backend spans and every metrics plugin (TokenMetricsPlugin, LatencyMetricsPlugin, ErrorMetricsPlugin) are silently dead — they never register.

Empirically confirmed: running with all OTLP env vars set produces start_session/session_context/aact application spans but zero backend chat spans and zero mellea.* metrics.

The TYPE_CHECKING guard is already used on lines 24–32 for the payload types — the same pattern applies here. Options: move the three function imports inside each hook method (lazy, cheap since called once per generation), or bind them after tracing.py is fully initialised.

if len(response_value) > 500:
response_value = response_value[:500] + "..."
set_span_attribute(span, "response", response_value)
set_span_attribute(span, "mellea.response", response_value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mellea.response is set unconditionally on every aact span, regardless of whether the operator has opted in to content capture.

is_content_tracing_enabled() is defined in tracing.py but is never called anywhere in the codebase. The MELLEA_TRACES_CONTENT opt-in contract documented in the PR description is not enforced.

Up to 500 chars of model output (potentially including PII or confidential content) reaches every configured OTLP exporter on every call by default. An operator who has not set MELLEA_TRACES_CONTENT=true has no way to prevent this.

Fix: gate this call on is_content_tracing_enabled(). The function is already there; it just needs to be wired.

action = payload.action
fmt = payload.format
start_backend_span(
"chat",
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span name "chat" diverges from the OTel GenAI semantic conventions, which specify "{operation} {gen_ai.request.model}" — e.g. "chat granite4.1:8b". The model suffix enables model-level filtering directly in trace UIs and pipelines without requiring an attribute filter.

This is one of several semconv alignment items tracked in #1209. Noting it here since start_backend_span("chat", ...) is new in this PR and the fix is a one-liner at this call site — but whether to address it here or in the follow-up is the author's call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

3 participants