Skip to content

feat: OTel: Task Processor#197

Open
khvn26 wants to merge 2 commits intomainfrom
feat/otel-task-processor-trace-context
Open

feat: OTel: Task Processor#197
khvn26 wants to merge 2 commits intomainfrom
feat/otel-task-processor-trace-context

Conversation

@khvn26
Copy link
Copy Markdown
Member

@khvn26 khvn26 commented Apr 10, 2026

Closes #184.

  • Serialise W3C trace context (traceparent, baggage) into a nullable trace_context JSONField on tasks at enqueue time via propagate.inject() in TaskHandler.delay()
  • Extract and activate context as a parent span in _run_task(), linking task execution spans back to the originating HTTP request trace
  • Derive default service.name from entrypoint: flagsmith-task-processor when running the task processor, flagsmith-api otherwise (OTEL_SERVICE_NAME overrides both)
  • Add opentelemetry-api to task-processor extras, TraceContext type alias, and span attributes matching Prometheus metric labels for cross-signal correlation

How to review

Review complexity: 2/5

Start with the model layer (models.py, types.py, migration) — a new trace_context nullable JSONField. Then follow the write path in decorators.py. Next, the read path in processor.py. Finally, main.py has one conditional for the default service.name based on sys.argv.

Tests mirror the above order: model, decorator, processor, main. The processor tests use static traceparent strings with known hex IDs for parametrisation. span_exporter fixture was promoted from integration test to root conftest, and RunTasksFixture protocol was updated with a default num_tasks.

Test plan

  • Run Flagsmith API + task processor against a local collector (e.g. SigNoz)
  • Trigger a request that enqueues a task (e.g. flag update that triggers environment rebuild)
  • In the trace viewer, verify the task processor span appears as a child of the HTTP request span with matching trace ID
  • Confirm span attributes (task_identifier, task_type, result) are present
  • Confirm service.name shows flagsmith-task-processor for task processor spans and flagsmith-api for API spans
  • Verify structlog events emitted during task execution carry the correct trace_id / span_id

Serialize W3C trace context (traceparent, baggage) into a nullable
JSONField on tasks at enqueue time, then extract and activate it as
a parent span at execution time. This links task processor spans back
to the originating HTTP request trace.

- Add `trace_context` JSONField to AbstractBaseTask (migration 0014)
- Inject via `propagate.inject()` in `TaskHandler.delay()`
- Extract via `propagate.extract()` + span in `_run_task()`
- Derive default service.name from entrypoint (flagsmith-task-processor)
- Add `opentelemetry-api` to task-processor extras
- Promote `span_exporter` fixture to root conftest

Closes #184

beep boop
@khvn26 khvn26 requested a review from a team as a code owner April 10, 2026 15:35
@khvn26 khvn26 requested review from emyller and removed request for a team April 10, 2026 15:35
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.05%. Comparing base (1256e87) to head (57c2456).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
+ Coverage   96.95%   97.05%   +0.09%     
==========================================
  Files         101      102       +1     
  Lines        4008     4139     +131     
==========================================
+ Hits         3886     4017     +131     
  Misses        122      122              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because it looks great, my comments are not blocking.

There is one coverage failure I'm suspicious about, but other tests add enough confidence.

Comment on lines +69 to +73
default_service_name = (
"flagsmith-task-processor"
if "task-processor" in sys.argv
else "flagsmith-api"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Should we just set OTEL_SERVICE_NAME in container settings? I'm tempted to the smaller effort this allows, but it got my eyebrow raised.

Comment on lines +173 to +203
def test_ensure_cli_env__task_processor__expected_otel_service_name(
monkeypatch: pytest.MonkeyPatch,
mocker: MockerFixture,
) -> None:
# Given
monkeypatch.setattr("sys.argv", ["flagsmith", "task-processor"])
monkeypatch.setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://collector:4318")

mock_build_log = mocker.patch(
"common.core.otel.build_otel_log_provider",
return_value=mocker.MagicMock(spec=LoggerProvider),
)
mock_build_tracer = mocker.patch(
"common.core.otel.build_tracer_provider",
return_value=mocker.MagicMock(spec=TracerProvider),
)
mocker.patch("common.core.otel.setup_tracing")

# When
with ensure_cli_env():
pass

# Then
mock_build_log.assert_called_once_with(
endpoint="http://collector:4318/v1/logs",
service_name="flagsmith-task-processor",
)
mock_build_tracer.assert_called_once_with(
endpoint="http://collector:4318/v1/traces",
service_name="flagsmith-task-processor",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the "flagsmith-api" case isn't caught by tests coverage.

Please, either:

  1. Fix coverage;
  2. Consider deleting default_service_name, and this test — see here. IMO test_ensure_cli_env__env_service_name__expected_otel_service_name looks sufficient.

task_spans = [s for s in spans if s.name == dummy_task.task_identifier]
assert len(task_spans) == 1

task_span = task_spans[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
task_span = task_spans[0]
task_span = task_spans[0]
assert task_span.status.status_code == StatusCode.OK

assert len(task_spans) == 1

task_span = task_spans[0]
assert task_span.attributes is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
assert task_span.attributes is not None


task_span = task_spans[0]
assert task_span.status.status_code == StatusCode.ERROR
assert task_span.attributes is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
assert task_span.attributes is not None

| --------------------------------- | --------------------------------------------------------------------------------------------------------------------- | --------------- |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (e.g. `http://collector:4318`). If unset, no OTel setup occurs. | _(disabled)_ |
| `OTEL_SERVICE_NAME` | The `service.name` resource attribute. | `flagsmith-api` |
| `OTEL_SERVICE_NAME` | The `service.name` resource attribute. Defaults to `flagsmith-task-processor` when running the task processor. | `flagsmith-api` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTel: Task Processor

3 participants