Refactor SLO workload to use environment variables by polRk · Pull Request #802 · ydb-platform/ydb-python-sdk

polRk · 2026-04-02T10:59:53Z

Migrate from CLI subcommands (table-run, topic-run) to environment-based configuration (WORKLOAD_NAME, YDB_ENDPOINT, etc.)
Simplify argument parsing: remove subcommand structure, add env var injection
Update metrics collection to always be enabled (remove otlp_endpoint checks)
Replace quantile-estimator with hdrhistogram
Update workflow matrix from include to sdk naming
Upgrade ydb-slo-action from commit hash to v2 tag
Add pyrightconfig.json for type checking
Update Dockerfile comments to reflect new env-var-based interface

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Other information

- Migrate from CLI subcommands (table-run, topic-run) to environment-based configuration (WORKLOAD_NAME, YDB_ENDPOINT, etc.) - Simplify argument parsing: remove subcommand structure, add env var injection - Update metrics collection to always be enabled (remove otlp_endpoint checks) - Replace quantile-estimator with hdrhistogram - Update workflow matrix from include to sdk naming - Upgrade ydb-slo-action from commit hash to v2 tag - Add pyrightconfig.json for type checking - Update Dockerfile comments to reflect new env-var-based interface

Copilot

Pull request overview

This PR refactors the Python SLO workload runner under tests/slo/ to use environment-variable-driven configuration (workload selection, YDB connection, duration, OTLP exporter settings) instead of CLI subcommands, and updates metrics and CI workflows accordingly.

Changes:

Replace subcommand-based orchestration with a single “run all” flow selected via WORKLOAD_NAME / env vars.
Rework metrics initialization to use standard OpenTelemetry env vars and switch latency quantiles to HDR histogram-based calculation.
Update SLO GitHub workflows and Docker image interface to match the new env-var-based runner.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/slo/src/runners/topic_runner.py	Update metrics factory call to new signature.
tests/slo/src/runners/table_runner.py	Update metrics factory call to new signature.
tests/slo/src/root_runner.py	Replace subcommand runner with `run_all()` flow selected via `WORKLOAD_NAME`.
tests/slo/src/pyrightconfig.json	Add Pyright configuration for the SLO runner code.
tests/slo/src/options.py	Simplify CLI flags; inject connection/duration from environment variables.
tests/slo/src/jobs/table_jobs.py	Switch to `WORKLOAD_NAME` constant.
tests/slo/src/jobs/base.py	Always start metrics sender job (no CLI OTLP gating).
tests/slo/src/jobs/async_topic_jobs.py	Always start async metrics sender task (no CLI OTLP gating).
tests/slo/src/core/metrics.py	Move to standard OTel env vars and implement HDR-based latency quantiles.
tests/slo/src/main.py	Switch entrypoint to call `run_all()`.
tests/slo/requirements.txt	Replace `quantile-estimator` with `hdrhistogram`.
tests/slo/Dockerfile	Update documentation/comments and provide a default CMD for tuning flags.
pyproject.toml	Configure `tool.ty` extra paths for SLO sources.
.github/workflows/slo.yml	Refactor workload matrix and switch to `ydb-slo-action` v2 init/report.
.github/workflows/slo-report.yml	Switch report action reference to v2 and remove label-removal job.

Comments suppressed due to low confidence (1)

.github/workflows/slo.yml:53

concurrency.group references matrix.workload, but the matrix is now defined under matrix.sdk (with name/command). This will make the workflow fail to evaluate at runtime. Update the concurrency group expression to use the new matrix key (e.g., matrix.sdk.name).

    concurrency:
      group: slo-${{ github.ref }}-${{ matrix.workload }}
      cancel-in-progress: true

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T11:06:23Z

tests/slo/src/core/metrics.py

-        # Metrics job calls push() with the cadence of --report-period.
-        # force_flush() makes the exporter send immediately.
+        with self._hdr_lock:
+            snapshot = {k: v for k, v in self._hdr.items()}


In push(), the code copies the _hdr dict under the lock, but then reads percentiles from the same HdrHistogram instances after releasing the lock. Since stop() records into these histograms under the same lock, this creates a data race (HdrHistogram isn't thread-safe) and can lead to incorrect values or runtime errors. Keep the lock held while computing percentiles, or clone/snapshot each histogram under the lock before reading it.

Suggested change

snapshot = {k: v for k, v in self._hdr.items()}

# Clone histograms under the lock to avoid concurrent mutation while reading.

snapshot = {k: v.copy() for k, v in self._hdr.items()}

Copilot · 2026-04-02T11:06:24Z

tests/slo/src/root_runner.py

 import asyncio
-import ydb
-import ydb.aio
 import logging
-from typing import Dict

-from runners.topic_runner import TopicRunner
+from core.metrics import WORKLOAD_NAME
 from runners.table_runner import TableRunner
-from runners.base import BaseRunner
-
-logger = logging.getLogger(__name__)
-
-
-class SLORunner:
-    def __init__(self):
-        self.runners: Dict[str, type(BaseRunner)] = {}
-
-    def register_runner(self, prefix: str, runner_cls: type(BaseRunner)):
-        self.runners[prefix] = runner_cls
+from runners.topic_runner import TopicRunner

-    def run_command(self, args):
-        subcommand_parts = args.subcommand.split("-", 1)
-        if len(subcommand_parts) < 2:
-            raise ValueError(f"Invalid subcommand format: {args.subcommand}. Expected 'prefix-command'")
+import ydb
+import ydb.aio



asyncio and ydb.aio are imported but not used in this module. Flake8 will flag these as unused imports (F401). Remove them, or use the async driver path if it’s still intended here.

tests/slo/requirements.txt

Copilot · 2026-04-02T11:06:24Z

tests/slo/src/jobs/async_topic_jobs.py

    def _run_metric_job(self):
-        # Metrics are enabled only if an OTLP endpoint is provided (CLI: --otlp-endpoint).
-        if not getattr(self.args, "otlp_endpoint", None):
-            return []
-
        task = asyncio.create_task(
            self._async_metric_sender(self.args.time),
            name="slo_metrics_sender",


Now that _run_metric_job() always starts the metrics sender task (no OTLP-endpoint gating), the push loop will run on every topic workload run. In _async_metric_sender() the limiter math appears to treat report_period as if it were in microseconds (10**6 // report_period), even though the CLI help documents it as milliseconds, which can drive the loop far more frequently than intended (default 1000ms → ~1000 pushes/sec). Consider fixing the limiter to match the sync path (~1 push per report_period).

Add Ruff configuration to match Black's line length, then reformat code accordingly. Remove unnecessary blank line in __main__.py.

Replace matrix.workload with matrix.sdk.name to properly identify concurrent jobs by SDK variant.

Add type hint to _run_metric_job return type. Move environment variable fallback logic from post-parse injection into argument parser definition using nargs="?" and default values. Remove unused asyncio import. Pin hdrhistogram and improve argument parsing - Pin hdrhistogram to v0.10.3 for reproducibility - Move YDB and workload config to proper argparse arguments with environment variable fallbacks instead of post-parsing assignment - Add type hint to _run_metric_job return type - Remove unused asyncio import

github-actions · 2026-04-02T12:35:03Z

🌋 SLO Test Results

🟢 2 workload(s) tested — All thresholds passed

Commit: 2ce104b · View run

Workload	Thresholds	Duration	Report
sync-table	🟢 OK	10m 2s	📄 Report
sync-query	🟢 OK	10m 9s	📄 Report

Generated by ydb-slo-action

Copilot AI review requested due to automatic review settings April 2, 2026 10:59

Copilot started reviewing on behalf of polRk April 2, 2026 11:01 View session

polRk added the SLO label Apr 2, 2026

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Configure Ruff and reformat code to 120 chars

5207479

Add Ruff configuration to match Black's line length, then reformat code accordingly. Remove unnecessary blank line in __main__.py.

github-actions bot removed the SLO label Apr 2, 2026

polRk added 2 commits April 2, 2026 14:14

Use SDK name in SLO workflow concurrency group

baacda3

Replace matrix.workload with matrix.sdk.name to properly identify concurrent jobs by SDK variant.

polRk added the SLO label Apr 2, 2026

github-actions bot removed the SLO label Apr 2, 2026

Update SLO workflow baseline image to use current version.

27bdeab

polRk added the SLO label Apr 2, 2026

github-actions bot removed the SLO label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SLO workload to use environment variables#802

Refactor SLO workload to use environment variables#802
polRk wants to merge 5 commits intomainfrom
slo-v2

polRk commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	snapshot = {k: v for k, v in self._hdr.items()}
	# Clone histograms under the lock to avoid concurrent mutation while reading.
	snapshot = {k: v.copy() for k, v in self._hdr.items()}

Conversation

polRk commented Apr 2, 2026

Pull request type

What is the current behavior?

What is the new behavior?

Other information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 2, 2026

🌋 SLO Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants