Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## master / unreleased
* [CHANGE] Querier: Make query time range configurations per-tenant: `query_ingesters_within`, `query_store_after`, and `shuffle_sharding_ingesters_lookback_period`. Uses `model.Duration` instead of `time.Duration` to support serialization but has minimum unit of 1ms (nanoseconds/microseconds not supported). #7160
* [CHANGE] Cache: Setting `-blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl` to 0 will disable the bucket-index cache. #7446
* [FEATURE] Ingester: Add experimental active series tracker that counts active series by configurable label matchers (including regex) per tenant and exposes `cortex_ingester_active_series_per_tracker` metric. Configured via `active_series_trackers` in runtime config overrides. #7476
* [FEATURE] Ruler: Add per-tenant `ruler_alert_generator_url_template` runtime config option to customize alert generator URLs using Go templates. Supports Grafana Explore, Perses, and other UIs. #7302
* [FEATURE] Distributor: Add experimental `-distributor.enable-start-timestamp` flag for Prometheus Remote Write 2.0. When enabled, `StartTimestamp (ST)` is ingested. #7371
* [FEATURE] Memberlist: Add `-memberlist.cluster-label` and `-memberlist.cluster-label-verification-disabled` to prevent accidental cross-cluster gossip joins and support rolling label rollout. #7385
Expand Down
15 changes: 15 additions & 0 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4153,6 +4153,10 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# [max_series]
[limits_per_label_set: <list of LimitsPerLabelSet> | default = []]

# List of active series tracker configurations. Each tracker counts active
# series matching its matchers and exposes the count as a metric.
[active_series_trackers: <list of ActiveSeriesTrackerConfig> | default = []]

# [EXPERIMENTAL] True to enable native histogram.
# CLI flag: -blocks-storage.tsdb.enable-native-histograms
[enable_native_histograms: <boolean> | default = false]
Expand Down Expand Up @@ -6864,6 +6868,17 @@ limits:
[label_set: <map of string (labelName) to string (labelValue)> | default = []]
```

### `ActiveSeriesTrackerConfig`

```yaml
# Name of the tracker, used as a label value in the emitted metric.
[name: <string> | default = ""]

# PromQL series selector (e.g. {__name__=~"api_.*"}). All matchers must match
# for a series to be counted.
[matchers: <string> | default = ""]
```

### `PriorityDef`

```yaml
Expand Down
3 changes: 3 additions & 0 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,6 @@ Currently experimental features are:
- `-validation.max-label-cardinality-for-unoptimized-regex` (int) - maximum label cardinality
- `-validation.max-total-label-value-length-for-unoptimized-regex` (int) - maximum total length of all label values in bytes
- HATracker: `-distributor.ha-tracker.enable-startup-sync` (bool) - If enabled, fetches all tracked keys on startup to populate the local cache.
- Ingester: Active Series Tracker
- Per-tenant `active_series_trackers` configuration in runtime config overrides
- Counts active series matching PromQL label matchers and exposes `cortex_ingester_active_series_per_tracker` metric
71 changes: 71 additions & 0 deletions docs/proposals/active-series-tracker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Active Series Tracker

## Problem

AMP needs to monitor active series counts by configurable patterns (e.g., all series with `__name__=~"api_.*"`) for internal observability. The existing `LimitsPerLabelSet` feature is unsuitable because:

1. **No regex matching** — only supports exact `label=value` matching.
2. **Default partition side-effects** — adding labelset buckets reduces the default partition count.
3. **Coupled to limit enforcement** — designed for enforcing series limits, not pure monitoring.

## Requirements

- Track active series counts by configurable label matchers (including regex).
- Expose counts as Prometheus metrics on the ingester (internal only, not vended to customers).
- Configuration supports **per-tenant overrides** with a **default** fallback (same pattern as all other Limits fields).
- **Runtime hot-reloadable** via the existing runtime config file mechanism.
- **No limit enforcement** — purely observational.
- **No default partition** — unmatched series are simply not tracked.
- A series can match multiple tracker entries simultaneously.

## Design

### Configuration

Tracker config lives in the `Limits` struct, following the same per-tenant override pattern as `LimitsPerLabelSet`:

```yaml
# Default trackers (applied to all tenants without overrides)
limits:
active_series_trackers:
- name: api_metrics
matchers: '{__name__=~"api_.*"}'

# Per-tenant overrides via runtime config
overrides:
tenant-123:
active_series_trackers:
- name: api_metrics
matchers: '{__name__=~"api_.*"}'
- name: system_metrics
matchers: '{__name__=~"node_.*|process_.*"}'
```

The `matchers` field uses standard PromQL matcher syntax parsed via `parser.ParseMetricSelector`.

### Runtime Reload

Tracker config is part of `Limits`, which is reloaded via the runtime config manager every `runtime-config.reload-period` (default 10s). Matchers are parsed and validated during YAML/JSON unmarshalling. Invalid configs are rejected (existing config stays active).

### Metrics

A new gauge metric emitted per ingester:

```
cortex_ingester_active_series_per_tracker{user="<tenant>", name="<tracker_name>"} <count>
```

### Matching Logic

On each active series metrics update tick (default 1min), for each tenant:
1. Read the tenant's tracker config via `i.limits.ActiveSeriesTrackers(userID)`
2. For each tracker, count active series whose labels satisfy all matchers
3. Emit the gauge metric

A series can match multiple trackers. Tenants without configured trackers emit no tracker metrics.

### Performance Considerations

- Matching runs once per update period (default 1min), not on every sample ingestion.
- The number of trackers is expected to be small (< 10).
- Compiled matchers are cached in the parsed Limits and only recompiled on config change.
160 changes: 160 additions & 0 deletions integration/active_series_tracker_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
//go:build requires_docker

package integration

import (
"fmt"
"path/filepath"
"testing"
"time"

"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/prompb"
"github.com/stretchr/testify/require"
"gopkg.in/yaml.v3"

"github.com/cortexproject/cortex/integration/e2e"
e2edb "github.com/cortexproject/cortex/integration/e2e/db"
"github.com/cortexproject/cortex/integration/e2ecortex"
)

func TestActiveSeriesTrackerPerTenant(t *testing.T) {
s, err := e2e.NewScenario(networkName)
require.NoError(t, err)
defer s.Close()

// Write runtime config with per-tenant active series trackers.
runtimeConfig := map[string]interface{}{
"overrides": map[string]interface{}{
"user-1": map[string]interface{}{
"active_series_trackers": []map[string]string{
{"name": "api_metrics", "matchers": `{__name__=~"api_.*"}`},
{"name": "node_metrics", "matchers": `{__name__=~"node_.*"}`},
},
},
},
}
runtimeCfgYAML, err := yaml.Marshal(runtimeConfig)
require.NoError(t, err)
require.NoError(t, writeFileToSharedDir(s, runtimeConfigFile, runtimeCfgYAML))

flags := BlocksStorageFlags()
flags["-distributor.shard-by-all-labels"] = "true"
flags["-ingester.active-series-metrics-enabled"] = "true"
flags["-ingester.active-series-metrics-update-period"] = "2s"
flags["-ingester.active-series-metrics-idle-timeout"] = "5m"
flags["-runtime-config.file"] = filepath.Join(e2e.ContainerSharedDir, runtimeConfigFile)
flags["-runtime-config.reload-period"] = "1s"
flags["-alertmanager.web.external-url"] = "http://localhost/alertmanager"
flags["-alertmanager-storage.backend"] = "local"
flags["-alertmanager-storage.local.path"] = filepath.Join(e2e.ContainerSharedDir, "alertmanager_configs")

require.NoError(t, writeFileToSharedDir(s, "alertmanager_configs", []byte{}))

consul := e2edb.NewConsul()
minio := e2edb.NewMinio(9000, flags["-blocks-storage.s3.bucket-name"])
require.NoError(t, s.StartAndWaitReady(consul, minio))

flags["-ring.store"] = "consul"
flags["-consul.hostname"] = consul.NetworkHTTPEndpoint()

cortex := e2ecortex.NewSingleBinary("cortex-1", flags, "")
require.NoError(t, s.StartAndWaitReady(cortex))

// Wait until the ring is ready.
require.NoError(t, cortex.WaitSumMetrics(e2e.Equals(float64(512)), "cortex_ring_tokens_total"))

c, err := e2ecortex.NewClient(cortex.HTTPEndpoint(), cortex.HTTPEndpoint(), "", "", "user-1")
require.NoError(t, err)

now := time.Now()
for _, name := range []string{"api_requests_total", "api_errors_total", "node_cpu_seconds", "process_memory_bytes"} {
series, _ := generateSeries(name, now, prompb.Label{Name: "job", Value: "test"})
res, err := c.Push(series)
require.NoError(t, err)
require.Equal(t, 200, res.StatusCode, fmt.Sprintf("push %s failed", name))
}

// user-1 has trackers: api_metrics (matches 2), node_metrics (matches 1).
require.NoError(t, cortex.WaitSumMetricsWithOptions(
e2e.Equals(2),
[]string{"cortex_ingester_active_series_per_tracker"},
e2e.WithLabelMatchers(
labels.MustNewMatcher(labels.MatchEqual, "user", "user-1"),
labels.MustNewMatcher(labels.MatchEqual, "name", "api_metrics"),
),
e2e.WaitMissingMetrics,
))

require.NoError(t, cortex.WaitSumMetricsWithOptions(
e2e.Equals(1),
[]string{"cortex_ingester_active_series_per_tracker"},
e2e.WithLabelMatchers(
labels.MustNewMatcher(labels.MatchEqual, "user", "user-1"),
labels.MustNewMatcher(labels.MatchEqual, "name", "node_metrics"),
),
e2e.WaitMissingMetrics,
))

// user-2 has no trackers configured — should have no tracker metrics.
c2, err := e2ecortex.NewClient(cortex.HTTPEndpoint(), cortex.HTTPEndpoint(), "", "", "user-2")
require.NoError(t, err)

series2, _ := generateSeries("api_requests_total", now, prompb.Label{Name: "job", Value: "test"})
res, err := c2.Push(series2)
require.NoError(t, err)
require.Equal(t, 200, res.StatusCode)

// Wait for user-2 active series to be counted.
require.NoError(t, cortex.WaitSumMetricsWithOptions(
e2e.Equals(1),
[]string{"cortex_ingester_active_series"},
e2e.WithLabelMatchers(labels.MustNewMatcher(labels.MatchEqual, "user", "user-2")),
e2e.WaitMissingMetrics,
))

// user-2 should have no tracker metrics.
sum, err := cortex.SumMetrics(
[]string{"cortex_ingester_active_series_per_tracker"},
e2e.WithLabelMatchers(labels.MustNewMatcher(labels.MatchEqual, "user", "user-2")),
e2e.SkipMissingMetrics,
)
require.NoError(t, err)
require.Equal(t, 0.0, sum[0])

// Now update runtime config: remove node_metrics tracker for user-1.
runtimeConfig2 := map[string]interface{}{
"overrides": map[string]interface{}{
"user-1": map[string]interface{}{
"active_series_trackers": []map[string]string{
{"name": "api_metrics", "matchers": `{__name__=~"api_.*"}`},
},
},
},
}
runtimeCfgYAML2, err := yaml.Marshal(runtimeConfig2)
require.NoError(t, err)
require.NoError(t, writeFileToSharedDir(s, runtimeConfigFile, runtimeCfgYAML2))

// Wait for the stale node_metrics tracker metric to be removed.
require.NoError(t, cortex.WaitSumMetricsWithOptions(
e2e.Equals(0),
[]string{"cortex_ingester_active_series_per_tracker"},
e2e.WithLabelMatchers(
labels.MustNewMatcher(labels.MatchEqual, "user", "user-1"),
labels.MustNewMatcher(labels.MatchEqual, "name", "node_metrics"),
),
e2e.SkipMissingMetrics,
))

// api_metrics tracker should still work.
require.NoError(t, cortex.WaitSumMetricsWithOptions(
e2e.Equals(2),
[]string{"cortex_ingester_active_series_per_tracker"},
e2e.WithLabelMatchers(
labels.MustNewMatcher(labels.MatchEqual, "user", "user-1"),
labels.MustNewMatcher(labels.MatchEqual, "name", "api_metrics"),
),
e2e.WaitMissingMetrics,
))
}
10 changes: 10 additions & 0 deletions pkg/ingester/active_series.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,3 +248,13 @@ func (s *activeSeriesStripe) getActiveNativeHistogram() int {

return s.activeNativeHistogram
}

// matchesAll returns true if the labels satisfy all given matchers.
func matchesAll(lbs labels.Labels, matchers []*labels.Matcher) bool {
for _, m := range matchers {
if !m.Matches(lbs.Get(m.Name)) {
return false
}
}
return true
}
Loading
Loading