OBSDOCS-3383: Document Vector metrics cardinality and monitoring impact#113290
OBSDOCS-3383: Document Vector metrics cardinality and monitoring impact#113290johnwilkins wants to merge 5 commits into
Conversation
Addresses the issue where Vector collector metrics exhibit high cardinality in multitenant environments, impacting Prometheus resource consumption and cluster stability. Creates comprehensive guidance covering design, diagnosis, and remediation of high metrics cardinality caused by complex ClusterLogForwarder configurations. New modules: - collector-metrics-cardinality-impact.adoc (CONCEPT) * Explains what metrics cardinality is * How ClusterLogForwarder configuration creates cardinality * Which Vector metrics are affected (component_id label) * Impact on Prometheus (memory, CPU, storage, queries) * When to be concerned about cardinality - best-practices-multitenant-logging.adoc (REFERENCE) * Design patterns to minimize cardinality impact * Consolidate inputs using label selectors * Consolidate outputs to same destination * Minimize pipeline count * Example multitenant architecture (40-60 vs 400-500 components) * Cardinality estimation formulas - troubleshooting-collector-metrics-cardinality.adoc (PROCEDURE) * Diagnostic steps using promtool and PromQL * Identify problematic ClusterLogForwarder instances * Remediation options with examples * Verification steps Content based on: - Red Hat KB article 7137995 (diagnostic procedures) - OBSDA-1341 RFE description (problem statement) - Support case analysis (customer impact) Added new section to configuring/cluster-logging-collector.adoc: "Collector metrics and monitoring impact" This documentation addresses 3 JTBDs: 1. Design multitenant logging with monitoring impact awareness 2. Diagnose if logging causes Prometheus issues 3. Remediate high cardinality issues Related Jira: https://redhat.atlassian.net/browse/OBSDOCS-3383 Related RFE: https://redhat.atlassian.net/browse/OBSDA-1341 Signed-off-by: John Wilkins <jowilkin@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@johnwilkins: This pull request references OBSDOCS-3383 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
🤖 Fri Jun 12 23:03:53 - Prow CI generated the docs preview: |
Replaces numbered callouts (<1>, <2>, etc.) with DITA-compliant approaches: - Inline explanatory text after code blocks - Definition lists for key-value explanations - Bulleted lists for multi-point explanations Callouts are not supported in DITA and cause build warnings. Changes: - best-practices-multitenant-logging.adoc * Removed 10 callouts from YAML examples * Replaced with inline explanations and bulleted lists - troubleshooting-collector-metrics-cardinality.adoc * Removed 4 callouts from terminal output examples * Replaced with inline explanatory sentences Vale validation: 0 errors, 6 warnings (block titles only) Signed-off-by: John Wilkins <jowilkin@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed level 2 subheadings and inappropriate block titles from: - modules/collector-metrics-cardinality-impact.adoc - modules/best-practices-multitenant-logging.adoc Changes: - Converted level 2 subheadings (==) to inline content within main module text - Removed inappropriate block titles (.Anti-pattern, .Recommended, etc.) - Replaced block titles with inline introductory text - Fixed PascalCase terms (LokiStack, LogQL) with backticks DITA compliance verified with vale-check-assembly - 0 errors, 0 warnings on these modules. Signed-off-by: John Wilkins <jowilkin@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…es module Removed standalone '+' continuation markers that were causing visible plus signs in HTML output. These markers were left over from removed block titles and are not needed for standalone paragraphs after code blocks. Signed-off-by: John Wilkins <jowilkin@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed RedHat.Using warnings: - modules/collector-metrics-cardinality-impact.adoc:101 - modules/troubleshooting-collector-metrics-cardinality.adoc:153 All three modules now have 0 errors, 0 warnings. Signed-off-by: John Wilkins <jowilkin@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@johnwilkins: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Documents the issue where Vector collector metrics exhibit high cardinality in multitenant environments, impacting Prometheus resource consumption and cluster stability.
Provides comprehensive guidance covering design, diagnosis, and remediation of high metrics cardinality caused by complex ClusterLogForwarder configurations.
Jira
Problem Addressed
In multitenant OpenShift clusters with many ClusterLogForwarder inputs, outputs, and pipelines, Vector collector metrics create high cardinality in Prometheus:
vector_component_received_events_count_bucket: 700K+ time seriesNo existing documentation covered this issue.
Content Created
New Modules (3)
modules/collector-metrics-cardinality-impact.adoc (CONCEPT - 143 lines)
modules/best-practices-multitenant-logging.adoc (REFERENCE - 312 lines)
modules/troubleshooting-collector-metrics-cardinality.adoc (PROCEDURE - 204 lines)
Assembly Updated
Content Sources
Validation
Signed-off-by: John Wilkins jowilkin@redhat.com