diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure API Management.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure API Management.mdx index eb3728a96..b95ff91fd 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure API Management.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure API Management.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor API Management gateway performance including request throughput, response status codes, gateway and backend response times, failed request counts, capacity utilization, event hub events, websocket message counts, and network connection status. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure API Management with metrics covering: + +- **Requests** -- gateway request rate +- **Latency** -- request duration (overall and backend response time) +- **Compute** -- gateway CPU and memory utilization +- **Capacity** -- capacity utilization percentage +- **Events** -- EventHub events (successful/failed/dropped/rejected/throttled/timed out), EventHub bytes sent +- **WebSockets** -- WebSocket connection attempts, WebSocket messages +- **Network** -- network connectivity status + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -331,6 +467,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -419,31 +556,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure App Service.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure App Service.mdx index e822837cb..37b857b98 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure App Service.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure App Service.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor App Service web applications including HTTP request rates and response status codes, response times, CPU and memory usage, network throughput, file IO operations, .NET runtime statistics (threads, GC, assemblies), Azure Functions execution counts and units, and Flex Consumption plan metrics. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure App Service with metrics covering: + +- **Requests** -- HTTP request rate, response status codes (2xx/3xx/4xx/5xx), error detail (401/403/404/406) +- **Performance** -- response time, request queue depth +- **Compute** -- CPU utilization, CPU time consumed +- **Memory** -- memory usage (average working set, working set, private bytes) +- **Network** -- network traffic (received/sent), I/O throughput (read/write/other) +- **I/O** -- I/O operations (read/write/other), file handles +- **.NET runtime** -- threads, GC collections (gen0/gen1/gen2), loaded assemblies, app domains +- **Functions** -- function executions and execution units (MB-ms), always-ready and on-demand units +- **Health** -- health check status + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -328,6 +466,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -432,31 +571,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Application Gateway.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Application Gateway.mdx index 163e61827..261cf82a3 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Application Gateway.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Application Gateway.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor Application Gateway performance including throughput and traffic volume, request rates and response status codes, backend health and latency breakdown (connect, first byte, last byte), client latency, current and new connections, WebSocket sessions, capacity and compute units, CPU utilization, TLS connections, and WAF security events including rule matches, challenges, and penalty box activity. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Application Gateway with metrics covering: + +- **Traffic** -- throughput, traffic volume (received/sent), request rates (total/failed) +- **Response** -- gateway and backend response status codes +- **Backend** -- backend health (healthy/unhealthy hosts), backend latency (connect, first byte, last byte) +- **Client** -- client latency (total time, client RTT) +- **Connections** -- current and new connections, TLS connections, WebSocket connections +- **Capacity** -- capacity units, compute units, billed/fixed billed, CPU utilization +- **WAF** -- WAF requests (total/blocked/matched), rule matches (managed/custom/bot), challenges, penalty box + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -327,6 +463,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -425,31 +562,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Application Insights.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Application Insights.mdx index 82bbd7280..2ba41eabd 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Application Insights.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Application Insights.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor application performance through Application Insights including availability test results and duration, server request rates and response times, dependency call tracking and failures, exception rates by source, browser page load timing breakdown, process CPU and memory usage, IO rates, HTTP request queue depth, page views, and trace volume. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Application Insights with metrics covering: + +- **Availability** -- availability test percentage, test duration +- **Requests** -- server request rate, HTTP request rate, HTTP request execution time, request queue depth +- **Responses** -- server response time, server requests (total/failed) +- **Dependencies** -- dependency calls (total/failed), dependency duration +- **Exceptions** -- exception rate, exceptions by source (total/browser/server) +- **Browser** -- page load time, browser timing breakdown (network/send/receive/processing), page views +- **Process** -- CPU utilization (process/processor), memory (available/private), I/O rate +- **Traces** -- trace volume + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -334,6 +471,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -433,31 +571,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Cache for Redis.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Cache for Redis.mdx index 7e0ef2747..6f59a5062 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Cache for Redis.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Cache for Redis.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor Azure Cache for Redis including cache hit and miss rates, read and write throughput, server load and CPU utilization, memory usage, connected clients, operations per second, command processing rates, latency percentiles, key eviction and expiration, and geo-replication health and sync status. Provides per-shard breakdowns for clustered deployments. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Cache for Redis with metrics covering: + +- **Performance** -- operations/second, command processing rates (get/set), cache hit/miss rates +- **Latency** -- average latency, P99 latency +- **Compute** -- CPU utilization, server load +- **Memory** -- memory usage (used/RSS), memory utilization +- **Connections** -- connected clients, connection rate (created/closed) +- **Keys** -- total keys, evicted keys, expired keys, miss rate +- **Throughput** -- read/write bytes per second +- **Geo-replication** -- replication health, connectivity lag, sync events, data sync offset +- **Per-shard** -- instance-level breakdowns for hit rate, clients, commands, server load, keys, operations, throughput + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -331,6 +469,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -441,31 +580,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Cognitive Services.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Cognitive Services.mdx index bdea0c976..7980c2e72 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Cognitive Services.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Cognitive Services.mdx @@ -26,40 +26,84 @@ Module: azure_monitor ## Overview -Monitor Azure AI and Cognitive Services including API call volume, success and client error rates, response latency, token processing rates for language models, content safety filtering, fine-tuning operations, provisioned throughput utilization, rate-limiting events, active inference connections, and context token cache performance. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure AI and Cognitive Services with metrics covering: + +- **API calls** -- total calls (successful/blocked/token), model requests, OpenAI requests +- **Errors** -- total, client, and server errors, rate limiting events +- **Latency** -- service latency, model latency (time to response/first token/between tokens/last byte) +- **Tokens** -- model token usage (input/output/total), OpenAI token usage (prompt/generated), cache tokens (read/write) +- **Availability** -- service availability, model availability, OpenAI availability +- **Content safety** -- content moderation calls (text/image), safety system events, harmful/blocked requests +- **Speech** -- transcription, translation, synthesis, speaker recognition, voice training/hosting +- **Vision** -- computer vision and custom vision transactions, images stored, training time +- **Translator** -- text and document translation (standard/custom) +- **Provisioned** -- model provisioned utilization, OpenAI provisioned-managed utilization +- **Fine-tuning** -- training hours +- **Personalizer** -- events, rewards, actions, feature cardinality + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +127,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +165,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +187,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +325,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +355,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +365,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +408,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +416,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -337,6 +478,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -488,31 +630,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Apps.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Apps.mdx index 1cea3aa24..7774bd374 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Apps.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Apps.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor Container Apps including CPU and memory usage, network traffic, replica counts, request processing rates, response times, restart frequency, and resource reservation utilization. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Container Apps with metrics covering: + +- **Compute** -- CPU usage (nanocores and percentage), GPU utilization +- **Memory** -- memory working set, memory percentage, JVM memory (total/pool/buffer) +- **Requests** -- request rate, response time +- **Network** -- network traffic (received/sent), resiliency pending connections and timeouts +- **Replicas** -- replica count, restart count, reserved cores +- **JVM** -- thread count, GC collections and duration, buffer count +- **Resiliency** -- host ejections, request retries + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -331,6 +467,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -431,31 +568,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Instances.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Instances.mdx index a690301a5..3d8945906 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Instances.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Instances.mdx @@ -26,40 +26,75 @@ Module: azure_monitor ## Overview -Monitor Container Instance groups including CPU and memory usage and network bytes transferred in and out. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Container Instances with metrics covering: + +- **Compute** -- CPU usage (average/max) +- **Memory** -- memory usage (average/max) +- **Network** -- network traffic (received/sent) + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +118,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +156,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +178,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +316,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +346,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +356,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +399,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +407,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -324,6 +456,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -405,31 +538,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Registry.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Registry.mdx index 575af2bfd..1ba7bcc3e 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Container Registry.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Container Registry.mdx @@ -26,40 +26,75 @@ Module: azure_monitor ## Overview -Monitor Container Registry including storage usage, successful and failed pull and push operation counts, and task run duration. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Container Registry with metrics covering: + +- **Operations** -- image pulls (successful/total), image pushes (successful/total) +- **Storage** -- storage used +- **Tasks** -- task run duration, agent pool CPU time + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +118,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +156,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +178,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +316,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +346,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +356,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +399,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +407,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -322,6 +454,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -405,31 +538,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Cosmos DB Account.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Cosmos DB Account.mdx index ebf14f91f..cdb6e9fc6 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Cosmos DB Account.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Cosmos DB Account.mdx @@ -26,40 +26,78 @@ Module: azure_monitor ## Overview -Monitor Cosmos DB accounts including request unit consumption and throttling, document counts and storage, data and index sizes, replication latency, availability percentages, provisioned throughput utilization, and normalized RU consumption per partition. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Cosmos DB with metrics covering: + +- **Request units** -- RU consumption, provisioned throughput (provisioned/autoscale), normalized RU per partition +- **Storage** -- data, index, and quota storage, physical partition size +- **Latency** -- server-side latency (direct/gateway), replication latency +- **Requests** -- total requests, API requests (Mongo/Cassandra/Gremlin), metadata requests, dedicated gateway requests +- **Availability** -- service availability percentage +- **Advanced** -- document count, partition count, dedicated gateway CPU/memory, integrated cache hit rate + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +121,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +159,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +181,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +319,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +349,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +359,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +402,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +410,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -330,6 +465,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -428,31 +564,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Data Explorer Cluster.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Data Explorer Cluster.mdx index fb3713abe..61527aed0 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Data Explorer Cluster.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Data Explorer Cluster.mdx @@ -26,40 +26,82 @@ Module: azure_monitor ## Overview -Monitor Azure Data Explorer (Kusto) clusters including ingestion latency, volume, and success rates, query performance and concurrency, cache utilization, CPU and memory usage, export operations, streaming ingest throughput, materialized view health, instance counts, and follower lag. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Data Explorer (Kusto) with metrics covering: + +- **Ingestion** -- ingestion latency, volume, result (success/failure), queue length, batch processing +- **Queries** -- query count, query duration, concurrent queries, throttled queries/commands +- **Streaming ingest** -- data rate, duration, result, utilization +- **Cache** -- cache and ingestion utilization +- **Compute** -- CPU utilization +- **Export** -- continuous export records, result, lateness, pending jobs, export utilization +- **Materialized views** -- view health, age, data loss, records in delta, extents rebuild +- **Cluster** -- instance count (average/max/min), keep alive, total extents +- **Events** -- events received/processed/dropped, blobs received/processed/dropped +- **Advanced** -- follower latency, discovery latency, weak consistency latency, partitioning percentage + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +125,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +163,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +185,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +323,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +353,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +363,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +406,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +414,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -341,6 +480,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -463,31 +603,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Data Factory.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Data Factory.mdx index 4eb64ba48..618003637 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Data Factory.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Data Factory.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor Data Factory including pipeline, activity, and trigger run success and failure counts, integration runtime CPU and memory utilization, available capacity and queue lengths, SSIS package execution rates, copy operations throughput, data flow processing metrics, and overall factory resource utilization. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Data Factory with metrics covering: + +- **Pipeline runs** -- pipeline runs (succeeded/failed/cancelled), elapsed time runs +- **Activity runs** -- activity runs (succeeded/failed/cancelled) +- **Trigger runs** -- trigger runs (succeeded/failed/cancelled) +- **Integration runtime** -- IR CPU and memory utilization, available nodes, queue length, task pickup delay +- **SSIS** -- SSIS package executions (succeeded/failed/cancelled), IR start/stop operations +- **MVNet IR** -- pipeline and copy capacity/utilization, external capacity, queue lengths +- **Airflow IR** -- CPU and memory, DAG processing, task instances, scheduler activity, triggers, pool slots +- **Factory** -- entity count, factory size (current/max allowed) + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -346,6 +483,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -472,31 +610,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Event Grid Topic.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Event Grid Topic.mdx index c2d774216..dcb821850 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Event Grid Topic.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Event Grid Topic.mdx @@ -26,40 +26,76 @@ Module: azure_monitor ## Overview -Monitor Event Grid topics including publish success and failure counts, publish latency, event delivery and routing rates, delivery success and failure counts, dead-lettered events, and matched event routing. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Event Grid with metrics covering: + +- **Publishing** -- publish rate (success/failed), publish latency +- **Delivery** -- events delivered, failed, dropped, and dead-lettered +- **Routing** -- matched and unmatched event routing, destination processing duration +- **Filters** -- advanced filter evaluations + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +119,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +157,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +179,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +317,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +347,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +357,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +400,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +408,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -326,6 +459,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -410,31 +544,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Event Hubs Namespace.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Event Hubs Namespace.mdx index 64441c3f7..2b70e136a 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Event Hubs Namespace.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Event Hubs Namespace.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor Event Hubs namespaces including incoming and outgoing message rates, byte throughput, captured messages and bytes, throttled and quota-exceeded request counts, active connections, and total connection counts. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Event Hubs with metrics covering: + +- **Messages** -- message flow (in/out), captured messages and bytes +- **Throughput** -- data throughput (in/out bytes per second) +- **Connections** -- active connections, connection events (opened/closed) +- **Requests** -- incoming and successful request rates +- **Errors** -- server errors, user errors, throttled requests, quota exceeded +- **Replication** -- replication lag (messages and duration) +- **Resources** -- namespace size, CPU and memory utilization + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -330,6 +466,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -421,31 +558,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Circuit.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Circuit.mdx index b9a875248..bf1bcfbd6 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Circuit.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Circuit.mdx @@ -26,40 +26,77 @@ Module: azure_monitor ## Overview -Monitor ExpressRoute circuits including bits per second in and out, ARP and BGP availability percentages, packet drops, and QoS bit rate throughput. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure ExpressRoute Circuit with metrics covering: + +- **Throughput** -- circuit throughput (bits/s in/out), GlobalReach throughput +- **Availability** -- ARP availability, BGP availability +- **Bandwidth** -- bandwidth utilization (ingress/egress) +- **QoS** -- QoS dropped bits (in/out) +- **Routes** -- FastPath routes count + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +120,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +158,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +180,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +318,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +348,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +358,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +401,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +409,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -326,6 +460,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -411,31 +546,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Gateway.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Gateway.mdx index a08efc70d..1f64870a7 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Gateway.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure ExpressRoute Gateway.mdx @@ -26,40 +26,77 @@ Module: azure_monitor ## Overview -Monitor ExpressRoute gateways including bits and packets per second for ingress and egress, connection counts, CPU utilization, active flow counts, and gateway scale unit counts. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure ExpressRoute Gateway with metrics covering: + +- **Throughput** -- gateway throughput, connection throughput (bits/s in/out), packets per second +- **Compute** -- CPU utilization +- **Flows** -- active flows, max flow creation rate +- **Routes** -- routes advertised to peer, routes learned from peer, route changes +- **Scale** -- VMs in VNet + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +120,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +158,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +180,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +318,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +348,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +358,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +401,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +409,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -325,6 +459,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -413,31 +548,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Firewall.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Firewall.mdx index a62e6bbe1..bac62d321 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Firewall.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Firewall.mdx @@ -26,40 +26,78 @@ Module: azure_monitor ## Overview -Monitor Azure Firewall including data processed, throughput, application and network rule hit counts, SNAT port utilization, health state percentage, and latency probes. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Firewall with metrics covering: + +- **Traffic** -- data processed, throughput (bits/s) +- **Rules** -- application and network rule hit counts +- **SNAT** -- SNAT port utilization +- **Health** -- firewall health state percentage +- **Latency** -- latency probe +- **Capacity** -- observed capacity units + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +121,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +159,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +181,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +319,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +349,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +359,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +402,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +410,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -323,6 +458,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -408,31 +544,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Front Door.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Front Door.mdx index 3ee6d6f89..819994db9 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Front Door.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Front Door.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor Azure Front Door including request counts and rates, response sizes, total latency, origin health probe percentages, origin request counts, origin latency, WAF request counts by action and rule, and WebSocket connection metrics. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Front Door with metrics covering: + +- **Requests** -- client and origin request rates, origin shield requests (to shield/to origin/rate limited) +- **Latency** -- total latency, origin latency +- **Data transfer** -- request and response data transfer, origin shield data transfer, byte hit ratio +- **Errors** -- 4xx and 5xx error rates +- **Origin** -- origin health probe percentage +- **WAF** -- WAF requests, challenges (CAPTCHA/JS challenge) +- **WebSocket** -- WebSocket connections (requested/active), connection duration + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -327,6 +463,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -417,31 +554,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Functions.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Functions.mdx index 95adce1bc..e3405ebb2 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Functions.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Functions.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor Azure Functions execution including function invocation counts, execution units (MB-milliseconds), HTTP request rates and response codes, CPU and memory consumption, and Flex Consumption plan metrics for always-ready and on-demand instances. Uses the same underlying metrics as App Service since Azure Functions runs on the App Service platform. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. + +::: + +Monitor Azure Functions with metrics covering: + +- **Executions** -- function invocation counts, execution units (MB-milliseconds) +- **HTTP** -- request rates, response status codes +- **Compute** -- CPU utilization, CPU time consumed +- **Memory** -- memory usage (working set, private bytes) +- **Flex Consumption** -- always-ready and on-demand function executions and units + +Azure Functions runs on the App Service platform and shares the same underlying metrics. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | + +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +**Profile file locations:** -User profile files with the same filename override stock profiles. +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | + +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) +###### Managed identity with exact profiles -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
- -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -316,6 +452,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -420,31 +557,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure IoT Hub.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure IoT Hub.mdx index 26b49ddd0..85d1153bb 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure IoT Hub.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure IoT Hub.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor IoT Hub including device telemetry message rates and quota usage, routing delivery and latency, device twin read and write operations, direct method invocations, cloud-to-device messaging and feedback, job completion rates, device connection and authentication events, and event grid publish status. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure IoT Hub with metrics covering: + +- **Telemetry** -- device telemetry messages (attempted/sent), throttling errors, daily message quota usage +- **Routing** -- message deliveries by endpoint (Event Hubs, Service Bus, storage), routing latency, delivery status +- **Device twins** -- backend and device twin reads/writes (successful/failed), query results +- **Direct methods** -- method invocations (successful/failed), request/response sizes +- **Cloud-to-device** -- C2D commands (completed/abandoned/rejected), expired messages +- **Jobs** -- job completions, cancellations, list calls, twin update and method job creations +- **Connections** -- successful connections, connected devices, total devices +- **Event Grid** -- Event Grid deliveries, Event Grid latency +- **Data** -- device data usage + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -337,6 +475,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -454,31 +593,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Key Vault.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Key Vault.mdx index 058475566..8f4848439 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Key Vault.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Key Vault.mdx @@ -26,40 +26,75 @@ Module: azure_monitor ## Overview -Monitor Key Vault including overall vault availability, API saturation approaching service limits, and service API hit and latency metrics. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Key Vault with metrics covering: + +- **Availability** -- overall vault availability percentage +- **API** -- API activity (hits/results), API latency +- **Saturation** -- API saturation approaching service limits + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +118,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +156,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +178,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +316,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +346,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +356,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +399,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +407,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -323,6 +455,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -405,31 +538,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Kubernetes Service Cluster.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Kubernetes Service Cluster.mdx index 26576064d..85454f378 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Kubernetes Service Cluster.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Kubernetes Service Cluster.mdx @@ -26,40 +26,76 @@ Module: azure_monitor ## Overview -Monitor AKS cluster health including API server and etcd resource usage, pod scheduling status and readiness, node capacity and conditions, cluster autoscaler behavior, and per-node CPU, memory, disk, and network utilization. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Kubernetes Service (AKS) with metrics covering: + +- **Control plane** -- API server CPU/memory, etcd CPU/memory/database utilization, inflight requests +- **Nodes** -- allocatable CPU/memory, per-node CPU (millicores and %), memory (RSS, working set), disk usage, network traffic +- **Pods** -- pods by phase, pods in ready state, node conditions +- **Autoscaler** -- autoscaler health, unneeded nodes, unschedulable pods + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +119,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +157,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +179,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +317,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +347,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +357,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +400,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +408,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -332,6 +465,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -433,31 +567,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Load Balancer.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Load Balancer.mdx index 2bf8f065d..05b233b59 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Load Balancer.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Load Balancer.mdx @@ -26,40 +26,76 @@ Module: azure_monitor ## Overview -Monitor Azure Load Balancer health and throughput including data path and health probe availability, SYN and SNAT connection counts, byte and packet throughput, allocated and used SNAT ports, and connection attempt rates. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Load Balancer with metrics covering: + +- **Availability** -- data path availability, health probe status, global backend availability +- **Throughput** -- byte and packet throughput +- **Connections** -- SNAT connections, SYN packet count +- **SNAT ports** -- allocated and used SNAT ports + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +119,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +157,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +179,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +317,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +347,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +357,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +400,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +408,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -324,6 +457,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -410,31 +544,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Log Analytics Workspace.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Log Analytics Workspace.mdx index d92a40be3..232c4b274 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Log Analytics Workspace.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Log Analytics Workspace.mdx @@ -26,40 +26,76 @@ Module: azure_monitor ## Overview -Monitor Log Analytics workspaces including ingestion volume and latency, query execution counts and volume, available storage capacity, and per-table breakdowns of ingestion rates and billing volume. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Log Analytics Workspace with metrics covering: + +- **Ingestion** -- ingestion volume (records/s), ingestion latency (average/max/min) +- **Queries** -- query count (total/failed), query availability +- **Export** -- exported data (bytes/s), exported records +- **Legacy agent** -- CPU utilization (processor/privileged/user/idle), memory (available/used/free), disk (free space/utilization/I/O/queue/latency), network (traffic/packets/errors/throughput), swap, paging, system processes, uptime, events, heartbeats, users + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +119,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +157,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +179,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +317,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +347,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +357,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +400,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +408,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -334,6 +467,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -449,31 +583,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Logic Apps Workflow.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Logic Apps Workflow.mdx index 5c7e92c96..66fea68ab 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Logic Apps Workflow.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Logic Apps Workflow.mdx @@ -26,40 +26,78 @@ Module: azure_monitor ## Overview -Monitor Logic Apps workflow execution including run completions and failures, action execution counts, trigger firing rates, run and action latency, billable executions, and action-level success and failure breakdowns. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Logic Apps with metrics covering: + +- **Runs** -- run lifecycle (started/completed/succeeded/failed/cancelled), run failure rate +- **Actions** -- action lifecycle (started/completed/succeeded/failed/skipped) +- **Triggers** -- trigger lifecycle (started/completed/succeeded/fired/failed/skipped) +- **Latency** -- run, action, and trigger latency +- **Billing** -- billable executions (total/actions/triggers), billing by type (native/connector/storage) +- **Throttling** -- run, action, and trigger throttling events + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +121,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +159,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +181,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +319,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +349,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +359,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +402,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +410,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -332,6 +467,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -423,31 +559,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Machine Learning Workspace.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Machine Learning Workspace.mdx index 11130fd0e..62d3e11fe 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Machine Learning Workspace.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Machine Learning Workspace.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor Azure Machine Learning workspaces including active model deployments and registered models, pipeline run completions and failures, compute node utilization and preemptions, quota usage, managed endpoint request latency and rates, estimated GPU utilization, and storage utilization. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Machine Learning with metrics covering: + +- **Compute** -- CPU utilization and millicores (used/capacity), CPU memory (used/capacity) +- **GPU** -- GPU utilization (cluster/node), GPU memory (used/capacity), GPU energy +- **Cluster** -- total cores and nodes, cluster cores and nodes by state (active/idle/leaving/preempted/unusable) +- **Runs** -- run completion (completed/failed/cancelled), run lifecycle, run issues (errors/warnings) +- **Models** -- model registrations (succeeded/failed), model deployments (started/succeeded/failed) +- **Quota** -- quota utilization +- **Storage** -- disk I/O (read/write), disk usage (used/available), storage API calls +- **Network** -- network traffic (in/out), InfiniBand traffic +- **AI agents** -- agent runs, messages, tokens, tool calls, events, indexed files + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -335,6 +473,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -443,31 +582,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Monitor.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Monitor.mdx index 371843ed4..b3087a621 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Monitor.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Monitor.mdx @@ -26,42 +26,72 @@ Module: azure_monitor ## Overview -This collector monitors Azure resources through the Azure Monitor Metrics API. It automatically discovers -resources in your subscription and collects platform metrics based on configurable profiles, providing -visibility into the health and performance of over 35 Azure service types. +This collector provides real-time visibility into your Azure infrastructure by collecting platform metrics from the Azure Monitor Metrics API. +**Key capabilities:** -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +- **Multi-subscription** -- monitor resources across one or more Azure subscriptions in a single job +- **Automatic service detection** -- discovers resources and enables matching metric profiles without manual configuration +- **38 built-in service profiles** -- covers databases, compute, networking, storage, AI, analytics, and more +- **Flexible discovery** -- use structured filters (resource groups, regions, tags) or a custom Azure Resource Graph KQL query + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -85,25 +115,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -112,13 +153,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -130,25 +175,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -190,14 +313,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -206,26 +343,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) +###### Managed identity with exact profiles -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
- -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -233,35 +353,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -270,7 +396,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -278,7 +404,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -299,11 +426,11 @@ There are no alerts configured by default for this integration. ## Metrics -Metrics depend on which Azure Monitor profiles are enabled. Each profile corresponds to an Azure -service type and defines the specific metrics collected. With the default `profiles: [auto]` setting, -profiles are automatically enabled for resource types found in your subscription. +The metrics collected depend on which Azure Monitor profiles are active. Each profile corresponds to an Azure service (e.g., SQL Database, Virtual Machines) and defines the specific charts and metrics for that service. -See the service-specific integrations below for detailed metrics lists. +With the default `profiles.mode: auto`, profiles are activated automatically based on the resource types found in your subscriptions. + +**See the service-specific integrations below for detailed metric lists per Azure service.** @@ -376,31 +503,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure MySQL Flexible Server.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure MySQL Flexible Server.mdx index 40d32abb7..48d78690c 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure MySQL Flexible Server.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure MySQL Flexible Server.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor MySQL Flexible Server including active connections, aborted connections, query rates, replication lag, storage utilization, CPU and memory usage, IO operations, InnoDB buffer pool efficiency, network throughput, and HA replication status. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure MySQL Flexible Server with metrics covering: + +- **Compute** -- CPU utilization, CPU credits (consumed/remaining) +- **Memory** -- memory utilization +- **Storage** -- storage used/limit, storage breakdown (data/ibdata1/binlog), backup storage, server log storage, I/O utilization +- **Connections** -- active connections, aborted connections, total connections, threads running +- **Queries** -- queries (total/slow), DML statements (select/insert/update/delete), DDL statements +- **InnoDB** -- buffer pool I/O (read requests/disk reads), buffer pool pages, data writes, row lock time/waits +- **Replication** -- replication lag (replica/HA), HA status (I/O/SQL), replica status +- **Network** -- network traffic (in/out) +- **Health** -- uptime, deadlocks, lock timeouts + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -339,6 +477,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -450,31 +589,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure NAT Gateway.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure NAT Gateway.mdx index c68b9f5af..136a2bd74 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure NAT Gateway.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure NAT Gateway.mdx @@ -26,40 +26,76 @@ Module: azure_monitor ## Overview -Monitor NAT Gateway including byte and packet counts, connection counts, dropped packets, total SNAT connection counts, and datapath availability. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure NAT Gateway with metrics covering: + +- **Throughput** -- byte and packet throughput +- **Connections** -- SNAT connections, total SNAT connections +- **Drops** -- dropped packets +- **Availability** -- datapath availability percentage + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +119,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +157,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +179,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +317,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +347,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +357,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +400,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +408,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -324,6 +457,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -408,31 +542,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure PostgreSQL Flexible Server.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure PostgreSQL Flexible Server.mdx index 8a81e76c2..18bf425db 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure PostgreSQL Flexible Server.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure PostgreSQL Flexible Server.mdx @@ -26,40 +26,81 @@ Module: azure_monitor ## Overview -Monitor PostgreSQL Flexible Server including active connections, transaction rates, replication lag, storage and backup utilization, CPU and memory usage, IO throughput, autovacuum activity, PgBouncer connection pooling, database sessions, and burstable instance CPU credits. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure PostgreSQL Flexible Server with metrics covering: + +- **Compute** -- CPU utilization, burstable CPU credits (consumed/remaining) +- **Memory** -- memory utilization +- **Storage** -- storage used/free, backup storage, WAL storage, database size, disk queue depth and saturation +- **I/O** -- IOPS (read/write), disk throughput, temp bytes/files +- **Connections** -- active connections, connection rate, max connections, PgBouncer client/server/pooled connections +- **Database** -- transaction rate, commits/rollbacks, tuple reads/writes, replication lag (time/bytes), deadlocks +- **Maintenance** -- autovacuum operations, table coverage, bloat percentage, buffer cache hit rate +- **Sessions** -- sessions by state and wait event type, backend count +- **Availability** -- database alive state + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +124,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +162,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +184,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +322,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +352,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +362,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +405,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +413,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -339,6 +477,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -460,31 +599,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Database.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Database.mdx index 2a240c694..6ca7f6550 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Database.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Database.mdx @@ -26,40 +26,79 @@ Module: azure_monitor ## Overview -Monitor SQL Database performance including CPU and DTU utilization, storage consumption, active sessions and workers, deadlocks, IO rates, tempdb usage, in-memory OLTP storage, and serverless auto-pause and billing metrics. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure SQL Database with metrics covering: + +- **CPU and DTU** -- CPU utilization (average/max), instance CPU, DTU consumption, vCore usage +- **Memory** -- instance memory utilization +- **Storage** -- data and allocated storage, storage utilization, tempdb size, in-memory OLTP storage +- **I/O** -- data read and log write utilization, tempdb log utilization +- **Connections** -- successful, failed, and firewall-blocked connections, active sessions and workers +- **Availability** -- database availability percentage +- **Advanced** -- deadlocks, replication lag, serverless CPU/memory/billing, ledger digest, free tier usage + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +122,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +160,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +182,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +320,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +350,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +360,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +403,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +411,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -339,6 +475,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -439,31 +576,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Elastic Pool.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Elastic Pool.mdx index dbe4d83b2..cd26e0b25 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Elastic Pool.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Elastic Pool.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor SQL Elastic Pool resource consumption including eDTU and CPU utilization, storage usage, active sessions and workers, IO rates, tempdb usage, and in-memory OLTP storage across all databases in the pool. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. + +::: + +Monitor Azure SQL Elastic Pool with metrics covering: + +- **CPU and DTU** -- CPU utilization (average/max), instance CPU, DTU consumption, eDTU and vCore usage +- **Memory** -- instance memory utilization +- **Storage** -- data and allocated storage, storage utilization, tempdb size, in-memory OLTP storage +- **I/O** -- data read and log write utilization, tempdb log utilization +- **Sessions** -- active sessions and workers count, serverless CPU/memory utilization +- **Billing** -- serverless billing (vCore-seconds) + +Metrics are aggregated across all databases in the pool. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | + +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +**Profile file locations:** -User profile files with the same filename override stock profiles. +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | + +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) +###### Managed identity with exact profiles -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
- -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -334,6 +471,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -428,31 +566,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Managed Instance.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Managed Instance.mdx index ed25761b2..37da41995 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Managed Instance.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure SQL Managed Instance.mdx @@ -26,40 +26,75 @@ Module: azure_monitor ## Overview -Monitor SQL Managed Instance performance including virtual core CPU utilization, storage consumption, IO throughput, and average request wait times. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure SQL Managed Instance with metrics covering: + +- **Compute** -- CPU utilization (average/max), virtual core count +- **Storage** -- reserved and used storage +- **I/O** -- read/write throughput, I/O request rate + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +118,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +156,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +178,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +316,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +346,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +356,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +399,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +407,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -323,6 +455,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -406,31 +539,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Service Bus Namespace.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Service Bus Namespace.mdx index 8d8653eb5..77f7301cb 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Service Bus Namespace.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Service Bus Namespace.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor Service Bus namespaces including incoming and outgoing message rates, active connections, active and dead-lettered message counts, scheduled message counts, completed and abandoned requests, server errors, throttled requests, CPU and memory utilization, and pending checkpoint operations. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Service Bus with metrics covering: + +- **Messages** -- message flow (in/out), active messages, dead-lettered messages, scheduled messages, queue depth +- **Throughput** -- data throughput (in/out bytes per second) +- **Operations** -- completed and abandoned message operations, send latency +- **Connections** -- active connections, connection events (opened/closed) +- **Requests** -- incoming and successful request rates +- **Errors** -- server errors, user errors, throttled requests +- **Replication** -- replication lag (messages and duration) +- **Resources** -- namespace size, CPU and memory utilization, pending checkpoint operations + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -332,6 +469,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -426,31 +564,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Storage Account.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Storage Account.mdx index b391e714b..836c78418 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Storage Account.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Storage Account.mdx @@ -26,40 +26,77 @@ Module: azure_monitor ## Overview -Monitor Azure Storage Account operations including transaction counts, availability percentages, success and end-to-end latency, ingress and egress throughput, and used capacity. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Storage Account with metrics covering: + +- **Transactions** -- transaction count +- **Latency** -- end-to-end latency and server latency (average/max) +- **Throughput** -- ingress and egress bytes per second +- **Availability** -- service availability percentage +- **Capacity** -- used storage capacity + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +120,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +158,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +180,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +318,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +348,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +358,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +401,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +409,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -325,6 +459,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -409,31 +544,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Stream Analytics Job.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Stream Analytics Job.mdx index f65b6b351..ee89c6d88 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Stream Analytics Job.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Stream Analytics Job.mdx @@ -26,40 +26,78 @@ Module: azure_monitor ## Overview -Monitor Stream Analytics jobs including input and output event counts, streaming unit utilization, watermark delay, backlogged input events, runtime and data conversion errors, out-of-order events, and late input events. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Stream Analytics with metrics covering: + +- **Events** -- event flow (in/out), backlogged input events +- **Errors** -- runtime errors, data conversion errors, deserialization errors +- **Timing** -- late, early, and out-of-order events, watermark delay +- **Resources** -- CPU and streaming unit memory utilization +- **Input** -- input data throughput, input sources received +- **Functions** -- function events and requests (total/failed) + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +121,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +159,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +181,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +319,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +349,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +359,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +402,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +410,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -330,6 +465,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -418,31 +554,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Synapse Analytics Workspace.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Synapse Analytics Workspace.mdx index b859687bd..4f8392e44 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Synapse Analytics Workspace.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Synapse Analytics Workspace.mdx @@ -26,40 +26,77 @@ Module: azure_monitor ## Overview -Monitor Synapse Analytics workspaces including pipeline and activity run metrics, SQL request counts and data processing volumes, data flow activity execution, integration runtime CPU and memory utilization, and link table event processing. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Synapse Analytics with metrics covering: + +- **Pipeline** -- pipeline runs, activity runs, trigger runs +- **SQL pool** -- built-in SQL pool requests, login attempts, data processed +- **Streaming** -- event flow (in/out), event timing (late/early/out-of-order/backlogged), watermark delay, resource utilization, errors +- **Streaming I/O** -- input data throughput, input sources received +- **Link** -- connection events, processed data volume, changed rows, processing latency, table events + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +120,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +158,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +180,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +318,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +348,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +358,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +401,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +409,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -328,6 +462,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -424,31 +559,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure VPN Gateway.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure VPN Gateway.mdx index 647208737..757189e92 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure VPN Gateway.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure VPN Gateway.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor VPN Gateway including site-to-site bandwidth and BGP peer status, point-to-site connection counts and bandwidth, per-tunnel ingress and egress traffic with packet counts and drops, IPsec security association counts, route table sizes, NAT flow counts and packet translations, and gateway-level bandwidth utilization. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure VPN Gateway with metrics covering: + +- **Site-to-site** -- S2S bandwidth, tunnel bandwidth, tunnel bytes (ingress/egress), tunnel packets and drops +- **Point-to-site** -- P2S connection count, P2S bandwidth +- **BGP** -- BGP peer status, routes advertised and learned +- **ExpressRoute** -- ExpressRoute gateway bandwidth, CPU, packets, active flows, route changes, VMs in VNet +- **IPsec** -- MMSA and QMSA security association counts +- **NAT** -- NAT flows, NAT allocations, NATed bytes and packets, NAT packet drops +- **Routes** -- user VPN and VNet prefix route counts +- **Flows** -- gateway inbound/outbound flows, tunnel total flows, peak PPS, TS mismatch drops + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -334,6 +471,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -451,31 +589,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine Scale Set.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine Scale Set.mdx index 33655b743..fa1370329 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine Scale Set.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine Scale Set.mdx @@ -26,40 +26,82 @@ Module: azure_monitor ## Overview -Monitor Virtual Machine Scale Sets including CPU utilization, available memory percentage, disk IOPS and throughput for OS, data, temp, and premium cache disks, disk burst and VM-level burst credit balances, network traffic, and inbound/outbound flow creation rates across all instances in the scale set. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. + +::: + +Monitor Azure Virtual Machine Scale Sets with metrics covering: + +- **Compute** -- CPU utilization, CPU credits (consumed/remaining) +- **Memory** -- available memory (bytes and percentage) +- **Disk** -- IOPS, throughput, latency, and queue depth for OS, data, and temp disks +- **Disk burst** -- burst credits and capacity for OS and data disks, VM-level cached/uncached burst credits +- **Disk cache** -- premium OS and data disk cache hit/miss rates +- **Network** -- traffic in/out, network flows, flow creation rate +- **Availability** -- VMSS availability state +- **Throttling** -- cached and uncached I/O bandwidth/IOPS throttling + +Metrics are aggregated across all instances in the scale set. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +125,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | + +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +163,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +**Profile file locations:** -User profile files with the same filename override stock profiles. +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | + +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +185,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +323,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +353,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) +###### Managed identity with exact profiles -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
- -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +363,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +406,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +414,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -346,6 +485,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -460,31 +600,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine.mdx b/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine.mdx index 843772613..78c6ae8b2 100644 --- a/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine.mdx +++ b/docs/Collecting Metrics/Cloud and DevOps/Azure Virtual Machine.mdx @@ -26,40 +26,80 @@ Module: azure_monitor ## Overview -Monitor Azure Virtual Machines including CPU utilization, available memory percentage, disk IOPS and throughput for OS, data, temp, and premium cache disks, disk burst and VM-level burst credit balances, network traffic, and inbound/outbound flow creation rates. +:::info +This is part of the [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) collector. No separate setup is needed -- a single Azure Monitor job discovers and monitors all supported resource types automatically. -The collector uses Azure SDK clients for: -- Authentication via Entra ID (service principal, managed identity, or default credentials) -- Resource discovery via Azure Resource Graph queries -- Metrics collection via Azure Monitor Metrics batch API, grouped by region and time grain +::: + +Monitor Azure Virtual Machines with metrics covering: + +- **Compute** -- CPU utilization, CPU credits (consumed/remaining) +- **Memory** -- available memory (bytes and percentage) +- **Disk** -- IOPS, throughput, latency, and queue depth for OS, data, and temp disks +- **Disk burst** -- burst credits and capacity for OS and data disks, VM-level cached/uncached burst credits +- **Disk cache** -- premium OS and data disk cache hit/miss rates +- **Network** -- traffic in/out, network flows, flow creation rate +- **Availability** -- VM availability state +- **Throttling** -- cached and uncached I/O bandwidth/IOPS throttling + + +It uses the [Azure Monitor Metrics batch API](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-batch-api) to collect metrics, grouping requests by subscription, region, and time grain. Resources are discovered via [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview) queries at startup and refreshed periodically. Authentication is handled through [Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity/) (service principal, managed identity, or default credentials). This collector is supported on all platforms. This collector supports collecting metrics from multiple instances of this integration, including remote instances. -The monitoring principal needs read access to Azure Resource Graph and Azure Monitor metrics for target resources. +The service principal or managed identity requires these Azure RBAC roles: + +| Role | Purpose | Scope | +|:-----|:--------|:------| +| **Monitoring Reader** | Read Azure Monitor metrics | Subscription or resource group | +| **Reader** | Query Azure Resource Graph for resource discovery | Subscription or resource group | ### Default Behavior #### Auto-Detection -When `profile_selection_mode` is `auto` (the default), the collector queries Azure Resource Graph -to discover which resource types exist in the subscription and enables matching built-in profiles automatically. +The collector has two discovery phases: + +**Bootstrap (first run)** + +- With the default `profiles.mode: auto`, the collector queries Azure Resource Graph within the configured `subscription_ids` to find candidate resources. +- It matches discovered resource types against built-in profiles and automatically enables the relevant ones. +- Discovery scope can be narrowed using `discovery.mode: filters` (resource groups, regions, tags) or replaced entirely with `discovery.mode: query` for a custom KQL query. +- A single job can monitor multiple subscriptions. + +**Runtime (periodic refresh)** + +- Periodically re-discovers resources for **already-active profile types only**. +- Controlled by `discovery.refresh_every` (default: 300 seconds, set to 0 to disable). + +> **Important:** Runtime refresh does not activate new profiles. If a new resource type appears after bootstrap, restart the collector to pick it up. #### Limits -Azure Monitor metrics granularity is typically 1 minute. -The collector enforces a minimum collection interval of 60 seconds. +- **Minimum collection interval:** 60 seconds (enforced). Azure Monitor metrics granularity is typically 1 minute. +- **Metrics reporting delay:** Azure Monitor metrics have a 1-3 minute reporting delay. The collector uses `query_offset` (default: 180s) as a minimum offset and automatically uses a larger effective offset for slower time-grain batches when needed. +- **API throttling:** Azure Monitor applies per-subscription rate limits. The collector uses bounded concurrency and batching to stay within limits, but monitoring many resources in a single subscription may require tuning `limits.*` options. #### Performance Impact -The collector uses bounded request concurrency and batches resources and metrics to minimize API calls. -Default limits: 4 concurrent queries, 50 resources per batch, 20 metrics per query. +The collector batches resources and metrics to minimize Azure API calls and uses bounded concurrency to avoid overwhelming the API. + +**Default concurrency and batching limits:** + +| Setting | Default | Description | +|:--------|:--------|:------------| +| `limits.max_concurrency` | 4 | Maximum concurrent batch queries | +| `limits.max_batch_resources` | 50 | Maximum resources per batch request | +| `limits.max_metrics_per_query` | 20 | Maximum metrics per batch request | + +For large deployments, consider splitting resources across multiple jobs. If you hit Azure API rate limits, reduce `max_concurrency`. ## Setup @@ -83,25 +123,36 @@ UI configuration requires paid Netdata Cloud plan. #### Create an Azure monitoring principal -Create a service principal or use a managed identity with the following permissions: +The collector requires a service principal or managed identity with two Azure RBAC roles: + +| Role | Purpose | +|:-----|:--------| +| **Monitoring Reader** | Access Azure Monitor metrics for target resources | +| **Reader** | Query Azure Resource Graph for resource discovery | -1. **Monitoring Reader** role on the target subscription or resource groups (for Azure Monitor metrics access) -2. **Reader** role for Azure Resource Graph queries (for resource discovery) +**Option A: Service principal** -For service principal authentication: ```bash -# Create the service principal +# Create service principal with Monitoring Reader role az ad sp create-for-rbac --name "netdata-monitor" --role "Monitoring Reader" \ --scopes /subscriptions/ +# Add the Reader role for resource discovery +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ + # Note the appId (client_id), password (client_secret), and tenant ``` -For managed identity (on Azure VMs, VMSS, or AKS): +**Option B: Managed identity** (Azure VMs, VMSS, or AKS) + ```bash -# Assign Monitoring Reader role to the VM's managed identity +# Assign both roles to the VM's managed identity az role assignment create --assignee \ --role "Monitoring Reader" --scope /subscriptions/ + +az role assignment create --assignee \ + --role "Reader" --scope /subscriptions/ ``` @@ -110,13 +161,17 @@ az role assignment create --assignee \ #### Options -The following options can be defined globally: update_every, autodetection_retry. +The following options can be defined globally: `update_every`, `autodetection_retry`. + +**Profile file locations:** -Profile files are loaded from: -- Stock: `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` -- User: `/etc/netdata/go.d/azure_monitor.profiles/` +| Type | Path | +|:-----|:-----| +| Stock profiles | `/usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` | +| User overrides | `/etc/netdata/go.d/azure_monitor.profiles/` | -User profile files with the same filename override stock profiles. +User profile files with the same `id` as a stock profile override it. +Custom profiles extend the collector's catalog -- they do not replace the discovery mechanism.
@@ -128,25 +183,103 @@ User profile files with the same filename override stock profiles. |:------|:-----|:------------|:--------|:---------:| | **Collection** | update_every | Data collection interval (seconds). Must be at least 60. | 60 | no | | | autodetection_retry | Autodetection retry interval (seconds). Set 0 to disable. | 0 | no | -| **Target** | subscription_id | Azure subscription ID. | | yes | +| | subscription_ids | List of Azure subscription IDs to monitor. Used as the scope for resource discovery. | | yes | | | cloud | Azure cloud environment: `public`, `government`, or `china`. | public | no | -| **Collection** | discovery_every | Resource discovery interval in seconds. | 300 | no | -| | query_offset | Offset in seconds for metric query windows. Increase if metrics appear incomplete. | 180 | no | +| | [query_offset](#option-collection-query-offset) | Minimum offset (seconds) subtracted from metric query windows. Increase if metrics appear incomplete. | 180 | no | | | timeout | Timeout for Azure Resource Graph and Azure Monitor API requests, in seconds. | 30 | no | -| **Limits** | max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | -| | max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | -| | max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | -| **Profiles** | profile_selection_mode | Profile selection mode: `auto` discovers matching profiles via Azure Resource Graph, `exact` uses only listed profile ids, `combined` merges listed ids with auto-discovered profiles. | auto | no | -| | profile_selection_mode_exact.profiles | Profile ids to enable (used when `profile_selection_mode` is `exact`). | [] | no | -| | profile_selection_mode_combined.profiles | Profile ids to merge with auto-discovered profiles (used when `profile_selection_mode` is `combined`). | [] | no | -| **Filters** | resource_groups | Optional list of resource group names to restrict monitoring scope. | [] | no | -| **Authentication** | auth.mode | Authentication mode: `service_principal`, `managed_identity`, or `default`. | | yes | +| **Authentication** | [auth.mode](#option-authentication-auth-mode) | Authentication method: `service_principal`, `managed_identity`, or `default`. | | yes | | | auth.mode_service_principal.tenant_id | Entra ID tenant ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_id | Entra ID application (client) ID (required for `service_principal` mode). | | no | | | auth.mode_service_principal.client_secret | Entra ID client secret (required for `service_principal` mode). | | no | | | auth.mode_managed_identity.client_id | Client ID for user-assigned managed identity. Leave empty for system-assigned. | | no | +| **Discovery** | discovery.refresh_every | Interval (seconds) for refreshing discovered resources. Set `0` to disable runtime re-discovery after bootstrap. | 300 | no | +| | [discovery.mode](#option-discovery-discovery-mode) | Resource discovery method: `filters` (structured filters) or `query` (custom KQL). | filters | no | +| | discovery.mode_filters.resource_groups | Optional list of Azure resource groups to include in `filters` mode. | [] | no | +| | discovery.mode_filters.regions | Optional list of Azure regions to include in `filters` mode. | [] | no | +| | discovery.mode_filters.tags | Optional exact-match tag filters for `filters` mode. Keys are matched case-insensitively and values case-sensitively. | \{} | no | +| | [discovery.mode_query.kql](#option-discovery-discovery-mode-query-kql) | Custom Azure Resource Graph KQL for `query` mode. Must project `id`, `name`, `type`, `resourceGroup`, `location`. | | no | +| **Profiles** | [profiles.mode](#option-profiles-profiles-mode) | How profiles are selected: `auto` (discover from resources), `exact` (explicit list), or `combined` (both). | auto | no | +| | profiles.mode_exact.names | Explicit profile file basenames used by `exact` mode. Matching is case-insensitive. | [] | no | +| | profiles.mode_combined.names | Explicit profile file basenames merged with auto-discovered profiles in `combined` mode. Matching is case-insensitive. | [] | no | +| **Limits** | limits.max_concurrency | Maximum concurrent batch queries to Azure Monitor. | 4 | no | +| | limits.max_batch_resources | Maximum resources per Azure Monitor batch request. | 50 | no | +| | limits.max_metrics_per_query | Maximum metrics per Azure Monitor batch request. | 20 | no | | **Virtual Node** | vnode | Associates this data collection job with a [Virtual Node](https://learn.netdata.cloud/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts#virtual-nodes). | | no | + +##### query_offset + +Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector subtracts this offset from the current time when building metric query windows to avoid fetching incomplete data points. + +The configured `query_offset` acts as a minimum floor. For slower metric batches, the collector automatically uses a larger effective offset when the batch time grain is longer than the configured value. + +- **Default (180s)** works for most services. +- **Longer time grains** (for example `PT5M`) automatically use at least one full time grain as the effective offset. +- **Increase to 240-300s** if you still see gaps or missing data points. +- **Do not set below 60s** -- metrics will likely be incomplete. + + + +##### auth.mode + +Determines how the collector authenticates with Azure. + +| Mode | When to use | Required options | +|:-----|:------------|:-----------------| +| `service_principal` | Running outside Azure, or when you need explicit credentials | `tenant_id`, `client_id`, `client_secret` | +| `managed_identity` | Running on Azure VMs, VMSS, or AKS with a managed identity | Optionally `client_id` for user-assigned identity | +| `default` | Uses the Azure SDK default credential chain (environment variables, managed identity, Azure CLI, etc.) | None | + + + +##### discovery.mode + +Controls how the collector finds candidate Azure resources. + +| Mode | Behavior | +|:-----|:---------| +| `filters` | Builds an Azure Resource Graph query from the structured `mode_filters.*` options (resource groups, regions, tags). This is the default. | +| `query` | Uses the raw KQL you provide in `discovery.mode_query.kql`. The query must project `id`, `name`, `type`, `resourceGroup`, and `location`. | + + + +##### discovery.mode_query.kql + +A raw Azure Resource Graph KQL query used when `discovery.mode` is `query`. + +The query **must** project these five columns: + +| Column | Description | +|:-------|:------------| +| `id` | Full Azure resource ID (ARM format) | +| `name` | Resource name | +| `type` | Resource type (e.g., `microsoft.sql/servers/databases`) | +| `resourceGroup` | Resource group name | +| `location` | Azure region | + +Example: + +``` +resources +| where tags.env =~ "prod" +| project id, name, type, resourceGroup, location +``` + + + +##### profiles.mode + +Controls how the collector decides which metric profiles to activate. + +| Mode | Behavior | +|:-----|:---------| +| `auto` | Discovers resource types in your subscriptions and enables matching built-in profiles automatically. This is the default. | +| `exact` | Uses only the profile basenames listed under `profiles.mode_exact.names`. No auto-discovery. | +| `combined` | Merges auto-discovered profiles with the basenames listed under `profiles.mode_combined.names`. | + +Profile basename matching is case-insensitive. A basename is the profile filename without the `.yaml` / `.yml` suffix. + +
@@ -188,14 +321,28 @@ sudo ./edit-config go.d/azure_monitor.conf ##### Examples -###### Service principal (auto-discover all resources) +###### Service principal with structured discovery -Authenticate with a service principal and auto-discover all supported Azure resource types in the subscription. +Authenticate with a service principal and auto-discover resources across two subscriptions, filtered to the `production-rg` resource group in `eastus` with the tag `env=prod`. ```yaml jobs: - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" + - "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" + discovery: + mode: filters + mode_filters: + resource_groups: + - production-rg + regions: + - eastus + tags: + env: + - prod + profiles: + mode: auto auth: mode: service_principal mode_service_principal: @@ -204,26 +351,9 @@ jobs: client_secret: "your-client-secret" ``` -###### Managed identity (Azure VM/VMSS/AKS) - -Use the managed identity of the Azure VM, VMSS, or AKS node where Netdata is running. - -
-Config - -```yaml -jobs: - - name: prod - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - auth: - mode: managed_identity - -``` -
+###### Managed identity with exact profiles -###### Specific profiles only - -Monitor only specific Azure services instead of auto-discovering all resource types. +Use a managed identity (on an Azure VM, VMSS, or AKS) and monitor only SQL Database and PostgreSQL Flexible Server resources -- skip auto-discovery of other services.
Config @@ -231,35 +361,41 @@ Monitor only specific Azure services instead of auto-discovering all resource ty ```yaml jobs: - name: databases - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" profiles: - - sql_database - - postgres_flexible - - redis_cache + mode: exact + mode_exact: + names: + - sql_database + - postgres_flexible auth: - mode: service_principal - mode_service_principal: - tenant_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - client_secret: "your-client-secret" + mode: managed_identity ```
-###### Filter by resource group +###### Custom Azure Resource Graph KQL -Only monitor resources in specific resource groups. +Replace the built-in discovery filters with your own KQL query. Useful when you need joins, computed columns, or filtering logic that structured filters cannot express.
Config ```yaml jobs: - - name: prod-rg - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" - resource_groups: - - production-rg - - staging-rg + - name: prod-query + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + discovery: + mode: query + mode_query: + kql: | + resources + | where tags.env =~ "prod" + | project id, name, type, resourceGroup, location + profiles: + mode: auto auth: mode: default @@ -268,7 +404,7 @@ jobs: ###### Azure Government cloud -Connect to Azure Government cloud environment. +Connect to an Azure Government environment. Set `cloud: government` to use the correct authentication and API endpoints.
Config @@ -276,7 +412,8 @@ Connect to Azure Government cloud environment. ```yaml jobs: - name: gov - subscription_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + subscription_ids: + - "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cloud: government auth: mode: service_principal @@ -344,6 +481,7 @@ Labels: | region | The Azure region where the resource is deployed. | | resource_type | The Azure resource type identifier. | | profile | The Azure Monitor profile id. | +| subscription_id | The Azure subscription identifier. | | resource_uid | The unique Azure resource identifier. | Metrics: @@ -458,31 +596,46 @@ docker logs netdata 2>&1 | grep azure_monitor ### No metrics are collected -Verify the following: -1. The service principal or managed identity has **Monitoring Reader** role on the subscription or resource group. -2. The `subscription_id` in the configuration matches the subscription containing the target resources. -3. Target resources are running and producing metrics (check Azure Portal > Metrics for the resource). -4. Check the Netdata error log for authentication or API errors: `grep azure_monitor /var/log/netdata/error.log`. +Check the following: + +- **Permissions** -- The principal has both **Monitoring Reader** and **Reader** roles on the target subscription. +- **Subscription IDs** -- The `subscription_ids` list includes the correct subscription(s). +- **Resources are active** -- Verify in Azure Portal > Metrics that the resources are producing metrics. +- **Collector logs** -- Check for authentication or API errors: + ```bash + # systemd + journalctl -u netdata --namespace=netdata --grep azure_monitor --since "5 minutes ago" + # non-systemd + grep azure_monitor /var/log/netdata/collector.log + ``` ### Missing metrics for some resource types -Azure Monitor profiles are matched by resource type. If a resource type exists but no metrics appear: -1. Ensure `profiles: [auto]` (default) is set, or the specific profile id is listed. -2. Verify the resource type matches a built-in profile. Run `ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/` to see available profiles. -3. Some metrics require the resource to be actively processing data (e.g., IoT Hub telemetry metrics only appear when devices send messages). +Profiles are matched by Azure resource type. If a resource type exists but metrics are missing: + +- **Check profile mode** -- Ensure `profiles.mode: auto` (default), or explicitly list the profile basename under `profiles.mode_exact.names` or `profiles.mode_combined.names`. +- **Verify a built-in profile exists** -- List available profiles: + ```bash + ls /usr/lib/netdata/conf.d/go.d/azure_monitor.profiles/default/ + ``` +- **Check resource activity** -- Some metrics only appear when the resource is actively processing data (e.g., IoT Hub telemetry metrics require devices to be sending messages). +- **New resource types after startup** -- Runtime discovery does not activate new profiles. Restart the collector if new resource types were added after bootstrap. -### Metrics appear delayed +### Charts have gaps or incomplete data -Azure Monitor metrics have a built-in reporting delay of 1-3 minutes. The collector uses a `query_offset` (default: 180 seconds) to account for this. -If metrics are missing or incomplete, try increasing `query_offset` to 240 or 300 seconds. -Some metrics with longer time grains (e.g., PT5M) may take up to 5 minutes to appear. +Azure Monitor metrics have a built-in reporting delay of **1-3 minutes**. + +- The collector uses `query_offset` (default: **180 seconds**) as the minimum offset for metric query windows. +- Slower time-grain batches automatically use a larger effective offset when needed. +- If metrics are still missing or incomplete, increase `query_offset` to **240** or **300** seconds. ### Authentication errors in sovereign clouds For Azure Government or Azure China clouds, set the `cloud` parameter: + - Azure Government: `cloud: government` - Azure China (21Vianet): `cloud: china` diff --git a/docs/Netdata Agent/Configuration/Securing Agents/Web Server Reference.mdx b/docs/Netdata Agent/Configuration/Securing Agents/Web Server Reference.mdx index 2641e20c8..73d3ebf96 100644 --- a/docs/Netdata Agent/Configuration/Securing Agents/Web Server Reference.mdx +++ b/docs/Netdata Agent/Configuration/Securing Agents/Web Server Reference.mdx @@ -150,7 +150,7 @@ If you publish your Netdata web server to the internet, you may want to apply so
Disable the Web Server -Edit `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration#edit-configuration-files) +Edit the `[web]` section in `netdata.conf` (see [Configure Basic Settings](#configure-basic-settings) for edit-config usage): ```text [web] @@ -172,6 +172,21 @@ Control the number of threads and sockets with the following settings:
+
+Change the Default Port + +By default, the Netdata web server listens on port `19999`. To change the default port, edit the `[web]` section in `netdata.conf` (see [Configure Basic Settings](#configure-basic-settings) for edit-config usage): + +```text +[web] + default port = 8000 +``` + +If you have configured `[web].bind to` with explicit ports, you must also update those explicit port numbers, or remove the port portion so the entries fall back to using `default port`. +After modifying the configuration, restart the Netdata service to apply changes. See the [service control documentation](/docs/netdata-agent/maintenance/service-control) for instructions. + +
+
Binding Netdata to Multiple Ports diff --git a/docs/Welcome to Netdata/Monitor Anything.mdx b/docs/Welcome to Netdata/Monitor Anything.mdx index 6167604aa..2326ac52c 100644 --- a/docs/Welcome to Netdata/Monitor Anything.mdx +++ b/docs/Welcome to Netdata/Monitor Anything.mdx @@ -246,6 +246,7 @@ Need a dedicated integration? [Submit a feature request](https://github.com/netd | [vm.stats.vm.v_swappgs](/docs/collecting-metrics/operating-systems/vm.stats.vm.v_swappgs) | The metric swap amount of data read from and written to SWAP. | | [vm.swap_info](/docs/collecting-metrics/operating-systems/vm.swap_info) | Collect information about SWAP memory. | | [vm.vmtotal](/docs/collecting-metrics/operating-systems/vm.vmtotal) | Collect Virtual Memory information from host. | +| [Windows Services](/docs/collecting-metrics/operating-systems/windows-services) | This integration monitors resource utilization grouped by Windows Service. | | [Windows Services](/docs/collecting-metrics/operating-systems/windows-services) | This collector monitors Windows Services Status and States. | | [ZRAM](/docs/collecting-metrics/operating-systems/zram) | zRAM, or compressed RAM, is a block device that uses a portion of your system's RAM as a block device. | @@ -325,46 +326,46 @@ Need a dedicated integration? [Submit a feature request](https://github.com/netd |-------------|-------------| | [AWS EC2 Compute instances](/docs/collecting-metrics/cloud-and-devops/aws-ec2-compute-instances) | Track AWS EC2 instances key metrics for optimized performance and cost management. | | [AWS Quota](/docs/collecting-metrics/cloud-and-devops/aws-quota) | Monitor AWS service quotas for effective resource usage and cost management. | -| [Azure API Management](/docs/collecting-metrics/cloud-and-devops/azure-api-management) | Monitor API Management gateway performance including request throughput, response status codes, gateway and backend response times, failed request counts, capacity utilization, event hub events, websocket message counts, and network connection status. | -| [Azure App Service](/docs/collecting-metrics/cloud-and-devops/azure-app-service) | Monitor App Service web applications including HTTP request rates and response status codes, response times, CPU and memory usage, network throughput, file IO operations, .NET runtime statistics (threads, GC, assemblies), Azure Functions execution counts and units, and Flex Consumption plan metrics. | -| [Azure Application Gateway](/docs/collecting-metrics/cloud-and-devops/azure-application-gateway) | Monitor Application Gateway performance including throughput and traffic volume, request rates and response status codes, backend health and latency breakdown (connect, first byte, last byte), client latency, current and new connections, WebSocket sessions, capacity and compute units, CPU utilization, TLS connections, and WAF security events including rule matches, challenges, and penalty box activity. | -| [Azure Application Insights](/docs/collecting-metrics/cloud-and-devops/azure-application-insights) | Monitor application performance through Application Insights including availability test results and duration, server request rates and response times, dependency call tracking and failures, exception rates by source, browser page load timing breakdown, process CPU and memory usage, IO rates, HTTP request queue depth, page views, and trace volume. | -| [Azure Cache for Redis](/docs/collecting-metrics/cloud-and-devops/azure-cache-for-redis) | Monitor Azure Cache for Redis including cache hit and miss rates, read and write throughput, server load and CPU utilization, memory usage, connected clients, operations per second, command processing rates, latency percentiles, key eviction and expiration, and geo-replication health and sync status. | -| [Azure Cognitive Services](/docs/collecting-metrics/cloud-and-devops/azure-cognitive-services) | Monitor Azure AI and Cognitive Services including API call volume, success and client error rates, response latency, token processing rates for language models, content safety filtering, fine-tuning operations, provisioned throughput utilization, rate-limiting events, active inference connections, and context token cache performance. | -| [Azure Container Apps](/docs/collecting-metrics/cloud-and-devops/azure-container-apps) | Monitor Container Apps including CPU and memory usage, network traffic, replica counts, request processing rates, response times, restart frequency, and resource reservation utilization. | -| [Azure Container Instances](/docs/collecting-metrics/cloud-and-devops/azure-container-instances) | Monitor Container Instance groups including CPU and memory usage and network bytes transferred in and out. | -| [Azure Container Registry](/docs/collecting-metrics/cloud-and-devops/azure-container-registry) | Monitor Container Registry including storage usage, successful and failed pull and push operation counts, and task run duration. | -| [Azure Cosmos DB Account](/docs/collecting-metrics/cloud-and-devops/azure-cosmos-db-account) | Monitor Cosmos DB accounts including request unit consumption and throttling, document counts and storage, data and index sizes, replication latency, availability percentages, provisioned throughput utilization, and normalized RU consumption per partition. | -| [Azure Data Explorer Cluster](/docs/collecting-metrics/cloud-and-devops/azure-data-explorer-cluster) | Monitor Azure Data Explorer (Kusto) clusters including ingestion latency, volume, and success rates, query performance and concurrency, cache utilization, CPU and memory usage, export operations, streaming ingest throughput, materialized view health, instance counts, and follower lag. | -| [Azure Data Factory](/docs/collecting-metrics/cloud-and-devops/azure-data-factory) | Monitor Data Factory including pipeline, activity, and trigger run success and failure counts, integration runtime CPU and memory utilization, available capacity and queue lengths, SSIS package execution rates, copy operations throughput, data flow processing metrics, and overall factory resource utilization. | -| [Azure Event Grid Topic](/docs/collecting-metrics/cloud-and-devops/azure-event-grid-topic) | Monitor Event Grid topics including publish success and failure counts, publish latency, event delivery and routing rates, delivery success and failure counts, dead-lettered events, and matched event routing. | -| [Azure Event Hubs Namespace](/docs/collecting-metrics/cloud-and-devops/azure-event-hubs-namespace) | Monitor Event Hubs namespaces including incoming and outgoing message rates, byte throughput, captured messages and bytes, throttled and quota-exceeded request counts, active connections, and total connection counts. | -| [Azure ExpressRoute Circuit](/docs/collecting-metrics/cloud-and-devops/azure-expressroute-circuit) | Monitor ExpressRoute circuits including bits per second in and out, ARP and BGP availability percentages, packet drops, and QoS bit rate throughput. | -| [Azure ExpressRoute Gateway](/docs/collecting-metrics/cloud-and-devops/azure-expressroute-gateway) | Monitor ExpressRoute gateways including bits and packets per second for ingress and egress, connection counts, CPU utilization, active flow counts, and gateway scale unit counts. | -| [Azure Firewall](/docs/collecting-metrics/cloud-and-devops/azure-firewall) | Monitor Azure Firewall including data processed, throughput, application and network rule hit counts, SNAT port utilization, health state percentage, and latency probes. | -| [Azure Front Door](/docs/collecting-metrics/cloud-and-devops/azure-front-door) | Monitor Azure Front Door including request counts and rates, response sizes, total latency, origin health probe percentages, origin request counts, origin latency, WAF request counts by action and rule, and WebSocket connection metrics. | -| [Azure Functions](/docs/collecting-metrics/cloud-and-devops/azure-functions) | Monitor Azure Functions execution including function invocation counts, execution units (MB-milliseconds), HTTP request rates and response codes, CPU and memory consumption, and Flex Consumption plan metrics for always-ready and on-demand instances. | -| [Azure IoT Hub](/docs/collecting-metrics/cloud-and-devops/azure-iot-hub) | Monitor IoT Hub including device telemetry message rates and quota usage, routing delivery and latency, device twin read and write operations, direct method invocations, cloud-to-device messaging and feedback, job completion rates, device connection and authentication events, and event grid publish status. | -| [Azure Key Vault](/docs/collecting-metrics/cloud-and-devops/azure-key-vault) | Monitor Key Vault including overall vault availability, API saturation approaching service limits, and service API hit and latency metrics. | -| [Azure Kubernetes Service Cluster](/docs/collecting-metrics/cloud-and-devops/azure-kubernetes-service-cluster) | Monitor AKS cluster health including API server and etcd resource usage, pod scheduling status and readiness, node capacity and conditions, cluster autoscaler behavior, and per-node CPU, memory, disk, and network utilization. | -| [Azure Load Balancer](/docs/collecting-metrics/cloud-and-devops/azure-load-balancer) | Monitor Azure Load Balancer health and throughput including data path and health probe availability, SYN and SNAT connection counts, byte and packet throughput, allocated and used SNAT ports, and connection attempt rates. | -| [Azure Log Analytics Workspace](/docs/collecting-metrics/cloud-and-devops/azure-log-analytics-workspace) | Monitor Log Analytics workspaces including ingestion volume and latency, query execution counts and volume, available storage capacity, and per-table breakdowns of ingestion rates and billing volume. | -| [Azure Logic Apps Workflow](/docs/collecting-metrics/cloud-and-devops/azure-logic-apps-workflow) | Monitor Logic Apps workflow execution including run completions and failures, action execution counts, trigger firing rates, run and action latency, billable executions, and action-level success and failure breakdowns. | -| [Azure Machine Learning Workspace](/docs/collecting-metrics/cloud-and-devops/azure-machine-learning-workspace) | Monitor Azure Machine Learning workspaces including active model deployments and registered models, pipeline run completions and failures, compute node utilization and preemptions, quota usage, managed endpoint request latency and rates, estimated GPU utilization, and storage utilization. | -| [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) | This collector monitors Azure resources through the Azure Monitor Metrics API. | -| [Azure MySQL Flexible Server](/docs/collecting-metrics/cloud-and-devops/azure-mysql-flexible-server) | Monitor MySQL Flexible Server including active connections, aborted connections, query rates, replication lag, storage utilization, CPU and memory usage, IO operations, InnoDB buffer pool efficiency, network throughput, and HA replication status. | -| [Azure NAT Gateway](/docs/collecting-metrics/cloud-and-devops/azure-nat-gateway) | Monitor NAT Gateway including byte and packet counts, connection counts, dropped packets, total SNAT connection counts, and datapath availability. | -| [Azure PostgreSQL Flexible Server](/docs/collecting-metrics/cloud-and-devops/azure-postgresql-flexible-server) | Monitor PostgreSQL Flexible Server including active connections, transaction rates, replication lag, storage and backup utilization, CPU and memory usage, IO throughput, autovacuum activity, PgBouncer connection pooling, database sessions, and burstable instance CPU credits. | -| [Azure Service Bus Namespace](/docs/collecting-metrics/cloud-and-devops/azure-service-bus-namespace) | Monitor Service Bus namespaces including incoming and outgoing message rates, active connections, active and dead-lettered message counts, scheduled message counts, completed and abandoned requests, server errors, throttled requests, CPU and memory utilization, and pending checkpoint operations. | -| [Azure SQL Database](/docs/collecting-metrics/cloud-and-devops/azure-sql-database) | Monitor SQL Database performance including CPU and DTU utilization, storage consumption, active sessions and workers, deadlocks, IO rates, tempdb usage, in-memory OLTP storage, and serverless auto-pause and billing metrics. | -| [Azure SQL Elastic Pool](/docs/collecting-metrics/cloud-and-devops/azure-sql-elastic-pool) | Monitor SQL Elastic Pool resource consumption including eDTU and CPU utilization, storage usage, active sessions and workers, IO rates, tempdb usage, and in-memory OLTP storage across all databases in the pool. | -| [Azure SQL Managed Instance](/docs/collecting-metrics/cloud-and-devops/azure-sql-managed-instance) | Monitor SQL Managed Instance performance including virtual core CPU utilization, storage consumption, IO throughput, and average request wait times. | -| [Azure Storage Account](/docs/collecting-metrics/cloud-and-devops/azure-storage-account) | Monitor Azure Storage Account operations including transaction counts, availability percentages, success and end-to-end latency, ingress and egress throughput, and used capacity. | -| [Azure Stream Analytics Job](/docs/collecting-metrics/cloud-and-devops/azure-stream-analytics-job) | Monitor Stream Analytics jobs including input and output event counts, streaming unit utilization, watermark delay, backlogged input events, runtime and data conversion errors, out-of-order events, and late input events. | -| [Azure Synapse Analytics Workspace](/docs/collecting-metrics/cloud-and-devops/azure-synapse-analytics-workspace) | Monitor Synapse Analytics workspaces including pipeline and activity run metrics, SQL request counts and data processing volumes, data flow activity execution, integration runtime CPU and memory utilization, and link table event processing. | -| [Azure Virtual Machine](/docs/collecting-metrics/cloud-and-devops/azure-virtual-machine) | Monitor Azure Virtual Machines including CPU utilization, available memory percentage, disk IOPS and throughput for OS, data, temp, and premium cache disks, disk burst and VM-level burst credit balances, network traffic, and inbound/outbound flow creation rates. | -| [Azure Virtual Machine Scale Set](/docs/collecting-metrics/cloud-and-devops/azure-virtual-machine-scale-set) | Monitor Virtual Machine Scale Sets including CPU utilization, available memory percentage, disk IOPS and throughput for OS, data, temp, and premium cache disks, disk burst and VM-level burst credit balances, network traffic, and inbound/outbound flow creation rates across all instances in the scale set. | -| [Azure VPN Gateway](/docs/collecting-metrics/cloud-and-devops/azure-vpn-gateway) | Monitor VPN Gateway including site-to-site bandwidth and BGP peer status, point-to-site connection counts and bandwidth, per-tunnel ingress and egress traffic with packet counts and drops, IPsec security association counts, route table sizes, NAT flow counts and packet translations, and gateway-level bandwidth utilization. | +| [Azure API Management](/docs/collecting-metrics/cloud-and-devops/azure-api-management) | :::info | +| [Azure App Service](/docs/collecting-metrics/cloud-and-devops/azure-app-service) | :::info | +| [Azure Application Gateway](/docs/collecting-metrics/cloud-and-devops/azure-application-gateway) | :::info | +| [Azure Application Insights](/docs/collecting-metrics/cloud-and-devops/azure-application-insights) | :::info | +| [Azure Cache for Redis](/docs/collecting-metrics/cloud-and-devops/azure-cache-for-redis) | :::info | +| [Azure Cognitive Services](/docs/collecting-metrics/cloud-and-devops/azure-cognitive-services) | :::info | +| [Azure Container Apps](/docs/collecting-metrics/cloud-and-devops/azure-container-apps) | :::info | +| [Azure Container Instances](/docs/collecting-metrics/cloud-and-devops/azure-container-instances) | :::info | +| [Azure Container Registry](/docs/collecting-metrics/cloud-and-devops/azure-container-registry) | :::info | +| [Azure Cosmos DB Account](/docs/collecting-metrics/cloud-and-devops/azure-cosmos-db-account) | :::info | +| [Azure Data Explorer Cluster](/docs/collecting-metrics/cloud-and-devops/azure-data-explorer-cluster) | :::info | +| [Azure Data Factory](/docs/collecting-metrics/cloud-and-devops/azure-data-factory) | :::info | +| [Azure Event Grid Topic](/docs/collecting-metrics/cloud-and-devops/azure-event-grid-topic) | :::info | +| [Azure Event Hubs Namespace](/docs/collecting-metrics/cloud-and-devops/azure-event-hubs-namespace) | :::info | +| [Azure ExpressRoute Circuit](/docs/collecting-metrics/cloud-and-devops/azure-expressroute-circuit) | :::info | +| [Azure ExpressRoute Gateway](/docs/collecting-metrics/cloud-and-devops/azure-expressroute-gateway) | :::info | +| [Azure Firewall](/docs/collecting-metrics/cloud-and-devops/azure-firewall) | :::info | +| [Azure Front Door](/docs/collecting-metrics/cloud-and-devops/azure-front-door) | :::info | +| [Azure Functions](/docs/collecting-metrics/cloud-and-devops/azure-functions) | :::info | +| [Azure IoT Hub](/docs/collecting-metrics/cloud-and-devops/azure-iot-hub) | :::info | +| [Azure Key Vault](/docs/collecting-metrics/cloud-and-devops/azure-key-vault) | :::info | +| [Azure Kubernetes Service Cluster](/docs/collecting-metrics/cloud-and-devops/azure-kubernetes-service-cluster) | :::info | +| [Azure Load Balancer](/docs/collecting-metrics/cloud-and-devops/azure-load-balancer) | :::info | +| [Azure Log Analytics Workspace](/docs/collecting-metrics/cloud-and-devops/azure-log-analytics-workspace) | :::info | +| [Azure Logic Apps Workflow](/docs/collecting-metrics/cloud-and-devops/azure-logic-apps-workflow) | :::info | +| [Azure Machine Learning Workspace](/docs/collecting-metrics/cloud-and-devops/azure-machine-learning-workspace) | :::info | +| [Azure Monitor](/docs/collecting-metrics/cloud-and-devops/azure-monitor) | This collector provides real-time visibility into your Azure infrastructure by collecting platform metrics from the Azure Monitor Metrics API. | +| [Azure MySQL Flexible Server](/docs/collecting-metrics/cloud-and-devops/azure-mysql-flexible-server) | :::info | +| [Azure NAT Gateway](/docs/collecting-metrics/cloud-and-devops/azure-nat-gateway) | :::info | +| [Azure PostgreSQL Flexible Server](/docs/collecting-metrics/cloud-and-devops/azure-postgresql-flexible-server) | :::info | +| [Azure Service Bus Namespace](/docs/collecting-metrics/cloud-and-devops/azure-service-bus-namespace) | :::info | +| [Azure SQL Database](/docs/collecting-metrics/cloud-and-devops/azure-sql-database) | :::info | +| [Azure SQL Elastic Pool](/docs/collecting-metrics/cloud-and-devops/azure-sql-elastic-pool) | :::info | +| [Azure SQL Managed Instance](/docs/collecting-metrics/cloud-and-devops/azure-sql-managed-instance) | :::info | +| [Azure Storage Account](/docs/collecting-metrics/cloud-and-devops/azure-storage-account) | :::info | +| [Azure Stream Analytics Job](/docs/collecting-metrics/cloud-and-devops/azure-stream-analytics-job) | :::info | +| [Azure Synapse Analytics Workspace](/docs/collecting-metrics/cloud-and-devops/azure-synapse-analytics-workspace) | :::info | +| [Azure Virtual Machine](/docs/collecting-metrics/cloud-and-devops/azure-virtual-machine) | :::info | +| [Azure Virtual Machine Scale Set](/docs/collecting-metrics/cloud-and-devops/azure-virtual-machine-scale-set) | :::info | +| [Azure VPN Gateway](/docs/collecting-metrics/cloud-and-devops/azure-vpn-gateway) | :::info | | [BOSH](/docs/collecting-metrics/cloud-and-devops/bosh) | Keep an eye on BOSH deployment metrics for improved cloud orchestration and resource management. | | [Cloud Foundry](/docs/collecting-metrics/cloud-and-devops/cloud-foundry) | Track Cloud Foundry platform metrics for optimized application deployment and management. | | [Cloud Foundry Firehose](/docs/collecting-metrics/cloud-and-devops/cloud-foundry-firehose) | Monitor Cloud Foundry Firehose metrics for comprehensive platform diagnostics and management. | diff --git a/ingest/generated_map.yaml b/ingest/generated_map.yaml index 10c2a9106..9c8c44be9 100644 --- a/ingest/generated_map.yaml +++ b/ingest/generated_map.yaml @@ -4561,6 +4561,15 @@ meta_yaml: https://github.com/netdata/netdata/edit/master/src/collectors/freebsd.plugin/metadata.yaml message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/integrations/windows_services.md + sidebar_label: Windows Services + learn_status: Published + learn_rel_path: Collecting Metrics/Operating Systems + keywords: '[''windows'', ''services'', ''processes'', ''os'', ''host monitoring'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml + FILE - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/windows.plugin/integrations/windows_services.md sidebar_label: Windows Services learn_status: Published diff --git a/ingest/one_commit_back_file-dict.yaml b/ingest/one_commit_back_file-dict.yaml index 7dd33bd3f..f97012bb1 100644 --- a/ingest/one_commit_back_file-dict.yaml +++ b/ingest/one_commit_back_file-dict.yaml @@ -1062,6 +1062,8 @@ learn_path: /docs/collecting-metrics/applications/cups - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/integrations/user_groups.md learn_path: /docs/collecting-metrics/operating-systems/user-groups +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/integrations/windows_services.md + learn_path: /docs/collecting-metrics/operating-systems/windows-services - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/integrations/users.md learn_path: /docs/collecting-metrics/operating-systems/users - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/integrations/applications.md