Skip to content
2 changes: 1 addition & 1 deletion docs/all-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ keywords:
|Standard support|❌|✅|✅|
|Premium support (24/7, 1 hour), trainings|❌|✅|✅|

See also: [Development roadmap](/docs/dblab-roadmap).
See also: [Development roadmap](/docs/roadmap).
3 changes: 2 additions & 1 deletion docs/database-lab/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import useBaseUrl from '@docusaurus/useBaseUrl';
- [DBLab tutorial for Amazon RDS Postgres](/docs/tutorials/database-lab-tutorial-amazon-rds)
- [Supported databases](/docs/database-lab/supported-databases)
- [DBLab UI](/docs/database-lab/user-interface)
- [Prometheus monitoring](/docs/database-lab/prometheus-monitoring)
- [Data masking](/docs/database-lab/masking)
- [DB Migration Checker](/docs/database-lab/db-migration-checker)
- [Telemetry](/docs/database-lab/telemetry)
Expand Down Expand Up @@ -69,7 +70,7 @@ Some problems that can be solved by using DBLab:
- Works well both on-premise and in clouds.
- Thin provisioning in seconds thanks to copy-on-write (CoW) provided by [ZFS](https://en.wikipedia.org/wiki/ZFS) and a special methodology for preparing PostgreSQL database snapshots. There is also an option to use [LVM](https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) instead of ZFS.
- Unlimited size of databases (Postgres database size [is unlimited](https://www.postgresql.org/docs/current/limits.html), ZFS volume can be up to 21^28 bytes, or [256 trillion yobibytes](https://en.wikipedia.org/wiki/ZFS)).
- Supports PostgreSQL from version 9.6 up to the most recently released version.
- Supports PostgreSQL from version 10 up to the most recently released version.
- Thin cloning takes only a few seconds, regardless of the database size.
- REST API.
- Client CLI included.
Expand Down
271 changes: 271 additions & 0 deletions docs/database-lab/prometheus-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
---
title: Prometheus monitoring
sidebar_label: Prometheus monitoring
---

DBLab Engine exposes Prometheus metrics via the `/metrics` endpoint. These metrics can be used to monitor the health and performance of the DBLab instance.

:::note
Prometheus metrics support was added in DBLab Engine 4.1.
:::

## Endpoint

```
GET /metrics
```

The endpoint is publicly accessible (no authentication required) and returns metrics in Prometheus text format.

## Available metrics

### Instance metrics

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_instance_info` | Gauge | `instance_id`, `version`, `edition` | Information about the DBLab instance (always 1) |
| `dblab_instance_uptime_seconds` | Gauge | - | Time in seconds since the DBLab instance started |
| `dblab_instance_status_code` | Gauge | - | Status code of the DBLab instance (0=OK, 1=Warning, 2=Bad) |
| `dblab_retrieval_status` | Gauge | `mode`, `status` | Status of data retrieval (1=active for status) |

### Disk/pool metrics

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_disk_total_bytes` | Gauge | `pool` | Total disk space in bytes |
| `dblab_disk_free_bytes` | Gauge | `pool` | Free disk space in bytes |
| `dblab_disk_used_bytes` | Gauge | `pool` | Used disk space in bytes |
| `dblab_disk_used_by_snapshots_bytes` | Gauge | `pool` | Disk space used by snapshots in bytes |
| `dblab_disk_used_by_clones_bytes` | Gauge | `pool` | Disk space used by clones in bytes |
| `dblab_disk_data_size_bytes` | Gauge | `pool` | Size of the data directory in bytes |
| `dblab_disk_compress_ratio` | Gauge | `pool` | Compression ratio of the filesystem (ZFS) |
| `dblab_pool_status` | Gauge | `pool`, `mode`, `status` | Status of the pool (1=active for status) |

### Clone metrics (aggregate)

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_clones_total` | Gauge | - | Total number of clones |
| `dblab_clones_by_status` | Gauge | `status` | Number of clones by status |
| `dblab_clone_max_age_seconds` | Gauge | - | Maximum age of any clone in seconds |
| `dblab_clone_total_diff_size_bytes` | Gauge | - | Total extra disk space used by all clones (sum of diffs from snapshots) |
| `dblab_clone_total_logical_size_bytes` | Gauge | - | Total logical size of all clone data |
| `dblab_clone_total_cpu_usage_percent` | Gauge | - | Total CPU usage percentage across all clone containers |
| `dblab_clone_avg_cpu_usage_percent` | Gauge | - | Average CPU usage percentage across all clone containers with valid data |
| `dblab_clone_total_memory_usage_bytes` | Gauge | - | Total memory usage in bytes across all clone containers |
| `dblab_clone_total_memory_limit_bytes` | Gauge | - | Total memory limit in bytes across all clone containers |
| `dblab_clone_protected_count` | Gauge | - | Number of protected clones |

### Snapshot metrics (aggregate)

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_snapshots_total` | Gauge | - | Total number of snapshots |
| `dblab_snapshots_by_pool` | Gauge | `pool` | Number of snapshots by pool |
| `dblab_snapshot_max_age_seconds` | Gauge | - | Maximum age of any snapshot in seconds |
| `dblab_snapshot_total_physical_size_bytes` | Gauge | - | Total physical disk space used by all snapshots |
| `dblab_snapshot_total_logical_size_bytes` | Gauge | - | Total logical size of all snapshot data |
| `dblab_snapshot_max_data_lag_seconds` | Gauge | - | Maximum data lag of any snapshot in seconds |
| `dblab_snapshot_total_num_clones` | Gauge | - | Total number of clones across all snapshots |

### Branch metrics

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_branches_total` | Gauge | - | Total number of branches |

### Dataset metrics

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_datasets_total` | Gauge | `pool` | Total number of datasets (slots) in the pool |
| `dblab_datasets_available` | Gauge | `pool` | Number of available (non-busy) dataset slots for reuse |

### Sync instance metrics (physical mode)

These metrics are only available when DBLab is running in physical mode with a sync instance enabled. They track the WAL replay status of the sync instance.

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_sync_status` | Gauge | `status` | Status of the sync instance (1=active for status code) |
| `dblab_sync_wal_lag_seconds` | Gauge | - | WAL replay lag in seconds for the sync instance |
| `dblab_sync_uptime_seconds` | Gauge | - | Uptime of the sync instance in seconds |
| `dblab_sync_last_replayed_timestamp` | Gauge | - | Unix timestamp of the last replayed transaction |

### Observability metrics

These metrics help monitor the health of the metrics collection system itself.

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `dblab_scrape_success_timestamp` | Gauge | - | Unix timestamp of last successful metrics collection |
| `dblab_scrape_duration_seconds` | Gauge | - | Duration of last metrics collection in seconds |
| `dblab_scrape_errors_total` | Counter | - | Total number of errors during metrics collection |

## Prometheus configuration

Add the following to your `prometheus.yml`:

```yaml
scrape_configs:
- job_name: 'dblab'
static_configs:
- targets: ['<dblab-host>:<dblab-port>']
metrics_path: /metrics
```

Replace `<dblab-host>` and `<dblab-port>` with your DBLab instance's host and API port (default: `2345`).

## Example queries

### Free disk space percentage

```promql
100 * dblab_disk_free_bytes / dblab_disk_total_bytes
```

### Number of active clones

```promql
dblab_clones_total
```

### Maximum clone age in hours

```promql
dblab_clone_max_age_seconds / 3600
```

### Data freshness (lag from current time)

```promql
dblab_snapshot_max_data_lag_seconds / 60
```

### WAL replay lag (physical mode)

```promql
dblab_sync_wal_lag_seconds
```

## Alerting examples

### Low disk space alert

```yaml
- alert: DBLabLowDiskSpace
expr: (dblab_disk_free_bytes / dblab_disk_total_bytes) * 100 < 20
for: 5m
labels:
severity: warning
annotations:
summary: "DBLab low disk space"
description: "DBLab pool {{ $labels.pool }} has less than 20% free disk space"
```

### Stale snapshot alert

```yaml
- alert: DBLabStaleSnapshot
expr: dblab_snapshot_max_data_lag_seconds > 86400
for: 10m
labels:
severity: warning
annotations:
summary: "DBLab snapshot data is stale"
description: "DBLab snapshot data is more than 24 hours old"
```

### High clone count alert

```yaml
- alert: DBLabHighCloneCount
expr: dblab_clones_total > 50
for: 5m
labels:
severity: warning
annotations:
summary: "DBLab has many clones"
description: "DBLab has {{ $value }} clones running"
```

### High WAL replay lag alert (physical mode)

```yaml
- alert: DBLabHighWALLag
expr: dblab_sync_wal_lag_seconds > 3600
for: 10m
labels:
severity: warning
annotations:
summary: "DBLab sync instance has high WAL lag"
description: "DBLab sync instance WAL replay is {{ $value | humanizeDuration }} behind"
```

### Metrics collection stale alert

```yaml
- alert: DBLabMetricsStale
expr: time() - dblab_scrape_success_timestamp > 300
for: 5m
labels:
severity: warning
annotations:
summary: "DBLab metrics collection is stale"
description: "DBLab metrics have not been updated for more than 5 minutes"
```

### Sync instance down alert (physical mode)

```yaml
- alert: DBLabSyncDown
expr: dblab_sync_status{status="down"} == 1 or dblab_sync_status{status="error"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "DBLab sync instance is down"
description: "DBLab sync instance is not healthy"
```

## OpenTelemetry integration

DBLab metrics can be exported to OpenTelemetry-compatible backends using the OpenTelemetry Collector. This allows you to send metrics to Grafana Cloud, Datadog, New Relic, and other observability platforms.

### Quick start

1. Install the OpenTelemetry Collector:
```bash
docker pull otel/opentelemetry-collector-contrib:latest
```

2. Copy the example configuration from the DBLab Engine repository:
```bash
cp engine/configs/otel-collector.example.yml otel-collector.yml
```

3. Edit `otel-collector.yml` to configure your backend:
```yaml
exporters:
otlp:
endpoint: "your-otlp-endpoint:4317"
headers:
Authorization: "Bearer <your-token>"
```

4. Run the collector:
```bash
docker run -v $(pwd)/otel-collector.yml:/etc/otelcol/config.yaml \
-p 4317:4317 -p 8889:8889 \
otel/opentelemetry-collector-contrib:latest
```

### Supported backends

The OTel Collector can export to:
- **Grafana Cloud** — use OTLP exporter with Grafana Cloud endpoint
- **Datadog** — use the datadog exporter
- **New Relic** — use OTLP exporter with New Relic endpoint
- **Prometheus Remote Write** — use prometheusremotewrite exporter
- **AWS CloudWatch** — use awsemf exporter
- **Any OTLP-compatible backend**
3 changes: 1 addition & 2 deletions docs/database-lab/supported-databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ title: PostgreSQL versions and extensions supported in DBLab Engine

## PostgreSQL versions
Currently, DBLab Engine fully supports the following [PostgreSQL major versions](https://www.postgresql.org/support/versioning/):
- 9.6 (released: 2016-09-29; EOL: 2021-11-11)
- 10 (released: 2017-10-05; EOL: 2022-11-10)
- 11 (released: 2018-10-18; EOL: 2023-11-09)
- 12 (released: 2019-10-03; EOL: 2024-11-14)
Expand All @@ -15,7 +14,7 @@ Currently, DBLab Engine fully supports the following [PostgreSQL major versions]
- 17 (released: 2024-09-26; EOL: 2029-11-08)
- 18 (released: 2025-09-25; EOL: 2030-11-13)

By default, version 17 is used: `postgresai/extended-postgres:17`.
By default, version 18 is used in the example configurations: `postgresai/extended-postgres:18`.

The images are published in [Docker Hub](https://hub.docker.com/r/postgresai/extended-postgres).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ keywords:
## Configure masking for PostgreSQL log
When Database Lab's CI Observer is used for automated testing of database migrations, it stores PostgreSQL log in DBLab Platform's centralized storage. You can optionally configure masking rules for sensitive data in the PostgreSQL log. Such rules will be continuously applied before sending any PostgreSQL log entries to the Platform's storage.

You can define masking rules in the form of regular expressions. To do it, open the DBLab Engine configuration file (usually, `~/.dblab/engine/configs/server.yml`; see config file examples [here](https://gitlab.com/postgres-ai/database-lab/-/tree/v4.0.3/engine/configs)) and define subsection `replacementRules` in the section `replacementRules`. A basic example:
You can define masking rules in the form of regular expressions. To do it, open the DBLab Engine configuration file (usually, `~/.dblab/engine/configs/server.yml`; see config file examples [here](https://gitlab.com/postgres-ai/database-lab/-/tree/v4.1.0/engine/configs)) and define subsection `replacementRules` in the section `replacementRules`. A basic example:
```yaml
observer:
replacementRules:
Expand Down
4 changes: 2 additions & 2 deletions docs/dblab-howtos/administration/data/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To set up it you need to use following jobs:
- [physicalSnapshot](/docs/reference-guides/database-lab-engine-configuration-reference#job-physicalsnapshot)

### Options
Copy the example configuration file [`config.example.physical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.0.3/engine/configs/config.example.physical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml`. For demo purposes we've used `pg_basebackup` tool, but you can use any tool suitable for the task. Check and update the following options:
Copy the example configuration file [`config.example.physical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.1.0/engine/configs/config.example.physical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml`. For demo purposes we've used `pg_basebackup` tool, but you can use any tool suitable for the task. Check and update the following options:
- Set secure `server:verificationToken`, it will be used to authorize API requests to the Engine
- Set connection options in `physicalRestore:options:envs`, based on your tool
- Set PostgreSQL commands in `physicalRestore:options:customTool`:
Expand All @@ -43,7 +43,7 @@ sudo docker run \
--env DOCKER_API_VERSION=1.39 \
--detach \
--restart on-failure \
postgresai/dblab-server:4.0.3
postgresai/dblab-server:4.1.0
```

:::info
Expand Down
4 changes: 2 additions & 2 deletions docs/dblab-howtos/administration/data/dump.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In order to set up DBLab Engine to automatically get the data from database usin
- [logicalSnapshot](/docs/reference-guides/database-lab-engine-configuration-reference#job-logicalsnapshot)

### Options
Copy the contents of configuration example [`config.example.logical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.0.3/engine/configs/config.example.logical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml` and update the following options:
Copy the contents of configuration example [`config.example.logical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.1.0/engine/configs/config.example.logical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml` and update the following options:
- Set secure `server:verificationToken`, it will be used to authorize API requests to the Engine
- Set connection options in `retrieval:spec:logicalDump:options:source:connection`:
- `dbname`: database name to connect to
Expand Down Expand Up @@ -44,7 +44,7 @@ sudo docker run \
--env DOCKER_API_VERSION=1.39 \
--detach \
--restart on-failure \
postgresai/dblab-server:4.0.3
postgresai/dblab-server:4.1.0
```

You can use PGPASSWORD env to set the password.
Expand Down
1 change: 1 addition & 0 deletions docs/dblab-howtos/administration/data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ slug: /dblab-howtos/administration/data
### Logical
- [Dump](/docs/dblab-howtos/administration/data/dump)
- [RDS](/docs/dblab-howtos/administration/data/rds)
- [RDS/Aurora refresh](/docs/dblab-howtos/administration/data/rds-refresh) — refreshes from temporary RDS clone instead of production
- [Full refresh](/docs/dblab-howtos/administration/logical-full-refresh)

### Physical
Expand Down
4 changes: 2 additions & 2 deletions docs/dblab-howtos/administration/data/pg_basebackup.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ In order to set up DBLab Engine to automatically get the data from database usin
- [physicalSnapshot](/docs/reference-guides/database-lab-engine-configuration-reference#job-physicalsnapshot)

### Options
Copy the contents of configuration example [`config.example.physical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.0.3/engine/configs/config.example.physical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml` and update the following options:
Copy the contents of configuration example [`config.example.physical_generic.yml`](https://gitlab.com/postgres-ai/database-lab/-/blob/v4.1.0/engine/configs/config.example.physical_generic.yml) from the Database Lab repository to `~/.dblab/engine/configs/server.yml` and update the following options:
- Set secure `server:verificationToken`, it will be used to authorize API requests to the Engine
- Set connection options in `physicalRestore:options:envs`:
- `PGUSER`: database user name
Expand Down Expand Up @@ -44,7 +44,7 @@ sudo docker run \
--env DOCKER_API_VERSION=1.39 \
--detach \
--restart on-failure \
postgresai/dblab-server:4.0.3
postgresai/dblab-server:4.1.0
```

:::info
Expand Down
Loading
Loading