Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .env.sample
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ PROMETHEUS_PUSHGATEWAY_URL=
# pushgateway push interval in ms
PROMETHEUS_PUSHGATEWAY_INTERVAL=10000

# Grouper memory log controls
GROUPER_MEMORY_LOG_EVERY_TASKS=50
GROUPER_MEMORY_GROWTH_WINDOW_TASKS=200
GROUPER_MEMORY_GROWTH_WARN_MB=64
GROUPER_MEMORY_HANDLE_GROWTH_WARN_MB=16

# project token for error catching
HAWK_CATCHER_TOKEN=

Expand All @@ -40,4 +46,4 @@ HAWK_CATCHER_TOKEN=
IS_NOTIFIER_WORKER_ENABLED=false

## Url for telegram notifications about workspace blocks and unblocks
TELEGRAM_LIMITER_CHAT_URL=
TELEGRAM_LIMITER_CHAT_URL=
39 changes: 39 additions & 0 deletions lib/metrics.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import * as client from 'prom-client';
import os from 'os';
import { nanoid } from 'nanoid';

const register = new client.Registry();

client.collectDefaultMetrics({ register });

export { register, client };
Comment on lines +5 to +9
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared global registry could cause issues if multiple worker types are instantiated in the same process. All workers would register their metrics to the same registry, and when metrics are pushed to Pushgateway, metrics from all workers would be included under each worker's grouping labels. This could lead to incorrect or confusing metrics attribution.

Consider either using separate registries per worker type, or ensure metrics are properly labeled to distinguish between different workers when using a shared registry.

Copilot uses AI. Check for mistakes.

/**
* Start periodic push to pushgateway
*
* @param workerName - name of the worker for grouping
*/
export function startMetricsPushing(workerName: string): void {
const url = process.env.PROMETHEUS_PUSHGATEWAY_URL;
const interval = parseInt(process.env.PROMETHEUS_PUSHGATEWAY_INTERVAL || '10000');
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PROMETHEUS_PUSHGATEWAY_INTERVAL environment variable parsing lacks error handling. If the environment variable contains an invalid number, parseInt will return NaN, and this will be passed to setInterval on line 32, which will cause the callback to be invoked immediately and repeatedly in a tight loop, potentially causing performance issues.

Add validation to check if the parsed interval is a valid positive number, and either throw an error or use a sensible default if it's invalid.

Suggested change
const interval = parseInt(process.env.PROMETHEUS_PUSHGATEWAY_INTERVAL || '10000');
const DEFAULT_INTERVAL = 10000;
const rawInterval = process.env.PROMETHEUS_PUSHGATEWAY_INTERVAL;
const parsedInterval = rawInterval !== undefined ? parseInt(rawInterval, 10) : DEFAULT_INTERVAL;
const interval =
Number.isFinite(parsedInterval) && parsedInterval > 0
? parsedInterval
: (() => {
if (rawInterval !== undefined) {
console.warn(
`Invalid PROMETHEUS_PUSHGATEWAY_INTERVAL "${rawInterval}", falling back to default ${DEFAULT_INTERVAL}ms`,
);
}
return DEFAULT_INTERVAL;
})();

Copilot uses AI. Check for mistakes.

if (!url) {
return;
}

const hostname = os.hostname();
const ID_SIZE = 5;
const id = nanoid(ID_SIZE);

const gateway = new client.Pushgateway(url, [], register);

console.log(`Start pushing metrics to ${url} every ${interval}ms (host: ${hostname}, id: ${id})`);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using console.log for informational messages is inconsistent with the rest of the codebase, which uses a logger module throughout (see lib/logger.ts and usage in workers like grouper). This message won't benefit from structured logging, log levels, or any other logging infrastructure features.

Consider using a logger instance consistent with the rest of the codebase.

Copilot uses AI. Check for mistakes.

setInterval(() => {
gateway.pushAdd({ jobName: 'workers', groupings: { worker: workerName, host: hostname, id } }, (err) => {
if (err) {
console.error('Metrics push error:', err);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using console.error for error logging is inconsistent with the rest of the codebase, which uses a logger module (see lib/logger.ts and usage in workers). The error message will not benefit from structured logging, log levels, or any other logging infrastructure features.

Consider using HawkCatcher or a logger instance consistent with the rest of the codebase for error reporting.

Copilot uses AI. Check for mistakes.
}
});
}, interval);
Comment on lines +32 to +38
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple timers may be created and never stopped if startMetricsPushing is called multiple times. The function creates a new interval on line 32 without storing or clearing any previous intervals. If multiple workers are started (which happens in the runner on line 89-91), this could create multiple timers that push metrics for each worker type, and there's no cleanup mechanism to stop these intervals.

Consider storing the interval ID and providing a cleanup mechanism, or ensure the function is only called once per worker type.

Copilot uses AI. Check for mistakes.
}
89 changes: 18 additions & 71 deletions runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import * as utils from './lib/utils';
import { Worker } from './lib/worker';
import HawkCatcher from '@hawk.so/nodejs';
import * as dotenv from 'dotenv';
import { startMetricsPushing } from './lib/metrics';

dotenv.config();

Expand Down Expand Up @@ -57,19 +58,17 @@ class WorkerRunner {
.then((workerConstructors) => {
this.constructWorkers(workerConstructors);
})
// .then(() => {
// try {
// this.startMetrics();
// } catch (e) {
// HawkCatcher.send(e);
// console.error(`Metrics not started: ${e}`);
// }
//
// return Promise.resolve();
// })
.then(() => {
return this.startWorkers();
})
.then(() => {
try {
this.startMetrics();
} catch (e) {
HawkCatcher.send(e);
console.error(`Metrics not started: ${e}`);
}
})
.then(() => {
this.observeProcess();
})
Expand All @@ -82,67 +81,15 @@ class WorkerRunner {
/**
* Run metrics exporter
*/
// private startMetrics(): void {
// if (!process.env.PROMETHEUS_PUSHGATEWAY_URL) {
// return;
// }
//
// const PUSH_INTERVAL = parseInt(process.env.PROMETHEUS_PUSHGATEWAY_INTERVAL);
//
// if (isNaN(PUSH_INTERVAL)) {
// throw new Error('PROMETHEUS_PUSHGATEWAY_INTERVAL is invalid or not set');
// }
//
// const collectDefaultMetrics = promClient.collectDefaultMetrics;
// const Registry = promClient.Registry;
//
// const register = new Registry();
// const startGcStats = gcStats(register);
//
// const hostname = os.hostname();
//
// const ID_SIZE = 5;
// const id = nanoid(ID_SIZE);
//
// // eslint-disable-next-line node/no-deprecated-api
// const instance = url.parse(process.env.PROMETHEUS_PUSHGATEWAY_URL).host;
//
// // Initialize metrics for workers
// this.workers.forEach((worker) => {
// // worker.initMetrics();
// worker.getMetrics().forEach((metric: promClient.Counter<string>) => register.registerMetric(metric));
// });
//
// collectDefaultMetrics({ register });
// startGcStats();
//
// this.gateway = new promClient.Pushgateway(process.env.PROMETHEUS_PUSHGATEWAY_URL, null, register);
//
// console.log(`Start pushing metrics to ${process.env.PROMETHEUS_PUSHGATEWAY_URL}`);
//
// // Pushing metrics to the pushgateway every PUSH_INTERVAL
// this.pushIntervalNumber = setInterval(() => {
// this.workers.forEach((worker) => {
// if (!this.gateway || !instance) {
// return;
// }
// // Use pushAdd not to overwrite previous metrics
// this.gateway.pushAdd({
// jobName: 'workers',
// groupings: {
// worker: worker.type.replace('/', '_'),
// host: hostname,
// id,
// },
// }, (err?: Error) => {
// if (err) {
// HawkCatcher.send(err);
// console.log(`Error of pushing metrics to gateway: ${err}`);
// }
// });
// });
// }, PUSH_INTERVAL);
// }
private startMetrics(): void {
if (!process.env.PROMETHEUS_PUSHGATEWAY_URL) {
return;
}

this.workers.forEach((worker) => {
startMetricsPushing(worker.type.replace('/', '_'));
});
Comment on lines +89 to +91
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics pushing intervals created by startMetricsPushing are never cleaned up when workers are stopped. The runner's stopWorker method on line 227 attempts to clear this.pushIntervalNumber, but this property is no longer set since the metrics pushing logic was moved to lib/metrics.ts. This will cause the intervals to continue running after workers are stopped, leading to potential memory leaks and failed push attempts.

Consider returning the interval ID from startMetricsPushing so it can be stored and cleared when workers are stopped, or implement a cleanup function in the metrics module.

Suggested change
this.workers.forEach((worker) => {
startMetricsPushing(worker.type.replace('/', '_'));
});
if (this.workers.length === 0) {
return;
}
const workerTypeForMetrics = this.workers[0].type.replace('/', '_');
this.pushIntervalNumber = startMetricsPushing(workerTypeForMetrics);

Copilot uses AI. Check for mistakes.
}

/**
* Dynamically loads workers through the yarn workspaces
Expand Down
Loading
Loading