Describe the feature you'd like
Add a randomized delay (e.g., RandomizedDelaySec=12h) to the default systemd cloudflared-update.timer file generated by the cloudflared service installer.
What problem does it solve for you?
Currently, the cloudflared-update.timer is configured with OnCalendar=daily, which triggers exactly at 00:00:00 local time (often 00:00:00 UTC on standard cloud servers). For fleets running multiple cloudflared instances for High Availability (HA) failover, this causes every active replica to pull the update and restart at the exact same second.
This simultaneous fleet-wide restart defeats HA redundancy. All active tunnels drop their connections simultaneously to apply the binary update, resulting in a brief but total outage for routed traffic across the entire fleet.
Describe alternatives you've considered
The current workaround is using configuration management (Ansible, bash scripts, etc.) to manually inject a systemd drop-in override for RandomizedDelaySec, or writing custom scripts to assign a deterministically hashed offset per-node.
My experience with this is that it introduces unnecessary operational friction and configuration drift. Since HA tunnels are a core Cloudflare use-case, the default auto-update behavior should be safe for multi-node failover deployments out-of-the-box.
Additional context
Here is an example modification for the default generated timer template that would resolve the issue:
[Unit]
Description=Update cloudflared
[Timer]
OnCalendar=daily
RandomizedDelaySec=12h
Persistent=true
[Install]
WantedBy=timers.target
This issue strictly applies to multi-node setups handling mission-critical, continuous traffic (like DoH or heavy ingress), where even a few seconds of simultaneous downtime across all replicas is noticeable to clients.
Describe the feature you'd like
Add a randomized delay (e.g.,
RandomizedDelaySec=12h) to the default systemdcloudflared-update.timerfile generated by thecloudflaredservice installer.What problem does it solve for you?
Currently, the
cloudflared-update.timeris configured withOnCalendar=daily, which triggers exactly at00:00:00local time (often00:00:00 UTCon standard cloud servers). For fleets running multiplecloudflaredinstances for High Availability (HA) failover, this causes every active replica to pull the update and restart at the exact same second.This simultaneous fleet-wide restart defeats HA redundancy. All active tunnels drop their connections simultaneously to apply the binary update, resulting in a brief but total outage for routed traffic across the entire fleet.
Describe alternatives you've considered
The current workaround is using configuration management (Ansible, bash scripts, etc.) to manually inject a systemd drop-in override for
RandomizedDelaySec, or writing custom scripts to assign a deterministically hashed offset per-node.My experience with this is that it introduces unnecessary operational friction and configuration drift. Since HA tunnels are a core Cloudflare use-case, the default auto-update behavior should be safe for multi-node failover deployments out-of-the-box.
Additional context
Here is an example modification for the default generated timer template that would resolve the issue:
This issue strictly applies to multi-node setups handling mission-critical, continuous traffic (like DoH or heavy ingress), where even a few seconds of simultaneous downtime across all replicas is noticeable to clients.