feat(grpc): add exponential backoff for reconnection attempts#789
feat(grpc): add exponential backoff for reconnection attempts#789Molter73 wants to merge 6 commits into
Conversation
📝 WalkthroughWalkthroughCentralized duration parsing added; BackoffConfig introduced and wired into GrpcConfig and CLI; scan_interval parsing moved to Duration; gRPC reconnect loop uses exponential backoff; tests, changelog, and rand dependency updated. ChangesgRPC Exponential Backoff Configuration and Duration Parsing
Sequence Diagram(s)sequenceDiagram
participant GRPCRunLoop
participant ChannelCreator
participant Backoff
GRPCRunLoop->>ChannelCreator: create_channel()
ChannelCreator-->>GRPCRunLoop: Err
GRPCRunLoop->>Backoff: next()
Backoff-->>GRPCRunLoop: delay (Duration)
GRPCRunLoop->>GRPCRunLoop: sleep(delay)
GRPCRunLoop->>ChannelCreator: create_channel()
ChannelCreator-->>GRPCRunLoop: Ok(Channel)
GRPCRunLoop->>Backoff: reset()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Assisted-by: claude-opus-4-6@default <noreply@opencode.ai>
…d env vars Add BackoffConfig to GrpcConfig with YAML (grpc.backoff.initial, grpc.backoff.max), CLI (--backoff-initial, --backoff-max), and env var (FACT_GRPC_BACKOFF_INITIAL, FACT_GRPC_BACKOFF_MAX) support. Extract yaml_to_duration_secs and parse_duration_secs helpers, reusing them for scan_interval parsing as well. Assisted-by: claude-opus-4-6@default <noreply@opencode.ai>
ddb3e74 to
fc69ffe
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #789 +/- ##
==========================================
+ Coverage 27.96% 31.93% +3.96%
==========================================
Files 21 21
Lines 2596 2762 +166
Branches 2596 2762 +166
==========================================
+ Hits 726 882 +156
- Misses 1867 1877 +10
Partials 3 3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
vladbologa
left a comment
There was a problem hiding this comment.
Just some high level feedback, as I don't think I should be reviewing rust code. 😅
Why not use this existing crate?
And also, I checked the stackrox repo and there we use 10s initial, 5 minute max, and a 1.5 multiplier (and also jitter). So you could use these settings for consistency.
You are not the only one in the team thinking this, but I need to start getting people into it because I can't be the only one 🤣
With all the supply chain attacks going on these days I've become quite paranoid and would prefer we keep dependencies to a minimum. I've never heard of that crate, it looks interesting, but for a straightforward backoff implementation that is only used once (which is what I was going for), I didn't even consider looking for another dependency.
Sounds good, I'll modify the defaults and try to add jitter as well (probably make the multiplier and jitter configurable as well). |
Actually, as I went to change the deafults I realized for a component that does runtime security, we probably want to keep the reconnection as fast as possible, otherwise we will start to drop events. I will add jitter and change the multiplier though. |
Add configurable jitter to the exponential backoff using rand crate for uniform distribution over [0, delay]. Jitter is enabled by default and can be toggled via YAML (grpc.backoff.jitter), CLI (--backoff-jitter / --no-backoff-jitter), or env var (FACT_GRPC_NO_BACKOFF_JITTER). Assisted-by: claude-opus-4-6@default <noreply@opencode.ai>
Add multiplier field to BackoffConfig (default 1.5x). Configurable via YAML (grpc.backoff.multiplier), CLI (--backoff-multiplier), and env var (FACT_GRPC_BACKOFF_MULTIPLIER). Must be > 1.0. Drops Eq derive from config types in favor of PartialEq to support storing multiplier as f64 directly. Assisted-by: claude-opus-4-6@default <noreply@opencode.ai>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fact/src/config/mod.rs (1)
425-433: 💤 Low valueConsider extracting multiplier validation into a shared helper.
The multiplier validation logic (finite, > 1.0) is duplicated between
parse_multiplier(line 42) and this YAML parsing. While functionally correct, extracting a shared validation function would reduce duplication and keep the validation rules in one place.♻️ Optional refactor to reduce duplication
+fn validate_multiplier(mult: f64) -> anyhow::Result<f64> { + if !mult.is_finite() || mult <= 1.0 { + bail!("multiplier must be > 1.0, got {mult}"); + } + Ok(mult) +} + fn parse_multiplier(s: &str) -> anyhow::Result<f64> { let mult = s.parse::<f64>()?; - if !mult.is_finite() || mult <= 1.0 { - bail!("multiplier must be > 1.0, got {mult}"); - } - Ok(mult) + validate_multiplier(mult) }Then simplify the YAML parsing:
"multiplier" => { - let multiplier = match v.as_f64().or_else(|| v.as_i64().map(|v| v as f64)) { - Some(m) if !m.is_finite() || m <= 1.0 => { - bail!("invalid grpc.backoff.multiplier: {v:?}") - } - None => { - bail!("invalid grpc.backoff.multiplier: {v:?}") - } - Some(m) => m, - }; + let Some(m) = v.as_f64().or_else(|| v.as_i64().map(|v| v as f64)) else { + bail!("invalid grpc.backoff.multiplier: {v:?}") + }; + let multiplier = validate_multiplier(m) + .with_context(|| format!("invalid grpc.backoff.multiplier: {v:?}"))?; backoff.multiplier = Some(multiplier); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@fact/src/config/mod.rs` around lines 425 - 433, The YAML parsing block duplicates multiplier validation logic; extract the shared validation into a helper (e.g., parse_multiplier or validate_multiplier) and call it from both places (the existing parse_multiplier caller and the YAML parsing code that currently binds `multiplier`). Implement a function that accepts a serde_json/serde_yaml Value or f64 and checks is_finite() and > 1.0, returns Result<f64, Error>, then replace the match in the YAML branch with a call to that helper to eliminate duplication and keep rules centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@fact/src/config/mod.rs`:
- Around line 607-617: The env var and visibility are inverted for the backoff
jitter flags: move the env attribute from the negative flag to the positive flag
and swap visibility so the positive flag is the public env-controlled option.
Concretely, on the backoff_jitter field (symbol backoff_jitter) add env =
"FACT_GRPC_BACKOFF_JITTER" and remove hide = true (make it exposed), and on
no_backoff_jitter remove the env attribute and add hide = true (hide the
negative flag); keep the overrides_with relationships (backoff_jitter
overrides_with = "no_backoff_jitter" and vice versa) so behavior stays
consistent but follows the established positive-flag env pattern.
---
Nitpick comments:
In `@fact/src/config/mod.rs`:
- Around line 425-433: The YAML parsing block duplicates multiplier validation
logic; extract the shared validation into a helper (e.g., parse_multiplier or
validate_multiplier) and call it from both places (the existing parse_multiplier
caller and the YAML parsing code that currently binds `multiplier`). Implement a
function that accepts a serde_json/serde_yaml Value or f64 and checks
is_finite() and > 1.0, returns Result<f64, Error>, then replace the match in the
YAML branch with a call to that helper to eliminate duplication and keep rules
centralized.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: 5206ea5e-dd20-4dac-8c98-2e989a4b7c5b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlfact/Cargo.tomlfact/src/config/mod.rsfact/src/config/tests.rsfact/src/output/grpc.rs
✅ Files skipped from review due to trivial changes (1)
- fact/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (2)
- fact/src/output/grpc.rs
- fact/src/config/tests.rs
| /// Enable jitter for gRPC reconnection backoff | ||
| /// | ||
| /// Default value is true | ||
| #[arg(long, overrides_with = "no_backoff_jitter", hide = true)] | ||
| backoff_jitter: bool, | ||
| #[arg( | ||
| long, | ||
| overrides_with = "backoff_jitter", | ||
| env = "FACT_GRPC_NO_BACKOFF_JITTER" | ||
| )] | ||
| no_backoff_jitter: bool, |
There was a problem hiding this comment.
Environment variable should follow the established positive-flag pattern.
The FACT_GRPC_NO_BACKOFF_JITTER environment variable is on the negative flag (no_backoff_jitter), which breaks the pattern established by all other boolean configuration options in this file:
- All other boolean env vars use positive forms:
FACT_JSON(line 654),FACT_HOTRELOAD(line 678),FACT_ENDPOINT_EXPOSE_METRICS(line 627), etc. - Setting those env vars to
"true"enables the feature; setting to"false"disables it. - But
FACT_GRPC_NO_BACKOFF_JITTER=truedisables jitter, inverting the logic and creating confusion.
Additionally, the flag visibility is inconsistent:
- The positive flag
backoff_jitteris hidden (line 610). - The negative flag
no_backoff_jitteris exposed. - This is the opposite of every other boolean flag in the file (e.g.,
expose_metricsexposed at line 629,no_expose_metricshidden at line 630).
🔧 Recommended fix to match established patterns
Move the env var to the positive flag and swap visibility:
/// Enable jitter for gRPC reconnection backoff
///
/// Default value is true
- #[arg(long, overrides_with = "no_backoff_jitter", hide = true)]
+ #[arg(long, overrides_with = "no_backoff_jitter", env = "FACT_GRPC_BACKOFF_JITTER")]
backoff_jitter: bool,
#[arg(
long,
overrides_with = "backoff_jitter",
- env = "FACT_GRPC_NO_BACKOFF_JITTER"
+ hide(true)
)]
no_backoff_jitter: bool,This matches the pattern of all other boolean flags and makes the API clearer: FACT_GRPC_BACKOFF_JITTER=true enables jitter, FACT_GRPC_BACKOFF_JITTER=false disables it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fact/src/config/mod.rs` around lines 607 - 617, The env var and visibility
are inverted for the backoff jitter flags: move the env attribute from the
negative flag to the positive flag and swap visibility so the positive flag is
the public env-controlled option. Concretely, on the backoff_jitter field
(symbol backoff_jitter) add env = "FACT_GRPC_BACKOFF_JITTER" and remove hide =
true (make it exposed), and on no_backoff_jitter remove the env attribute and
add hide = true (hide the negative flag); keep the overrides_with relationships
(backoff_jitter overrides_with = "no_backoff_jitter" and vice versa) so behavior
stays consistent but follows the established positive-flag env pattern.
Description
If connection to a gRPC server fails, fact was into a loop trying to reconnect every second which is pretty noisy and not really needed. With this change we add an exponential backoff algorithm that will try to reconnect with some leeway.
In the interest of making it obvious
factis failing to connect on k8s, we probably want to crash the application if we reach the maximum backoff, but there is currently no easy way to do that, so I'll leave it to a follow up.Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
Without this change, reconnection attempts every second
With this change, exponential attempts
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation