Skip to content

feat(logger): add rate limiter#5799

Merged
ShadowCurse merged 4 commits intofirecracker-microvm:mainfrom
kalyazin:log_rate_limiter
Apr 14, 2026
Merged

feat(logger): add rate limiter#5799
ShadowCurse merged 4 commits intofirecracker-microvm:mainfrom
kalyazin:log_rate_limiter

Conversation

@kalyazin
Copy link
Copy Markdown
Contributor

@kalyazin kalyazin commented Mar 27, 2026

Changes

Add per-callsite rate limiting for guest-triggered logging paths, following the Linux kernel printk_ratelimited pattern. The error_rate_limited! macro gives each callsite its own independent, preconfigured rate limiter set to 10 messages per 5-second window. When messages are suppressed, a summary is emitted once the callsite resumes logging. A new rate_limited_log_count metric tracks total suppressions.

I was not able to build an integration test that demonstrates that the rate limiting is effective against a real end-to-end scenario because it would've required a custom guest kernel, but I ran an ad hoc experiment by inserting an extra error_rate_limited! line into the balloon
inflate descriptor processing loop (hot path) and saw that it was rate-limited from 128 lines to 10 as expected.

Reason

Guest VMs can trigger repeated error!() calls through various virtio device paths (balloon, net, block, PCI, MMIO). Under sustained error conditions, this leads to excessive disk I/O and CPU consumption on the host from synchronous log writes.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 54.23729% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.07%. Comparing base (8ddea25) to head (ecae71f).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/firecracker/src/main.rs 0.00% 10 Missing ⚠️
src/vmm/src/signal_handler.rs 0.00% 5 Missing ⚠️
src/firecracker/src/api_server_adapter.rs 0.00% 4 Missing ⚠️
src/firecracker/src/metrics.rs 0.00% 4 Missing ⚠️
src/firecracker/src/api_server/mod.rs 83.33% 1 Missing ⚠️
src/firecracker/src/api_server/parsed_request.rs 83.33% 1 Missing ⚠️
src/vmm/src/builder.rs 0.00% 1 Missing ⚠️
src/vmm/src/logger/mod.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5799      +/-   ##
==========================================
+ Coverage   83.04%   83.07%   +0.03%     
==========================================
  Files         275      276       +1     
  Lines       29528    29541      +13     
==========================================
+ Hits        24521    24541      +20     
+ Misses       5007     5000       -7     
Flag Coverage Δ
5.10-m5n.metal 83.40% <57.14%> (+0.03%) ⬆️
5.10-m6a.metal 82.73% <57.14%> (+0.03%) ⬆️
5.10-m6g.metal 79.99% <54.23%> (+0.03%) ⬆️
5.10-m6i.metal 83.40% <57.14%> (+0.03%) ⬆️
5.10-m7a.metal-48xl 82.72% <57.14%> (+0.02%) ⬆️
5.10-m7g.metal 79.99% <54.23%> (+0.03%) ⬆️
5.10-m7i.metal-24xl 83.37% <57.14%> (+0.02%) ⬆️
5.10-m7i.metal-48xl 83.37% <57.14%> (+0.02%) ⬆️
5.10-m8g.metal-24xl 79.99% <54.23%> (+0.03%) ⬆️
5.10-m8g.metal-48xl 79.99% <54.23%> (+0.03%) ⬆️
5.10-m8i.metal-48xl 83.37% <57.14%> (+0.02%) ⬆️
5.10-m8i.metal-96xl 83.37% <57.14%> (+0.02%) ⬆️
6.1-m5n.metal 83.42% <57.14%> (+0.02%) ⬆️
6.1-m6a.metal 82.75% <57.14%> (+0.02%) ⬆️
6.1-m6g.metal 79.99% <54.23%> (+0.02%) ⬆️
6.1-m6i.metal 83.42% <57.14%> (+0.02%) ⬆️
6.1-m7a.metal-48xl 82.75% <57.14%> (+0.02%) ⬆️
6.1-m7g.metal 79.99% <54.23%> (+0.03%) ⬆️
6.1-m7i.metal-24xl 83.43% <57.14%> (+0.02%) ⬆️
6.1-m7i.metal-48xl 83.43% <57.14%> (+0.02%) ⬆️
6.1-m8g.metal-24xl 79.99% <54.23%> (+0.03%) ⬆️
6.1-m8g.metal-48xl 79.99% <54.23%> (+0.03%) ⬆️
6.1-m8i.metal-48xl 83.43% <57.14%> (+0.02%) ⬆️
6.1-m8i.metal-96xl 83.43% <57.14%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kalyazin kalyazin force-pushed the log_rate_limiter branch 2 times, most recently from 0240225 to eb60521 Compare March 27, 2026 12:33
@kalyazin kalyazin marked this pull request as ready for review March 27, 2026 12:33
@kalyazin kalyazin requested review from Manciukic and pb8o as code owners March 27, 2026 12:33
@kalyazin kalyazin self-assigned this Mar 27, 2026
@kalyazin kalyazin added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 27, 2026
Comment thread src/vmm/src/devices/virtio/balloon/device.rs Outdated
Comment thread src/vmm/src/logger/rate_limited.rs Outdated
Comment thread src/vmm/src/logger/rate_limited.rs Outdated
Comment thread src/vmm/src/logger/rate_limited.rs
Comment thread src/vmm/src/logger/rate_limited.rs
@kalyazin kalyazin force-pushed the log_rate_limiter branch 3 times, most recently from 531998b to 80580f3 Compare March 30, 2026 14:40
Comment thread src/vmm/src/logger/mod.rs Outdated
Comment thread CHANGELOG.md Outdated
Comment thread src/vmm/src/logger/mod.rs
Comment thread src/vmm/src/logger/rate_limited.rs
Comment thread src/vmm/src/logger/rate_limited.rs
@kalyazin kalyazin force-pushed the log_rate_limiter branch 2 times, most recently from 0514643 to d5835aa Compare April 2, 2026 11:49
Comment thread src/vmm/src/arch/aarch64/vcpu.rs Outdated
Comment thread src/vmm/src/logger/mod.rs
Comment thread src/vmm/src/logger/rate_limited.rs Outdated
Comment thread src/vmm/src/logger/mod.rs
Comment thread src/vmm/src/arch/aarch64/vcpu.rs Outdated
@kalyazin kalyazin force-pushed the log_rate_limiter branch 5 times, most recently from 0e11369 to 18d9c30 Compare April 8, 2026 09:25
@kalyazin kalyazin force-pushed the log_rate_limiter branch 5 times, most recently from ade2a4d to 978f04d Compare April 13, 2026 13:08
Comment thread src/vmm/src/acpi/mod.rs Outdated
ShadowCurse
ShadowCurse previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more small issue but overall LGTM!

Comment thread src/vmm/src/signal_handler.rs
Comment thread src/vmm/src/logger/mod.rs
ShadowCurse
ShadowCurse previously approved these changes Apr 13, 2026
Manciukic
Manciukic previously approved these changes Apr 14, 2026
@ShadowCurse ShadowCurse enabled auto-merge (rebase) April 14, 2026 15:03
Add a per-callsite rate limiter for logging that wraps the
existing TokenBucket in OnceLock<Mutex<...>>. Each macro
invocation site gets its own independent LogRateLimiter via
a static, so flooding one callsite does not suppress
unrelated log messages.

Default configuration: 10 messages per 5-second refill
period, matching the Linux kernel printk_ratelimited
defaults.

Include unit tests for burst enforcement, callsite
independence, and token refill after the configured
period.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Redefine the error, warn, and info macros re-exported
from crate::logger to include per-callsite rate limiting.
The original unrestricted log macros are available as
error_unrestricted, warn_unrestricted, and
info_unrestricted for callsites that must not be rate
limited.

Each macro checks log_enabled before touching the rate
limiter to avoid overhead for filtered-out log levels.
Per-callsite suppression counting via a static AtomicU64
reports the number of suppressed messages at warn level
when logging resumes.

Add rate_limited_log_count metric to LoggerSystemMetrics
and update fcmetrics.py accordingly.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Add clippy.toml to the vmm crate with disallowed-macros
configuration that prevents direct use of log::error,
log::warn, log::info, and log::debug.

This ensures all log callsites go through the
crate::logger wrappers rather than calling log macros
directly.

The rate-limited and unrestricted macro implementations
use allow(clippy::disallowed_macros) internally since
they must call the underlying log macros.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Document the new per-callsite rate-limited logging
feature in the changelog.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
@ShadowCurse ShadowCurse dismissed stale reviews from Manciukic and themself via ecae71f April 14, 2026 15:33
@ShadowCurse
Copy link
Copy Markdown
Contributor

Resolved 5 lines of conflicts, but GH shows the whole PR in the compare section for some reason.

@ShadowCurse ShadowCurse merged commit a481d51 into firecracker-microvm:main Apr 14, 2026
6 of 7 checks passed
zulinx86 added a commit to zulinx86/firecracker that referenced this pull request Apr 16, 2026
The per-callsite rate-limited logging feature [1] added a static
LogRateLimiter (OnceLock<Mutex<TokenBucket>>) and a static AtomicU64
at each error!/warn!/info! callsite. With ~339 callsites in the VMM
crate, this increases the RSS footprint of the Firecracker process.

Under the maximum configuration (32 vCPUs, PCI enabled, snapshot
creation), the memory monitor observed 6.07-6.13 MiB, exceeding the
previous 6.0 MiB threshold_snapshot and causing test_all_vcpus_online
to fail in the uvm_restored path.

Bump threshold_snapshot from 6 MiB to 7 MiB to accommodate the
additional static memory from rate-limited logging.

[1]: firecracker-microvm#5799
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants