feat(logger): add rate limiter#5799
Merged
ShadowCurse merged 4 commits intofirecracker-microvm:mainfrom Apr 14, 2026
Merged
Conversation
2325c61 to
3ddd1f5
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5799 +/- ##
==========================================
+ Coverage 83.04% 83.07% +0.03%
==========================================
Files 275 276 +1
Lines 29528 29541 +13
==========================================
+ Hits 24521 24541 +20
+ Misses 5007 5000 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0240225 to
eb60521
Compare
ShadowCurse
reviewed
Mar 27, 2026
ShadowCurse
reviewed
Mar 27, 2026
Manciukic
reviewed
Mar 27, 2026
Manciukic
reviewed
Mar 27, 2026
531998b to
80580f3
Compare
ShadowCurse
reviewed
Mar 30, 2026
ShadowCurse
reviewed
Mar 30, 2026
80580f3 to
b795a7b
Compare
Manciukic
reviewed
Mar 30, 2026
ilstam
reviewed
Apr 1, 2026
0514643 to
d5835aa
Compare
Manciukic
reviewed
Apr 2, 2026
ShadowCurse
reviewed
Apr 2, 2026
0e11369 to
18d9c30
Compare
ade2a4d to
978f04d
Compare
ShadowCurse
reviewed
Apr 13, 2026
978f04d to
1a02189
Compare
ShadowCurse
previously approved these changes
Apr 13, 2026
Manciukic
reviewed
Apr 13, 2026
Contributor
Manciukic
left a comment
There was a problem hiding this comment.
one more small issue but overall LGTM!
1a02189 to
1f0602b
Compare
ShadowCurse
reviewed
Apr 13, 2026
ShadowCurse
previously approved these changes
Apr 13, 2026
Manciukic
previously approved these changes
Apr 14, 2026
Add a per-callsite rate limiter for logging that wraps the existing TokenBucket in OnceLock<Mutex<...>>. Each macro invocation site gets its own independent LogRateLimiter via a static, so flooding one callsite does not suppress unrelated log messages. Default configuration: 10 messages per 5-second refill period, matching the Linux kernel printk_ratelimited defaults. Include unit tests for burst enforcement, callsite independence, and token refill after the configured period. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Redefine the error, warn, and info macros re-exported from crate::logger to include per-callsite rate limiting. The original unrestricted log macros are available as error_unrestricted, warn_unrestricted, and info_unrestricted for callsites that must not be rate limited. Each macro checks log_enabled before touching the rate limiter to avoid overhead for filtered-out log levels. Per-callsite suppression counting via a static AtomicU64 reports the number of suppressed messages at warn level when logging resumes. Add rate_limited_log_count metric to LoggerSystemMetrics and update fcmetrics.py accordingly. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Add clippy.toml to the vmm crate with disallowed-macros configuration that prevents direct use of log::error, log::warn, log::info, and log::debug. This ensures all log callsites go through the crate::logger wrappers rather than calling log macros directly. The rate-limited and unrestricted macro implementations use allow(clippy::disallowed_macros) internally since they must call the underlying log macros. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Document the new per-callsite rate-limited logging feature in the changelog. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
1f0602b to
ecae71f
Compare
Contributor
|
Resolved 5 lines of conflicts, but GH shows the whole PR in the |
ShadowCurse
approved these changes
Apr 14, 2026
Manciukic
approved these changes
Apr 14, 2026
zulinx86
added a commit
to zulinx86/firecracker
that referenced
this pull request
Apr 16, 2026
The per-callsite rate-limited logging feature [1] added a static LogRateLimiter (OnceLock<Mutex<TokenBucket>>) and a static AtomicU64 at each error!/warn!/info! callsite. With ~339 callsites in the VMM crate, this increases the RSS footprint of the Firecracker process. Under the maximum configuration (32 vCPUs, PCI enabled, snapshot creation), the memory monitor observed 6.07-6.13 MiB, exceeding the previous 6.0 MiB threshold_snapshot and causing test_all_vcpus_online to fail in the uvm_restored path. Bump threshold_snapshot from 6 MiB to 7 MiB to accommodate the additional static memory from rate-limited logging. [1]: firecracker-microvm#5799 Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Add per-callsite rate limiting for guest-triggered logging paths, following the Linux kernel printk_ratelimited pattern. The error_rate_limited! macro gives each callsite its own independent, preconfigured rate limiter set to 10 messages per 5-second window. When messages are suppressed, a summary is emitted once the callsite resumes logging. A new rate_limited_log_count metric tracks total suppressions.
I was not able to build an integration test that demonstrates that the rate limiting is effective against a real end-to-end scenario because it would've required a custom guest kernel, but I ran an ad hoc experiment by inserting an extra
error_rate_limited!line into the ballooninflate descriptor processing loop (hot path) and saw that it was rate-limited from 128 lines to 10 as expected.
Reason
Guest VMs can trigger repeated error!() calls through various virtio device paths (balloon, net, block, PCI, MMIO). Under sustained error conditions, this leads to excessive disk I/O and CPU consumption on the host from synchronous log writes.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.