Skip to content

perf(core): eliminate avoidable per-call allocation in hot logging/config paths#36150

Open
swicken wants to merge 3 commits into
mainfrom
issue-36145-hot-path-allocation
Open

perf(core): eliminate avoidable per-call allocation in hot logging/config paths#36150
swicken wants to merge 3 commits into
mainfrom
issue-36145-hot-path-allocation

Conversation

@swicken

@swicken swicken commented Jun 12, 2026

Copy link
Copy Markdown
Member

Proposed Changes

  • Logger lazy Supplier overloads are now actually lazy — the three debug(…, Supplier) and two info(…, Supplier) overloads called message.get() unconditionally; they now check the level first. ~900 call sites use suppliers specifically to skip message building, including ones that did reflectionToString on Contentlets via PermissionBitFactoryImpl debug logging on the reindex path.
  • Config int/long/float/boolean getters no longer throw per read — the DOT_* env-alias probe used commons-configuration's throwing 1-arg getter wrapped in vavr Try, so every read of a non-env-overridden key allocated and discarded a NoSuchElementException (with full stack trace). Applied the containsKey guard that getBooleanProperty(name, default) already had.
  • Logger.isVelocityMessage memoized via ClassValue — previously allocated a lowercased class name on every Class-keyed log call; ClassValue lookups are allocation-free, keyed per actual Class (no FQN conflation across OSGi classloaders), and entries are released with their class.
  • LeakyTokenBucketImpl no longer walks Config per request-cost increment — the four config reads (enabled, refill, max bucket, add-header-info) are TTL-memoized (30s). Runtime overrides via system table / config refresh still take effect within the TTL, preserved deliberately so rate limiting can be toggled during an incident; a test with an injectable TTL pins that contract.

Why

Allocation profiling of a running instance (Pyroscope TLAB profiles under mixed REST/GraphQL/page-render load) attributed the Throwable.fillInStackTrace, Method.copy/Field.copy, and StringLatin1.toLowerCase allocation hotspots to these four spots; Config.get* subtrees alone were ~19% of sampled allocation in a mixed window (e.g. one full Config walk — exception included — per expired session via ClickstreamListener, per 250ms reindex-pause iteration, per request-cost increment). Expected effect is reduced allocation → less GC pressure, not a single-endpoint latency win.

Checklist

  • Tests
  • Translations
  • Security Implications Contemplated (none — no input handling or auth paths touched; rate-limit values remain config-driven)

Additional Info

New/extended unit tests: LoggerTest (suppliers not evaluated when level disabled), LeakyTokenBucketImplTest (runtime config override reaches a running bucket; reads within TTL stay cached), ConfigTest (getLongProperty missing-key default; env-alias-present-but-unparseable falls back to default for int/long/float). The deprecated no-default Config getters still throw NoSuchElementException for missing base keys — that documented contract is unchanged (and pinned by existing tests).

Closes #36145

swicken added 3 commits June 12, 2026 13:57
…nfig paths

Allocation profiling (Pyroscope, TLAB profiles) attributed several hotspots
to four mechanical issues in core utility code; Config.get* subtrees alone
accounted for ~19% of sampled allocation in a mixed-load window.

- Logger: lazy Supplier overloads (debug/info) no longer evaluate the
  message supplier when the level is disabled. ~900 call sites use
  suppliers specifically to avoid message-building cost, and all of them
  evaluated eagerly (including reflectionToString on Contentlets via
  PermissionBitFactoryImpl debug logging on the reindex path).
- Config: getInt/getLong/getFloat/getBoolean env-alias probes are now
  guarded with containsKey instead of throwing and discarding a
  NoSuchElementException on every read of a non-env-overridden key
  (same guard getBooleanProperty(name, default) already had).
- Logger.isVelocityMessage: memoized per class name; previously allocated
  a lowercased class name on every Class-keyed log call.
- LeakyTokenBucketImpl: isEnabled/getMaximumBucketSize/getRefillPerSecond
  return the values resolved at construction instead of walking Config on
  every request-cost increment.

Fixes #36145
…ization

Returning constructor-captured values from the application-scoped
LeakyTokenBucketImpl meant RATE_LIMIT_ENABLED / RATE_LIMIT_REFILL_PER_SECOND /
RATE_LIMIT_MAX_BUCKET_SIZE changes (system table or config refresh) were
ignored until restart - enabling rate limiting during an incident would
silently do nothing.

Memoize the Config lookups with a 30s TTL instead: runtime overrides still
take effect (bounded staleness) while the per-request-cost-increment Config
walk is gone.

Refs #36145
- Pin the runtime-override contract with a test: a TTL-injectable
  constructor lets the new LeakyTokenBucketImplTest assert that a runtime
  Config change reaches a running bucket and that reads within the TTL
  serve the cached value (guards against a future cleanup reverting the
  suppliers to the constructor-captured fields).
- Cover Config.getLongProperty missing-key default and pin the
  env-alias-present-but-unparseable fallback for int/long/float.
- Memoize the RATE_LIMIT_ADD_HEADER_INFO read in getHeaderInfo(): it runs
  unconditionally per request via RequestCostFilter, same hot path.
- Make the allow() debug message lazy via the new Supplier overload.
- Replace the name-keyed Caffeine cache in Logger.isVelocityMessage with a
  ClassValue: allocation-free lookups (the capturing lambda allocated per
  call), per-Class keying (no FQN conflation across OSGi classloaders),
  and entries released with their class.

Refs #36145
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

❌ Codex Review failed — openai.gpt-5.5

The review job failed before producing output. See the run for details.

Run: #27437728136

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @swicken's task in 1m 8s —— View job


Rollback Safety Analysis

  • Read rollback-unsafe categories reference
  • Get full PR diff
  • Analyze changes against unsafe categories
  • Apply appropriate label and post findings

Result: ✅ Safe To Rollback

The label AI: Safe To Rollback has been applied.


Analysis

All six changed files were reviewed against every category in the rollback-unsafe reference. The changes are purely behavioral / in-memory — no database schema, no Elasticsearch mapping, no data migration, no API contract, no OSGi interface, no VTL viewtool surface change.

File Change Category checked Verdict
Logger.java Lazy Supplier guard + ClassValue memoization for isVelocityMessage M-4 (OSGi), H-8 (VTL) Safe — no public method signatures changed; isVelocityMessage is private static. ClassValue behavior is semantically identical to the old toLowerCase path — same result, just cached.
Config.java containsKey guard before calling props.getInt/getLong/getFloat/getBoolean None applicable Safe — pure read-path optimization; no schema, no data, no API contract touched.
LeakyTokenBucketImpl.java TTL-memoized suppliers for four Config reads; new @VisibleForTesting constructor overload None applicable Safe — no public interface changed (implements LeakyTokenBucket); the rate-limit behavior is preserved; config values still refresh within 30 s. New constructor overload is @VisibleForTesting only.
Test files (LoggerTest, ConfigTest, LeakyTokenBucketImplTest) New / extended unit tests N/A Safe — test-only.

No matches against any rollback-unsafe category (C-1 through C-4, H-1 through H-8, M-1 through M-4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Eliminate avoidable per-call allocation in hot logging and config paths

1 participant