perf: use commix concurrent GC for Native CLI binary#881
Open
He-Pin wants to merge 1 commit into
Open
Conversation
Motivation: The Native CLI defaulted to immix, whose stop-the-world collections dominate wall-clock on allocation-heavy configs and cause large latency variance. Profiling against jrsonnet showed the remaining gap was GC/allocation cost, not interpreter dispatch. Modification: Set nativeGC = "commix" (concurrent immix) on the Native release module. commix collects on background threads, overlapping collection with evaluation. Output is byte-identical and RSS is unchanged vs immix (it still frees -- bounded, safe on small machines). Result: jrsonnet realworld suite (min ms, interleaved, cooled): kube-prometheus 141.8 -> 122.1 (1.16x) loki 43.9 -> 40.5 mimir 51.6 -> 47.6 tempo 45.4 -> 45.7 (neutral) STW variance collapses (kube-prometheus +-55ms -> +-5ms). RSS 168 -> 169 MB. Native test suite 462/462 pass.
Contributor
Author
|
@stephenamar-db FYI @WojciechMazur cc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switch the Native CLI release binary from the default immix GC to commix (concurrent immix). commix collects on background threads, overlapping collection with evaluation. This is a one-line
build.millchange with byte-identical output and unchanged RSS — it still frees memory, so it stays bounded and safe on small machines.Why commix
While profiling the Native binary against jrsonnet, the remaining wall-clock gap turned out to be GC/allocation cost, not interpreter dispatch. immix's stop-the-world collections dominate on allocation-heavy configs and cause large latency variance. I benchmarked all four Scala Native GCs (
none,immix,commix,boehm) on jrsonnet's realworld config suite (min ms, interleaved + cooled, 20 runs each):Ranking is consistent: none < commix < immix < boehm.
boehmis the slowest of all four (eliminated).noneis fastest (and on kube-prometheus essentially ties jrsonnet) but never frees → OOM risk on huge/adversarial inputs, so it is not a safe default. commix gives most of the safe win.The residual gap to jrsonnet on the smaller configs (~1.5×, even with
none) is interpreter + allocation throughput, which no GC choice addresses.Trade-off
commix uses background GC threads (higher total CPU). On a genuinely single-core machine it could in theory regress; on any multi-core machine it is a clear win. The module keeps
nativeMultithreading = Noneand commix still manages its own GC threads correctly.Verification