From 250de9c42be87526a8c852de0dcc69c18bbc42cf Mon Sep 17 00:00:00 2001
From: He-Pin <kerr.hepin@gmail.com>
Date: Sat, 23 May 2026 17:15:35 +0800
Subject: [PATCH] perf: skip UTF-8 encode for clean-ASCII long strings in
 renderer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Motivation:
async-profiler on the Scala Native kube-prometheus workload shows
HeapCharBuffer.wrap accounting for 40.3% of GC-allocation parents
(GC itself is ~25-30% of native runtime). The wrap site is
String.getBytes(UTF_8) called once per long (>=128 char) JSON string
inside BaseByteRenderer.visitLongString. Each call also allocates an
output byte[]. In K8s manifest output the overwhelming majority of
these long values (descriptions, annotations, base64 blobs, paths)
are pure printable ASCII with no JSON-escape characters.

Modification:
At the top of visitLongString, probe the string with the existing
Platform.isAsciiJsonSafe SWAR scan (16 chars/Long word, no allocation).
On a positive probe, delegate to renderAsciiSafeString which uses
Platform.copyAsciiStringToBytes for a direct char->byte memcpy and
skips the CharsetEncoder, HeapCharBuffer, and intermediate byte[]
entirely. Strings that contain any escape-requiring char or any
non-ASCII codepoint fall through to the existing byte-SWAR path
unchanged — they pay one SWAR scan over chars (~bLen/16 Long reads)
on top of the existing work, which is dominated by the encode cost
they already perform.

Result:
- ./mill 'sjsonnet.jvm[3.3.7]'.test : 444/444 pass
- Byte-identical output on kube-prometheus (1.5MB / 72k lines)
- hyperfine (Scala Native, kube-prom, 60 runs, warmup 8):
    before: 150.7 ms ± 8.3 ms
    after : 145.9 ms ± 6.2 ms
    => 1.03x faster (-4.8 ms mean, -3.2%)
- ./mill bench.runRegressions : completes successfully across all
  cpp/go/sjsonnet suites with no anomalies.

Analysis:
Modest but real: visitLongString is one call per long output string,
so even on a 72k-line kube-prom output we hit it on the order of
~10^4 times. Each spared call avoids two heap allocations and a
CharsetEncoder dispatch. Larger gains require attacking the
remaining UTF-8 path itself (next commits target the escape-needing
branch and the PlatformBase64 zero-copy).

References:
- async-profiler GC-parent analysis on /tmp/sjsonnet-yaml-fix
- Platform.isAsciiJsonSafe / CharSWAR.isAsciiJsonSafe (existing SWAR helper)
- renderAsciiSafeString / Platform.copyAsciiStringToBytes (existing fast path)
---
 sjsonnet/src/sjsonnet/BaseByteRenderer.scala | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/sjsonnet/src/sjsonnet/BaseByteRenderer.scala b/sjsonnet/src/sjsonnet/BaseByteRenderer.scala
index ce5f4907..219f8b12 100644
--- a/sjsonnet/src/sjsonnet/BaseByteRenderer.scala
+++ b/sjsonnet/src/sjsonnet/BaseByteRenderer.scala
@@ -305,8 +305,18 @@ class BaseByteRenderer[T <: java.io.OutputStream](
   /**
    * SWAR-accelerated path for long strings. Converts to UTF-8 bytes once, then bulk-copies clean
    * chunks and escapes only the bytes that require it.
+   *
+   * Probes the string with a SWAR ASCII-safe scan first. When the string is clean printable ASCII
+   * (no escape chars, no non-ASCII), the entire UTF-8 encode pass (HeapCharBuffer.wrap +
+   * CharsetEncoder.loop + output byte[] allocation) is skipped — bytes are written directly from
+   * the chars via Platform.copyAsciiStringToBytes. This is the dominant case for K8s/JSON output
+   * where long values (descriptions, paths, base64 blobs) are pure ASCII.
    */
   private def visitLongString(str: String): Unit = {
+    if (Platform.isAsciiJsonSafe(str)) {
+      renderAsciiSafeString(str)
+      return
+    }
     val bytes = str.getBytes(java.nio.charset.StandardCharsets.UTF_8)
     val bLen = bytes.length
     val firstEscape = CharSWAR.findFirstEscapeChar(bytes, 0, bLen)