fix: sort ASCII-safe strings by runtime kind by He-Pin · Pull Request #868 · databricks/sjsonnet

He-Pin · 2026-05-23T11:52:56Z

Motivation:
Val.AsciiSafeStr is a Val.Str subclass used by renderer/string optimizations. std.sort was using exact runtime class equality to validate and dispatch sortable values, so arrays containing AsciiSafeStr could fail even though Jsonnet semantics treat those values as strings.

Key Design Decision:
Classify values by Jsonnet runtime kind rather than exact JVM/Native class. This preserves the existing specialized string/number/array sort paths while making subtype-based string optimizations semantically transparent to the standard library.

Modification:

Add small integer sort-kind classification in SetModule.
Replace exact getClass checks in default sorting and key-function sorting with sort-kind checks.
Add a regression in new_test_suite/set_sort_ascii_safe_strings.jsonnet for all-AsciiSafeStr, mixed Str/AsciiSafeStr, and key-function-produced AsciiSafeStr sorting.

Benchmark Results:
This is a correctness/enabler PR, not a claimed performance win. On the Scala Native 0.5.12 stacked profiling branch, kube-prometheus output stayed byte-identical; sort-kind-only A/B was neutral: forward 194.3 ± 14.6 ms clean vs 187.9 ± 12.3 ms candidate, reversed 170.7 ± 2.7 ms clean vs 172.0 ± 1.6 ms candidate.

Analysis:
The exact-class check was brittle because optimized runtime value subclasses must remain semantically indistinguishable from their base Jsonnet type. The replacement also avoids closure-based forall(_.getClass == keyType) checks in this path, but the important result is that future AsciiSafeStr propagation can be evaluated without breaking std.sort.

References:

Discovered while evaluating broader short-string AsciiSafeStr propagation for the Scala Native 0.5.12 performance work.

Result:

./mill --no-server --ticker false --color false __.reformat passed.
./mill --no-server --ticker false --color false -j 1 __.test passed 444/444.

Motivation: AsciiSafeStr is a Val.Str subclass used by renderer optimizations, but std.sort compared exact runtime classes. Sorting arrays containing ASCII-safe strings could fail even though Jsonnet semantics treat them as strings. Modification: Classify sortable values by Jsonnet runtime kind instead of exact JVM/Native class, preserving existing fast paths for strings, numbers, arrays, booleans, and objects. Add a regression covering all-AsciiSafeStr, mixed Str/AsciiSafeStr, and key-function-produced AsciiSafeStr sorting. Result: std.sort now treats AsciiSafeStr values as strings without changing Jsonnet semantics. Full cross-platform validation will be run before opening the draft PR.

Motivation: Scala Native kube-prometheus rendering still showed write/output overhead after the renderer and strict JSON import stack. `NativeOutputStream` already bypasses the JVM-compatible `PrintStream` path by writing through C `fwrite`, but stdout still used the platform default stdio buffering. Key Design Decision: Keep this optimization Native-only and local to stdout buffering. Instead of changing renderer flush thresholds or `ByteBuilder` behavior globally, configure the C stdio stream with full buffering before any `NativeOutputStream` writes occur. Passing a null buffer lets libc own the buffer lifetime, so the Scala object does not need to retain native memory. Modification: - Configure `NativeOutputStream` with `setvbuf(file, null, _IOFBF, 256 KiB)` during construction. - Leave JVM, JS, YAML, expect-string, and file-output code paths unchanged. - Preserve existing explicit `flush()` behavior for trailing newline and close handling. Benchmark Results: Workload: `jrsonnet/tests/realworld/entry-kube-prometheus.jsonnet -J vendor` Candidate was benchmarked on the Scala Native 0.5.12 stacked exploration branch against clean `cf7b8af9`. | Order | Clean | Candidate | Result | | --- | ---: | ---: | ---: | | Forward mean | 218.848 ms | 188.528 ms | -13.9% | | Forward median | 215.517 ms | 187.368 ms | -13.1% | | Reverse mean | 224.045 ms | 183.701 ms | -18.0% | | Reverse median | 224.281 ms | 182.914 ms | -18.4% | Output equality matched by `cmp`. Validation: - `./mill --no-server --ticker false --color false __.reformat` - `./mill --no-server --ticker false --color false -j 1 __.test` — 444 passed, 0 failed - `./mill --no-server --ticker false --color false bench.runRegressions` Analysis: This is a lower-risk write/flush optimization than increasing `ByteBuilder` thresholds: it does not alter rendering order, JSON escaping, object materialization, or JVM/JS behavior. It only changes the buffering policy of the Native stdout `FILE*`, and explicit flushes still happen at the same public boundaries. References: - Scala Native 0.5.12 migration PR: #867 - Related performance stack context: #863, #864, #865, #866, #868 Result: Native stdout rendering writes fewer/smoother buffered chunks for large JSON output while preserving byte-identical output and the existing flush contract.

Motivation: The Native stdout buffering follow-up showed that downstream buffering can materially reduce large-output write overhead. JSON `-o` output still sent `ByteRenderer` chunks directly to the file output stream, relying only on `ByteBuilder`'s internal flush threshold. Key Design Decision: Keep the change local to the JSON output-file fast path. Rather than changing `ByteBuilder` thresholds globally, wrap the file output stream in a `BufferedOutputStream` with the same 256 KiB output buffer size used for the Native stdout buffering follow-up. YAML, expect-string, stdout, and renderer semantics stay unchanged. Modification: - Add `OutputBufferSize = 256 * 1024` in `SjsonnetMainBase`. - Wrap JSON output-file `ByteRenderer` targets in `BufferedOutputStream(out, OutputBufferSize)`. - Flush the buffered stream at the same completion boundary before closing the underlying file output stream. Benchmark Results: Workload: `jrsonnet/tests/realworld/entry-kube-prometheus.jsonnet -J vendor -o /tmp/fileout-*.json` Candidate was benchmarked on the Scala Native 0.5.12 stacked exploration branch after the Native stdout buffering commit. | Order | Clean | Candidate | Result | | --- | ---: | ---: | ---: | | Forward mean | 217.372 ms | 205.062 ms | -5.7% | | Forward median | 196.625 ms | 183.491 ms | -6.7% | | Reverse mean | 210.517 ms | 177.174 ms | -15.8% | | Reverse median | 193.394 ms | 175.878 ms | -9.1% | Output equality matched by `cmp`. Validation: - `./mill --no-server --ticker false --color false __.reformat` - `./mill --no-server --ticker false --color false -j 1 __.test` — 444 passed, 0 failed - `./mill --no-server --ticker false --color false bench.runRegressions` Analysis: This preserves the existing rendering pipeline and only changes the buffering layer for file output. It avoids global `ByteBuilder` threshold changes, keeps stdout behavior separate, and does not affect YAML or expect-string paths. References: - Native stdout buffering PR: #869 - Scala Native 0.5.12 migration PR: #867 - Related performance stack context: #863, #864, #865, #866, #868 Result: Large JSON file output writes are buffered more effectively while preserving byte-identical output and the existing flush/close contract.

He-Pin marked this pull request as ready for review May 23, 2026 13:33

This was referenced May 23, 2026

perf: buffer native stdout writes #869

Merged

perf: buffer json file output #870

Merged

stephenamar-db merged commit 438777c into databricks:master May 27, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sort ASCII-safe strings by runtime kind#868

fix: sort ASCII-safe strings by runtime kind#868
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:fix/sort-asciisafe-kind

He-Pin commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants