Skip to content

Kafka Connect: Precompute UUID-as-bytes flag in RecordConverter#16654

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:kafka-connect-precompute-uuid-write-format
Open

Kafka Connect: Precompute UUID-as-bytes flag in RecordConverter#16654
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:kafka-connect-precompute-uuid-write-format

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

RecordConverter.convertUUID recomputed FileFormat.PARQUET.name().toLowerCase(Locale.ROOT).equals(config.writeProps().get(DEFAULT_FILE_FORMAT)) for every UUID-typed value. The write file format is fixed for the converter's lifetime (writeProps is set once on the config), so this boolean is constant, yet enum.name() + toLowerCase allocated a fresh "parquet" String on every call, plus a map lookup and an equals.

This resolves the flag once in the constructor (writeUuidAsBytes), reducing convertUUID to a field read. Behavior is unchanged: the same 16-byte representation is returned for Parquet and the same UUID otherwise.

A throwaway A/B microbench over the whole convertUUID method (2M iterations x 9 trials, median; baseline mirrors the current inline expression, optimized uses the precomputed boolean) showed the per-value cost drop:

input format before after faster
String parquet 53.6 ns 32.5 ns 39%
String orc 46.1 ns 26.1 ns 43%
UUID parquet 32.8 ns 5.9 ns 82%
UUID orc 22.3 ns 2.4 ns 89%

That is roughly 20-27 ns saved per UUID value, about 40% of the method on String inputs (the common Kafka Connect case). The numbers are indicative wall-clock from a microbench, not JMH.

Existing TestRecordConverter covers the conversion (including testUUIDConversionWithParquet); its mock now defaults writeProps() to an empty map to mirror production, where it is never null.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant