perf: cache offsetBufferAddress in CometPlainVector for variable-width vectors#4364
perf: cache offsetBufferAddress in CometPlainVector for variable-width vectors#43640lai0 wants to merge 5 commits into
Conversation
| return UTF8String.fromString(convertToUuid(result).toString()); | ||
| } | ||
| } else { | ||
| throw new RuntimeException("Unsupported UTF8 vector type: " + valueVector.getName()); |
There was a problem hiding this comment.
Is it possible to write a test for this error branch?
There was a problem hiding this comment.
Sure, Thanks for review. I'll add this in next commit.
There was a problem hiding this comment.
I looked into this and I don't think this branch is reachable through the real construction path.
LargeVarCharVector is rejected in the CometDecodedVector constructor via fromArrowType(LargeUtf8), before getUTF8String can be called.
Supported string vectors use VarCharVector, so they take the offsetBufferAddress != -1 branch. So I don't think we should add a test for this branch.
…or.java Co-authored-by: Andy Grove <agrove@apache.org>
|
I added a small focused microbenchmark for the On my local arm64 macOS machine with OpenJDK 17.0.18, using 1M rows, 3 warmup iterations, and 5 measured rounds:
This shows about a 10x improvement on this tight loop. The offset buffer address is stable for the lifetime of the vector, so caching it is safe. This is a simple timing benchmark rather than full JMH, but it directly measures the targeted path. |
mbutrovich
left a comment
There was a problem hiding this comment.
Minor nits, thanks for picking up this issue @0lai0!
| return UTF8String.fromString(convertToUuid(result).toString()); | ||
| } | ||
| } else { | ||
| throw new RuntimeException("Unsupported UTF8 vector type: " + valueVector.getName()); |
There was a problem hiding this comment.
Is this branch reachable in practice? NullVector is already filtered by isNullAt because valueBufferAddress == -1, so I am trying to think which ValueVector subclass would actually land here. If it is truly unreachable, IllegalStateException reads as more idiomatic than RuntimeException.
| public class CometPlainVector extends CometDecodedVector { | ||
| private final long valueBufferAddress; | ||
| private final long offsetBufferAddress; | ||
| private final boolean isBaseFixedWidthVector; |
There was a problem hiding this comment.
Consider a private final boolean isVariableWidthVector alongside isBaseFixedWidthVector instead of using offsetBufferAddress != -1 as the dispatch sentinel. The -1 pattern matches the existing valueBufferAddress precedent, so this is purely taste.
Alternatively, instead of an ever growing list of booleans, can we do an enum (maybe that's a bigger refactor and out of scope)?
Which issue does this PR close?
Closes #4280
Rationale for this change
CometPlainVectoralready caches the value buffer address, but variable-width reads still fetched the offset buffer address on everygetUTF8StringandgetBinarycall. Caching the offset buffer address avoids repeated Arrow buffer lookups on hot per-row paths.What changes are included in this PR?
offsetBufferAddressinCometPlainVectorfor variable-width vectorsgetUTF8StringandgetBinaryHow are these changes tested?