PECOBLR-1121 Tests to validate Arrow patch across JVM versions.#1144
PECOBLR-1121 Tests to validate Arrow patch across JVM versions.#1144tejassp-db wants to merge 33 commits intoPECOBLR-1121/arrow-patch/stack-0from
Conversation
Added tests to validate Arrow patch code paths. Added Maven profiles to validate the behaviour across JVM versions and with/without "--add-opens=java.base/java.nio=ALL-UNNAMED" JVM arguments. By default, JVM version 11 is assumed. To use other JVM versions, the toolchain needs to be setup to point to the correct Java versions on the local machine in .m2/toolchains.xml.
Add tests to handle all data types supported by Arrow.
Add tests for Boolean, Null, Fixed size list, UTF-8 view, Binary view, list view, large list view types.
Remove default profile of JDK 11. Do not fail on Github actions.
Test validates the patched allocator properly releases memory by repeatedly parsing Arrow streams (1000 iterations). Designed to run with constrained heap (-Xmx100m) to verify no resource leaks or OOM errors occur under GC pressure.
Fix memory leaks and parse lesser rows to be within the test heap space.
| * Test the patched allocator does not put the JVM into GC pressure and cause it to OOM | ||
| * (OutOfMemoryError). | ||
| * | ||
| * TODO - add a mvn profile to run this test with low max heap size -Xmx100m. |
There was a problem hiding this comment.
what would be current behavior withot using this mvn profile? Will this pick higher memory?
There was a problem hiding this comment.
Added a profile for this in a later PR stack-6.
| <artifactId>maven-surefire-plugin</artifactId> | ||
| <configuration> | ||
| <!-- JaCoCo args + module opens --> | ||
| <argLine> |
There was a problem hiding this comment.
Arrow behaves fine with JDK 11, so I have not added a profile for it. Also JDK 11 is the default profile.
Test Arrow patch with different allocator types.
Test only DatabricksBufferAllocator with Arrow NIO disabled.
| /** Test all Arrow supported data types can be read and written with the patched Arrow classes. */ | ||
| @Tag("Jvm17PlusAndArrowToNioReflectionDisabled") | ||
| @EnabledOnJre({JRE.JAVA_17, JRE.JAVA_21}) | ||
| public class DatabricksArrowPatchReaderWriterTest { |
There was a problem hiding this comment.
Can we write this class in a modular way for maintainability?
- Extract base class for DataTester implementations:
abstract class BaseDataTester implements DataTester {
protected void assertNullAtEvenIndices(ValueVector vector, int rowCount,
java.util.function.Function<Integer, Object> expectedValueFn) {
for (int i = 0; i < rowCount; i++) {
if (i % 2 == 0) {
assertTrue(vector.isNull(i));
} else {
assertEquals(expectedValueFn.apply(i), vector.getObject(i));
}
}
}
}
- Split into category-specific test files:
- ArrowNumericTypesTest.java - integers, floats, decimals
- ArrowTemporalTypesTest.java - timestamps, dates, intervals
- ArrowBinaryStringTypesTest.java - binary, UTF-8, views
- ArrowComplexTypesTest.java - list, struct, map, union
Also minor typo date → data appears 20+ times:
byte[] date = writeData(...); // should be "data"
Not blocking, but would improve long-term maintainability.
There was a problem hiding this comment.
I split it into smaller clasess . Also renamed date -> data.
| * manager type when the creation of {@code RootAllocator} is not possible in the current JVM. | ||
| * | ||
| * <p>This test is in a separate class to ensure it runs in a fresh JVM with the system property set | ||
| * before Arrow's static initialization. |
There was a problem hiding this comment.
are we sure this is true? I think maven surefire defaults to reusing the same jvm fork: https://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#reuseForks
There was a problem hiding this comment.
Set it up in Github actions (PR - #1180) to run it with these args -DforkCount=1 -DreuseForks=false
There was a problem hiding this comment.
can we not add these to the pom?
There was a problem hiding this comment.
I didn't find a way to do it for a single test class.
| Throwable throwable = null; | ||
| try { | ||
| RootAllocator allocator = new RootAllocator(); | ||
| allocator.close(); // Unreachable code. |
There was a problem hiding this comment.
I think similar to the first PR we should try to write here?
There was a problem hiding this comment.
did we evaluate what is the overhead to generate these files when running the test itself? If it is too much overhead, can we add a readme on how these are generated.
There was a problem hiding this comment.
I have a class to generate these files. Should we check in the generator to the code base as well?
There was a problem hiding this comment.
I think we should. Even if do not generate on every test run, it would be useful to retain the generator.
Use Arrow chunk file with all supported Arrow types.
Description
Added tests to validate Arrow patch code paths. Added Maven profiles to validate the behaviour across JVM versions and with/without "--add-opens=java.base/java.nio=ALL-UNNAMED" JVM arguments.
By default, JVM version 11 is assumed. To use other JVM versions, the toolchain needs to be setup to point to the correct Java versions on the local machine in .m2/toolchains.xml.
Testing
Tests to validate Arrow patch behaviour.
Additional Notes to the Reviewer
Stacked PR.