Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
8fdc05d
contrib(delta): scaffold typed-proto dispatch + skeleton crate
schenksj May 19, 2026
c41e915
contrib(delta): port native implementation from PR2 (minus SPI)
schenksj May 19, 2026
381282d
contrib(delta): port reflection-only Scala files + dev scripts
schenksj May 19, 2026
91e1d5c
contrib(delta): port SPI-touching Scala files + Maven profile + refle…
schenksj May 19, 2026
93f0f54
fix(shuffle): get_string matches Spark's UTF8String no-validate seman…
schenksj May 19, 2026
53fb5c0
contrib(delta): wire CometExecRule serde dispatch
schenksj May 19, 2026
1ee31b2
contrib(delta): wire per-partition DeltaPlanDataInjector
schenksj May 19, 2026
53f6cb4
contrib(delta): scalastyle fixes for DeltaIntegration + operators
schenksj May 19, 2026
de9e0d3
fix(error): wrap native Parquet errors as FAILED_READ_FILE.NO_HINT wi…
schenksj May 19, 2026
effe5f7
fix(contrib-delta) P7h: decline native scan for unsupported FS schemes
schenksj May 19, 2026
9096957
fix(contrib-delta) P7i: cache DeltaEngine per (scheme,authority,config)
schenksj May 19, 2026
56c2b01
fix(serde): decline CreateArray with mismatched child data types
schenksj May 19, 2026
fea28d7
perf(contrib-delta) P7j: pre-parse session TZ and key injectors by op…
schenksj May 19, 2026
7e9249f
perf(contrib-delta) P7k: cache resolved Method handle in DeltaIntegra…
schenksj May 19, 2026
a805f81
perf(contrib-delta) P7l: hoist CometScanTypeChecker out of per-scan loop
schenksj May 19, 2026
e346776
perf(contrib-delta) P7m: O(1) partition-value lookup in build_delta_p…
schenksj May 19, 2026
ed0d8ac
fix(error): thread file paths to FAILED_READ_FILE.NO_HINT wrapping
schenksj May 19, 2026
43768c1
fix(contrib-delta) P7n: review findings -- InputFileBlockHolder hook …
schenksj May 19, 2026
3005d6e
chore: spotless:apply (no behavioral change)
schenksj May 19, 2026
6ba81b3
docs(serde): point CometCreateArray decline at upstream tracking issue
schenksj May 19, 2026
702ddd1
feat(contrib-delta) P7o: resolve S3A credential chain Scala-side for …
schenksj May 19, 2026
e0c0390
docs(contrib-delta): document why checkLatestSchemaOnRead guard is lo…
schenksj May 19, 2026
7ace165
feat(contrib-delta) P7p: wire Delta column-mapping `id` mode via parq…
schenksj May 19, 2026
018914f
docs(contrib-delta): document remaining fallback gates as verified-no…
schenksj May 20, 2026
2cb9188
feat(contrib-delta) P7q WIP: native exec for Delta synthetic columns …
schenksj May 20, 2026
c53e724
feat(contrib-delta) P7r: wire Scala side for native Delta synthetic c…
schenksj May 20, 2026
ee9f9e4
feat(contrib-delta) P7s: unblock general Parquet field-ID matching gate
schenksj May 20, 2026
8e21ff5
feat(contrib-delta) P7t: native row-tracking synthesis (row_id + row_…
schenksj May 20, 2026
d602225
feat(contrib-delta) P7u: unblock outputHasIsRowDeleted DV-fallback path
schenksj May 20, 2026
da1096d
feat(contrib-delta) P7v: unblock TahoeBatchFileIndex DV fallback
schenksj May 20, 2026
d33cb5a
feat(contrib-delta) P7w: unblock enableRowTracking=false row-id queries
schenksj May 20, 2026
0b275f9
feat(contrib-delta) P7x: unblock synthetic-column-not-suffix decline …
schenksj May 20, 2026
fe6b80e
feat(contrib-delta) P7y: drop checkLatestSchemaOnRead=false gate
schenksj May 20, 2026
2d13a14
fix(contrib-delta) P7z: address code-review findings on gate-unblock …
schenksj May 20, 2026
f52b783
docs(contrib-delta): comprehensive design documentation under contrib…
schenksj May 20, 2026
7fa3754
docs(contrib-delta): proofreading corrections against actual source
schenksj May 20, 2026
6d386a4
docs(contrib-delta): add Apache license headers to design docs
schenksj May 20, 2026
fcaf82b
docs(contrib-delta): wire cross-doc links and add prev/next/index nav
schenksj May 20, 2026
c41b18f
test(contrib-delta): port Phase-1 test harness + inline Rust unit tests
schenksj May 20, 2026
6333cd6
test(contrib-delta): inline tests for predicate.rs and planner.rs
schenksj May 20, 2026
15fb272
test(contrib-delta): crate-root end-to-end integration test
schenksj May 20, 2026
24ba7fe
test(contrib-delta): port Phase-1 Scala suites (Native + ColumnMapping)
schenksj May 20, 2026
57a4d8f
test(contrib-delta): in-progress wiring to get Scala suites running u…
schenksj May 20, 2026
b6a446d
fix(contrib-delta): DeltaIntegration reflection bridge was inert in p…
schenksj May 20, 2026
95f7c3c
fix(comet): contrib leaf scans recognized as input boundaries
schenksj May 20, 2026
e88eea0
test(contrib-delta): finalize CometDeltaTestBase wiring after debug pass
schenksj May 20, 2026
9ed147c
fix(contrib-delta): PlanDataInjector reflective lookup of contrib inj…
schenksj May 20, 2026
ae373b4
fix(comet): CometScanWithPlanData trait so contrib scans surface plan…
schenksj May 20, 2026
c992a8d
test(contrib-delta): add CometDeltaFeaturesSuite + assertNativePlanCo…
schenksj May 20, 2026
0d75619
test(contrib-delta): reinforce NativeSuite tests with explicit plan-s…
schenksj May 20, 2026
6304170
style(comet): spotless:apply for CometScanWithPlanData trait match
schenksj May 20, 2026
9c14e61
test(contrib-delta): drop spark.sql.adaptive.forceApply
schenksj May 20, 2026
e2ae4c6
fix(contrib-delta): accept Delta scans by relation.location when file…
schenksj May 20, 2026
680079e
fix(contrib-delta): findAndStripDeltaScanBelow accepts Delta batch Fi…
schenksj May 21, 2026
0cffcb0
test(contrib-delta): tighten synthetic-column test (require row track…
schenksj May 21, 2026
01b30d8
test(contrib-delta): cleanup CometDeltaTestBase comment after forceAp…
schenksj May 21, 2026
39078d8
fix(contrib-delta): propagate logicalLink from source FileSourceScanExec
schenksj May 21, 2026
009df93
fix(contrib-delta): metadata-col bailout now happens INSIDE transform…
schenksj May 21, 2026
7de663c
fix(contrib-delta): defensive projection-vector builder + tmp_metadat…
schenksj May 21, 2026
67b1279
feat(contrib-delta): row_index emit with `_tmp_metadata_row_index` alias
schenksj May 21, 2026
26e3bfb
feat(contrib-delta): native synthesis of Spark `_metadata.*` virtual …
schenksj May 21, 2026
6581344
fix(contrib-delta): emit row_index as Int64 + propagate alias in emit…
schenksj May 21, 2026
f98b085
style(contrib-delta): spotless fixes on rules
schenksj May 21, 2026
f9022fc
fix(contrib-delta): include scan.output _metadata.* cols in wrapped o…
schenksj May 21, 2026
9baf297
fix(contrib-delta): native synthesis of base_row_id and materialised …
schenksj May 21, 2026
4826bb1
fix(contrib-delta): chain DV filter on top of synthetic emission
schenksj May 21, 2026
d374275
fix(contrib-delta): force reorder when synthetic suffix not in canoni…
schenksj May 21, 2026
4dbf7c3
fix(contrib-delta): plumb per-partition file paths to InputFileBlockH…
schenksj May 21, 2026
80e4dfd
fix(contrib-delta): emit is_row_deleted as Int8 to match Delta's Byte…
schenksj May 21, 2026
0b26684
fix(contrib-delta): prefer live matchingFiles for PreparedDeltaFileIndex
schenksj May 21, 2026
830c979
fix(contrib-delta): snapshot refresh + adjust DV double-DELETE test t…
schenksj May 22, 2026
d693349
test(contrib-delta): comprehensive accelerator-coverage assertion suite
schenksj May 22, 2026
843db82
refactor(contrib-delta): relocate Delta-scan dispatcher to contrib + …
schenksj May 22, 2026
1d89796
test(contrib-delta): verification script for the build gate
schenksj May 22, 2026
25c991e
docs(contrib-delta): Spark 3.5 + Delta 3.3 support feasibility evalua…
schenksj May 22, 2026
6c13187
feat(contrib-delta): Spark 3.5 + Delta 3.3.2 support
schenksj May 22, 2026
879dd61
docs(contrib-delta): replace Spark 3.5 feasibility with shipped status
schenksj May 22, 2026
1d6cada
ci(contrib-delta): add Delta Lake Contrib Tests workflow
schenksj May 22, 2026
7ee6a70
fix: address code-review findings (UB, broad catches, cache leak, gat…
schenksj May 22, 2026
ca96f31
feat(contrib-delta): add Spark 4.0 + Delta 4.0.0 support (third Spark…
schenksj May 22, 2026
eb44fac
fix(contrib-delta): preserve oneTaskPerPartition across convertBlock(…
schenksj May 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/scheduled_tasks.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"sessionId":"30c58e8c-25fc-4915-9010-bf68c560c7c1","pid":2718,"procStart":"Sun May 17 17:04:58 2026","acquiredAt":1779187628787}
213 changes: 213 additions & 0 deletions .github/workflows/delta_contrib_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Runs the contrib-delta Scala test suite on both supported (Spark, Delta)
# version pairs PLUS the build-gate verification script that proves default
# builds carry zero Delta surface. Modeled on iceberg_spark_test.yml.
#
# Three jobs:
# 1. build-native -- builds libcomet.so once with --features
# contrib-delta, uploads as an artifact.
# 2. delta-contrib-scala -- matrix over (Spark 3.5 + Delta 3.3.2) and
# (Spark 4.1 + Delta 4.1.0), downloads the
# native lib, runs all four contrib Scala
# suites (49 tests total per matrix cell).
# 3. delta-build-gate -- cheap independent job; runs
# dev/verify-contrib-delta-gate.sh which
# proves default cargo / mvn / dylib carry
# zero Delta surface. Runs in parallel.

name: Delta Lake Contrib Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "dev/changelog/*.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark-integration/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "dev/changelog/*.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark-integration/**"
workflow_dispatch:

permissions:
contents: read

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1
# Force GNU ld on Linux to match the rest of Comet's CI (rust-lld can't
# resolve -ljvm against the Zulu JDK layout installed by setup-java).
RUSTFLAGS: "-Clink-arg=-fuse-ld=bfd"

jobs:
# Build libcomet ONCE with the contrib-delta feature and share with both
# matrix cells via an artifact upload/download. Mirrors iceberg_spark_test.yml.
build-native:
name: Build Native Library (contrib-delta)
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs', 'contrib/delta/native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-

- name: Build native library with contrib-delta
run: |
cd native && cargo build --profile ci --features contrib-delta
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3 -Clink-arg=-fuse-ld=bfd"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs', 'contrib/delta/native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-contrib-delta
path: native/target/ci/libcomet.so
retention-days: 1

# Run all four contrib Scala suites across both (Spark, Delta) version
# pairs. The matrix asserts feature parity: same 49 tests must pass on
# Spark 3.5 + Delta 3.3.2 AND Spark 4.1 + Delta 4.1.0.
delta-contrib-scala:
needs: build-native
strategy:
matrix:
include:
- spark-version: { short: '3.5', full: '3.5.8' }
delta-version: '3.3.2'
scala-version: '2.13'
java-version: 17
- spark-version: { short: '4.0', full: '4.0.1' }
delta-version: '4.0.0'
scala-version: '2.13'
java-version: 17
- spark-version: { short: '4.1', full: '4.1.1' }
delta-version: '4.1.0'
scala-version: '2.13'
java-version: 17
fail-fast: false
name: delta-contrib/spark-${{ matrix.spark-version.full }}/delta-${{ matrix.delta-version }}
runs-on: ubuntu-24.04
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}

- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-contrib-delta
# Comet's test JVM looks under native/target/debug/ first then
# /release/. The CI build profile lands under /ci/ so we place it
# in /debug/ to satisfy the loader.
path: native/target/debug/

- name: Run contrib-delta Scala test suites
run: |
./mvnw -Pspark-${{ matrix.spark-version.short }},contrib-delta \
-pl spark -am test \
-Dsuites='org.apache.comet.contrib.delta.CometDeltaFeaturesSuite,org.apache.comet.contrib.delta.CometDeltaNativeSuite,org.apache.comet.contrib.delta.CometDeltaColumnMappingSuite,org.apache.comet.contrib.delta.CometDeltaCoverageSuite' \
-Djava.version=${{ matrix.java-version }} \
-Dmaven.compiler.source=${{ matrix.java-version }} \
-Dmaven.compiler.target=${{ matrix.java-version }} \
-Dmaven.gitcommitid.skip

- name: Upload surefire reports on failure
if: failure()
uses: actions/upload-artifact@v7
with:
name: surefire-reports-spark-${{ matrix.spark-version.short }}-delta-${{ matrix.delta-version }}
path: spark/target/surefire-reports/
retention-days: 5

# Independent of build-native: a cheap proof that default builds carry
# zero Delta surface. Runs the dev/verify-contrib-delta-gate.sh script
# which does its own cargo + mvn invocations across both feature
# combinations.
delta-build-gate:
name: Build-gate verification (default builds exclude Delta)
runs-on: ubuntu-24.04
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Run dev/verify-contrib-delta-gate.sh
run: |
dev/verify-contrib-delta-gate.sh
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ output
docs/comet-*/
docs/build/
docs/temp/
pr-4366-body*.md
Loading