Skip to content

[spark] Fix column projection on log/upsert read path#3057

Open
fresh-borzoni wants to merge 1 commit intoapache:mainfrom
fresh-borzoni:fix/spark-projection-type-dependent
Open

[spark] Fix column projection on log/upsert read path#3057
fresh-borzoni wants to merge 1 commit intoapache:mainfrom
fresh-borzoni:fix/spark-projection-type-dependent

Conversation

@fresh-borzoni
Copy link
Copy Markdown
Contributor

FlussPartitionReader.rowType uses the full table schema, but scanners return projected rows with shifted ordinals. Type-dependent accessors in FlussAsSparkRow dispatch via rowType.getTypeAt(ordinal), so projected queries on TIMESTAMP/ARRAY/STRUCT columns get a ClassCastException. Override convertToSparkRow in both subclasses to use the projected row type, and stop double-projecting in SortMergeReader.

closes #3029

@fresh-borzoni fresh-borzoni force-pushed the fix/spark-projection-type-dependent branch from e3c79fb to 378beea Compare April 10, 2026 11:43
@fresh-borzoni
Copy link
Copy Markdown
Contributor Author

fresh-borzoni commented Apr 10, 2026

@luoyuxia @YannByron @Yohahaha PTAL 🙏

@fresh-borzoni fresh-borzoni force-pushed the fix/spark-projection-type-dependent branch from 378beea to 264a955 Compare April 10, 2026 11:57
@fresh-borzoni fresh-borzoni force-pushed the fix/spark-projection-type-dependent branch from 264a955 to 8ac8103 Compare April 10, 2026 13:47
Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM! left one comments, thank you!

Comment on lines +36 to +41
private lazy val projectedRowType = rowType.project(projection)

override protected def convertToSparkRow(flussRow: FlussInternalRow): InternalRow = {
DataConverter.toSparkInternalRow(flussRow, projectedRowType)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move these common func to FlussPartitionReader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark] Column projection broken on log/upsert read path for type-dependent accessors

2 participants