[FLINK-39392][table] Support conditional traits for PTFs by gustavodemorais · Pull Request #27886 · apache/flink

gustavodemorais · 2026-04-02T11:36:23Z

What is the purpose of the change

We'd like to make PTF traits configurable so we can have multiple versions depending on how they are configured by the user.

The first suggestion: introduce a declarative addTraitWhen API on StaticArgument that allows table argument traits to vary based on the SQL call context (e.g., whether PARTITION BY is provided or scalar argument values). This replaces the static

As the first use case, TO_CHANGELOG now supports optional PARTITION BY:
With PARTITION BY: set semantics (co-located parallel execution)

  ┌────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┐
  │                    │   Row semantics (no PARTITION BY)   │  Set semantics (with PARTITION BY)  │
  ├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤                                                                                                                                                                                                               
  │ Distribution       │ Inherited from upstream             │ Hash by partition key               │
  ├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤                                                                                                                                                                                                               
  │ Key co-location    │ Preserved within operator chains    │ Guaranteed across operators         │
  ├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤                                                                                                                                                                                                               
  │ Insert-only source │ Direct chaining, no Exchange        │ Exchange(hash)                      │
  ├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤                                                                                                                                                                                                               
  │ Retract source     │ Direct chaining, no Exchange        │ Exchange(hash)                      │
  ├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤                                                                                                                                                                                                               
  │ Upsert source      │ ChangelogNormalize + Exchange(hash) │ ChangelogNormalize + Exchange(hash) │
  └────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┘

Brief change log

Add conditional traits with TraitContext
Evaluate them where necessary
Use if for the first use case TO_CHANGELOG with or without partition by
Adjust docs and comments

Verifying this change

Add semantic test

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (docs / JavaDocs)

… arguments

…inference

… semantics

flinkbot · 2026-04-02T11:45:34Z

CI report:

e8f776f Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

…cumentation

twalthr

Thank you for this PR @gustavodemorais. Overall I'm +1 for this change. However, we need to clearly define the boundaries, when static arguments are fully resolved and a trait condition has no effect anymore. Some locations look currently very hacky, we should take another look. Also we need Table API support which is not covered by this PR, at least not in tests.

twalthr · 2026-04-09T10:51:47Z

docs/content/docs/sql/reference/queries/changelog.md

 | Parameter    | Required | Description |
 |:-------------|:---------|:------------|
-| `input`      | Yes      | The input table. Must include `PARTITION BY` for parallel execution. Accepts insert-only, retract, and upsert tables. |
+| `input`      | Yes      | The input table. With `PARTITION BY`, rows with the same key are co-located for parallel execution. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance. |


Suggested change

| `input` | Yes | The input table. With `PARTITION BY`, rows with the same key are co-located for parallel execution. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance. |

| `input` | Yes | The input table. With `PARTITION BY`, rows with the same key are co-located for parallel execution (set semantics). Without `PARTITION BY`, each row is processed independently (row semantics). Accepts insert-only, retract, and upsert tables. |

twalthr · 2026-04-09T10:53:23Z

docs/content/docs/sql/reference/queries/changelog.md


+#### Without PARTITION BY
+
+```sql


Let's remove all PARTITION BY examples for now. A default TO_CHANGELOG example should always be without PARTITION BY. They do not provide any benefit but rather add exchange overhead.

twalthr · 2026-04-09T10:55:31Z

...-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinitions.java

+                                            Row.class,
+                                            false,
+                                            EnumSet.of(
+                                                    StaticArgumentTrait.TABLE,


Just to make things explicit, I would add StateicArgumentTrait.ROW_SEMANTIC here as well.

twalthr · 2026-04-09T10:57:42Z

.../flink-table-common/src/main/java/org/apache/flink/table/types/inference/StaticArgument.java

+     *         .addTraitWhen(not(hasPartitionBy()), ROW_SEMANTIC_TABLE);
+     * }</pre>
+     */
+    public StaticArgument addTraitWhen(


Suggested change

public StaticArgument addTraitWhen(

public StaticArgument withConditionalTrait(

I would swap the parameter order then: withConditionalTrait(trait, condition)

twalthr · 2026-04-09T11:01:26Z

.../flink-table-common/src/main/java/org/apache/flink/table/types/inference/StaticArgument.java

+    private final List<ConditionalTrait> conditionalTraits;
+
+    /** A trait that is conditionally added based on a {@link TraitCondition}. */
+    private static final class ConditionalTrait implements Serializable {


nit: classes to the bottom of the file

Suggested change

private static final class ConditionalTrait implements Serializable {

private static final class ConditionalTrait {

twalthr · 2026-04-09T12:49:35Z

.../flink-table-common/src/main/java/org/apache/flink/table/types/inference/TraitCondition.java

+    }
+
+    /** True when the named boolean argument is provided and its value is {@code true}. */
+    static TraitCondition argIsTrue(final String name) {


generialize the is true and is false to:

static <T> TraitCondition argIsEqualTo(T obj) {ctx.getScalarArgument(name, obj.getClass) == obj}

twalthr · 2026-04-09T12:58:27Z

...va/org/apache/flink/table/planner/plan/nodes/exec/stream/StreamExecProcessTableFunction.java

        }

        final int timeColumn = inputTimeColumns.get(tableArgCall.getInputIndex());
+        final org.apache.flink.table.types.inference.TraitContext traitCtx =


pay attention to full imports, seems Claude loves to do this

Suggested change

final org.apache.flink.table.types.inference.TraitContext traitCtx =

final TraitContext traitCtx =

same comment as above. resolve the static arg as early as possible to not reconstruct TraitContext multiple times

twalthr · 2026-04-09T13:00:30Z

...pache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalProcessTableFunction.java

+                    .noneMatch(
+                            arg ->
+                                    arg.is(StaticArgumentTrait.SET_SEMANTIC_TABLE)
+                                            || arg.hasConditionalTrait(


we should do this on the actual resulting trait

twalthr · 2026-04-09T13:02:32Z

...pache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalProcessTableFunction.java

+                if (operand.getKind() == SqlKind.DEFAULT || !(operand instanceof RexLiteral)) {
+                    return Optional.empty();
+                }
+                return Optional.ofNullable(((RexLiteral) operand).getValueAs(clazz));


this is too simple, it should follow the same rules as CallContext does. Otherwise it won't be possible e.g. to get Instant.class or other literals.

twalthr · 2026-04-09T13:03:45Z

...e/flink/table/planner/plan/rules/physical/stream/StreamPhysicalProcessTableFunctionRule.java

+        final boolean hasPartitionBy = partitionKeys.length > 0;
+        final boolean reportedAsSet = tableCharacteristic.semantics == Semantics.SET;
+        final boolean setIsConditional =
+                staticArg.hasConditionalTrait(StaticArgumentTrait.SET_SEMANTIC_TABLE);


too fragile. determine the effective StaticArgument first and then execute this logic.

gustavodemorais added 4 commits April 2, 2026 13:27

[FLINK-39392][table] Add declarative conditional traits for PTF table…

d62d1e4

… arguments

[FLINK-39392][table] Evaluate conditional traits in planner and type …

ba5b471

…inference

[FLINK-39392][table] Support optional PARTITION BY for TO_CHANGELOG

1a62621

[FLINK-39392][table] Add tests and documentation for TO_CHANGELOG row…

4acdc14

… semantics

[FLINK-39392][table] Adjust row semantics distribution and improve do…

e8f776f

…cumentation

gustavodemorais marked this pull request as ready for review April 8, 2026 14:46

twalthr reviewed Apr 9, 2026

View reviewed changes

	\| `input` \| Yes \| The input table. With `PARTITION BY`, rows with the same key are co-located for parallel execution. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance. \|
	\| `input` \| Yes \| The input table. With `PARTITION BY`, rows with the same key are co-located for parallel execution (set semantics). Without `PARTITION BY`, each row is processed independently (row semantics). Accepts insert-only, retract, and upsert tables. \|

	public StaticArgument addTraitWhen(
	public StaticArgument withConditionalTrait(

	private static final class ConditionalTrait implements Serializable {
	private static final class ConditionalTrait {

	final org.apache.flink.table.types.inference.TraitContext traitCtx =
	final TraitContext traitCtx =


		#### Without PARTITION BY

		```sql

Conversation

gustavodemorais commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

twalthr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gustavodemorais commented Apr 2, 2026 •

edited

Loading

flinkbot commented Apr 2, 2026 •

edited

Loading