From f0d81ebf4b9036673cc424a0252bf652b4cb0354 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Thu, 21 May 2026 14:48:19 +0200 Subject: [PATCH 1/6] [HWORKS-2802] Document partitioned_by parameter on feature group creation https://hopsworks.atlassian.net/browse/HWORKS-2802 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a section to docs/user_guides/fs/feature_group/create.md describing the storage-engine-native partitioned_by parameter for Delta feature groups. Covers: - Usage example with create_feature_group / get_or_create_feature_group. - The CREATE TABLE … USING DELTA … GENERATED ALWAYS AS … contract: the storage layer derives the partition columns; the user's dataframe never carries them. - Validation rules: mutual exclusion with partition_key, requires event_time. - Partition pruning table — Delta auto-derives partition predicates from the GENERATED expressions for hierarchical specs (year / year+month / year+month+day / year+month+day+hour), so `fg.read(start_time=..., end_time=...)` and `fg.filter(fg.event_time >= ...)` prune at the partition level. Non-hierarchical specs (e.g. ["month"], ["year","week"]) are valid but skip the auto-derivation — only direct predicates on the grain columns prune. Recommend hierarchical specs. - Online feature store behavior: derived columns live offline-only by default; online_partition_columns=true opts into online materialization. Until the onlinefs consumer filter ships, the backend rejects partitioned_by + online_enabled=true with the default online_partition_columns=false. Document both workarounds. - Hudi: partitioned_by + HUDI is rejected at creation; Hudi support is tracked under a separate follow-up ticket. Signed-off-by: Jim Dowling Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user_guides/fs/feature_group/create.md | 54 +++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index c6db36f3ef..c7c6a91d0f 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -102,6 +102,60 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition. +##### Time-grain partitioning with `partitioned_by` (Delta only) + +When the partition columns are derived from the feature group's `event_time`, the Python client can hand the backend the desired time grains and let the storage engine generate the partition columns automatically. +Pass `partitioned_by=[...]` with one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. + +```python +fg = fs.get_or_create_feature_group( + name="transactions", + version=1, + primary_key=["tx_id"], + event_time="tx_ts", + partitioned_by=["year", "month", "day"], + time_travel_format="DELTA", +) +fg.insert(df) # df does not need year/month/day — Delta derives them +``` + +The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`. +The backend creates the table via `CREATE TABLE … USING DELTA … GENERATED ALWAYS AS …`, so the derived columns live entirely inside the storage layer; the source dataframe never carries them. + +`partitioned_by` and `partition_key` are mutually exclusive. +`partitioned_by` requires `event_time` to be set. + +###### Partition pruning + +Delta auto-derives partition predicates from the GENERATED expressions when the user filters on the source column. +Filtering on `event_time` ranges therefore prunes partitions for free on hierarchical specs: + +| `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? | +| --- | --- | --- | +| `["year"]` | ✅ | ✅ | +| `["year", "month"]` | ✅ | ✅ | +| `["year", "month", "day"]` | ✅ | ✅ | +| `["year", "month", "day", "hour"]` | ✅ | ✅ | +| `["month"]` (no year) | ⚠️ no — month alone is ambiguous across years | ✅ filter on month works | +| `["year", "week"]` | ⚠️ year only — week isn't directly derivable from a date range | ✅ both columns prune | +| `["day"]` (no year/month) | ⚠️ no — day-of-month is ambiguous | ✅ filter on day works | + +Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "day"]`) — they line up with the typical batch-pipeline access pattern and prune naturally. + +###### Online feature store + +By default, the derived partition columns live only in the offline storage; the online feature store does not get them. +Pass `online_partition_columns=True` to materialize them in the online row as well. + +While the online-store filter (the `onlinefs` consumer that drops `offline_only` columns from the RonDB write) is still pending, the backend rejects `partitioned_by` together with `online_enabled=true` and the default `online_partition_columns=false` to avoid writing the grain columns to RonDB by accident. +The two workarounds: keep the feature group offline-only, or set `online_partition_columns=True` to materialize the grains online explicitly. + +###### Hudi + +`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation. +Hudi needs a different mechanism (a `CustomKeyGenerator` + server-side `Transformer`) and is tracked under a separate follow-up ticket. +Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. + ##### Table format When you create a feature group, you can specify the table format you want to use to store the data in your feature group by setting the `time_travel_format` parameter. From 6b0c36317e3d698c6f82eaee2b64952b6a4267ef Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Sun, 31 May 2026 15:18:16 +0200 Subject: [PATCH 2/6] [HWORKS-2802] Update partitioned_by docs for the real-column design https://hopsworks.atlassian.net/browse/HWORKS-2802 The partitioned_by section described Delta GENERATED ALWAYS AS columns and storage-engine-side derivation, which is no longer how it works. Document the real design: the client derives the grain columns from event_time and writes them as real partition columns, pruning works natively on grain filters and via predicate translation on event_time ranges. Correct the online-store note: online-enabled partitioned_by feature groups are rejected entirely until HWORKS-2808, not only with the default online_partition_columns. Signed-off-by: Jim Dowling Co-Authored-By: Claude Opus 4.8 --- docs/user_guides/fs/feature_group/create.md | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index c7c6a91d0f..8197c9245f 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -104,8 +104,8 @@ By using partitioning the system will write the feature data in different subdir ##### Time-grain partitioning with `partitioned_by` (Delta only) -When the partition columns are derived from the feature group's `event_time`, the Python client can hand the backend the desired time grains and let the storage engine generate the partition columns automatically. -Pass `partitioned_by=[...]` with one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. +When the partition columns are derived from the feature group's `event_time`, hand the backend the desired time grains with `partitioned_by=[...]` and the Python client derives the partition columns for you. +Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. ```python fg = fs.get_or_create_feature_group( @@ -116,19 +116,20 @@ fg = fs.get_or_create_feature_group( partitioned_by=["year", "month", "day"], time_travel_format="DELTA", ) -fg.insert(df) # df does not need year/month/day — Delta derives them +fg.insert(df) # df does not need year/month/day — the client derives them ``` The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`. -The backend creates the table via `CREATE TABLE … USING DELTA … GENERATED ALWAYS AS …`, so the derived columns live entirely inside the storage layer; the source dataframe never carries them. +The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write and the backend registers them as partition columns through the normal table-creation path. +The source dataframe does not need to carry them. `partitioned_by` and `partition_key` are mutually exclusive. `partitioned_by` requires `event_time` to be set. ###### Partition pruning -Delta auto-derives partition predicates from the GENERATED expressions when the user filters on the source column. -Filtering on `event_time` ranges therefore prunes partitions for free on hierarchical specs: +The grain columns are real partition columns, so a filter on a grain column (for example `year == 2026`) prunes partitions natively. +A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so it prunes too on hierarchical specs: | `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? | | --- | --- | --- | @@ -144,11 +145,9 @@ Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", " ###### Online feature store -By default, the derived partition columns live only in the offline storage; the online feature store does not get them. -Pass `online_partition_columns=True` to materialize them in the online row as well. - -While the online-store filter (the `onlinefs` consumer that drops `offline_only` columns from the RonDB write) is still pending, the backend rejects `partitioned_by` together with `online_enabled=true` and the default `online_partition_columns=false` to avoid writing the grain columns to RonDB by accident. -The two workarounds: keep the feature group offline-only, or set `online_partition_columns=True` to materialize the grains online explicitly. +Online-enabled feature groups do not yet support `partitioned_by`. +The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=true` until that work lands (tracked under a separate follow-up ticket). +Keep the feature group offline-only to use `partitioned_by`. ###### Hudi From 00494373b894da3d99750817ca6eb8682ac7b171 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Wed, 10 Jun 2026 11:09:54 +0200 Subject: [PATCH 3/6] [HWORKS-2802] Drop key-generator detail from the Hudi partitioned_by note https://hopsworks.atlassian.net/browse/HWORKS-2802 The Hudi follow-up materializes the grain columns server-side and partitions on them directly; the CustomKeyGenerator phrasing described a mechanism the revised design no longer uses. Co-Authored-By: Claude Fable 5 --- docs/user_guides/fs/feature_group/create.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index 8197c9245f..97fa30189e 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -152,7 +152,7 @@ Keep the feature group offline-only to use `partitioned_by`. ###### Hudi `partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation. -Hudi needs a different mechanism (a `CustomKeyGenerator` + server-side `Transformer`) and is tracked under a separate follow-up ticket. +Hudi materializes the grain columns server-side in the streaming materialization job, and that work is tracked under a separate follow-up ticket. Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. ##### Table format From 1dec9e01c46fdfb56935b56924f8c959c05646c7 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Thu, 11 Jun 2026 06:41:05 +0200 Subject: [PATCH 4/6] [HWORKS-2802] Expand partitioned_by feature group docs https://hopsworks.atlassian.net/browse/HWORKS-2802 Flesh out the partitioned_by section into reference for the shipped feature: the parameter list (partitioned_by + online_partition_columns with their constraints), cross-session persistence and the round-trip through get_feature_group, the on-disk Hive layout, a read/partition- pruning example with the hierarchical-vs-non-hierarchical matrix, a clickstream-by-hour example, and the current online and Hudi limitations (online rejected at create and on enable). Co-Authored-By: Claude Fable 5 --- docs/user_guides/fs/feature_group/create.md | 83 ++++++++++++++++----- 1 file changed, 66 insertions(+), 17 deletions(-) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index 97fa30189e..d07f7e3cd3 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -104,7 +104,8 @@ By using partitioning the system will write the feature data in different subdir ##### Time-grain partitioning with `partitioned_by` (Delta only) -When the partition columns are derived from the feature group's `event_time`, hand the backend the desired time grains with `partitioned_by=[...]` and the Python client derives the partition columns for you. +Most time-series feature groups want to partition by a time grain derived from `event_time`. +Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you. Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. ```python @@ -116,20 +117,50 @@ fg = fs.get_or_create_feature_group( partitioned_by=["year", "month", "day"], time_travel_format="DELTA", ) -fg.insert(df) # df does not need year/month/day — the client derives them +fg.insert(df) # df does not need year/month/day; they derive from tx_ts ``` -The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`. -The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write and the backend registers them as partition columns through the normal table-creation path. -The source dataframe does not need to carry them. +The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`, but you never write the grain columns yourself. +The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write, and the backend registers them as partition columns through the normal table-creation path (no Delta generated columns, no extra job). +The source DataFrame must contain only your real features plus `event_time`; it must not carry the grain columns. -`partitioned_by` and `partition_key` are mutually exclusive. -`partitioned_by` requires `event_time` to be set. +On disk the data lands in the standard Hive layout, one directory level per grain in the order you listed them: -###### Partition pruning +```text +.../transactions_1/year=2026/month=06/day=11/ +``` + +The grains become real features on the feature group, so they show up in the schema and in `fg.partition_key`, and you can filter on them directly. +By default they are written only to the offline store (see [Online feature store](#online-feature-store) below). + +###### Parameters + +- `partitioned_by`: ordered, non-empty list of grains from `{"hour", "day", "week", "month", "year"}`, no duplicates. + Mutually exclusive with `partition_key`, and requires `event_time` to be set. + A grain must not collide with `event_time` or an existing feature name. +- `online_partition_columns` (default `False`): when `True`, the derived grain columns are also written to the online store; when `False` they are offline-only. + Online serving with `partitioned_by` is not supported yet, so this is effectively always `False` today (see below). + +###### Persistence across sessions + +`partitioned_by` is stored on the feature group, so it round-trips without re-passing it: -The grain columns are real partition columns, so a filter on a grain column (for example `year == 2026`) prunes partitions natively. -A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so it prunes too on hierarchical specs: +```python +fg = fs.get_feature_group("transactions", version=1) +fg.partitioned_by # ["year", "month", "day"] +fg.partition_key # ["year", "month", "day"] +``` + +###### Reading and partition pruning + +Read the whole group, or a time slice; the grain columns appear as normal feature columns, populated from `event_time`: + +```python +recent = fg.read(start_time="2026-06-01", end_time="2026-06-11") +``` + +The grain columns are real partition columns, so a filter on a grain column (for example `fg.filter(fg.year == 2026)`) prunes partitions natively. +A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so `fg.read(start_time=..., end_time=...)` prunes too on hierarchical specs (and tightens to the finest grain the range allows, so a within-one-month window also bounds `day`): | `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? | | --- | --- | --- | @@ -137,23 +168,41 @@ A filter on an `event_time` range is rewritten into equivalent grain-column pred | `["year", "month"]` | ✅ | ✅ | | `["year", "month", "day"]` | ✅ | ✅ | | `["year", "month", "day", "hour"]` | ✅ | ✅ | -| `["month"]` (no year) | ⚠️ no — month alone is ambiguous across years | ✅ filter on month works | -| `["year", "week"]` | ⚠️ year only — week isn't directly derivable from a date range | ✅ both columns prune | -| `["day"]` (no year/month) | ⚠️ no — day-of-month is ambiguous | ✅ filter on day works | +| `["month"]` (no year) | ⚠️ no, month alone is ambiguous across years | ✅ filter on month works | +| `["year", "week"]` | ⚠️ year only, week is not directly derivable from a date range | ✅ both columns prune | +| `["day"]` (no year/month) | ⚠️ no, day-of-month is ambiguous | ✅ filter on day works | + +Prefer hierarchical specs: `["year"]`, `["year", "month"]`, `["year", "month", "day"]`, `["year", "month", "day", "hour"]`. +They line up with the typical batch-pipeline access pattern and prune naturally on both grain-column and `event_time`-range filters. +Non-hierarchical specs are still valid; they just do not prune on an `event_time` range, only on a direct filter of the derived columns. -Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "day"]`) — they line up with the typical batch-pipeline access pattern and prune naturally. +###### Example: clickstream partitioned by the hour + +A high-volume event stream partitioned down to the hour, so a query for a few hours reads only those partitions: + +```python +fg = fs.get_or_create_feature_group( + name="clickstream", + version=1, + primary_key=["event_id"], + event_time="event_time", + partitioned_by=["year", "month", "day", "hour"], + online_enabled=False, + time_travel_format="DELTA", +) +fg.insert(clickstream_df) # only event_id / event_time / event fields +``` ###### Online feature store Online-enabled feature groups do not yet support `partitioned_by`. -The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=true` until that work lands (tracked under a separate follow-up ticket). +The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=True`, both at creation and when enabling online on an existing group. Keep the feature group offline-only to use `partitioned_by`. ###### Hudi `partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation. -Hudi materializes the grain columns server-side in the streaming materialization job, and that work is tracked under a separate follow-up ticket. -Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. +Until Hudi support lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. ##### Table format From 49db202ff6afb31a1117aedd1b8993d333a70cb3 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Sat, 13 Jun 2026 00:17:48 +0200 Subject: [PATCH 5/6] [HWORKS-2807] Document partitioned_by support on Iceberg https://hopsworks.atlassian.net/browse/HWORKS-2807 partitioned_by now works on DELTA and ICEBERG; NONE is rejected alongside Hudi. Update the section heading, supported-formats note, and the Hudi fallback guidance. Co-Authored-By: Claude Fable 5 --- docs/user_guides/fs/feature_group/create.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index 002ce4879f..367224968d 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -102,11 +102,12 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition. -##### Time-grain partitioning with `partitioned_by` (Delta only) +##### Time-grain partitioning with `partitioned_by` (Delta and Iceberg) Most time-series feature groups want to partition by a time grain derived from `event_time`. Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you. Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. +Supported on `time_travel_format="DELTA"` and `time_travel_format="ICEBERG"`. ```python fg = fs.get_or_create_feature_group( @@ -201,8 +202,8 @@ Keep the feature group offline-only to use `partitioned_by`. ###### Hudi -`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation. -Until Hudi support lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. +`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation; so is `time_travel_format="NONE"` (plain Hive/parquet), which has no grain-materialization step. +Until Hudi support lands, use `time_travel_format="DELTA"` or `"ICEBERG"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. ##### Table format From c2e8830da78850e058dbfb63906af37d28cc40c4 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Sat, 13 Jun 2026 07:31:35 +0200 Subject: [PATCH 6/6] [HWORKS-2807] Document partitioned_by on Hudi + stream limitation https://hopsworks.atlassian.net/browse/HWORKS-2807 Non-stream Hudi feature groups now support partitioned_by (direct Spark write); stream feature groups and NONE are rejected. Update the section heading, supported-formats note, Hudi note, and add a stream note. Co-Authored-By: Claude Fable 5 --- docs/user_guides/fs/feature_group/create.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index 367224968d..f985c96472 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -102,12 +102,12 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition. -##### Time-grain partitioning with `partitioned_by` (Delta and Iceberg) +##### Time-grain partitioning with `partitioned_by` Most time-series feature groups want to partition by a time grain derived from `event_time`. Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you. Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`. -Supported on `time_travel_format="DELTA"` and `time_travel_format="ICEBERG"`. +Supported on `time_travel_format="DELTA"`, `"ICEBERG"`, and `"HUDI"` for non-stream feature groups (see [Hudi](#hudi) and [Stream feature groups](#stream-feature-groups) below). ```python fg = fs.get_or_create_feature_group( @@ -202,8 +202,15 @@ Keep the feature group offline-only to use `partitioned_by`. ###### Hudi -`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation; so is `time_travel_format="NONE"` (plain Hive/parquet), which has no grain-materialization step. -Until Hudi support lands, use `time_travel_format="DELTA"` or `"ICEBERG"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes. +`partitioned_by` works on Hudi feature groups written directly by Spark (a non-stream feature group): the client materializes the grain columns and Hudi partitions on them. +On the Python (non-Spark) engine a Hudi feature group is created as a stream feature group, which is not yet supported (see below); use `time_travel_format="DELTA"` or `"ICEBERG"` there. +`time_travel_format="NONE"` (plain Hive/parquet) is rejected because it has no grain-materialization step. + +###### Stream feature groups + +`partitioned_by` is not yet supported on stream feature groups (`stream=True`). +Stream feature groups materialize through the DeltaStreamer job, which does not derive the grain columns yet, so the backend rejects `partitioned_by` on them at creation. +Create a non-stream feature group to use `partitioned_by`. ##### Table format