[FSTORE-1938] Support chaining of Transformation Functions using a DAG by manu-sj · Pull Request #580 · logicalclocks/logicalclocks.github.io

manu-sj · 2026-05-18T20:29:43Z

No description provided.

Copilot

Pull request overview

Adds documentation for chaining Transformation Functions into a dependency graph (DAG) in the Hopsworks Feature Store docs, including how execution order is resolved, how to visualize the DAG, and how parallel execution behaves for independent branches.

Changes:

Documented chaining semantics for Transformation Functions (ODT + MDT), including cycle/duplicate-output rejection behavior.
Added guidance on visualizing the transformation execution DAG from UI and SDK.
Added performance/parallelism tuning details via n_processes, including defaults and serving-time pool pre-spawn.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
docs/user_guides/fs/transformation_functions.md	Introduces chained transformation DAG concept, DAG visualization, and performance tuning/parallelism behavior.
docs/user_guides/fs/feature_view/model-dependent-transformations.md	Adds a section describing chaining model-dependent transformations and links to performance tuning guidance.
docs/user_guides/fs/feature_group/on_demand_transformations.md	Adds a section describing chaining on-demand transformations and the cross-DAG path into feature views/MDTs.

manu-sj · 2026-06-10T08:35:43Z

+A model-dependent transformation can consume another MDT's output as its input.
+The DAG is resolved automatically at execution time, so producers always run before consumers.
+
+!!! example "Chaining two normalizers and a sum"


Renamed to "Chaining two increments and a sum" to match the add_one/add code. Fixed in efcea35.

manu-sj · 2026-06-10T08:35:44Z

+
+## Chaining Model-Dependent Transformations
+
+A model-dependent transformation can consume another MDT's output as its input.


Defined on first use: "A model-dependent transformation (MDT) can consume another MDT's output". Fixed in efcea35.

manu-sj · 2026-06-10T08:35:46Z

+Hopsworks resolves the execution order automatically using a topological sort of the resulting DAG, so dependencies always run before their consumers.
+Chaining works for both on-demand transformations attached to a feature group and model-dependent transformations attached to a feature view.
+
+!!! example "Chained MDTs on a feature view"


Spelled out: "Chained model-dependent transformations on a feature view". Fixed in efcea35.

manu-sj · 2026-06-10T08:35:47Z

+
+## Chaining On-Demand Transformations
+
+On-demand transformations attached to the same feature group can be chained: one ODT's output column can serve as another ODT's input.


Defined on first use: "On-demand transformations (ODTs) attached to the same feature group". Fixed in efcea35.

manu-sj · 2026-06-10T08:35:48Z

+An intermediate output consumed only by a downstream ODT can be dropped from the feature group; the full chain still executes during online serving, and the dropped column never becomes a stored feature.
+
+An ODT's output column becomes a regular feature in the feature group, which a downstream feature view can consume and pass into a model-dependent transformation.
+This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to configure on either side.


Spelled out: "between on-demand and model-dependent transformation chains". Fixed in efcea35.

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Document chaining of transformation functions across the user guides: how the output of one function feeds another, how the execution DAG resolves the order, how cycles and duplicate output columns are rejected, and how the DAG is rendered from the UI and from the SDK with visualize_transformations(). A Transformation Functions Performance Tuning subsection in the transformation functions guide covers the node-parallel execution model: the n_processes argument and its defaults per input shape, pool pre-spawning through init_serving and init_batch_scoring, Arrow shared-memory staging, and the HSFS_TF_POOL_START_METHOD override. The model-dependent transformations guide notes that statistics for chained functions are fit in dependency order on the data each function sees. The on-demand transformations guide covers chains whose intermediate output is dropped from the feature group. No migration entry is included since the changes are backwards compatible. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Restructure the performance tuning section so it reads in order: what the n_processes argument is, how parallelism maps to the DAG, when it pays off, online serving specifics, implementation notes. The previous version stated the sequential default three times across the first three paragraphs and placed the practical guidance after the implementation internals. Content changes: a call-shape distinction in the guidance (batch and offline calls benefit from worker processes, single feature vectors rarely do because the per-call dispatch cost usually exceeds the work), and a note that pre-spawning the pool removes the startup cost but not the per-call dispatch cost. Both reflect the measured behavior of the online batch chaining benchmark in the loadtest repository. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@hopsworks.ai> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Rework the chaining documentation for reading order on all three pages. The hub page now flows what chaining is, example, uniform offline and online behavior, statistics over chains with a link to the model-dependent page, cross-type chaining, and invalid configurations last instead of interleaved. The model-dependent page gives the statistics-over-chains behavior its own subsection instead of a single dangling sentence after the example, and states that statistics are fit on the train split, each transformation executes once, and the fitted values are persisted for serving. The on-demand page leads with the example like the other pages, and the example now demonstrates the dropped-column claims it previously only stated: both the raw input and the intermediate are dropped, leaving one stored output. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@hopsworks.ai> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

manu-sj marked this pull request as draft May 21, 2026 13:06

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 5ed6dcb to b770050 Compare May 28, 2026 07:50

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 6eacba8 to cbf2ed3 Compare June 4, 2026 11:25

manu-sj marked this pull request as ready for review June 8, 2026 08:59

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from ff87ced to 4db4444 Compare June 10, 2026 08:19

manu-sj requested a review from Copilot June 10, 2026 08:20

Copilot started reviewing on behalf of manu-sj June 10, 2026 08:21 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

manu-sj force-pushed the FSTORE-1938 branch from 4db4444 to efcea35 Compare June 10, 2026 08:35

manu-sj force-pushed the FSTORE-1938 branch from efcea35 to 83a8a2a Compare June 10, 2026 13:02

manu-sj and others added 3 commits June 12, 2026 08:05

Improving docs

6c984a9

bubriks approved these changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580
manu-sj wants to merge 4 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-1938

manu-sj commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

manu-sj Jun 10, 2026

Uh oh!

manu-sj Jun 10, 2026

Uh oh!

manu-sj Jun 10, 2026

Uh oh!

manu-sj Jun 10, 2026

Uh oh!

manu-sj Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Chaining Model-Dependent Transformations

		A model-dependent transformation can consume another MDT's output as its input.


		## Chaining On-Demand Transformations

		On-demand transformations attached to the same feature group can be chained: one ODT's output column can serve as another ODT's input.

Conversation

manu-sj commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

manu-sj Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

manu-sj Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

manu-sj Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

manu-sj Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

manu-sj Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants