You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As
a data scientist or ML engineer running OpenSTEF in managed container environments (Databricks, Microsoft Fabric) I want to
have clear guarantees, guidance, and/or mitigations around dependency compatibility between OpenSTEF and these runtimes So that I
can deploy and use OpenSTEF reliably without dependency conflicts, runtime failures, or manual environment workarounds
🌍 Background
Important
Changes should be done by forking OpenSTEF 4.0.0 tracking branch (release/v4.0.0) and then make a PR back onto release/v4.0.0 branch.
Please read the contributing guide before you start.
Managed platforms such as Databricks and Microsoft Fabric ship with fixed or tightly controlled Python environments. Core libraries like pandas, numpy, and pyarrow are often pinned to platform-approved versions and cannot be freely downgraded or upgraded.
OpenSTEF, on the other hand, specifies tighter dependency constraints to ensure correctness, reproducibility, and model stability. While currently compatible, this setup is fragile and may break due to:
OpenSTEF dependency updates
Platform runtime upgrades
Transitive dependency changes
This creates adoption risk for enterprise users relying on managed data platforms.
❗Priority (What if we don't do this?/Are there any deadlines? etc.)
Medium priority – preventative / risk mitigation
If this is not addressed:
Future releases may silently break on Databricks or Fabric
Users may face environment conflicts with limited recovery options
Support and maintenance costs may increase
There is no immediate deadline, but addressing this would help to board more organizations
Definition of Done:
Compatibility and risk around managed container runtimes is explicitly addressed
Users have clear expectations and guidance when running OpenSTEF on Databricks or Fabric
No breaking changes are introduced for existing non-managed deployments
✅ Acceptance criteria
Identify OpenSTEF core dependencies that commonly conflict with managed runtimes (e.g. pandas, numpy)
Evaluate compatibility with at least one current Databricks runtime
Define a mitigation strategy (e.g. relaxed version constraints, optional extras, or documented workarounds)
Document supported, partially supported, and unsupported scenarios
Ensure existing OpenSTEF use cases remain unaffected
📄 Documentation criteria:
Add or update documentation describing:
Running OpenSTEF on Databricks
Running OpenSTEF on Microsoft Fabric
Known limitations and constraints
Update relevant confluence pages with dependency and platform guidance
As
a data scientist or ML engineer running OpenSTEF in managed container environments (Databricks, Microsoft Fabric)
I want to
have clear guarantees, guidance, and/or mitigations around dependency compatibility between OpenSTEF and these runtimes
So that I
can deploy and use OpenSTEF reliably without dependency conflicts, runtime failures, or manual environment workarounds
🌍 Background
Important
Changes should be done by forking OpenSTEF 4.0.0 tracking branch (
release/v4.0.0) and then make a PR back ontorelease/v4.0.0branch.Please read the contributing guide before you start.
Managed platforms such as Databricks and Microsoft Fabric ship with fixed or tightly controlled Python environments. Core libraries like pandas, numpy, and pyarrow are often pinned to platform-approved versions and cannot be freely downgraded or upgraded.
OpenSTEF, on the other hand, specifies tighter dependency constraints to ensure correctness, reproducibility, and model stability. While currently compatible, this setup is fragile and may break due to:
This creates adoption risk for enterprise users relying on managed data platforms.
❗Priority (What if we don't do this?/Are there any deadlines? etc.)
Medium priority – preventative / risk mitigation
If this is not addressed:
There is no immediate deadline, but addressing this would help to board more organizations
Definition of Done:
✅ Acceptance criteria
📄 Documentation criteria:
Add or update documentation describing:
Update relevant confluence pages with dependency and platform guidance
🧪 Test criteria:
Unit tests written and passed
⌛ Dependencies:
N/A
🚀 Releasing:
N/A
Other information: