Skip to content

[OpenSTEF 4.0] Potential Dependency Conflicts with Managed Container Runtimes (Databricks, Microsoft Fabric) #854

@MaxPolakStedin

Description

@MaxPolakStedin

As
a data scientist or ML engineer running OpenSTEF in managed container environments (Databricks, Microsoft Fabric)
I want to
have clear guarantees, guidance, and/or mitigations around dependency compatibility between OpenSTEF and these runtimes
So that I
can deploy and use OpenSTEF reliably without dependency conflicts, runtime failures, or manual environment workarounds

🌍 Background

Important

Changes should be done by forking OpenSTEF 4.0.0 tracking branch (release/v4.0.0) and then make a PR back onto release/v4.0.0 branch.
Please read the contributing guide before you start.
Managed platforms such as Databricks and Microsoft Fabric ship with fixed or tightly controlled Python environments. Core libraries like pandas, numpy, and pyarrow are often pinned to platform-approved versions and cannot be freely downgraded or upgraded.

OpenSTEF, on the other hand, specifies tighter dependency constraints to ensure correctness, reproducibility, and model stability. While currently compatible, this setup is fragile and may break due to:

  • OpenSTEF dependency updates
  • Platform runtime upgrades
  • Transitive dependency changes

This creates adoption risk for enterprise users relying on managed data platforms.

❗Priority (What if we don't do this?/Are there any deadlines? etc.)
Medium priority – preventative / risk mitigation
If this is not addressed:

  • Future releases may silently break on Databricks or Fabric
  • Users may face environment conflicts with limited recovery options
  • Support and maintenance costs may increase

There is no immediate deadline, but addressing this would help to board more organizations

Definition of Done:

  • Compatibility and risk around managed container runtimes is explicitly addressed
  • Users have clear expectations and guidance when running OpenSTEF on Databricks or Fabric
  • No breaking changes are introduced for existing non-managed deployments

✅ Acceptance criteria

  • Identify OpenSTEF core dependencies that commonly conflict with managed runtimes (e.g. pandas, numpy)
  • Evaluate compatibility with at least one current Databricks runtime
  • Define a mitigation strategy (e.g. relaxed version constraints, optional extras, or documented workarounds)
  • Document supported, partially supported, and unsupported scenarios
  • Ensure existing OpenSTEF use cases remain unaffected

📄 Documentation criteria:
Add or update documentation describing:

  • Running OpenSTEF on Databricks
  • Running OpenSTEF on Microsoft Fabric
  • Known limitations and constraints

Update relevant confluence pages with dependency and platform guidance

🧪 Test criteria:

Unit tests written and passed

⌛ Dependencies:

N/A

🚀 Releasing:

N/A

Other information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions