Skip to content

Pass Python worker startup arguments by name instead of by position #5547

@kunwp1

Description

@kunwp1

Task Summary

The JVM launches each Python worker in PythonWorkflowWorker by building a long list of positional command-line arguments, and texera_run_python_worker.py unpacks them positionally ((…, a, b, c, …) = sys.argv, then forwards them into StorageConfig.initialize(...)). That list has grown to around 20 arguments.

Because the two sides agree only by index, adding, removing, or reordering one argument means editing both in lockstep. If they ever drift, arguments are silently misassigned (a value lands in the wrong field) instead of failing loudly.

Surfaced in review of #5280, which added the 20th positional argument (the large-binary base URI). It follows the existing convention, so it is fine as-is — this is a maintainability/robustness follow-up, not a bug in that PR.

Root cause: worker startup config is passed by argv position, with no names.

Before:  JVM  Seq(a1, a2, …, a20)  ──by position──▶  py  (a1, …, a20) = sys.argv
         add/reorder one → silent misalignment if the two sides drift
After:   JVM  {"endpoint": …, "largeBinaryBaseUri": …}  ──by name──▶  py  cfg["largeBinaryBaseUri"]
         add a field → no positional coupling; a missing/renamed key fails clearly

Proposed: pass startup config by name — e.g. a single JSON object, or argparse --key value flags — so the two sides agree by key, and a missing field raises a clear error.

Affected:

  • amber/src/main/scala/org/apache/texera/amber/engine/architecture/pythonworker/PythonWorkflowWorker.scala (builds the arg list)
  • amber/src/main/python/texera_run_python_worker.py (unpacks sys.argv)
  • amber/src/main/python/core/storage/storage_config.py (StorageConfig.initialize positional params)

Task Type

  • Refactor / Cleanup
  • DevOps / Deployment / CI
  • Testing / QA
  • Documentation
  • Performance
  • Other

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions