Task Summary
The JVM launches each Python worker in PythonWorkflowWorker by building a long list of positional command-line arguments, and texera_run_python_worker.py unpacks them positionally ((…, a, b, c, …) = sys.argv, then forwards them into StorageConfig.initialize(...)). That list has grown to around 20 arguments.
Because the two sides agree only by index, adding, removing, or reordering one argument means editing both in lockstep. If they ever drift, arguments are silently misassigned (a value lands in the wrong field) instead of failing loudly.
Surfaced in review of #5280, which added the 20th positional argument (the large-binary base URI). It follows the existing convention, so it is fine as-is — this is a maintainability/robustness follow-up, not a bug in that PR.
Root cause: worker startup config is passed by argv position, with no names.
Before: JVM Seq(a1, a2, …, a20) ──by position──▶ py (a1, …, a20) = sys.argv
add/reorder one → silent misalignment if the two sides drift
After: JVM {"endpoint": …, "largeBinaryBaseUri": …} ──by name──▶ py cfg["largeBinaryBaseUri"]
add a field → no positional coupling; a missing/renamed key fails clearly
Proposed: pass startup config by name — e.g. a single JSON object, or argparse --key value flags — so the two sides agree by key, and a missing field raises a clear error.
Affected:
amber/src/main/scala/org/apache/texera/amber/engine/architecture/pythonworker/PythonWorkflowWorker.scala (builds the arg list)
amber/src/main/python/texera_run_python_worker.py (unpacks sys.argv)
amber/src/main/python/core/storage/storage_config.py (StorageConfig.initialize positional params)
Task Type
Task Summary
The JVM launches each Python worker in
PythonWorkflowWorkerby building a long list of positional command-line arguments, andtexera_run_python_worker.pyunpacks them positionally ((…, a, b, c, …) = sys.argv, then forwards them intoStorageConfig.initialize(...)). That list has grown to around 20 arguments.Because the two sides agree only by index, adding, removing, or reordering one argument means editing both in lockstep. If they ever drift, arguments are silently misassigned (a value lands in the wrong field) instead of failing loudly.
Surfaced in review of #5280, which added the 20th positional argument (the large-binary base URI). It follows the existing convention, so it is fine as-is — this is a maintainability/robustness follow-up, not a bug in that PR.
Root cause: worker startup config is passed by argv position, with no names.
Proposed: pass startup config by name — e.g. a single JSON object, or
argparse--key valueflags — so the two sides agree by key, and a missing field raises a clear error.Affected:
amber/src/main/scala/org/apache/texera/amber/engine/architecture/pythonworker/PythonWorkflowWorker.scala(builds the arg list)amber/src/main/python/texera_run_python_worker.py(unpackssys.argv)amber/src/main/python/core/storage/storage_config.py(StorageConfig.initializepositional params)Task Type