Compute Worker - Fix submission files duplication#2285
Compute Worker - Fix submission files duplication#2285ihsaan-ullah wants to merge 10 commits intodevelopfrom
Conversation
|
…e submission during both ingestion and scoring
3fa7c4d to
e18c68f
Compare
@ihsaan-ullah It is needed to change the scoring program / ingestion program to have this working? When trying locally I get this failure (haven't put back the commented function):
|
|
I think the failure will go away if you comment the code # Check if scoring program failed
try:
program_results, _, _ = task_results
except:
program_results, _ = task_results
# Gather returns either normal values or exception instances when return_exceptions=True
had_async_exc = isinstance(
program_results, BaseException
) and not isinstance(program_results, asyncio.CancelledError)
program_rc = getattr(self, "program_exit_code", None)
failed_rc = (program_rc is None) or (program_rc != 0)
if had_async_exc or failed_rc:
self._update_status(
SubmissionStatus.FAILED,
extra_information=f"program_rc={program_rc}, async={task_results}",
)
# Raise so upstream marks failed immediately
raise SubmissionException("Child task failed or non-zero return code")The bundle and submission I have provided should work. There is nothing special in the ingestion/scoring. They just print the content of different directories. The important thing to notice is that the input directory of scoring should have the predictions from ingestion and the submission files |
|
I get the same error when it is commented out. Compute worker logs: compute_worker | 2026-03-26 13:45:23.289 | INFO | compute_worker:_get_bundle:715 - Getting bundle http://docker.for.mac.localhost:9000/private/dataset/2026-03-26-1774525773/3efef493ce29/scoring_program.zip?AWSAccessKeyId=testkey&Signature=Djg5C%2FACl%2F%2BSE%2BtTfuiHEfqTsvc%3D&Expires=1774964722 to unpack @ scoring_program
compute_worker | 2026-03-26 13:45:23.303 | INFO | compute_worker:rewrite_bundle_url_if_needed:278 - Rewriting bundle URL for worker: http://docker.for.mac.localhost:9000/private/dataset/2026-03-26-1774525773/3efef493ce29/scoring_program.zip?AWSAccessKeyId=testkey&Signature=Djg5C%2FACl%2F%2BSE%2BtTfuiHEfqTsvc%3D&Expires=1774964722 -> http://minio:9000/private/dataset/2026-03-26-1774525773/3efef493ce29/scoring_program.zip?AWSAccessKeyId=testkey&Signature=Djg5C%2FACl%2F%2BSE%2BtTfuiHEfqTsvc%3D&Expires=1774964722
compute_worker | 2026-03-26 13:45:23.334 | INFO | compute_worker:_get_bundle:715 - Getting bundle http://docker.for.mac.localhost:9000/private/dataset/2026-03-26-1774532706/9b6d0e636deb/dev-submission.zip?AWSAccessKeyId=testkey&Signature=nWAJSIXuyqc7o0n2o6V0Pakv5O8%3D&Expires=1774964722 to unpack @ submission
compute_worker | 2026-03-26 13:45:23.337 | INFO | compute_worker:rewrite_bundle_url_if_needed:278 - Rewriting bundle URL for worker: http://docker.for.mac.localhost:9000/private/dataset/2026-03-26-1774532706/9b6d0e636deb/dev-submission.zip?AWSAccessKeyId=testkey&Signature=nWAJSIXuyqc7o0n2o6V0Pakv5O8%3D&Expires=1774964722 -> http://minio:9000/private/dataset/2026-03-26-1774532706/9b6d0e636deb/dev-submission.zip?AWSAccessKeyId=testkey&Signature=nWAJSIXuyqc7o0n2o6V0Pakv5O8%3D&Expires=1774964722
compute_worker | 2026-03-26 13:45:23.367 | INFO | compute_worker:_get_bundle:715 - Getting bundle http://docker.for.mac.localhost:9000/private/dataset/2026-03-26-1774525774/101b225ebcef/reference_data.zip?AWSAccessKeyId=testkey&Signature=W1iCOUyCG7Q09Ae2OP6svrKTkEs%3D&Expires=1774964722 to unpack @ input/ref
compute_worker | 2026-03-26 13:45:23.399 | INFO | compute_worker:_get_bundle:715 - Getting bundle http://docker.for.mac.localhost:9000/private/prediction_result/2026-03-26-1774532716/0d8e3f2c618b/prediction_result.zip?AWSAccessKeyId=testkey&Signature=7fMro2rHx3QOveS8tgcYsIwuqSM%3D&Expires=1774964722 to unpack @ input/res
compute_worker | 2026-03-26 13:45:23.404 | INFO | compute_worker:rewrite_bundle_url_if_needed:278 - Rewriting bundle URL for worker: http://docker.for.mac.localhost:9000/private/prediction_result/2026-03-26-1774532716/0d8e3f2c618b/prediction_result.zip?AWSAccessKeyId=testkey&Signature=7fMro2rHx3QOveS8tgcYsIwuqSM%3D&Expires=1774964722 -> http://minio:9000/private/prediction_result/2026-03-26-1774532716/0d8e3f2c618b/prediction_result.zip?AWSAccessKeyId=testkey&Signature=7fMro2rHx3QOveS8tgcYsIwuqSM%3D&Expires=1774964722
compute_worker | 2026-03-26 13:45:23.413 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:45:43.547 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:46:03.722 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:46:23.911 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:46:43.987 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:47:04.051 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:47:24.147 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:47:44.192 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:48:04.299 | WARNING | compute_worker:_get_bundle:754 - Failed. Retrying in 20 seconds...
compute_worker | 2026-03-26 13:48:24.390 | INFO | compute_worker:_update_submission:602 - Updating submission @ http://django:8000/api/submissions/93/ with data = {'status': 'Failed', 'status_details': 'Submission failed: Bad or empty zip file. See logs for more details.', 'secret': 'd508ee2d-3495-45fd-9565-92a95f45a797'}
compute_worker | 2026-03-26 13:48:25.903 | INFO | compute_worker:_update_submission:606 - Submission updated successfully!
compute_worker | 2026-03-26 13:48:25.906 | WARNING | compute_worker:clean_up:1529 - CODALAB_IGNORE_CLEANUP_STEP mode enabled, ignoring clean up of: /codabench/uPK-1_sID-93__br4xlja2
compute_worker | 2026-03-26 13:48:26.060 | ERROR | celery.app.trace:_log_error:285 - Task compute_worker_run[a835a084-8264-4c0d-8da0-d65a37b06bc7] raised unexpected: SubmissionException('Bad or empty zip file')
compute_worker | Traceback (most recent call last):
compute_worker |
compute_worker | File "/app/compute_worker.py", line 746, in _get_bundle
compute_worker | with ZipFile(bundle_file, "r") as z:
compute_worker | │ └ '/codabench/uPK-1_sID-93__br4xlja2/bundles/tmp4lfxu7ek'
compute_worker | └ <class 'zipfile.ZipFile'>
compute_worker |
compute_worker | File "/root/.local/share/uv/python/cpython-3.13.11-linux-aarch64-gnu/lib/python3.13/zipfile/__init__.py", line 1401, in __init__
compute_worker | self._RealGetContents()
compute_worker | │ └ <function ZipFile._RealGetContents at 0xffff7f042980>
compute_worker | └ <zipfile.ZipFile [closed]>
compute_worker | File "/root/.local/share/uv/python/cpython-3.13.11-linux-aarch64-gnu/lib/python3.13/zipfile/__init__.py", line 1468, in _RealGetContents
compute_worker | raise BadZipFile("File is not a zip file")
compute_worker | └ <class 'zipfile.BadZipFile'>
compute_worker |
compute_worker | zipfile.BadZipFile: File is not a zip file
compute_worker |
compute_worker |
compute_worker | During handling of the above exception, another exception occurred:
compute_worker |
compute_worker |
compute_worker | Traceback (most recent call last):
compute_worker |
compute_worker | > File "/.venv/lib/python3.13/site-packages/celery/app/trace.py", line 479, in trace_task
compute_worker | R = retval = fun(*args, **kwargs)
compute_worker | File "/.venv/lib/python3.13/site-packages/celery/app/trace.py", line 779, in __protected_call__
compute_worker | return self.run(*args, **kwargs)
compute_worker |
compute_worker | File "/app/compute_worker.py", line 294, in run_wrapper
compute_worker | run.prepare()
compute_worker |
compute_worker | File "/app/compute_worker.py", line 1260, in prepare
compute_worker | zip_file = self._get_bundle(url, path, cache=cache_this_bundle)
compute_worker |
compute_worker | File "/app/compute_worker.py", line 752, in _get_bundle
compute_worker | raise SubmissionException("Bad or empty zip file")
compute_worker |
compute_worker | compute_worker.SubmissionException: Bad or empty zip file |
|
Seems like a different issue. Maybe rerun the submission. If this keeps happening, then i will try to reproduce this on my side |
|
I am pretty sure the problem is related to the PR.
compute_worker | 2026-03-26 13:45:23.399 | INFO | compute_worker:_get_bundle:715 - Getting bundle http://docker.for.mac.localhost:9000/private/prediction_result/2026-03-26-1774532716/0d8e3f2c618b/prediction_result.zip?AWSAccessKeyId=testkey&Signature=7fMro2rHx3QOveS8tgcYsIwuqSM%3D&Expires=1774964722 to unpack @ input/resTo reproduce it, I am using: |
|
I will check this. I am also going to separate the main change and other cleaning changes in separate simple PRs |
|
I think it will be clearer this way. Can we delete this branch? |

Description
This PR has the following updates:
.env_sampleWORKER_BUNDLE_URL_REWRITEsrc/apps/competitions/tasks.pycomputer_worker.pySubmissionStatusfor submission statuses and related updatesProgramKindfor program kind and related updatesSettingsthat gathers all env variables. This class is now used all over the code for accessing settings. NOTE: This can be further improved to convert strings to booleans so that we don't use== "true"or `== "false".programtoscoring_programstartfunction to clarify which functions will run during ingestion and during scoring, and separated submission and scoring program clearlyapp/ingested_program_run_program_directoryfunction to simplify code and logs_run_program_directoryto a new function_create_containerfor reusability and claritywatch_detailed_resultsfunction to avoid looping forevercontainer_id(undeclared) bycontainer.get("Id") in_run_container_engine_cmd`startfunction that was failing submission even if everything went well. Need to discuss this and update the codeIssues this PR resolves
A checklist for hand testing
Checklist