Add Sandbox provider with SandboxExecutor, SandboxOperator and @task.sandbox#68847
Open
zozo123 wants to merge 2 commits into
Open
Add Sandbox provider with SandboxExecutor, SandboxOperator and @task.sandbox#68847zozo123 wants to merge 2 commits into
zozo123 wants to merge 2 commits into
Conversation
Add the apache-airflow-providers-sandbox provider whose SandboxExecutor runs each task instance in an ephemeral cloud sandbox behind a pluggable backend layer (local subprocess reference backend plus Daytona, E2B, Modal, islo). It implements the public BaseExecutor interface only, with no core changes (AIP-51), and follows the Airflow 3 Task SDK / EdgeExecutor topology: the in-sandbox supervisor heartbeats and ships logs to the api-server while the executor reconciles terminal exit state from a polling watcher that distinguishes a transient UNKNOWN from a confirmed GONE, so a single failed poll never kills a healthy task. remote_logging is enforced at start(); get_task_log is a documented best-effort fallback. Adoption uses deterministic, labelled sandbox names where the backend supports reattach. Includes provider.yaml (executors registration), get_provider_info, unit tests for the backend contract and the watcher logic, docs, and the boring-cyborg label mapping. Relates to: apache#68845 Generated-by: Claude (Anthropic) Signed-off-by: zozo123 <yossi.eliaz@incredibuild.com>
Add a SandboxOperator and a @task.sandbox TaskFlow decorator (mirroring @task.bash) that run a command in an ephemeral sandbox from a normal task — the executor-free way to adopt sandboxes, verified end-to-end on Airflow 3. Register the decorator via the provider's task-decorators. Switch the e2b backend to the Sandbox.create() classmethod and rewrite the islo backend against the real islo SDK (sandboxes resource: create_sandbox/ exec_in_sandbox/get_exec_result/delete_sandbox, name-keyed, pause/resume). Add operator and islo unit tests. Generated-by: Claude (Anthropic) Signed-off-by: zozo123 <yossi.eliaz@incredibuild.com>
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follows up #68845.
What
Adds a new
Sandboxprovider (apache-airflow-providers-sandbox) for running Airflow tasks in ephemeral cloud sandboxes behind a pluggable backend layer (a local subprocess reference backend + Daytona, E2B, Modal, islo). Three entry points, smallest-blast-radius first:SandboxOperatorand@task.sandbox(mirrors@task.bash) — run one command in a sandbox from a normal task; no special executor required.SandboxExecutor— route every task instance through a sandbox (implements the publicBaseExecutorinterface only; no core changes, per AIP-51).Why
There is no executor/operator today targeting ephemeral cloud sandboxes — strong per-task isolation and a serverless execution model without standing up Celery/K8s. The motivating use case is running untrusted / LLM-generated code (agents): a fresh disposable environment per task, with model credentials injected into the sandbox only.
Verification
SandboxOperatorand@task.sandboxare verified end-to-end on Airflow 3.1 viaairflow dags teston the local backend, including credential injection.daytona/e2b/modal/isloSDKs byscripts/verify_sdk_conformance.py(reference repo) — 49/49.Reference implementation, design notes and the runnable demo: https://github.com/zozo123/airflow-provider-sandbox
Process / asks
I'm aware new community providers go through
ACCEPTING_PROVIDERS.rst(working impl + tests/docs + system-test plan + ≥2 stewards + a committer sponsor, via a dev@[DISCUSS]lazy-consensus). I'm opening this to anchor the discussion against working code and am happy to take it to the devlist and/or ship it as a third-party package if that's preferred. Seeking steward/sponsor interest. Some breeze-generated artifacts (pyproject/get_provider_info/dependency sync) will need a regen pass by a committer.BaseExecutorinterface only, no core/API/scheduler changes (AIP-51).Was generative AI tooling used to co-author this PR?