AIR CLI Integration: air run end to end command #5710
Open
riddhibhagwat-db wants to merge 5 commits into
Open
Conversation
Contributor
Approval status: pending
|
air run end to end command
Collaborator
Integration test reportCommit: a5d851b
23 interesting tests: 13 SKIP, 8 RECOVERED, 1 FAIL, 1 flaky
Top 21 slowest tests (at least 2 minutes):
|
air run end to end command air run end to end command
373988d to
226d41a
Compare
Flatten the validated runConfig schema into the derived values the launch path consumes (timeout seconds, retry default, docker image URL, requirements file vs inline dependencies, runtime version), replacing the Python CLI's _convert_to_run_config step. handle_run reads runConfig directly, so these are accessors rather than a separate internal config type. Co-authored-by: Isaac
Wire `air run`'s RunE to load and structurally validate the YAML config, and implement --dry-run (validate without submitting). The non-dry-run submission path returns "not implemented" until the submit phase lands; --override is rejected with a clear error since the override pipeline is not ported yet. Drop `run` from the not-implemented stub test now that it does real work. Co-authored-by: Isaac
Resolve the workspace context air run needs before uploading and submitting: the current user, the per-user workspace home (with env override), a unique cli_launch directory for a run's artifacts, the MLflow experiment path, and ensuring a custom experiment_directory exists (created if missing, matching the CLI's convention for its other artifact directories). Co-authored-by: Isaac
Assemble and upload the launch artifacts for a run into its cli_launch directory: the merged config (training_config.yaml, 1 MB cap), the inline command as command.sh, requirements.yaml (from a file or synthesized from inline dependencies), and hyperparameters.yaml. buildArtifacts is pure; the upload writes through a narrow fileWriter (a workspace filer in production). A TODO(DABs) marks the client-side upload path as a future candidate for reuse of DABs' file-staging (libs/sync / bundle deploy). Co-authored-by: Isaac
Wire `air run` end to end: ensure the experiment directory, upload launch artifacts, build the native ai_runtime_task payload, and submit it via a direct POST to /api/2.2/jobs/runs/submit. The ai_runtime_task routes straight to the training service with no genai-mapi forwarding — the MAPI path is deprecated. The proto is lean: env vars and secrets are staged as co-located env_vars.json / secret_env_vars.json workspace files rather than inline, and requirements / hyperparameters are derived server-side from the command directory. The non-dry-run path resolves the workspace context, uploads, submits, and prints the run id + dashboard URL. usage_policy_name, code_source snapshots, and --watch are rejected with clear errors until their phases land. environment.docker_image is accepted by the schema as scaffolding but not conveyed (the native path has no docker field). Co-authored-by: Isaac
c6edcc2 to
a5d851b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Implements the
air runhappy path on top of the config schema (#5657), submitting a one-time training run through the Jobs API. Five commits, one per phase:--dry-runvalidates offline (no workspace/auth) and returns;--override/--watchare rejected for now with clear errors (ported in future PR).experiment_directoryexists.env_vars.json/secret_env_vars.json, and hyperparameters.yaml into the launch dir via a workspace filer.ai_runtime_taskpayload andPOST /api/2.2/jobs/runs/submitdirectly, then print the run id + dashboard URL (or a JSON envelope).Submission uses the native
ai_runtime_tasktask (BYOT task type) and it talks only to the Jobs API (which internally routes to training service endpoint) and has no genai-mapi forwarding (the MAPI path is deprecated). It isn't modeled by the typed SDK in go, so the payload is a custom struct posted to the raw endpoint. The proto is lean: env vars and secrets ship as co-locatedenv_vars.json/secret_env_vars.jsonfiles rather than inline, andrequirements.yaml/hyperparameters.yamlare derived server-side from the command directory.Deferred, with explicit "not yet supported" errors (no silent drops):
code_sourcesnapshot packaging,--watchlog streaming, andusage_policy_name.environment.docker_imageis accepted by the schema as scaffolding but not conveyed in the payload (the native path has no docker field).node_pool_id/pool_name/priorityremain dropped (new AIR CLI does not support pool placement).Why
air runis the core of the migration for AIR CLI. Splitting it into per-phase commits keeps each reviewable in isolation, and stacking on the schema PR keeps that PR focused. Regarding some specific decisions:--dry-runis decoupled from auth. It validates the config locally and returns before any workspace call, so config validation works fully offline (matching the Python CLI). Only actual submission requires an authenticated workspace client.Tests
go test ./experimental/air/...,go test ./acceptance -run TestAccept/experimental/air, and./task lint-qall pass.Manual verification tests (all pass):
--watchand--overridereturn an informative error message (since they are not supported yet, but are valid flags)How to test locally for manual verification:
Checkout & build:
git fetch origin git checkout air-integration-m2-3 # this PR (stacked on air-integration-m2-2) ./task buildSample configs:
Automated tests
Dry run:
./cli experimental air run -f /tmp/min.yaml --dry-run # note that this command will, in the final version, be databricks experimental air run ./cli experimental air run -f /tmp/full.yaml --dry-run ./cli experimental air run -f /tmp/min.yaml --dry-run -o jsonActual run submission:
Unsupported flags (asserting that error is thrown):
Validation errors for field-specific message (exit 1, offline):