Skip to content

Fix run_duration calculation in airflow dags test by decoupling start_date#68837

Open
sejal-gupta-ksolves wants to merge 1 commit into
apache:mainfrom
sejal-gupta-ksolves:fix/task-sdk-dag-test-start-date
Open

Fix run_duration calculation in airflow dags test by decoupling start_date#68837
sejal-gupta-ksolves wants to merge 1 commit into
apache:mainfrom
sejal-gupta-ksolves:fix/task-sdk-dag-test-start-date

Conversation

@sejal-gupta-ksolves

Copy link
Copy Markdown
Contributor

closes: #66127

Description

Problem:
When running local execution validation using DAG.test(logical_date=...), the test harness erroneously anchors the DagRun.start_date to the historical scheduled execution boundary (logical_date or run_after). Because start_date is overridden with a past date instead of the actual wall-clock execution time, downstream calculations for run_duration (which relies on subtracting start_date from end_date) yield completely inaccurate and inflated durations. This breaks isolated performance tuning metrics and runtime analytics during local development testing.

Solution:
This fix explicitly decouples the physical execution clock from the logical scheduling parameters within the DAG.test() orchestration pathway inside the task-sdk.

  1. Decouple Initialization Dates: In task-sdk/src/airflow/sdk/definitions/dag.py, a dedicated run_start_date is introduced using timezone.utcnow(). This captures the exact time the test actually begins executing, regardless of how far in the past the targeted logical_date is set.
  2. Assign Correct Execution Parameters: The get_or_create_dagrun caller is updated to cleanly map the parameters:
    • logical_date retains the historical scheduling context requested by the user.
    • start_date uses the newly isolated run_start_date execution timestamp.
  3. Add Automated Regression Guard: A fully mock-isolated unit test (test_dag_test_runtime_start_date_decoupled_from_logical_date) is implemented in task-sdk/tests/task_sdk/definitions/test_dag.py. It freezes the execution environment using time_machine and asserts that start_date anchors to the frozen wall-clock matrix while the logical_date accurately tracks the historical context parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

airflow dags test sets DagRun.start_date to logical_date causing incorrect run_duration

1 participant