Fix CloudWatch remote logging for ephemeral lifecycle executor by jason810496 · Pull Request #68779 · apache/airflow

jason810496 · 2026-06-20T05:26:46Z

Why

While trying to setup cloudwatch remote logging in #68709 in order to persist the logs in real time. I encounter the same errors as above listed issues.

The root cause I found is same as #66475 (comment) pointed out. The configure_logging -> dictConfig -> _clearExistingHandlers call chain shutdown the watchtower handler.

How

I went to the another than #66633, instead of configuring the processors after the dictConfig call. We could make the cloudwatch remote logger itself self-healing by creating the fresh instance if previous instance was shutdown by the dictConfig call but also ensure the .close semantic by guarding with the _close state.

What

Fix the lifecycle issue of cloudwatch remote logging and verify with breeze k8s system test with provider only changes without touching the Task-SDK changes.

jason810496 · 2026-06-20T06:10:49Z

cc @sarvesh371, @seanghaeli Could you verify this patch for your setup when you have a moment? Since it likely #66633 won't catch the 3.3 release (we're close to dev freeze for 3.3), so we might release this provider-only patch first. Thanks.

The streaming CloudWatch handler is rebuilt whenever it reports shutting_down, so logs survive configure_logging() closing it. But shutting_down alone cannot tell a mid-task close apart from genuine teardown, so a record arriving after teardown would spin up an orphan handler and its background queue thread that nobody flushes or closes. The supervisor lifecycle makes the two cases distinguishable in time: 1. configure_logging() builds the handler via remote.processors (processors does `_ = self.handler`), registering it in logging._handlerList. 2. The same call then runs dictConfig, whose non-incremental reset closes that handler -> watchtower sets shutting_down=True. 3. Child log records stream through proc -> self.handler, which sees shutting_down and rebuilds. This is the case we must keep working. 4. At the last possible moment _upload_logs() -> upload() -> close() flushes; nothing logs after this. shutting_down is watchtower's flag set by dictConfig (step 2); the new _closed flag is ours, set only by close() (step 4). dictConfig never touches _closed, so the rebuild in step 3 still fires, while a late record after step 4 keeps the closed handler instead of orphaning a new one. close() on the outer CloudwatchTaskHandler now closes the handler the IO is currently using rather than the reference captured in set_context(), which dictConfig may have closed and the IO since rebuilt.

ferruzzi

I haven't played with CloudWatch much, but I left some style nitpicks. Also, Claude loves to over-comment code, you may want to clean some of those up/

vincbeck · 2026-06-23T14:12:25Z

        # The handler MUST be initted here, before the processor is actually used to log anything.
        # Otherwise, logging that occurs during the creation of the handler can create infinite loops.
-        _handler = self.handler
+        _ = self.handler


Do we need to keep this line?

This forces the handler property to be created and cached when called, often called vivification. Yes needed as per the comment

A comment was even there to explain, my bad!

Fix Cloudwatch remote logging for ephemeral lifecycle executor

ee56865

jason810496 requested a review from o-nikolas as a code owner June 20, 2026 05:26

boring-cyborg Bot added area:logging area:providers provider:amazon AWS/Amazon - related issues labels Jun 20, 2026

jason810496 self-assigned this Jun 20, 2026

jason810496 mentioned this pull request Jun 20, 2026

Fix: CloudWatch/Watchtower logs dropped in Task SDK due to handler lifetime bugs #66633

Open

jason810496 requested review from ashb and ferruzzi June 20, 2026 06:07

jason810496 force-pushed the fix/cloudwatch/remote-logging-k8s-executor branch from a92793f to f28925a Compare June 20, 2026 06:18

ferruzzi requested a review from vincbeck June 22, 2026 19:51

ferruzzi reviewed Jun 22, 2026

View reviewed changes

Promote CloudWatch handler lifecycle notes to docstrings

0331916

vincbeck reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CloudWatch remote logging for ephemeral lifecycle executor#68779

Fix CloudWatch remote logging for ephemeral lifecycle executor#68779
jason810496 wants to merge 3 commits into
apache:mainfrom
jason810496:fix/cloudwatch/remote-logging-k8s-executor

jason810496 commented Jun 20, 2026 •

edited

Loading

Uh oh!

jason810496 commented Jun 20, 2026 •

edited

Loading

Uh oh!

ferruzzi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincbeck Jun 23, 2026

Uh oh!

ashb Jun 23, 2026

Uh oh!

vincbeck Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jason810496 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

What

Uh oh!

jason810496 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferruzzi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincbeck Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

ashb Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

vincbeck Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jason810496 commented Jun 20, 2026 •

edited

Loading

jason810496 commented Jun 20, 2026 •

edited

Loading