feat(lambda-runtime): log non-2xx Lambda Runtime API responses with status and body by darklight3it · Pull Request #1105 · aws/aws-lambda-rust-runtime

darklight3it · 2026-02-23T23:23:37Z

Summary

When the Lambda Runtime API returns a non-2xx response (e.g. HTTP 410 with {"errorMessage":"Invoke timeout","errorType":"InvokeTimeout"}), the previous implementation silently discarded the response body. Customers had no visibility into why their handler response was rejected.

Solution

Read and log the response body for non-2xx responses from the Lambda Runtime API.

The main challenge is that RuntimeApiClientFuture is a pub enum without #[non_exhaustive], so adding a new variant would be a breaking change for any downstream code that exhaustively matches on it. To avoid this, we instead spawn a background task via tokio::task::spawn to read the body asynchronously without modifying the enum's public surface.

Changes

RuntimeApiClientService now captures a tokio::runtime::Handle at construction time via Handle::try_current(), with a fallback minimal current-thread runtime for non-tokio contexts (e.g. tests)
On non-2xx responses, a background task is spawned to collect and log the response body including the HTTP status code
Added a log_or_print! macro in lib.rs that dispatches to tracing::error! when a tracing dispatcher is set, and falls back to eprintln! otherwise — consistent with the existing codebase pattern
Added unit tests covering 2xx success, 410 InvokeTimeout, and inner service errors, using httpmock

Example log output

ERROR lambda_runtime::layers::api_client: Lambda Runtime API returned non-200 response status=410 body={"errorMessage":"Invoke timeout","errorType":"InvokeTimeout"}

jlizen · 2026-02-24T00:18:29Z

lambda-runtime/src/layers/api_client.rs

-    First(#[pin] F, Arc<Client>),
-    Second(#[pin] BoxFuture<'static, Result<http::Response<Incoming>, BoxError>>),
+#[pin_project(project = RuntimeApiClientFutureStateProj)]
+pub enum RuntimeApiClientFutureState<F> {


This will be a breaking change. It is an internal type, that probably nobody is using, but it is part of our public API.

If we did want to go this route, I'd suggest making a new one, and marking the old one as deprecated. And cutting over all usage.

Given that probably nobody is using it, the new type is fairly painless. I don't mind that option.

That would only require a minor version bump (the deprecated old enum).

It wouldn't be breaking if we had tagged it #[non_exhaustive], but we missed this one.

Definitely if we make a new type, we should tag it non exhaustive :)

sorry I was already fixing that. It's still RuntimeApiClientFutureState. The CR was still a draft.

we changed the internals ase we said in the other CR. Cannot make it pub (crate) from api_client but still it isn't exposed in mod.rs

I discussed this with you and rcoh@ offline -

Technically this is a breaking change, whether we change the innards of RuntimeApiClientFuture or substitute it for a different enum. The type is accessible as an associated type on the RuntimeApiClientService which is publicly visible.

None of our internal machinery relies on this specific type surface - we are just polling the service that the runtime api client exposes. I'm pretty sure whoever implemented this just didn't realize that pub enum exposed all of its variants and field contents.

It would generally be risky to try to deploy something in the runtime api client, that the lambda-runtime needs to be upgraded in lockstep to, else they will have compile errors. And then that's fragile to do except across a breaking version boundary.

BUT - we aren't in that situation, the only people that would be impacted would be people that built their own higher level runtime on top of our own, or otherwise are doing things with the innards of the runtime client.

I've confirmed via code search that the only cases of RuntimeApiClientFuture being used are forks of the api client.

So, with that in mind, I think this is probably a fairly low risk change to make.

And then, the lowest risk change available would be to just keep the current future name, since then anybody that is naming the future as part of a tower service stack, wouldn't be broken.

And with the future, it is safer to add a new field to the existing variant, rather than add a new variant. Since that won't break match statements.

Unfortunately, that doesn't get us out of this mess for next time, but it is the least risky.

That said - what field to add - well, we already have the contents we want inside the RuntimeApiClientFuture, the issue is that we can't poll it to completion from inside the runtime (specifically, the body field of the http::Response future stored inside Second)

For that, we really would prefer a third state transition which is riskier than a new field .

I guess, we could also just change the second variant's contents to an enum (which is clearly sillier than a third variant, but less breaking-prone maybe?)

Maybe instead we could do something hacky like, check the status code, and bubble up the inner future inside of the BoxError, and downcast the BoxError from inside our managing runtime to decide whether to finish polling the body?

And then, another option we discussed was just storing a client current thread runtime or something to drive the future, but that's pretty awful... but we don't currently require that this be run inside a tokio runtime, so i don't think we can rely on tokio::spawn.

I guess we could also do something with a sync thread and block_on, but that is pretty terrible :)

I went with the tokio::spawn idea, the bubbling up of the error required a changing in then RuntimeApiClientFuture signature that while respecting the current enum and variant structure would have changed the signature constituting a breaking change.

This seems to be the only way for this to work without any signature change. The integration tests are working and the test I added are working too.

…useful for the customer for timeouts

jlizen

I think we have something workable here. I'll ping my colleague to take a quick peek as well for landmine avoidance.

Some changes needed (assuming he doesn't suggest a different route):

we need to pass the tokio runtime handle through to the the future so that it doesn't panic on being awaited in non-tokio (one option is a static oncelock holding it)
probably we should reflect against the logged output in our test

jlizen · 2026-02-24T16:49:53Z

lambda-runtime/src/layers/api_client.rs

+            Err(_) => {
+                // Fallback: build a minimal current-thread runtime and keep it alive alongside
+                // the service. This should only happen outside of a tokio context (e.g. some tests).
+                let rt = Arc::new(


Is there a reason that the rt needs to be inside an arc? It is stored inside a struct that doesn't have Clone anyway

jlizen · 2026-02-24T16:54:10Z

lambda-runtime/src/layers/api_client.rs

+                        let status = resp.status();
+                        // Spawn a background task to read and log the error body without
+                        // blocking the current poll or requiring a new enum variant.
+                        Handle::current().spawn(async move {


This will panic if the future is awaited outside of a tokio runtime. It isn't actually using the handle we store on the client.

That said, there isn't a GREAT way to pass through the runtime handle to the future without changing the public signature, since we are stuck with an enum with wide open variants and inner fields. We would essentially need to add the handle to both the first and second, and have first plumb it through to second. IE add a field to both variants.

The main non breaking way I can think of would be to store the runtime in a static OnceLock (which you initialize inside RuntimeApiClientService::new(), and it stores the runtime and returns the handle via get_or_init. So nothing stored on the client itself anymore.

At that point, you won't need to carry any data in the future. And the runtime is something of a global resource, so it's not too strange..?

One thing we might need to do would be, add a check to see if a caller-provided tokio runtime is still alive, since it might shut down early and cause a panic? Not sure how much we need to guard against that... Will get a second opinion

is the only thing you need to do here log the message? Rather than doing it this way, can you send the resp over a channel to someone else to read it out and print it?

One "dumb" option is to use the response.extensions_mut() to get the channel (or runtime handle) in there.

That avoids the hazard of attempting to read it here from the thread local.

Yes we only need to log the message.

One concern is, the place we would want to store a runtime handle or channel receiver in order to write into response extensions, is the underlying runtime api client, which is also wide open as pub / non-exhaustive.

So it wouldn't necessarily sidestep the need for static state.

A nicer option might be, instead use a tokio::task_local! at the point where we are spawning the tokio tasks, to inject a mpsc::Sender. And then, we can listen on it in a tokio::select! loop alongside its work loop.

That reduces the hazard of of the thread local reads, right @rcoh ? It also has the side benefit of allowing us to enrich the logs with per-worker context, since we have a worker id span instrumented there.

And then, inside the RuntimeApiClientFuture, where we read the response from second, we can (perhaps behind cfg(feature="concurrency-tokio") check if LocalKey::try_get returns something, and if yes, write the response to the channel to let the worker-managed channel receive loop await the body and write out its log?

I guess we could also just use tokio::runtime::Handle::current() inside the lambda_runtime_api_client gated behind tokio_concurrency and drill that feature down? IE use .map_ok on the request future, get the handle, and shove it in response extensions, and then have the RuntimeApiClientFuture check extensions and spawn the task.

Feels more infectious though than just task_locals, and it still has some weird edge cases if anybody is instantiating the lambda_runtime_api_client::Client directly.

jlizen · 2026-02-24T16:54:52Z

lambda-runtime/src/layers/api_client.rs

+                                }
+                            }
+                        });
+                        break Ok(());


why are these breaks needed? i would have expected we can just have eg Ok(_) => Ok(())

jlizen · 2026-02-24T16:56:03Z

lambda-runtime/src/layers/api_client.rs

+                        break Ok(());
+                    }
+                    Ok(_) => break Ok(()),
+                    Err(err) => break Err(err),


just to confirm, we are already logging the connection related error higher up? It looked like the C++ impl was logging them all at the same site (though it's fine if we handle differently, with the same ultimate logs)

jlizen · 2026-02-24T16:56:49Z

lambda-runtime/src/lib.rs

 pub use tower::{self, service_fn, Service};

+/// Logs using tracing `error!` if a dispatcher is set, otherwise falls back to `eprintln!`.
+macro_rules! log_or_print {


Cute, can we cut over this usage as well?

jlizen · 2026-02-24T16:57:34Z

lambda-runtime/src/layers/api_client.rs

+    }
+
+    #[tokio::test]
+    async fn test_invoke_timeout_410_is_logged_and_succeeds() {


This isn't actually asserting that anything is logged, we probably want to use #[traced_test] to reflect against log output

I know. I can add this.

darklight3it force-pushed the dmelfi/timeout-fixup branch from b17ead3 to 0120ef0 Compare February 24, 2026 00:17

jlizen reviewed Feb 24, 2026

View reviewed changes

darklight3it force-pushed the dmelfi/timeout-fixup branch from 0120ef0 to ef4ded0 Compare February 24, 2026 12:15

darklight3it marked this pull request as ready for review February 24, 2026 12:29

darklight3it force-pushed the dmelfi/timeout-fixup branch from ef4ded0 to 0b88c4b Compare February 24, 2026 12:38

feat: ⚡ Improved Logging when Lambda Runtime Api fails, this will be …

8e6dc81

…useful for the customer for timeouts

darklight3it force-pushed the dmelfi/timeout-fixup branch from 0b88c4b to 8e6dc81 Compare February 24, 2026 12:51

darklight3it mentioned this pull request Feb 24, 2026

Log timeouts when DP responds with timedout #1091

Closed

4 tasks

darklight3it requested review from jlizen and maxday February 24, 2026 12:55

darklight3it self-assigned this Feb 24, 2026

jlizen requested changes Feb 24, 2026

View reviewed changes

Comments

Conversation

darklight3it commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution

Changes

Example log output

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

darklight3it Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

darklight3it commented Feb 23, 2026 •

edited

Loading

darklight3it Feb 24, 2026 •

edited

Loading

jlizen Feb 24, 2026 •

edited

Loading

jlizen Feb 24, 2026 •

edited

Loading

jlizen Feb 24, 2026 •

edited

Loading

jlizen Feb 24, 2026 •

edited

Loading

jlizen Feb 24, 2026 •

edited

Loading