feat: add support for running on AWS Lambda managed instance types#2083
feat: add support for running on AWS Lambda managed instance types#2083herin049 wants to merge 8 commits intoopen-telemetry:mainfrom
Conversation
|
Hi @herin049, For the Python SDK related changes, as far as I understood from your explanations, Because otherwise (if |
|
Hi @serkan-ozal, thanks for the review. Yes, your understanding is correct. To reiterate, here is what I assume happens internally for AWS Lambda managed instances:
I can't find any documentation on this except with the docs stating that managed lambda instances can serve many requests concurrently, and the only way to do this while achieving true parallelism with Python is to use multiple processes. Regardless, this change will ALWAYS be safe to make because even for standard lambda instances, the behavior will be identical. That is, running |
|
Yes, I know that using a wrapper handler which delegates to the user handler is a very common approach. But while asking to verify this behavioral change with the However, even though In addition to the points above, for Python, instead of wrapper handler, one other approach would be using |
|
Thanks @serkan-ozal I see where you are coming from now. To limit the scope of these changes I've just focused on the collector level changes and adding support for Python for now. If you'd like, I can create an issue for verifying/making changes for all of the supported lambda runtime for Lambda managed instances. I am not as familiar with how auto instrumentation works for the other runtimes, but I can certainly look into this more and make any required changes based on my findings in order to support managed instances. In either case, the collector changes I have made in this PR will not change even if there are substantial changes to the auto instrumentation logic in some runtimes. With regard to your concern with instrumenting the original parent process, I don't think this is a concern. I have not observed any irregular spans being reported even when enabling all of the auto instrumentation libraries. From what I have found so far, it seems like the auto instrumentation wrapper command is not working properly for the worker Python processes because somehow The reason I made the changes to the wrapper script the way I did is because: they are relatively minimal and the changes are backwards compatible, ensuring that the behavior is the same as the previous wrapper script. |
|
@herin049 I think it is better to limit the scope of this PR to only changes related to the collector. For the SDK related changes, first, I would like to understand the behavior for the main process and spawned processes (I will be looking into it too) when AWS Lambda managed instance is used. So, my take is please create issue(s) for the SDK related changes and remove the Python related changes from this PR. |
Sounds good to me @serkan-ozal. I have reverted the Python SDK related changes in this PR, I can work on a follow-up PR to update all of the SDKs where necessary to support managed instance types, and do some additional research myself. |
Yes I agree! |
2b22ef9 to
114e673
Compare
|
I am OK with the changes here, @herin049 can you please resolve the conflicts, so then we can merge this one. |
Awesome, will resolve the conflicts shortly. |
114e673 to
904b61f
Compare
|
@wpessers and @serkan-ozal made a trivial cleanup change after reviewing the code again, should be good to go |
There was a problem hiding this comment.
Pull request overview
This PR updates the Lambda layer/collector to support AWS Lambda managed instance types by branching behavior based on AWS_LAMBDA_INITIALIZATION_TYPE (notably: no Invoke subscription, no platform.runtimeDone wait path, and no decouple processor insertion), and adjusts telemetry parsing to tolerate differences in managed-instance telemetry events.
Changes:
- Add
collector/lambdalifecyclemodule to parseAWS_LAMBDA_INITIALIZATION_TYPEinto a typedInitType. - Update lifecycle manager + extension client registration to subscribe only to
SHUTDOWN(and skip Telemetry API listener) for managed instances. - Update telemetry API receiver to better handle missing/partial report fields and request-id association, and adjust tests accordingly.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| collector/receiver/telemetryapireceiver/receiver_test.go | Updates expected platform report log formatting behavior. |
| collector/receiver/telemetryapireceiver/receiver.go | Adds init-type awareness, improves report formatting tolerance, and adjusts request-id handling for managed instances. |
| collector/receiver/telemetryapireceiver/go.mod | Adds dependency/replace wiring for the new lambdalifecycle module. |
| collector/lambdalifecycle/types_test.go | Adds unit tests for InitType parsing/string/env behavior. |
| collector/lambdalifecycle/types.go | Defines InitType enum + parsing helpers. |
| collector/lambdalifecycle/go.sum | Adds sums for lambdalifecycle module dependencies. |
| collector/lambdalifecycle/go.mod | Declares the lambdalifecycle submodule and its test dependency. |
| collector/lambdalifecycle/constants.go | Defines AWS_LAMBDA_INITIALIZATION_TYPE env var constant. |
| collector/internal/lifecycle/manager_test.go | Updates tests to use new extension client constructor signature. |
| collector/internal/lifecycle/manager.go | Branches extension event subscriptions + Telemetry API listener startup based on init type. |
| collector/internal/lifecycle/constants.go | Centralizes AWS_LAMBDA_RUNTIME_API env var name. |
| collector/internal/extensionapi/client.go | Extends NewClient/Register to accept configurable subscribed event types. |
| collector/internal/confmap/converter/decoupleafterbatchconverter/converter_test.go | Adds test ensuring decouple isn’t appended for managed instances. |
| collector/internal/confmap/converter/decoupleafterbatchconverter/converter.go | Skips decouple insertion when init type is managed instances. |
Comments suppressed due to low confidence (1)
collector/receiver/telemetryapireceiver/receiver.go:384
- In createLogs(), the current request ID is updated twice for platform.start: first via updateCurrentRequestId(requestId) and then again via direct assignment to r.currentFaasInvocationID. The direct assignment bypasses the LambdaManagedInstances guard in updateCurrentRequestId and is redundant for other init types; remove the direct assignment and rely on updateCurrentRequestId (or route all writes through the helper) so managed-instance behavior stays consistent.
if requestId != "" {
logRecord.Attributes().PutStr(string(semconv.FaaSInvocationIDKey), requestId)
// If this is the first event in the invocation with a request id (i.e. the "platform.start" event),
// set the current invocation id to this request id.
if el.Type == string(telemetryapi.PlatformStart) {
r.currentFaasInvocationID = requestId
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if r.lastPlatformStartTime != "" && el.Time != "" { | ||
| r.lastPlatformEndTime = el.Time | ||
| r.logger.Info(fmt.Sprintf("Init end: %s", r.lastPlatformEndTime), zap.Any("event", el)) | ||
| } |
There was a problem hiding this comment.
The PlatformInitReport branch updates lastPlatformEndTime but never creates the init span or clears lastPlatformStartTime/lastPlatformEndTime. If managed instances emit platform.initReport instead of platform.initRuntimeDone, this will prevent coldstart/init spans from being produced and can leave stale init timestamps in the receiver state. Consider mirroring the PlatformInitRuntimeDone handling here (createPlatformInitSpan when start/end are present, then reset the timestamps).
| } | |
| } | |
| if len(r.lastPlatformStartTime) > 0 && len(r.lastPlatformEndTime) > 0 { | |
| if record, ok := el.Record.(map[string]any); ok { | |
| if td, err := r.createPlatformInitSpan(record, r.lastPlatformStartTime, r.lastPlatformEndTime); err == nil { | |
| err := r.nextTraces.ConsumeTraces(context.Background(), td) | |
| if err == nil { | |
| r.lastPlatformEndTime = "" | |
| r.lastPlatformStartTime = "" | |
| } else { | |
| r.logger.Error("error receiving traces", zap.Error(err)) | |
| } | |
| } | |
| } | |
| } |
| listener *telemetryapi.Listener | ||
| wg sync.WaitGroup | ||
| lifecycleListeners []lambdalifecycle.Listener | ||
| initType lambdalifecycle.InitType | ||
| } |
There was a problem hiding this comment.
The new manager.initType field is assigned in NewManager but never read anywhere, so it adds state without affecting behavior. Either remove the field or use it in processEvents()/other methods so it serves a purpose (e.g., for logging, guarding listener usage, or future branching).
| var extensionEvents []extensionapi.EventType | ||
| initType := lambdalifecycle.InitTypeFromEnv(lambdalifecycle.InitTypeEnvVar) | ||
| if initType == lambdalifecycle.LambdaManagedInstances { | ||
| extensionEvents = []extensionapi.EventType{extensionapi.Shutdown} | ||
| } else { | ||
| extensionEvents = []extensionapi.EventType{extensionapi.Invoke, extensionapi.Shutdown} | ||
| } |
There was a problem hiding this comment.
The Lambda managed-instances branching in NewManager (event subscription selection + skipping Telemetry API listener startup) is new behavior but isn't covered by unit tests in this package. Add a test that sets AWS_LAMBDA_INITIALIZATION_TYPE to lambda-managed-instances and verifies the extension client is registered with only SHUTDOWN and that listener/Wait + FunctionInvoked/Finished paths are not used.
Adds support for running on AWS Lambda managed instances.
Lambda managed instances differ from standard lambda functions in several areas, but the differences most relevant for the OpenTelemetry collector layer are:
Invokeevent type.platform.runtimeDoneeventsdecoupleprocessor)For more information see: https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html
In order to accommodate the changes above, the following changes have been made to the layer if the extension determines the initialization type is
lambda-managed-instancesusing theAWS_LAMBDA_INITIALIZATION_TYPEenvironment variable:Invokeevent type, and the extension no longer subscribes to the telemetry API to listen for theplatform.runtimeDoneevent if the initialization type islambda-managed-instancesdecoupleprocessor is no longer added to any pipelines.FunctionInvoked()andFunctionFinished()lifecycle methods are no longer invoked for lifecycle listeners.I have added relevant unit tests and have manually verified the implementation using the following repo: https://github.com/herin049/aws-lambda-managed which configures the layer to export signals to Grafana cloud.