Python: Defensively strip pickle markers on internal deserialize paths by BiswajeetRay7 · Pull Request #6507 · microsoft/agent-framework

BiswajeetRay7 · 2026-06-13T23:17:32Z

Summary

This is a defense-in-depth hardening change, not a fix for a known exploit.

strip_pickle_markers() (in _serialization.py) is the control that prevents untrusted pickled / type markers from reaching pickle.loads() via deserialize_value. It is applied at the HTTP entry points today (e.g. send_hitl_response), but the internal deserialize_value() calls in _app.py — reconstructing message_data and shared_state_snapshot values — do not strip.

I traced the current entry points and did not find an attacker-reachable path to these internal sinks, so this is not a reported vulnerability. The intent is purely to remove the design's reliance on every present and future entry point remembering to sanitize: a new untrusted entry point or a refactor could otherwise silently route external data into one of these unprotected deserialize_value() calls.

Description

Wrap the two internal deserialize_value() calls in _app.py with strip_pickle_markers() (already imported in this file). Negligible runtime cost.

Honest framing: no PoC, no known exploit — defense-in-depth only.

Signed-off-by: Biswajeet Ray raybiswajeet2@gmail.com

strip_pickle_markers() is applied at the HTTP entry points today, but the internal deserialize_value() calls in _app.py do not strip. Wrapping these with strip_pickle_markers() (already imported) removes reliance on every entry point remembering to sanitize. No known exploit; defense-in-depth hardening only.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates Azure Functions agent deserialization flow to remove pickle markers before reconstructing typed objects, likely to standardize payload handling and avoid pickle-coded data.

Changes:

Strip pickle markers before deserializing executor messages.
Strip pickle markers before deserializing shared state snapshots.

            # Reconstruct message - deserialize_value restores the original typed objects
            # from the encoded data (with type markers)
-            message = deserialize_value(message_data)
+            message = deserialize_value(strip_pickle_markers(message_data))


                # Deserialize shared state values to reconstruct dataclasses/Pydantic models
                deserialized_state: dict[str, Any] = {
-                    str(k): deserialize_value(v) for k, v in shared_state_snapshot.items()
+                    str(k): deserialize_value(strip_pickle_markers(v)) for k, v in shared_state_snapshot.items()


BiswajeetRay7 · 2026-06-13T23:19:47Z

Thanks for the review. Good points — I'll update the stale comment to reflect the marker-stripping, and refactor the duplicated deserialize_value(strip_pickle_markers(...)) into a small local helper to centralize the behavior.

Signed-off-by: Biswajeet Ray <raybiswajeet2@gmail.com>

BiswajeetRay7 · 2026-06-13T23:32:07Z

Follow-up on the helper-function suggestion: on reflection I kept the two call sites inline rather than adding a _deserialize_safely() helper. With only two call sites, a module-level helper added a bit of indirection for little gain, so I opted to keep it explicit. The stale comment has been updated in a1dd6ea. Happy to refactor into a helper if you'd prefer — just let me know.

ahmedmuhsin

Thanks for thinking about this defensively. I have one concern: I think these two internal deserialize sinks are the wrong layer to strip at, and stripping here would silently null legitimate typed payloads. Details inline.

ahmedmuhsin · 2026-06-13T23:41:28Z

-            message = deserialize_value(message_data)
+            # Reconstruct message: strip untrusted pickle/type markers first
+            # (defense-in-depth), then deserialize_value restores typed objects.
+            message = deserialize_value(strip_pickle_markers(message_data))


These two internal sinks (this line and the shared_state_snapshot deserialize on line 327) are not untrusted input. message_data and the shared-state values are produced by the orchestrator itself via serialize_value, and are expected to carry __pickled__ / __type__ markers, because that is how typed objects (dataclasses, Pydantic models, Message, AgentExecutorResponse) round-trip across the JSON activity hop.

strip_pickle_markers replaces any top-level marker dict with None. A typed message or typed shared-state value is a top-level marker dict after serialize_value, so:

deserialize_value(strip_pickle_markers(<typed object>)) -> deserialize_value(None) -> None

The net effect is that any workflow passing a typed object between executors, or storing one in shared state, would have it silently nulled. Primitives survive (no markers), which is likely why a quick run looks fine, but typed payloads break. A test with a dataclass or Pydantic message plus a typed shared-state value would show this.

On the threat model: the untrusted boundaries (the initial client input and the HITL response path) are already sanitized at the entry points in #6418, which is the right place, since that external JSON should never contain markers in the first place. Stripping on the internal deserialize conflates the trusted transport format with the untrusted boundary.

One heads up: #6418 extracts this inline activity body into a shared activity.py used by both the Azure Functions and standalone hosts, so these exact lines move. Worth coordinating so any change lands in the shared engine rather than only here.

BiswajeetRay7 · 2026-06-13T23:45:39Z

You're absolutely right, and thank you for the detailed explanation. I missed that message_data and the shared-state values are produced internally by serialize_value and are expected to carry pickled / type markers — so stripping here would null legitimate typed payloads, breaking workflows that pass typed objects between executors or store them in shared state. The untrusted boundaries are already correctly sanitized at the entry points in #6418, which is the right layer.

My change is incorrect, so I'll close this PR. Thanks again for catching it and for the pointer to the shared activity.py refactor.

Copilot AI review requested due to automatic review settings June 13, 2026 23:17

BiswajeetRay7 requested a review from a team as a code owner June 13, 2026 23:17

moonbox3 added the python label Jun 13, 2026

github-actions Bot changed the title ~~Defensively strip pickle markers on internal deserialize paths~~ Python: Defensively strip pickle markers on internal deserialize paths Jun 13, 2026

Copilot AI reviewed Jun 13, 2026

View reviewed changes

Update comment to reflect marker-stripping behavior

a1dd6ea

Signed-off-by: Biswajeet Ray <raybiswajeet2@gmail.com>

ahmedmuhsin requested changes Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Defensively strip pickle markers on internal deserialize paths#6507

Python: Defensively strip pickle markers on internal deserialize paths#6507
BiswajeetRay7 wants to merge 2 commits into
microsoft:mainfrom
BiswajeetRay7:patch-1

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

ahmedmuhsin left a comment

Uh oh!

ahmedmuhsin Jun 13, 2026

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BiswajeetRay7 commented Jun 13, 2026

Summary

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

ahmedmuhsin left a comment

Choose a reason for hiding this comment

Uh oh!

ahmedmuhsin Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

BiswajeetRay7 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants