Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -298,9 +298,9 @@ def executor_activity(inputData: str) -> str:
if not executor:
raise ValueError(f"Unknown executor: {captured_executor_id}")

# Reconstruct message - deserialize_value restores the original typed objects
# from the encoded data (with type markers)
message = deserialize_value(message_data)
# Reconstruct message: strip untrusted pickle/type markers first
# (defense-in-depth), then deserialize_value restores typed objects.
message = deserialize_value(strip_pickle_markers(message_data))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two internal sinks (this line and the shared_state_snapshot deserialize on line 327) are not untrusted input. message_data and the shared-state values are produced by the orchestrator itself via serialize_value, and are expected to carry __pickled__ / __type__ markers, because that is how typed objects (dataclasses, Pydantic models, Message, AgentExecutorResponse) round-trip across the JSON activity hop.

strip_pickle_markers replaces any top-level marker dict with None. A typed message or typed shared-state value is a top-level marker dict after serialize_value, so:

deserialize_value(strip_pickle_markers(<typed object>)) -> deserialize_value(None) -> None

The net effect is that any workflow passing a typed object between executors, or storing one in shared state, would have it silently nulled. Primitives survive (no markers), which is likely why a quick run looks fine, but typed payloads break. A test with a dataclass or Pydantic message plus a typed shared-state value would show this.

On the threat model: the untrusted boundaries (the initial client input and the HITL response path) are already sanitized at the entry points in #6418, which is the right place, since that external JSON should never contain markers in the first place. Stripping on the internal deserialize conflates the trusted transport format with the untrusted boundary.

One heads up: #6418 extracts this inline activity body into a shared activity.py used by both the Azure Functions and standalone hosts, so these exact lines move. Worth coordinating so any change lands in the shared engine rather than only here.


# Check if this is a HITL response message by examining source_executor_ids
is_hitl_response = any(s.startswith(SOURCE_HITL_RESPONSE) for s in source_executor_ids)
Expand All @@ -324,7 +324,7 @@ def classify_yielded_output(executor_id: str) -> YieldOutputEventType | None:

# Deserialize shared state values to reconstruct dataclasses/Pydantic models
deserialized_state: dict[str, Any] = {
str(k): deserialize_value(v) for k, v in shared_state_snapshot.items()
str(k): deserialize_value(strip_pickle_markers(v)) for k, v in shared_state_snapshot.items()
}
original_snapshot = _create_state_snapshot(deserialized_state)
shared_state.import_state(deserialized_state)
Expand Down