Summary
Nested persona behavior strings are rendered as Jinja templates during evaluation prompt construction. The affected code paths did not consistently keep that nested rendering inside a SandboxedEnvironment.
Affected code
src/google/adk/evaluation/simulation/llm_backed_user_simulator_prompts.py
src/google/adk/evaluation/simulation/per_turn_user_simulator_quality_prompts.py
Problem
Both prompt builders support nested rendering of persona behavior fields through render_string_filter.
In llm_backed_user_simulator_prompts.py, the outer prompt used SandboxedEnvironment, but nested string rendering created a new template from the persona string instead of reusing the sandboxed environment.
In per_turn_user_simulator_quality_prompts.py, nested persona rendering followed the same pattern and the outer environment was not sandboxed.
As a result, nested persona strings could evaluate Jinja expressions outside the intended sandbox boundary.
Expected behavior
Nested persona strings should only render through a sandboxed Jinja environment, so supported placeholders such as {{ stop_signal }} still work but unsafe attribute traversal and similar sandbox escapes do not.
Proposed fix
- Reuse
SandboxedEnvironment for nested persona rendering instead of creating a fresh unsandboxed template.
- Use
SandboxedEnvironment in the per-turn evaluator prompt builder as well.
- Add regression tests that verify:
- supported nested placeholders still render
- unsafe nested template expressions are blocked
Validation
I have a PR prepared that:
- switches both prompt builders to sandboxed nested rendering
- adds regression tests for allowed interpolation and blocked unsafe access
- passes
tests/unittests/evaluation/simulation in clean Linux Docker
- confirms in a live
adk web eval run that:
- malicious nested persona templates are blocked with
SecurityError
- safe nested placeholders proceed past prompt rendering into normal model execution
Summary
Nested persona behavior strings are rendered as Jinja templates during evaluation prompt construction. The affected code paths did not consistently keep that nested rendering inside a
SandboxedEnvironment.Affected code
src/google/adk/evaluation/simulation/llm_backed_user_simulator_prompts.pysrc/google/adk/evaluation/simulation/per_turn_user_simulator_quality_prompts.pyProblem
Both prompt builders support nested rendering of persona behavior fields through
render_string_filter.In
llm_backed_user_simulator_prompts.py, the outer prompt usedSandboxedEnvironment, but nested string rendering created a new template from the persona string instead of reusing the sandboxed environment.In
per_turn_user_simulator_quality_prompts.py, nested persona rendering followed the same pattern and the outer environment was not sandboxed.As a result, nested persona strings could evaluate Jinja expressions outside the intended sandbox boundary.
Expected behavior
Nested persona strings should only render through a sandboxed Jinja environment, so supported placeholders such as
{{ stop_signal }}still work but unsafe attribute traversal and similar sandbox escapes do not.Proposed fix
SandboxedEnvironmentfor nested persona rendering instead of creating a fresh unsandboxed template.SandboxedEnvironmentin the per-turn evaluator prompt builder as well.Validation
I have a PR prepared that:
tests/unittests/evaluation/simulationin clean Linux Dockeradk webeval run that:SecurityError