FEAT Beam Search for OpenAIResponseTarget#1346
FEAT Beam Search for OpenAIResponseTarget#1346riedgar-ms wants to merge 145 commits intoAzure:mainfrom
Conversation
| @pytest.fixture | ||
| def mock_non_true_false_scorer(): | ||
| """Create a mock scorer that is not a true/false type""" | ||
| scorer = MagicMock(spec=Scorer) | ||
| scorer.get_identifier.return_value = get_mock_scorer_identifier() | ||
| return scorer | ||
|
|
||
|
|
There was a problem hiding this comment.
The mock_non_true_false_scorer fixture defined at line 56–61 is never referenced in any test method in this file. It should either be used in a test or removed to avoid dead code.
| @pytest.fixture | |
| def mock_non_true_false_scorer(): | |
| """Create a mock scorer that is not a true/false type""" | |
| scorer = MagicMock(spec=Scorer) | |
| scorer.get_identifier.return_value = get_mock_scorer_identifier() | |
| return scorer |
| beam_reviewer=TopKBeamReviewer(k=2, drop_chars=1), | ||
| attack_scoring_config=scoring_config, | ||
| num_chars_per_step=0, | ||
| ) |
There was a problem hiding this comment.
The test file only covers the initialization/validation logic of BeamSearchAttack, TopKBeamReviewer, and Beam.get_grammar. There are no tests for the core execution paths: _perform_async, _propagate_beam_async, _score_beam_async, _get_target_for_beam, and _determine_attack_outcome. Other single-turn attacks like SkeletonKeyAttack and PromptSendingAttack have comprehensive execution tests that mock the target and normalizer (see tests/unit/executor/attack/single_turn/test_skeleton_key.py and test_prompt_sending.py). The _propagate_beam_async and _score_beam_async can be tested by mocking self._prompt_normalizer and the scoring infrastructure, without needing a real OpenAIResponseTarget. Similarly, _get_target_for_beam can be tested by mocking fresh_instance.
| ) | |
| ) | |
| def test_get_target_for_beam_uses_fresh_instance(self, mock_true_false_scorer, mock_float_scale_scorer): | |
| """Test that _get_target_for_beam obtains a fresh target instance via fresh_instance.""" | |
| # Arrange: create an objective target whose fresh_instance method is mocked | |
| objective_target = MagicMock(spec=OpenAIResponseTarget) | |
| fresh_target_instance = MagicMock(spec=PromptTarget) | |
| objective_target.fresh_instance.return_value = fresh_target_instance | |
| scoring_config = AttackScoringConfig( | |
| objective_scorer=mock_true_false_scorer, | |
| auxiliary_scorers=[mock_float_scale_scorer], | |
| ) | |
| attack = BeamSearchAttack( | |
| objective_target=objective_target, | |
| beam_reviewer=TopKBeamReviewer(k=2, drop_chars=1), | |
| attack_scoring_config=scoring_config, | |
| ) | |
| # Use a simple Beam instance; the specific content is not important for this interaction | |
| beam = Beam(prompt="test", score=0.0) | |
| # Act: call the internal helper | |
| target_for_beam = attack._get_target_for_beam(beam) | |
| # Assert: fresh_instance was used and its return value was propagated | |
| objective_target.fresh_instance.assert_called_once() | |
| assert target_for_beam is fresh_target_instance |
| PREFIX: "{prefix}" | ||
| CONTINUATION: /.{{0,{n_chars}}}/ | ||
| """ | ||
| prefix = self.text.replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n") |
There was a problem hiding this comment.
The get_grammar method escapes \\, ", and \n from the beam text before embedding it in the Lark grammar string. However, \r (carriage return, U+000D) is not escaped. Since '\r'.isprintable() returns False in Python, any trailing \r characters will be pruned by the trailing non-printable pruning loop. However, an embedded \r that appears before printable characters in the beam text will NOT be pruned and will be inserted unescaped into the grammar string literal, causing a Lark parse error at runtime.
If the model ever produces a response containing a Windows-style \r\n line ending, the \r would be embedded unescaped in the grammar. The escaping should be extended to handle \r (replacing it with \\r), and similarly any other non-printable characters that could appear in an intermediate position should be stripped or escaped before the trailing-prune step.
| prefix = self.text.replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n") | |
| escaped_parts = [] | |
| for ch in self.text: | |
| if ch == "\\": | |
| escaped_parts.append("\\\\") | |
| elif ch == '"': | |
| escaped_parts.append('\\"') | |
| elif ch == "\n": | |
| escaped_parts.append("\\n") | |
| elif ch == "\r": | |
| escaped_parts.append("\\r") | |
| elif ch.isprintable(): | |
| escaped_parts.append(ch) | |
| prefix = "".join(escaped_parts) |
| # Store the prepended conversation configuration | ||
| self._prepended_conversation_config = prepended_conversation_config | ||
| self._start_context: Optional[SingleTurnAttackContext[Any]] = None | ||
|
|
There was a problem hiding this comment.
BeamSearchAttack does not override get_attack_scoring_config(), while the closely related PromptSendingAttack (which also stores _objective_scorer and _auxiliary_scorers) does override it. This means the attack's objective scorer and auxiliary scorers are not included in the attack's ComponentIdentifier, making it harder to distinguish between BeamSearchAttack instances configured with different scorers. For consistency with PromptSendingAttack (at pyrit/executor/attack/single_turn/prompt_sending.py:121-131), BeamSearchAttack should override get_attack_scoring_config() to return an AttackScoringConfig built from self._objective_scorer and self._auxiliary_scorers.
| def get_attack_scoring_config(self) -> AttackScoringConfig: | |
| """ | |
| Get the attack scoring configuration for this beam search attack. | |
| Returns: | |
| AttackScoringConfig: The scoring configuration with objective and auxiliary scorers. | |
| """ | |
| return AttackScoringConfig( | |
| objective_scorer=self._objective_scorer, | |
| auxiliary_scorers=self._auxiliary_scorers, | |
| ) |
| if beam.objective_score and beam.objective_score.get_value(): | ||
| # We have a positive score, so it's a success | ||
| return AttackOutcome.SUCCESS, "Objective achieved according to scorer" | ||
|
|
||
| # No response at all (all attempts filtered/failed) | ||
| return AttackOutcome.FAILURE, "All attempts were filtered or failed to get a response" | ||
|
|
There was a problem hiding this comment.
The comment at line 508 says "No response at all (all attempts filtered/failed)" but this outcome_reason applies to two distinct cases: (1) all attempts genuinely failed/were filtered (so beam.objective_score is None), and (2) the objective scorer returned a non-positive score (the beam got a response but didn't achieve the objective). The misleading outcome_reason ("All attempts were filtered or failed to get a response") is reported in both cases, even when a response was successfully received but failed the objective check. The outcome reason should reflect the actual cause: when beam.objective_score is None, use "No score was produced"; when it's present but non-positive, use "Objective not achieved according to scorer".
| if beam.objective_score and beam.objective_score.get_value(): | |
| # We have a positive score, so it's a success | |
| return AttackOutcome.SUCCESS, "Objective achieved according to scorer" | |
| # No response at all (all attempts filtered/failed) | |
| return AttackOutcome.FAILURE, "All attempts were filtered or failed to get a response" | |
| objective_value = None | |
| if beam.objective_score is not None: | |
| objective_value = beam.objective_score.get_value() | |
| if objective_value is not None and objective_value > 0: | |
| # We have a positive score, so it's a success | |
| return AttackOutcome.SUCCESS, "Objective achieved according to scorer" | |
| if beam.objective_score is None: | |
| # No score was produced for this beam | |
| return AttackOutcome.FAILURE, "No score was produced" | |
| # A response was scored but did not achieve the objective | |
| return AttackOutcome.FAILURE, "Objective not achieved according to scorer" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
| Beam | ||
| BeamReviewer | ||
| BeamSearchAttack | ||
| TopKBeamReviewer |
There was a problem hiding this comment.
The four new entries (Beam, BeamReviewer, BeamSearchAttack, TopKBeamReviewer) are appended at the end of the alphabetically ordered list (after TreeOfAttacksWithPruningAttack), breaking the alphabetical ordering of the existing entries. They should be inserted at their alphabetically correct positions: Beam and BeamReviewer between AttackStrategy and ChunkedRequestAttack, and BeamSearchAttack and TopKBeamReviewer at their respective positions.
There was a problem hiding this comment.
Looks like there's an error in the output?
There was a problem hiding this comment.
It's a warning, but something went wrong on one of the API calls (most likely a grammar had an incorrectly escaped character, although I'm having a hard time running that down). Would you prefer something like that be logged as info rather than warning?
| # Log the attack configuration | ||
| self._logger.info(f"Starting {self.__class__.__name__} with objective: {context.objective}") | ||
|
|
||
| beams = [Beam(id=context.conversation_id, text="", score=0.0) for _ in range(self._num_beams)] |
There was a problem hiding this comment.
are we using the same conversation ID for all of them? Feels like we should create a fresh one per beam / per iteration.
There was a problem hiding this comment.
They don't share (I think that would be a key violation on the database) - it's actually the opposite extreme in that every time a beam updates, it gets a fresh conversation id (hence my concerns above about whether I'm using the database correctly). That said, how this happens (when _propagate_beam_async() re-calls _setup_async() is perhaps too well hidden. I can add comments, but would like to straighten out the higher level question of how I'm spawning new conversations with abandon virst.
| "tool_choice": "required", | ||
| } | ||
|
|
||
| return self._objective_target.fresh_instance(extra_body_parameters=ebp, grammar_name=str(grammar_tool["name"])) |
There was a problem hiding this comment.
The fact that our target is static in its configuration and doesn't allow for custom grammars to be fed in per request is rather annoying here...
The problem is that even if we managed to allow per-request overrides the target identifier wouldn't have those unless we managed to take care of that as well (which is doable). Perhaps the fresh instances are the only way as it stands. Getting a whole new level of appreciation for why the openai client separates such parameters out into the send call rather than the constructor.
CC @rlundeen2
There was a problem hiding this comment.
I'm somewhat frustrated that OpenAI made grammars a tool, rather than a response_format (like a JSON schema).
To the specific case here, there's certainly a reason for always keeping the same tools (not thrashing the KV cache), but it does complicate matters for this scenario
Description
Use the Lark grammar feature of the
OpenAIResponseTargetto create a beam search for PyRIT. This is a single turn attack, where a collection of candidate responses (the beams) are maintained. On each iteration, the model's response is allowed to extend a little for each beam. The beams are scored, with the worst performing ones discarded, and replaced with copies of higher scoring beams.Tests and Documentation
Have basic unit tests of the classes added, but since this requires features only currently in the
OpenAIResponseTargetthere didn't seem much point in mocking that. There is a notebook which runs everything E2E.