FEAT Add VLGuard multimodal safety dataset loader by romanlutz · Pull Request #1447 · Azure/PyRIT

romanlutz · 2026-03-10T12:54:33Z

Summary

Adds support for the VLGuard dataset (ICML 2024), a vision-language safety benchmark that evaluates whether multimodal models refuse unsafe content while remaining helpful on safe content.

What is VLGuard?

VLGuard contains ~2,000 image-instruction pairs across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) and 8 subcategories (Personal Data, Professional Advice, Political, Sexually
Explicit, Violence, Disinformation, Discrimination by Sex, Discrimination by Race).

It supports three evaluation subsets:

unsafes — unsafe images with instructions (tests whether the model refuses to describe unsafe visual content)
safe_unsafes — safe images with unsafe instructions (tests whether the model refuses unsafe text prompts)
safe_safes — safe images with safe instructions (tests whether the model remains helpful)

Usage

from pyrit.datasets.seed_datasets.remote import _VLGuardDataset, VLGuardCategory, VLGuardSubset

Load unsafe image examples (default)

loader = VLGuardDataset(token="hf...")
dataset = await loader.fetch_dataset()

Load safe images with unsafe instructions, filtered to Privacy category

loader = VLGuardDataset(
subset=VLGuardSubset.SAFE_UNSAFES,
categories=[VLGuardCategory.PRIVACY],
token="hf...",
)
dataset = await loader.fetch_dataset()

Note: This is a gated dataset on HuggingFace. Users must accept the terms at https://huggingface.co/datasets/ys-zong/VLGuard and provide a HuggingFace token.

Changes

pyrit/datasets/seed_datasets/remote/vlguard_dataset.py — new dataset loader
pyrit/datasets/seed_datasets/remote/init.py — register exports
tests/unit/datasets/test_vlguard_dataset.py — 14 unit tests
doc/code/datasets/1_loading_datasets.ipynb — regenerated to show VLGuard in dataset list

Add support for the VLGuard dataset (ICML 2024) which contains image-instruction pairs for evaluating vision-language model safety across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) with 8 subcategories. Supports three evaluation subsets: - unsafes: unsafe images with instructions (tests refusal) - safe_unsafes: safe images with unsafe instructions (tests refusal) - safe_safes: safe images with safe instructions (tests helpfulness) Downloads from HuggingFace (gated dataset, requires token and terms acceptance). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ValbuenaVC · 2026-03-10T20:21:04Z

pyrit/datasets/seed_datasets/remote/vlguard_dataset.py

+
+
+class VLGuardCategory(Enum):
+    """Categories in the VLGuard dataset."""


Nit: Can we add a brief explainer or example of each category?

ValbuenaVC · 2026-03-10T20:21:29Z

pyrit/datasets/seed_datasets/remote/vlguard_dataset.py

+    models refuse unsafe content while remaining helpful on safe content.
+
+    The dataset covers 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech)
+    with 8 subcategories (Personal Data, Professional Advice, Political, Sexually Explicit,


Why aren't subcategories in the enum? They don't have to be, just curious

ValbuenaVC · 2026-03-10T20:26:24Z

pyrit/datasets/seed_datasets/remote/vlguard_dataset.py

+            prompts.append(text_prompt)
+            prompts.append(image_prompt)
+
+            if self.max_examples is not None and len(prompts) >= self.max_examples * 2:


Suggested change

if self.max_examples is not None and len(prompts) >= self.max_examples * 2:

# Note that len(prompts) is divided by two since each prompt includes one image and one text field.

if self.max_examples is not None and len(prompts) >= self.max_examples * 2:

romanlutz and others added 2 commits March 10, 2026 05:33

Update 1_loading_datasets notebook to include VLGuard

255dd50

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz force-pushed the romanlutz/vlguard-dataset branch from cac3cad to 255dd50 Compare March 10, 2026 13:24

ValbuenaVC reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add VLGuard multimodal safety dataset loader#1447

FEAT Add VLGuard multimodal safety dataset loader#1447
romanlutz wants to merge 2 commits intoAzure:mainfrom
romanlutz:romanlutz/vlguard-dataset

romanlutz commented Mar 10, 2026

Uh oh!

ValbuenaVC Mar 10, 2026

Uh oh!

ValbuenaVC Mar 10, 2026

Uh oh!

ValbuenaVC Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class VLGuardCategory(Enum):
		"""Categories in the VLGuard dataset."""

	if self.max_examples is not None and len(prompts) >= self.max_examples * 2:
	# Note that len(prompts) is divided by two since each prompt includes one image and one text field.
	if self.max_examples is not None and len(prompts) >= self.max_examples * 2:

Conversation

romanlutz commented Mar 10, 2026

Load unsafe image examples (default)

Load safe images with unsafe instructions, filtered to Privacy category

Uh oh!

ValbuenaVC Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

ValbuenaVC Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

ValbuenaVC Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants