Re: Agentic AI – Threats and Mitigations v1.1 — Playbook 5 (Protecting HITL & Preventing Decision Fatigue Exploits) and T10 (Overwhelming Human-in-the-Loop).
Gap
Across Playbooks 3–6, the human-in-the-loop / autonomy gate keys on risk level, impact, domain, or privilege — never on the reversibility of the action. E.g.:
- P5, Step 1: "Automate low-risk approvals while requiring human oversight for high-impact tasks."
- P3, Step 2: "explicit user approval for AI tool executions involving financial, medical, or administrative functions."
- T10 mitigation: "low-risk decisions are automated, and human intervention is prioritized for high-risk anomalies."
Risk/impact/domain are estimates; reversibility is an intrinsic, deterministic property of the action.
Proposal
Add reversibility as the gating axis (primary home: P5 Step 1; cross-ref P3/P4/P6). Classify each agent-invocable action by whether it can be undone:
- Read-only (no state change) -> autonomous
- Reversible (engine can roll back / auto-expires) -> autonomous, logged
- External-reversible (recoverable via a separate process) -> human approval
- Irreversible (cannot be undone) -> human approval + evidence threshold
Worst-case governs: a multi-step task gates at the least-reversible action it can reach.
Why it strengthens the doc
Directly serves Playbook 5's goal: reserving the human gate for irreversible actions (not all "high-risk") keeps the review queue small — the core defense against the decision fatigue T10 exploits. It also caps the blast radius of T2/T3/T6/T7/T13 regardless of attack path, since in all of those the agent acts with legitimate access.
Note on terminology
This "reversibility" (a property of the action) is distinct from the "decision reversal" detection in Playbooks 1 & 6 (flagging suspicious approval-flips). Both can coexist; worth naming explicitly to avoid conflation.
Standards anchors
NIST SP 800-53 AC-6 (least privilege at the action level); EU AI Act Article 14 (human oversight). Complementary to Playbook 4 (identity/authz): authz decides permission, reversibility decides the gate. Mirrors the reversibility axis proposed in OWASP AISVS (OWASP/AISVS#820, C9.2), so the two artifacts reinforce.
Happy to draft the PR for the Playbook 5 Step 1 bullet + cross-references.
Re: Agentic AI – Threats and Mitigations v1.1 — Playbook 5 (Protecting HITL & Preventing Decision Fatigue Exploits) and T10 (Overwhelming Human-in-the-Loop).
Gap
Across Playbooks 3–6, the human-in-the-loop / autonomy gate keys on risk level, impact, domain, or privilege — never on the reversibility of the action. E.g.:
Risk/impact/domain are estimates; reversibility is an intrinsic, deterministic property of the action.
Proposal
Add reversibility as the gating axis (primary home: P5 Step 1; cross-ref P3/P4/P6). Classify each agent-invocable action by whether it can be undone:
Worst-case governs: a multi-step task gates at the least-reversible action it can reach.
Why it strengthens the doc
Directly serves Playbook 5's goal: reserving the human gate for irreversible actions (not all "high-risk") keeps the review queue small — the core defense against the decision fatigue T10 exploits. It also caps the blast radius of T2/T3/T6/T7/T13 regardless of attack path, since in all of those the agent acts with legitimate access.
Note on terminology
This "reversibility" (a property of the action) is distinct from the "decision reversal" detection in Playbooks 1 & 6 (flagging suspicious approval-flips). Both can coexist; worth naming explicitly to avoid conflation.
Standards anchors
NIST SP 800-53 AC-6 (least privilege at the action level); EU AI Act Article 14 (human oversight). Complementary to Playbook 4 (identity/authz): authz decides permission, reversibility decides the gate. Mirrors the reversibility axis proposed in OWASP AISVS (OWASP/AISVS#820, C9.2), so the two artifacts reinforce.
Happy to draft the PR for the Playbook 5 Step 1 bullet + cross-references.