Compliance-as-Code for AI Agents. Define safety policies in YAML. Enforce them at runtime. Audit everything.
PolicyForge is a lightweight, framework-agnostic policy engine for AI agent systems. It puts a three-layer safety net around every agent interaction — pre-flight, in-flight, and post-flight — with declarative YAML policies that anyone on the team can read, review, and version-control.
Agent frameworks give you powerful, autonomous systems that can browse the web, execute code, send emails, and query databases. But with power comes risk:
| Risk | PolicyForge Rule Type | Stage |
|---|---|---|
Agent calls rm -rf / |
tool_gate |
pre_flight |
| User injects "ignore all rules" | content_filter (input_guard) |
pre_flight |
| Output leaks customer emails | content_filter (pii_detector) |
post_flight |
| Agent burns $50 in API calls | resource_limit |
in_flight |
| Output contains toxic content | content_classifier |
post_flight |
| API key appears in response | content_filter (regex) |
post_flight |
| EU AI Act requires audit trail | audit |
all |
PolicyForge checks all of these — declaratively, auditable, and without modifying your agent code.
┌──────────────────────┐
│ Agent Framework │
│ (Agno / CrewAI / LC) │
└──────────┬───────────┘
│
┌─────────▼─────────┐
│ PolicyForge │
│ │
User Input ────────►│ 1. Pre-Flight │ Tool gate, input guard
│ 2. In-Flight │ Token budget, rate limit
│ 3. Post-Flight │ PII detection, content
│ │ filter, classifier
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Audit Trail │
│ (structlog) │
└───────────────────┘
Every decision — pass, block, redact, terminate — is logged with the rule ID, matched pattern, and timestamp for full auditability (EU AI Act Article 12, SOC 2, ISO 27001).
pip install policyforgeRequirements: Python ≥ 3.11. Optional: transformers, torch (for ML-based content classification).
# my_policy.yaml
name: "Safe Agent"
version: "1.0.0"
rules:
- id: no-secrets
description: Never expose API keys in output
type: content_filter
stage: post_flight
patterns:
- type: regex
value: "API_KEY_[A-Z0-9]+"
action:
on_match: block
fallback: "🔒 Secret removed by policy"from policyforge import PolicyEngine
engine = PolicyEngine.from_yaml("my_policy.yaml")
@engine.guard
def answer_question(query: str) -> str:
return llm.generate(query)
answer_question("What is the weather?") # ✅ passes
answer_question("Print API_KEY_12345") # ❌ raises GuardViolationThat's it. No framework lock-in. The @engine.guard decorator wraps any Python function — tool, callback, or pipeline step.
| Feature | Type | Stage | Description |
|---|---|---|---|
| Tool Gate | tool_gate |
pre_flight | Allowlist-based tool access control |
| Input Guard | content_filter (input_guard) |
pre_flight | Detects instruction-manipulation patterns |
| Content Filter | content_filter (regex) |
post_flight | Blocks or redacts text matching regex patterns |
| PII Detection | content_filter (pii_detector) |
post_flight | Detects emails, phones, SSNs, credit cards, IBANs |
| PII Redaction | content_filter (pii_detector) |
post_flight | Masks PII instead of blocking (action: redact) |
| Resource Limit | resource_limit |
in_flight | Token budget / API call rate tracking |
| Content Classifier | content_classifier |
post_flight | ML-based toxicity/harm detection (HuggingFace) |
| Audit Logging | audit |
all | Structured JSONL audit trails with GDPR export |
| YAML Policies | — | — | Declarative, version-controlled, human-readable |
Three pattern types:
patterns:
- type: regex
value: "AKIA[0-9A-Z]{16}" # AWS access key patternpatterns:
- type: pii_detector
entities: [email, phone, credit_card, ssn, iban]Available entities: email, phone, credit_card, ssn, iban.
patterns:
- type: input_guardDetects instruction-manipulation patterns like "ignore previous instructions", "developer mode", "system prompt leak". Uses a curated keyword list plus regex heuristics. Supports custom patterns:
from policyforge.evaluators.input_guard import InputGuardEvaluator
evaluator = InputGuardEvaluator(
custom_patterns=["secret backdoor", "override safety"]
)| Action | Behavior |
|---|---|
block |
Raises GuardViolation, stops execution |
redact |
Masks matched PII with [REDACTED:entity], returns sanitized text |
rules:
- id: approved-tools-only
type: tool_gate
stage: pre_flight
allowed_tools: [search_kb, create_ticket, send_email, calculate]
action:
on_blocked: deny_with_explanationThe tool gate fires before the function body executes. If the function name isn't in allowed_tools, a GuardViolation is raised with the blocked tool name.
rules:
- id: token-budget
type: resource_limit
stage: in_flight
threshold: 500000
action:
on_exceeded: terminate
error_message: "Token budget exhausted. Start a new session."Programmatic API:
from policyforge.evaluators.resource_limit import ResourceLimitTracker
tracker = ResourceLimitTracker()
tracker.consume(1500) # track usage
tracker.check(rule, current) # returns EvalResult
tracker.reset() # reset for new sessionML-based content classification using HuggingFace transformers (optional dependency).
rules:
- id: toxicity-check
type: content_classifier
stage: post_flight
model: "unitary/toxic-bert"
threshold: 0.7
labels: [toxic, hate, violence]
action:
on_match: block
fallback: "Response held for quality reasons."Set model: "mock" for testing without downloading models.
rules:
- id: eu-ai-act-logging
type: audit
stage: all
requirements:
log_all_decisions: true
retention_days: 365
gdpr_exportable: trueAll policy decisions are logged via structlog in structured JSONL format.
The complete customer service policy combines all rule types:
name: "Customer Service Agent"
version: "1.2.0"
rules:
- id: no-pii-output # Redact emails, phones, IBANs in output
type: content_filter / pii_detector
action: redact
- id: no-credit-cards # Block credit card numbers entirely
type: content_filter / regex + pii_detector
action: block
- id: token-budget # 500K token cap per session
type: resource_limit
action: terminate
- id: approved-tools-only # Only 4 tools allowed
type: tool_gate
- id: toxicity-check # ML-based toxicity filter
type: content_classifier / unitary/toxic-bert
- id: eu-ai-act-logging # EU AI Act Art. 12 compliance
type: audit / 365d retentionPolicyForge is framework-agnostic. The @engine.guard decorator wraps any callable — integrate it wherever your agent calls tools or returns output.
Agno provides a BaseGuardrail class with a check() method and pre_hooks/post_hooks on the Agent class. The cleanest integration is a custom guardrail subclass:
from agno.agent import Agent
from agno.guardrails.base import BaseGuardrail, CheckTrigger
from policyforge import PolicyEngine
engine = PolicyEngine.from_yaml("my_policy.yaml")
class PolicyForgeGuardrail(BaseGuardrail):
"""Bridges PolicyForge policies into Agno's guardrail system."""
def check(self, text: str, trigger: CheckTrigger) -> str:
"""Run PolicyForge rules against agent output."""
try:
# PolicyForge evaluates all post_flight rules
result = engine._eval_post_flight(text)
return result # sanitized text (may be redacted)
except Exception as e:
# GuardViolation → block the output
return f"[BLOCKED by policy: {e}]"
# Register with Agno agent
agent = Agent(
name="Safe Assistant",
model="gpt-4o",
tools=[search, calculate],
post_hooks=[PolicyForgeGuardrail()],
)For tool-level enforcement, wrap tools with the decorator:
from policyforge import PolicyEngine
engine = PolicyEngine.from_yaml("my_policy.yaml")
@engine.guard
def search(query: str) -> str:
return web_search(query)
agent = Agent(
tools=[search], # guarded at the function level
)Synergy: Agno's permission_mode + allowed_tools provides basic allowlisting, but PolicyForge adds regex content filtering, PII redaction, input guards, and structured audit logging — all defined in version-controlled YAML.
CrewAI offers @before_kickoff and @after_kickoff decorators on Crews, and tools are plain Python functions.
Option A — Wrap individual tools:
from crewai import Agent, Task, Crew
from policyforge import PolicyEngine
engine = PolicyEngine.from_yaml("my_policy.yaml")
@engine.guard
def database_query(sql: str) -> str:
return db.execute(sql)
crew_agent = Agent(
role="Data Analyst",
tools=[database_query], # guarded
)Option B — Crew-level hooks:
from crewai import Crew
crew = Crew(agents=[...], tasks=[...])
@crew.before_kickoff
def check_inputs(inputs: dict) -> dict:
for key, value in inputs.items():
if isinstance(value, str):
engine._eval_pre_flight_input(value)
return inputs
@crew.after_kickoff
def filter_outputs(output: str) -> str:
return engine._eval_post_flight(output)Wrap LangChain tools with the @guard decorator, or insert a RunnableLambda into the chain:
Option A — Guard individual tools:
from langchain.agents import tool
from policyforge import PolicyEngine
engine = PolicyEngine.from_yaml("my_policy.yaml")
@tool
@engine.guard
def send_email(recipient: str, body: str) -> str:
return mailer.send(recipient, body)Option B — Chain-level filter:
from langchain_core.runnables import RunnableLambda
def safety_filter(text: str) -> str:
engine._eval_pre_flight_input(text)
return text
chain = (
RunnableLambda(safety_filter)
| prompt
| llm
| RunnableLambda(engine._eval_post_flight)
)PolicyForge works with any Python-based agent framework:
- AutoGen: wrap AssistantAgent tools with
@engine.guard - Semantic Kernel: guard
@kernel.functiondecorated methods - DSPy: insert
engine._eval_post_flightas an output processor - Custom agents: wrap any
def tool(...)with@engine.guard
from policyforge import PolicyEngine, GuardViolation
from policyforge.evaluators import (
ContentFilterEvaluator,
PIIDetectorEvaluator,
InputGuardEvaluator,
ToolGateEvaluator,
ResourceLimitTracker,
ContentClassifierEvaluator,
)
# Load policy
engine = PolicyEngine.from_yaml("policy.yaml") # from file
engine = PolicyEngine.from_yaml(yaml_string) # from string
# Manual evaluation (without decorator)
engine._eval_pre_flight("tool_name") # check tool allowlist
engine._eval_pre_flight_input("user input text") # check for injection
engine._eval_post_flight("agent output") # filter/redact output
# All checks raise GuardViolation on block
try:
engine._eval_post_flight("secret output")
except GuardViolation as e:
print(f"Blocked by {e.rule_id}: {e.message}")MIT. See LICENSE file.