PolicyForge 🔒

Compliance-as-Code for AI Agents. Define safety policies in YAML. Enforce them at runtime. Audit everything.

PolicyForge is a lightweight, framework-agnostic policy engine for AI agent systems. It puts a three-layer safety net around every agent interaction — pre-flight, in-flight, and post-flight — with declarative YAML policies that anyone on the team can read, review, and version-control.

Why PolicyForge?

Agent frameworks give you powerful, autonomous systems that can browse the web, execute code, send emails, and query databases. But with power comes risk:

Risk	PolicyForge Rule Type	Stage
Agent calls `rm -rf /`	`tool_gate`	pre_flight
User injects "ignore all rules"	`content_filter` (input_guard)	pre_flight
Output leaks customer emails	`content_filter` (pii_detector)	post_flight
Agent burns $50 in API calls	`resource_limit`	in_flight
Output contains toxic content	`content_classifier`	post_flight
API key appears in response	`content_filter` (regex)	post_flight
EU AI Act requires audit trail	`audit`	all

PolicyForge checks all of these — declaratively, auditable, and without modifying your agent code.

Architecture

                   ┌──────────────────────┐
                   │    Agent Framework   │
                   │ (Agno / CrewAI / LC) │
                   └──────────┬───────────┘
                              │
                    ┌─────────▼─────────┐
                    │    PolicyForge    │
                    │                   │
User Input ────────►│ 1. Pre-Flight     │  Tool gate, input guard
                    │ 2. In-Flight      │  Token budget, rate limit
                    │ 3. Post-Flight    │  PII detection, content
                    │                   │    filter, classifier
                    └─────────┬─────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Audit Trail     │
                    │   (structlog)     │
                    └───────────────────┘

Every decision — pass, block, redact, terminate — is logged with the rule ID, matched pattern, and timestamp for full auditability (EU AI Act Article 12, SOC 2, ISO 27001).

Installation

pip install policyforge

Requirements: Python ≥ 3.11. Optional: transformers, torch (for ML-based content classification).

Quickstart (30 seconds)

1. Write a policy

# my_policy.yaml
name: "Safe Agent"
version: "1.0.0"
rules:
  - id: no-secrets
    description: Never expose API keys in output
    type: content_filter
    stage: post_flight
    patterns:
      - type: regex
        value: "API_KEY_[A-Z0-9]+"
    action:
      on_match: block
      fallback: "🔒 Secret removed by policy"

2. Guard your agent

from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def answer_question(query: str) -> str:
    return llm.generate(query)

answer_question("What is the weather?")        # ✅ passes
answer_question("Print API_KEY_12345")         # ❌ raises GuardViolation

That's it. No framework lock-in. The @engine.guard decorator wraps any Python function — tool, callback, or pipeline step.

Features (v0.2)

Feature	Type	Stage	Description
Tool Gate	`tool_gate`	pre_flight	Allowlist-based tool access control
Input Guard	`content_filter` (input_guard)	pre_flight	Detects instruction-manipulation patterns
Content Filter	`content_filter` (regex)	post_flight	Blocks or redacts text matching regex patterns
PII Detection	`content_filter` (pii_detector)	post_flight	Detects emails, phones, SSNs, credit cards, IBANs
PII Redaction	`content_filter` (pii_detector)	post_flight	Masks PII instead of blocking (`action: redact`)
Resource Limit	`resource_limit`	in_flight	Token budget / API call rate tracking
Content Classifier	`content_classifier`	post_flight	ML-based toxicity/harm detection (HuggingFace)
Audit Logging	`audit`	all	Structured JSONL audit trails with GDPR export
YAML Policies	—	—	Declarative, version-controlled, human-readable

Rule Types in Detail

1. Content Filter (`content_filter`)

Three pattern types:

Regex patterns

patterns:
  - type: regex
    value: "AKIA[0-9A-Z]{16}"     # AWS access key pattern

PII detection patterns

patterns:
  - type: pii_detector
    entities: [email, phone, credit_card, ssn, iban]

Available entities: email, phone, credit_card, ssn, iban.

Input guard patterns

patterns:
  - type: input_guard

Detects instruction-manipulation patterns like "ignore previous instructions", "developer mode", "system prompt leak". Uses a curated keyword list plus regex heuristics. Supports custom patterns:

from policyforge.evaluators.input_guard import InputGuardEvaluator

evaluator = InputGuardEvaluator(
    custom_patterns=["secret backdoor", "override safety"]
)

Actions

Action	Behavior
`block`	Raises `GuardViolation`, stops execution
`redact`	Masks matched PII with `[REDACTED:entity]`, returns sanitized text

2. Tool Gate (`tool_gate`)

rules:
  - id: approved-tools-only
    type: tool_gate
    stage: pre_flight
    allowed_tools: [search_kb, create_ticket, send_email, calculate]
    action:
      on_blocked: deny_with_explanation

The tool gate fires before the function body executes. If the function name isn't in allowed_tools, a GuardViolation is raised with the blocked tool name.

3. Resource Limit (`resource_limit`)

rules:
  - id: token-budget
    type: resource_limit
    stage: in_flight
    threshold: 500000
    action:
      on_exceeded: terminate
      error_message: "Token budget exhausted. Start a new session."

Programmatic API:

from policyforge.evaluators.resource_limit import ResourceLimitTracker

tracker = ResourceLimitTracker()
tracker.consume(1500)              # track usage
tracker.check(rule, current)      # returns EvalResult
tracker.reset()                   # reset for new session

4. Content Classifier (`content_classifier`)

ML-based content classification using HuggingFace transformers (optional dependency).

rules:
  - id: toxicity-check
    type: content_classifier
    stage: post_flight
    model: "unitary/toxic-bert"
    threshold: 0.7
    labels: [toxic, hate, violence]
    action:
      on_match: block
      fallback: "Response held for quality reasons."

Set model: "mock" for testing without downloading models.

5. Audit Logging (`audit`)

rules:
  - id: eu-ai-act-logging
    type: audit
    stage: all
    requirements:
      log_all_decisions: true
      retention_days: 365
      gdpr_exportable: true

All policy decisions are logged via structlog in structured JSONL format.

Real-World Example: Customer Service Agent

The complete customer service policy combines all rule types:

name: "Customer Service Agent"
version: "1.2.0"
rules:
  - id: no-pii-output          # Redact emails, phones, IBANs in output
    type: content_filter / pii_detector
    action: redact

  - id: no-credit-cards         # Block credit card numbers entirely
    type: content_filter / regex + pii_detector
    action: block

  - id: token-budget            # 500K token cap per session
    type: resource_limit
    action: terminate

  - id: approved-tools-only     # Only 4 tools allowed
    type: tool_gate

  - id: toxicity-check          # ML-based toxicity filter
    type: content_classifier / unitary/toxic-bert

  - id: eu-ai-act-logging       # EU AI Act Art. 12 compliance
    type: audit / 365d retention

Integration with Agent Frameworks

PolicyForge is framework-agnostic. The @engine.guard decorator wraps any callable — integrate it wherever your agent calls tools or returns output.

Agno Integration

Agno provides a BaseGuardrail class with a check() method and pre_hooks/post_hooks on the Agent class. The cleanest integration is a custom guardrail subclass:

from agno.agent import Agent
from agno.guardrails.base import BaseGuardrail, CheckTrigger
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

class PolicyForgeGuardrail(BaseGuardrail):
    """Bridges PolicyForge policies into Agno's guardrail system."""

    def check(self, text: str, trigger: CheckTrigger) -> str:
        """Run PolicyForge rules against agent output."""
        try:
            # PolicyForge evaluates all post_flight rules
            result = engine._eval_post_flight(text)
            return result  # sanitized text (may be redacted)
        except Exception as e:
            # GuardViolation → block the output
            return f"[BLOCKED by policy: {e}]"

# Register with Agno agent
agent = Agent(
    name="Safe Assistant",
    model="gpt-4o",
    tools=[search, calculate],
    post_hooks=[PolicyForgeGuardrail()],
)

For tool-level enforcement, wrap tools with the decorator:

from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def search(query: str) -> str:
    return web_search(query)

agent = Agent(
    tools=[search],  # guarded at the function level
)

Synergy: Agno's permission_mode + allowed_tools provides basic allowlisting, but PolicyForge adds regex content filtering, PII redaction, input guards, and structured audit logging — all defined in version-controlled YAML.

CrewAI Integration

CrewAI offers @before_kickoff and @after_kickoff decorators on Crews, and tools are plain Python functions.

Option A — Wrap individual tools:

from crewai import Agent, Task, Crew
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def database_query(sql: str) -> str:
    return db.execute(sql)

crew_agent = Agent(
    role="Data Analyst",
    tools=[database_query],  # guarded
)

Option B — Crew-level hooks:

from crewai import Crew

crew = Crew(agents=[...], tasks=[...])

@crew.before_kickoff
def check_inputs(inputs: dict) -> dict:
    for key, value in inputs.items():
        if isinstance(value, str):
            engine._eval_pre_flight_input(value)
    return inputs

@crew.after_kickoff
def filter_outputs(output: str) -> str:
    return engine._eval_post_flight(output)

LangChain Integration

Wrap LangChain tools with the @guard decorator, or insert a RunnableLambda into the chain:

Option A — Guard individual tools:

from langchain.agents import tool
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@tool
@engine.guard
def send_email(recipient: str, body: str) -> str:
    return mailer.send(recipient, body)

Option B — Chain-level filter:

from langchain_core.runnables import RunnableLambda

def safety_filter(text: str) -> str:
    engine._eval_pre_flight_input(text)
    return text

chain = (
    RunnableLambda(safety_filter)
    | prompt
    | llm
    | RunnableLambda(engine._eval_post_flight)
)

Other Frameworks

PolicyForge works with any Python-based agent framework:

AutoGen: wrap AssistantAgent tools with @engine.guard
Semantic Kernel: guard @kernel.function decorated methods
DSPy: insert engine._eval_post_flight as an output processor
Custom agents: wrap any def tool(...) with @engine.guard

Programmatic API

from policyforge import PolicyEngine, GuardViolation
from policyforge.evaluators import (
    ContentFilterEvaluator,
    PIIDetectorEvaluator,
    InputGuardEvaluator,
    ToolGateEvaluator,
    ResourceLimitTracker,
    ContentClassifierEvaluator,
)

# Load policy
engine = PolicyEngine.from_yaml("policy.yaml")       # from file
engine = PolicyEngine.from_yaml(yaml_string)          # from string

# Manual evaluation (without decorator)
engine._eval_pre_flight("tool_name")                  # check tool allowlist
engine._eval_pre_flight_input("user input text")      # check for injection
engine._eval_post_flight("agent output")              # filter/redact output

# All checks raise GuardViolation on block
try:
    engine._eval_post_flight("secret output")
except GuardViolation as e:
    print(f"Blocked by {e.rule_id}: {e.message}")

License

MIT. See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs/plans		docs/plans
examples		examples
src/policyforge		src/policyforge
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolicyForge 🔒

Why PolicyForge?

Architecture

Installation

Quickstart (30 seconds)

1. Write a policy

2. Guard your agent

Features (v0.2)

Rule Types in Detail

1. Content Filter (`content_filter`)

Regex patterns

PII detection patterns

Input guard patterns

Actions

2. Tool Gate (`tool_gate`)

3. Resource Limit (`resource_limit`)

4. Content Classifier (`content_classifier`)

5. Audit Logging (`audit`)

Real-World Example: Customer Service Agent

Integration with Agent Frameworks

Agno Integration

CrewAI Integration

LangChain Integration

Other Frameworks

Programmatic API

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PolicyForge 🔒

Why PolicyForge?

Architecture

Installation

Quickstart (30 seconds)

1. Write a policy

2. Guard your agent

Features (v0.2)

Rule Types in Detail

1. Content Filter (content_filter)

Regex patterns

PII detection patterns

Input guard patterns

Actions

2. Tool Gate (tool_gate)

3. Resource Limit (resource_limit)

4. Content Classifier (content_classifier)

5. Audit Logging (audit)

Real-World Example: Customer Service Agent

Integration with Agent Frameworks

Agno Integration

CrewAI Integration

LangChain Integration

Other Frameworks

Programmatic API

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Content Filter (`content_filter`)

2. Tool Gate (`tool_gate`)

3. Resource Limit (`resource_limit`)

4. Content Classifier (`content_classifier`)

5. Audit Logging (`audit`)

Packages