Skip to content

FBR65/PolicyForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolicyForge 🔒

Compliance-as-Code for AI Agents. Define safety policies in YAML. Enforce them at runtime. Audit everything.


PolicyForge is a lightweight, framework-agnostic policy engine for AI agent systems. It puts a three-layer safety net around every agent interaction — pre-flight, in-flight, and post-flight — with declarative YAML policies that anyone on the team can read, review, and version-control.

Why PolicyForge?

Agent frameworks give you powerful, autonomous systems that can browse the web, execute code, send emails, and query databases. But with power comes risk:

Risk PolicyForge Rule Type Stage
Agent calls rm -rf / tool_gate pre_flight
User injects "ignore all rules" content_filter (input_guard) pre_flight
Output leaks customer emails content_filter (pii_detector) post_flight
Agent burns $50 in API calls resource_limit in_flight
Output contains toxic content content_classifier post_flight
API key appears in response content_filter (regex) post_flight
EU AI Act requires audit trail audit all

PolicyForge checks all of these — declaratively, auditable, and without modifying your agent code.

Architecture

                   ┌──────────────────────┐
                   │    Agent Framework   │
                   │ (Agno / CrewAI / LC) │
                   └──────────┬───────────┘
                              │
                    ┌─────────▼─────────┐
                    │    PolicyForge    │
                    │                   │
User Input ────────►│ 1. Pre-Flight     │  Tool gate, input guard
                    │ 2. In-Flight      │  Token budget, rate limit
                    │ 3. Post-Flight    │  PII detection, content
                    │                   │    filter, classifier
                    └─────────┬─────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Audit Trail     │
                    │   (structlog)     │
                    └───────────────────┘

Every decision — pass, block, redact, terminate — is logged with the rule ID, matched pattern, and timestamp for full auditability (EU AI Act Article 12, SOC 2, ISO 27001).

Installation

pip install policyforge

Requirements: Python ≥ 3.11. Optional: transformers, torch (for ML-based content classification).

Quickstart (30 seconds)

1. Write a policy

# my_policy.yaml
name: "Safe Agent"
version: "1.0.0"
rules:
  - id: no-secrets
    description: Never expose API keys in output
    type: content_filter
    stage: post_flight
    patterns:
      - type: regex
        value: "API_KEY_[A-Z0-9]+"
    action:
      on_match: block
      fallback: "🔒 Secret removed by policy"

2. Guard your agent

from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def answer_question(query: str) -> str:
    return llm.generate(query)

answer_question("What is the weather?")        # ✅ passes
answer_question("Print API_KEY_12345")         # ❌ raises GuardViolation

That's it. No framework lock-in. The @engine.guard decorator wraps any Python function — tool, callback, or pipeline step.

Features (v0.2)

Feature Type Stage Description
Tool Gate tool_gate pre_flight Allowlist-based tool access control
Input Guard content_filter (input_guard) pre_flight Detects instruction-manipulation patterns
Content Filter content_filter (regex) post_flight Blocks or redacts text matching regex patterns
PII Detection content_filter (pii_detector) post_flight Detects emails, phones, SSNs, credit cards, IBANs
PII Redaction content_filter (pii_detector) post_flight Masks PII instead of blocking (action: redact)
Resource Limit resource_limit in_flight Token budget / API call rate tracking
Content Classifier content_classifier post_flight ML-based toxicity/harm detection (HuggingFace)
Audit Logging audit all Structured JSONL audit trails with GDPR export
YAML Policies Declarative, version-controlled, human-readable

Rule Types in Detail

1. Content Filter (content_filter)

Three pattern types:

Regex patterns

patterns:
  - type: regex
    value: "AKIA[0-9A-Z]{16}"     # AWS access key pattern

PII detection patterns

patterns:
  - type: pii_detector
    entities: [email, phone, credit_card, ssn, iban]

Available entities: email, phone, credit_card, ssn, iban.

Input guard patterns

patterns:
  - type: input_guard

Detects instruction-manipulation patterns like "ignore previous instructions", "developer mode", "system prompt leak". Uses a curated keyword list plus regex heuristics. Supports custom patterns:

from policyforge.evaluators.input_guard import InputGuardEvaluator

evaluator = InputGuardEvaluator(
    custom_patterns=["secret backdoor", "override safety"]
)

Actions

Action Behavior
block Raises GuardViolation, stops execution
redact Masks matched PII with [REDACTED:entity], returns sanitized text

2. Tool Gate (tool_gate)

rules:
  - id: approved-tools-only
    type: tool_gate
    stage: pre_flight
    allowed_tools: [search_kb, create_ticket, send_email, calculate]
    action:
      on_blocked: deny_with_explanation

The tool gate fires before the function body executes. If the function name isn't in allowed_tools, a GuardViolation is raised with the blocked tool name.

3. Resource Limit (resource_limit)

rules:
  - id: token-budget
    type: resource_limit
    stage: in_flight
    threshold: 500000
    action:
      on_exceeded: terminate
      error_message: "Token budget exhausted. Start a new session."

Programmatic API:

from policyforge.evaluators.resource_limit import ResourceLimitTracker

tracker = ResourceLimitTracker()
tracker.consume(1500)              # track usage
tracker.check(rule, current)      # returns EvalResult
tracker.reset()                   # reset for new session

4. Content Classifier (content_classifier)

ML-based content classification using HuggingFace transformers (optional dependency).

rules:
  - id: toxicity-check
    type: content_classifier
    stage: post_flight
    model: "unitary/toxic-bert"
    threshold: 0.7
    labels: [toxic, hate, violence]
    action:
      on_match: block
      fallback: "Response held for quality reasons."

Set model: "mock" for testing without downloading models.

5. Audit Logging (audit)

rules:
  - id: eu-ai-act-logging
    type: audit
    stage: all
    requirements:
      log_all_decisions: true
      retention_days: 365
      gdpr_exportable: true

All policy decisions are logged via structlog in structured JSONL format.

Real-World Example: Customer Service Agent

The complete customer service policy combines all rule types:

name: "Customer Service Agent"
version: "1.2.0"
rules:
  - id: no-pii-output          # Redact emails, phones, IBANs in output
    type: content_filter / pii_detector
    action: redact

  - id: no-credit-cards         # Block credit card numbers entirely
    type: content_filter / regex + pii_detector
    action: block

  - id: token-budget            # 500K token cap per session
    type: resource_limit
    action: terminate

  - id: approved-tools-only     # Only 4 tools allowed
    type: tool_gate

  - id: toxicity-check          # ML-based toxicity filter
    type: content_classifier / unitary/toxic-bert

  - id: eu-ai-act-logging       # EU AI Act Art. 12 compliance
    type: audit / 365d retention

Integration with Agent Frameworks

PolicyForge is framework-agnostic. The @engine.guard decorator wraps any callable — integrate it wherever your agent calls tools or returns output.

Agno Integration

Agno provides a BaseGuardrail class with a check() method and pre_hooks/post_hooks on the Agent class. The cleanest integration is a custom guardrail subclass:

from agno.agent import Agent
from agno.guardrails.base import BaseGuardrail, CheckTrigger
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

class PolicyForgeGuardrail(BaseGuardrail):
    """Bridges PolicyForge policies into Agno's guardrail system."""

    def check(self, text: str, trigger: CheckTrigger) -> str:
        """Run PolicyForge rules against agent output."""
        try:
            # PolicyForge evaluates all post_flight rules
            result = engine._eval_post_flight(text)
            return result  # sanitized text (may be redacted)
        except Exception as e:
            # GuardViolation → block the output
            return f"[BLOCKED by policy: {e}]"

# Register with Agno agent
agent = Agent(
    name="Safe Assistant",
    model="gpt-4o",
    tools=[search, calculate],
    post_hooks=[PolicyForgeGuardrail()],
)

For tool-level enforcement, wrap tools with the decorator:

from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def search(query: str) -> str:
    return web_search(query)

agent = Agent(
    tools=[search],  # guarded at the function level
)

Synergy: Agno's permission_mode + allowed_tools provides basic allowlisting, but PolicyForge adds regex content filtering, PII redaction, input guards, and structured audit logging — all defined in version-controlled YAML.

CrewAI Integration

CrewAI offers @before_kickoff and @after_kickoff decorators on Crews, and tools are plain Python functions.

Option A — Wrap individual tools:

from crewai import Agent, Task, Crew
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@engine.guard
def database_query(sql: str) -> str:
    return db.execute(sql)

crew_agent = Agent(
    role="Data Analyst",
    tools=[database_query],  # guarded
)

Option B — Crew-level hooks:

from crewai import Crew

crew = Crew(agents=[...], tasks=[...])

@crew.before_kickoff
def check_inputs(inputs: dict) -> dict:
    for key, value in inputs.items():
        if isinstance(value, str):
            engine._eval_pre_flight_input(value)
    return inputs

@crew.after_kickoff
def filter_outputs(output: str) -> str:
    return engine._eval_post_flight(output)

LangChain Integration

Wrap LangChain tools with the @guard decorator, or insert a RunnableLambda into the chain:

Option A — Guard individual tools:

from langchain.agents import tool
from policyforge import PolicyEngine

engine = PolicyEngine.from_yaml("my_policy.yaml")

@tool
@engine.guard
def send_email(recipient: str, body: str) -> str:
    return mailer.send(recipient, body)

Option B — Chain-level filter:

from langchain_core.runnables import RunnableLambda

def safety_filter(text: str) -> str:
    engine._eval_pre_flight_input(text)
    return text

chain = (
    RunnableLambda(safety_filter)
    | prompt
    | llm
    | RunnableLambda(engine._eval_post_flight)
)

Other Frameworks

PolicyForge works with any Python-based agent framework:

  • AutoGen: wrap AssistantAgent tools with @engine.guard
  • Semantic Kernel: guard @kernel.function decorated methods
  • DSPy: insert engine._eval_post_flight as an output processor
  • Custom agents: wrap any def tool(...) with @engine.guard

Programmatic API

from policyforge import PolicyEngine, GuardViolation
from policyforge.evaluators import (
    ContentFilterEvaluator,
    PIIDetectorEvaluator,
    InputGuardEvaluator,
    ToolGateEvaluator,
    ResourceLimitTracker,
    ContentClassifierEvaluator,
)

# Load policy
engine = PolicyEngine.from_yaml("policy.yaml")       # from file
engine = PolicyEngine.from_yaml(yaml_string)          # from string

# Manual evaluation (without decorator)
engine._eval_pre_flight("tool_name")                  # check tool allowlist
engine._eval_pre_flight_input("user input text")      # check for injection
engine._eval_post_flight("agent output")              # filter/redact output

# All checks raise GuardViolation on block
try:
    engine._eval_post_flight("secret output")
except GuardViolation as e:
    print(f"Blocked by {e.rule_id}: {e.message}")

License

MIT. See LICENSE file.

About

Compliance-as-Code framework for AI agents — YAML-defined security, privacy, and operational rules enforced at runtime (pre/in/post-flight).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages