Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@
"version": "0.1.0",
"category": "ruby",
"keywords": ["ruby", "bundler", "gem", "dependencies"]
},
{
"name": "security",
"source": "./plugins/security",
"description": "White-box, dynamically-verified security audit. /security:audit recons a repo, hunts OWASP Top 10:2025 vulnerabilities, proves them with live PoCs in isolated worktrees, and writes a high-signal senior-engineer report.",
"version": "0.1.0",
"category": "security",
"keywords": ["security", "pentest", "vulnerability", "audit", "owasp", "appsec"]
}
]
}
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ If the plugin's commands don't show up in the `/` menu, run `/reload-plugins`.
| Plugin | Description |
| --- | --- |
| [gem](plugins/gem) | Ruby gem helpers. Includes `/gem:bump` for changelog-rich dependency bumps. |
| [security](plugins/security) | Dynamically-verified security audit. `/security:audit` proves vulnerabilities with live PoCs and writes a senior-engineer report. |

## Developing plugins

Expand Down
13 changes: 13 additions & 0 deletions plugins/security/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"name": "security",
"version": "0.1.0",
"description": "White-box, dynamically-verified security audit. /security:audit recons a repo, hunts vulnerabilities across the OWASP Top 10:2025 classes, proves them with live PoCs in isolated worktrees, and writes a high-signal senior-engineer report.",
"author": {
"name": "84codes",
"url": "https://github.com/84codes"
},
"homepage": "https://github.com/84codes/claude-plugins/tree/main/plugins/security",
"repository": "https://github.com/84codes/claude-plugins",
"license": "MIT",
"keywords": ["security", "pentest", "vulnerability", "audit", "owasp", "appsec", "sast"]
}
197 changes: 197 additions & 0 deletions plugins/security/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# vuln-audit — agent & design spec

A Claude Code **skill + workflow** that runs a white-box, dynamically-verified
security audit of a target repository: a
multi-phase pipeline (recon → triage → deep review → adversarial verify →
dynamic repro → report) that produces **proven, high-signal findings with
patches**, not speculative noise.

> Read this file before touching the workflow or prompts. It is the source of
> truth for the data contracts, taxonomy, severity model, and signal policy.

## Invocation

```
/security:audit /path/to/target-repo [--no-dynamic] [--classes injection,ssrf] [--out <dir>]
```

The skill (`skills/audit/SKILL.md`) is the agent-facing entry point. It parses the
target, picks a writable `outDir`, preflights host capabilities, then calls the
workflow (`workflows/vuln-audit.js`) with everything assembled in `args` —
`toolRoot` = `${CLAUDE_PLUGIN_ROOT}` (read-only, holds the prompts), `outDir` =
where the bundle is written.

## Pipeline

| Phase | What | Primitive |
|-------|------|-----------|
| 1. Recon | Detect stack, map attack surface & trust boundaries, pick run strategy, select relevant finder classes | single agent (`prompts/recon.md`) |
| 2. Triage | One finder per vuln class scans its surface, emits candidate findings | `parallel()` finders |
| 3. Dedup | Collapse same-root-cause findings across call sites | plain JS in the workflow |
| 4. Deep review | Re-examine each candidate with surrounding context (callers, sanitizers, related files); confirm a reachable source→sink path | `pipeline()` stage |
| 5. Adversarial verify | Independent skeptics, each a distinct lens, try to **refute** the finding; majority-refute kills it | `parallel()` skeptic panel |
| 6. Dynamic repro | Survivors are built & run in an isolated git **worktree** (docker-first); a real PoC is fired and impact observed | `agent(..., {isolation:'worktree'})` |
| 7. Report | Synthesize the senior-engineer report (`prompts/report-template.md`) | single agent |

## Reference evaluation (why we adopt what we adopt)

- **Anthropic security-guidance** (`code.claude.com/docs/en/security-guidance`)
— **adopt methodology.** Validates our core moves: (a) review independence —
the reviewer is a *fresh-context* agent, never the author, "instructed only to
find problems"; (b) read callers/sanitizers/related files before reporting to
keep false positives low. Our tool is the deepest layer: in-session plugin →
`/security-review` (branch) → Code Review (PR) → **vuln-audit (on-demand,
dynamically verified PoCs)**. We honor its extension convention: if the target
has a `.claude/claude-security-guidance.md`, we load it as extra threat-model
context.
- **OWASP Top 10:2025** — **adopt as primary taxonomy.** Current edition; new
categories A03 Software Supply Chain Failures and A10 Mishandling of
Exceptional Conditions; SSRF folded into A01. Every finder maps to a 2025 ID.
- **OWASP ASVS v5.0** (17 chapters, ~350 reqs) — **reference only, not a walked
checklist.** Walking 350 requirements is exactly the low-signal sidetrack we
avoid. Used two ways: (a) coverage map so the finder taxonomy has no blind
spots; (b) cite a requirement/chapter ID in findings as a terse, authoritative
reference for senior readers.
- **OSSF Scorecard** — **partial adopt, code-exploitable checks only.** Scorecard
scores project *hygiene/posture* (Maintained, License, SBOM, Security-Policy,
Contributors) — out of scope for findings. But its CI/CD checks ARE real
exploitable issues and feed our `supply-chain` finder: Dangerous-Workflow
(`pull_request_target` + untrusted checkout, `${{ }}` script injection),
Token-Permissions (over-broad `GITHUB_TOKEN`), Pinned-Dependencies (unpinned
actions/deps), Vulnerabilities (known-vuln deps via OSV). Posture/process
checks are relegated to the Info appendix, never the high-priority body.

## Vuln-class taxonomy (finders)

Each maps to OWASP Top 10:2025 + CWE + an ASVS v5.0 chapter. One prompt file per
class under `prompts/finders/<key>.md`.

| key | title | OWASP 2025 | ASVS |
|-----|-------|-----------|------|
| access-control | Broken Access Control & IDOR | A01 | V8 |
| ssrf | Server-Side Request Forgery | A01 | V4 |
| injection | Injection (SQL/NoSQL/OS/LDAP) | A05 | V1/V2 |
| xss-ssti | XSS & Template Injection | A05 | V1/V3 |
| auth-session | Authentication & Session | A07 | V6/V7/V9/V10 |
| crypto | Cryptographic Failures | A04 | V11 |
| deserialization | Insecure Deserialization & Integrity | A08 | V2/V15 |
| path-file | Path Traversal & File Handling | A01 | V5 |
| secrets | Hardcoded Secrets & Credentials | A02 | V14 |
| misconfig | Security Misconfiguration | A02 | V13 |
| supply-chain | Software Supply Chain & CI/CD | A03 | V15 |
| logging-errors | Logging, Error & Exception Handling | A09/A10 | V16 |
| dos-redos | Denial of Service & ReDoS | A06 | V2 |
| csrf-cors | CSRF, CORS & Clickjacking | A01 | V3 |

Insecure Design (A06) is cross-cutting and handled in recon/synthesis, not a
grep-able finder.

## Data contracts

### Finding (finders + deep review)
`id` · `title` · `vuln_class` · `owasp` (A0x:2025) · `cwe` · `asvs` ·
`severity` (critical|high|medium|low|info) · `status` (confirmed|likely|triage) ·
`confidence` (low|medium|high) · `file` · `line` · `end_line` ·
`code_excerpt` · `source` (untrusted origin) · `sink` (dangerous op) ·
`data_flow` (source→sink, sanitizers noted) · `sanitizers_checked` (mitigations
verified absent/ineffective — the FP guard) · `rationale` · `exploit_sketch` ·
`dynamic_poc_plan` · `proposed_fix` (high-level direction of the change, not a
patch — implementation is left to whoever takes the issue).

After the pipeline, each finding is also stamped with `fp` (stable fingerprint =
`djb2(vuln_class | file | sink)`, the cross-scan dedup key), `display_id`
(`<slug>-<CLASS>-<fp4>`, provisional until the courier swaps in the GitHub issue
number), `status`, `kept`, `reject_reason`, `verdicts`, and `repro`.

### Verdict (adversarial verify)
`finding_id` · `lens` · `refuted` (bool) · `confidence` · `reasoning`.

### Repro (dynamic verify)
`finding_id` · `reproduced` (bool) · `method`
(live-exploit|unit-test|build-only|static-poc) · `environment` ·
`setup_commands` · `poc` · `observed` (evidence) · `impact` · `notes`.

## Severity model (exploitability × impact)

- **Critical** — remote, unauth → RCE / full data breach / auth bypass; reachable.
- **High** — low barrier (authenticated or realistic conditions); significant
impact (priv-esc, sensitive data, injection with a real sink).
- **Medium** — unusual conditions or limited impact, or partial mitigations.
- **Low** — minor info leak, defense-in-depth gap, hard to exploit.
- **Info** — hygiene/posture, no direct exploit path.

`status` is orthogonal and drives report placement: **confirmed** (dynamically
reproduced or statically proven + survived verify), **likely** (strong proof, no
live repro), **triage** (unverified / split verdicts). Only confirmed+likely go
in the report body; triage goes to an appendix.

## Signal discipline (the anti-noise contract)

The report is for senior engineers. Stay high-signal — enforced in deep review
and verify:

- Report only issues with a **reachable** path from untrusted input to a
dangerous sink. Check for sanitizers/validators/authz on the path first; if
present and effective, drop it.
- No style/lint nits. No generic "defense-in-depth" without a concrete sink. No
unreachable/dead code.
- Posture/process items (missing SECURITY.md, SBOM, license, maintainership) →
Info appendix only, never the body.
- Dedup: one finding per root cause, list N locations.
- Prefer few proven findings over many speculative ones. Every High+ finding
carries a PoC or an explicit source→sink trace.

## Layout

```
.claude-plugin/plugin.json # plugin manifest (name: security)
skills/audit/SKILL.md # agent-facing orchestrator (/security:audit)
workflows/vuln-audit.js # the Workflow script (the engine)
prompts/recon.md # phase-1 recon prompt
prompts/finders/<key>.md # one finder prompt per vuln class
prompts/playbooks/<key>.md # per-ecosystem build/run/exploit playbook
prompts/report-template.md # the report format (phase 7)
docs/issue-tracking.md # output bundle → GitHub issues + naming rules
```
(The output bundle is written to a writable `outDir`, NOT into the plugin root,
which is read-only/ephemeral.)

Schemas live inline in the workflow (the JS sandbox has no filesystem access at
runtime); prose content lives in `prompts/` so it is editable without touching
the script, and is passed into the workflow via `args`.

## Output bundle (VM → courier handoff)

The scan runs on a VM and emits a self-contained **bundle** at
`reports/<slug>/`; a separate "courier" agent SSHes in, fetches it, and files the
issues (the courier holds the only GitHub creds — the VM holds none). Bundle:

- `report.md` — the human report (findings referenced by `display_id`).
- `findings.json` — the structured findings array, verbatim; the machine
interface the courier reconciles against, **keyed by `fp`**.
- `manifest.json` — `{ tool, schema, repo (owner/repo), target_path, ref,
commit, slug, date, dynamic, classes_assessed, counts }`; `repo` tells the
courier where to file.
- `evidence/` — optional captured PoC output (repro evidence also lives inline
in `findings.json`).

**Issue tracking & the vulnerability ID/naming rules** (scan epic → finding
sub-issues, reconcile by `fp`, `display_id` = `<slug>-<CLASS>-<issue#>`, the
courier emitter, and what each host needs) live in
[`docs/issue-tracking.md`](docs/issue-tracking.md) — the portable source of truth
that travels with the repo.

## Runtime notes (gotchas)

- **`args` arrives as a JSON string.** The Workflow runtime delivers the `args`
payload to the script as a JSON *string*, not a parsed object (verified
empirically). `vuln-audit.js` normalizes it (`typeof args === 'string' ?
JSON.parse(args) : args`) before reading any input — do not remove this.
- **Invoke by `scriptPath`, not `name`, mid-session.** Named-workflow discovery
only registers files that existed at session start.
- **Subagents have full tools** (Read/Grep/Bash/Write/ast-grep, and web via
ToolSearch) and operate on the *target*; only the orchestration JS is
sandboxed. Dynamic repro creates its own `git worktree` of the target — the
`isolation:'worktree'` option is about the tool repo and is not used here.
- **Host adaptivity:** pass `hostNotes` so recon picks a runnable strategy
(docker vs native) the host can actually execute.
85 changes: 85 additions & 0 deletions plugins/security/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# security (vulnerability audit)

A white-box, **dynamically-verified** security-audit plugin for internal
pentests. `/security:audit` points at a repo you own, recons it, hunts
vulnerabilities across the OWASP Top 10:2025 classes, **proves them with live
PoCs in isolated git worktrees**, and writes a terse, senior-engineer report —
proven findings with a high-level proposed fix, not speculative noise.

## Install

```
/plugin marketplace add 84codes/claude-plugins
/plugin install security@84codes
```

Then run `/reload-plugins` if the command doesn't appear.

## Usage

```
/security:audit /abs/path/to/target-repo
/security:audit /abs/path/to/target-repo --no-dynamic
/security:audit /abs/path/to/target-repo --classes injection,ssrf,access-control --ref v1.2.0
/security:audit /abs/path/to/target-repo --out /abs/writable/dir
```

The first argument is the path to the target repo (required). The flags:

| Flag | Meaning |
|------|---------|
| `--no-dynamic` | Skip the build/run/PoC phase — static review + adversarial verify only. |
| `--classes` | Comma-separated vuln-class keys to restrict the audit to (e.g. `injection,ssrf,access-control`; see [`AGENTS.md`](AGENTS.md) for the full taxonomy). Default: classes picked by recon. |
| `--ref` | Git ref to audit. Default: `HEAD`. |
| `--out` | Writable directory for the output bundle. Default: `<cwd>/vuln-audit-reports`. |

The output **bundle** is written to `<out>/<slug>/`: `report.md` +
`findings.json` + `manifest.json`.

## How it works

```
recon → triage → consolidate → deep review → adversarial verify → dynamic PoC → report
```

| Phase | Purpose |
|-------|---------|
| Recon | Detect stack, map attack surface, pick relevant vuln classes + run strategy. |
| Triage | One finder agent per relevant class emits candidates. |
| Consolidate | Dedup by root cause, assign IDs, drop low-signal noise. |
| Deep review | Confirm a reachable source→sink path with no mitigation. |
| Adversarial verify | Independent skeptics try to refute each finding; majority kills it. |
| Dynamic PoC | Build + run the target in an isolated worktree; fire a real exploit. |
| Report | Senior-engineer report: severity-first, reference-backed, PoC-evidenced. |

## Requirements

- `git` (target must be a git repo for worktree isolation + the live-PoC phase).
- `docker` for dynamic verification (works via `sudo` if the daemon needs it);
otherwise repro falls back to unit-test/static PoCs (`--no-dynamic` skips it).
- No security scanners required — the tool is LLM-native and uses
`semgrep`/`gitleaks`/`trivy` only opportunistically if present.

## Output & issue tracking

Findings carry a stable fingerprint (`fp`) and a `display_id`
(`<slug>-<CLASS>-<n>`). The bundle is designed to be filed to GitHub issues by a
separate courier step (scan epic + per-finding sub-issues for Critical/High/
Medium, reconciled by `fp`). See [`docs/issue-tracking.md`](docs/issue-tracking.md).

## Design

Full pipeline spec, vuln-class taxonomy (OWASP 2025 + CWE + ASVS), data
contracts, and the signal-discipline policy are in
[`AGENTS.md`](AGENTS.md).

## Safety & scope

Authorized testing only — audit repositories you own or are explicitly cleared
to test. All PoC traffic is contained to local processes/containers; the tool
never fires exploits at external hosts, uses real credentials, or exfiltrates
data.

## License

MIT
Loading