diff --git a/plugins/codex-security/.codex-plugin/plugin.json b/plugins/codex-security/.codex-plugin/plugin.json index b1b1a13e..81f6fdee 100644 --- a/plugins/codex-security/.codex-plugin/plugin.json +++ b/plugins/codex-security/.codex-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "codex-security", - "version": "0.1.0", + "version": "0.1.1", "description": "Codex Security workflows for security scans, analysis, and investigation.", "author": { "name": "OpenAI" @@ -31,8 +31,8 @@ "privacyPolicyURL": "https://openai.com/policies/row-privacy-policy/", "termsOfServiceURL": "https://openai.com/policies/row-terms-of-use/", "defaultPrompt": [ - "Scan this entire codebase using subagents", - "Scan this diff, branch, or folder", + "/goal Run a security scan on this codebase using subagents. Do not stop until all in-scope files are covered and all required steps are complete.", + "/goal Scan this diff, branch, or folder using subagents. Do not stop until all in-scope files are covered and all required steps are complete.", "Create a threat model for this codebase" ], "brandColor": "#111111", diff --git a/plugins/codex-security/references/scan-artifacts.md b/plugins/codex-security/references/scan-artifacts.md index c3b7dca4..893111d0 100644 --- a/plugins/codex-security/references/scan-artifacts.md +++ b/plugins/codex-security/references/scan-artifacts.md @@ -19,21 +19,37 @@ Use these shared path conventions for Codex Security scan workflows unless the u ## Finding Discovery (Phase 2) Paths -- Runtime inventory: `/runtime_inventory.md` +### Coverage Planning + - Advisory seed research: `/seed_research.md` +- Scoped ranking input: `/rank_input.csv` if applicable +- Scoped ranking output: `/rank_output.csv` if applicable +- Scoped deep-review input: `/deep_review_input.csv` if applicable - Finding discovery report: `/finding_discovery_report.md` -- Repository-wide exhaustive file checklist: `/exhaustive-file-checklist.md` if applicable + +### Deep Review + +- Scoped work ledger: `/work_ledger.jsonl` if applicable +- Scoped raw candidates: `/raw_candidates.jsonl` if applicable + +### Candidate Reconciliation + +- Candidate findings directory: `/findings/` +- Per-finding directory: `/findings//` +- Per-finding candidate ledger: `/findings//candidate_ledger.jsonl` +- Scoped dedupe report: `/dedupe_report.md` if applicable +- Scoped deduped candidates: `/deduped_candidates.jsonl` if applicable - Repository-wide coverage ledger: `/repository_coverage_ledger.md` - This is a coverage artifact, not a findings list: it should include checked surfaces with not_applicable, suppressed, deferred, or reportable dispositions. ## Validation (Phase 3) Paths -- Validation report: `/validation_report.md` -- Validation artifacts: `/validation_artifacts/` +- Per-finding validation report: `/findings//validation_report.md` +- Per-finding validation artifacts: `/findings//validation_artifacts/` ## Attack-Path Analysis (Phase 4) Paths -- Attack-path analysis report: `/attack_path_analysis_report.md` +- Per-finding attack-path analysis report: `/findings//attack_path_analysis_report.md` ## Final Report Paths diff --git a/plugins/codex-security/skills/attack-path-analysis/SKILL.md b/plugins/codex-security/skills/attack-path-analysis/SKILL.md index 9e2e2c6d..f0889d47 100644 --- a/plugins/codex-security/skills/attack-path-analysis/SKILL.md +++ b/plugins/codex-security/skills/attack-path-analysis/SKILL.md @@ -34,6 +34,8 @@ Use the shared scan artifact path conventions in `../../references/scan-artifact 5. Calibrate impact and likelihood from the repository evidence. 6. Apply a separate final policy-adjustment pass mechanically using those facts and the calibrated severity. 7. Drop any candidate whose final policy decision is `ignore`. +8. Save that finding's visible attack-path report to its per-finding attack-path analysis report path from `../../references/scan-artifacts.md`. +9. Append one attack-path receipt per candidate id to that finding's candidate ledger path from `../../references/scan-artifacts.md`. The receipt must record the candidate id, attack-path reportability decision, attack-path facts or exact proof gap, and attack-path artifact/report reference for that candidate finding. ## Scope and Attack Path Checklist @@ -94,12 +96,13 @@ Render attack-path facts using `references/attack-path-facts.md`. - Prefer repository evidence first, but use network connectivity when it materially helps confirm deployment context, reachable surfaces, or other reportability-relevant facts. - Do not invent attack chains that the code does not support. +- Do not leave candidate coverage implicit. Every candidate finding that reaches attack-path analysis must leave an attack-path receipt in `findings//candidate_ledger.jsonl`, even when the final policy decision is `ignore` or the path remains deferred. - Do not drop exact affected locations while converting validated findings into attack paths. Repository-wide seeded/root-control rows that survive validation must keep their root-control file:line even when a wrapper, route, or transport is easier to explain. - Do not skip a reportable validation row because a neighboring same-family finding has a cleaner story. Either produce attack-path facts for that exact row or make an explicit final policy decision with repository counterevidence. - Missing public-ingress evidence is not by itself dispositive counterevidence. - Keep attack-path analysis, severity calibration, and final policy suppression as separate sub-stages. - Use the final policy-adjustment matrix mechanically rather than re-arguing severity from scratch after the facts are set. -- Save a final visible report using the attack-path analysis report path from `../../references/scan-artifacts.md`. +- Save a final visible report for each candidate finding using that finding's attack-path analysis report path from `../../references/scan-artifacts.md`. -- Considerations for attack path -- - A finding should count as a real security issue if a realistic attacker could use it from a reasonable attack surface relevant to the product, especially if it is something that is part of the thread model. diff --git a/plugins/codex-security/skills/finding-discovery/SKILL.md b/plugins/codex-security/skills/finding-discovery/SKILL.md index f35f1c50..665aa294 100644 --- a/plugins/codex-security/skills/finding-discovery/SKILL.md +++ b/plugins/codex-security/skills/finding-discovery/SKILL.md @@ -19,7 +19,15 @@ If a required input is still missing, stop and ask the user for it before contin Use the shared scan artifact path conventions in `../../references/scan-artifacts.md`. ### Code Diff Workflow -If the scan target is for a targeted code-diff, follow the procedure in `../security-scan/references/code-diff-scan.md`. +If the scan target is for a targeted code-diff: + +- Read `../security-scan/references/scan-artifacts-and-ledger.md`. +- Generate `rank_input.csv` deterministically from changed source-like files with `python3 /scripts/generate_rank_input.py make-diff-rank-input --repo --base --head --out /rank_input.csv`. +- Copy every diff row into `deep_review_input.csv` with `python3 /scripts/generate_rank_input.py copy-deep-review-input --rank-input /rank_input.csv --out /deep_review_input.csv`. Diff scans do not rank or drop changed files before deep review. +- Add directly supporting files required to understand the changed security behavior only when repository evidence shows they are needed. Do not use them to broaden into unrelated repository-wide enumeration. +- Deep-review every file in `deep_review_input.csv` using the shared scoped file-review rules. +- Stay anchored to the changed code and directly supporting files. Unchanged siblings are context or negative controls unless the diff newly reaches them, weakens their shared control, or changes a shared sink/helper they depend on. +- When the diff is too large to review credibly as one parent-agent pass, use approved file-review subagents and follow the shared scoped deep-review rules in `../security-scan/references/scan-artifacts-and-ledger.md#scoped-deep-review`. ### Repository-Wide Workflow @@ -33,10 +41,10 @@ Use this checklist to keep discovery specific without turning it into validation - Treat the commit message and title as potentially incomplete or misleading; trust the actual code path more than the narrative. - Follow the entire changed-code chain far enough to understand how the diff affects authorization, trust boundaries, dangerous sinks, or security controls. - Prefer multiple distinct finding families only when they come from different root causes; do not split one issue into cosmetic variants, but keep independently reachable instances as separate candidate entries. -- When the diff changes a shared helper, guard, route pattern, template pattern, or sink wrapper, fan out to sibling call sites that the changed code directly affects, and keep each vulnerable instance addressable. +- When the diff changes a shared helper, guard, route pattern, template pattern, or sink wrapper, expand to sibling call sites that the changed code directly affects, and keep each vulnerable instance addressable. - Look for attacker-controlled input, broken enforcement, or dangerous sinks introduced or made reachable by the change. - Stay anchored to the diff and the supporting files it depends on rather than drifting into unrelated repository scanning. -- For repository-wide scans, stay anchored to the runtime inventory and coverage ledger rather than drifting into arbitrary text search. +- For repository-wide scans, stay anchored to `rank_input.csv`, `deep_review_input.csv`, and the coverage ledger rather than drifting into arbitrary text search. - Do not group many vulnerable files under one candidate when the files have separate line-level source/sink/control evidence. - When a dangerous sink has multiple call sites, enumerate each call site with its own source and closest control. - When repeated templates, query builders, parser operations, auth/object endpoints, or shared-helper callers are independently reachable, keep each vulnerable file and sink/control line as its own candidate instance even if the final report later groups related prose. @@ -54,7 +62,7 @@ Use this checklist to keep discovery specific without turning it into validation - For shared deserialization, class-resolution, template, and auth controls, treat the resolver/filter/allowlist/denylist/guard line as a candidate location when downstream transports or callers prove reachability. Do not anchor only on the more dramatic transport if the broken control is reusable. - For deserialization and object-construction families, enumerate concrete codec, deserializer, converter, and container handlers registered by the parser or serialization config, including array, collection, map, bean, enum, throwable, and generic-object handlers. A top-level parser/config finding does not close a concrete codec row when that codec recursively invokes parsing, type resolution, conversion, or object construction on attacker-controlled data. - For file-format object models, enumerate primitive/container helper methods that convert or traverse attacker-controlled document structures, including `to*Array`, `get*`, `getObject`, numeric conversion, `parse*`, iterator, `size`, unchecked casts, and allocation loops. Treat these helpers as candidate root controls when malformed documents can trigger type confusion, exceptions, unbounded traversal, or memory/CPU exhaustion. -- If the runtime inventory names a central object-model package for an untrusted format, include that package's array, dictionary, node, collection, and primitive conversion helpers as discovery rows before closing the parser family. A parser, filter, or codec finding in a neighboring package does not close unchecked conversion helpers in the core object model. +- If the repository-wide worklist or coverage ledger identifies a central object-model package for an untrusted format, include that package's array, dictionary, node, collection, and primitive conversion helpers as discovery rows before closing the parser family. A parser, filter, or codec finding in a neighboring package does not close unchecked conversion helpers in the core object model. - Object-model helper sweeps create mandatory discovery rows first, not automatic reportable findings. Promote them only when malformed or adversarial input plausibly reaches the helper and the missing type, size, shape, recursion, numeric, or conversion guard can cause crash, denial of service, parser confusion, authorization bypass, or another concrete security impact. - Do not suppress deterministic parser/helper crashes as mere robustness when untrusted remote, protocol, document, archive, or package input can reach the missing guard and abort a service, request worker, parser pipeline, or security negotiation. Suppression needs exact containment evidence such as caller-side recovery, input prevalidation equivalent to the missing guard, or a non-security-only boundary. - For structured patch/edit/apply APIs such as JSON Patch, Graph Patch, document edits, or config mutations, enumerate concrete request-selected operations like add, remove, replace, move, copy, and test. Keep operation-specific path transforms, array append handling, wildcard selection, or object-binding lines candidate-visible when they feed a shared evaluator or binder. @@ -76,8 +84,6 @@ Use this checklist to keep discovery specific without turning it into validation - In protocol-heavy repositories, inspect low-level version, capability, feature, and negotiation utility classes even if the most obvious candidates are REST/upload/admin hotspots. Search for helper names such as `Version`, `VersionUtil`, `versionCompare`, `versionMatch`, `Capability`, `Feature`, `Negotiation`, `parseInt`, `split`, `matches`, and comparator methods, then close paired validator/parser rows explicitly. - For self-service update routes, include guard or predicate methods that compare requested objects against persisted objects. Treat missing checks on security-sensitive scalar fields and collection aliases as candidate locations when they can change identity, trust state, tenant membership, roles, groups, or account recovery properties. - When a template or config pattern appears repeatedly, enumerate each affected file/line and note any nearby safe control that should not be reported. -- In large repository scans, do not let one hotspot cluster consume the whole discovery pass. After promoting one or two obvious findings in the same route, upload, parser, or auth cluster, inspect a disjoint seeded row or low-salience utility/control shard such as protocol/version helpers, central object-model helpers, shared validators, or class-resolution controls before finalizing discovery. -- In large repositories, bias early coverage inside high-risk subsystems toward leaf runtime validators, comparators, object-model helpers, and operation branches that parse or select untrusted values. Top-level controllers, uploads, admin endpoints, and XML hotspots are entrypoints, not sufficient coverage of the subsystem. - For diff-scoped scans, include `relevant_lines` only when the bug overlaps the diff and those lines are genuinely relevant to the issue. - For recursive placeholder or template findings, include the helper/parser setup line that enables recursive expansion or expression evaluation along with the resolver/evaluation/render line. - Include CWE IDs when known; use an empty list when the class is unclear. @@ -135,6 +141,7 @@ If there are no plausible candidates, return a no-findings result. Otherwise, for each candidate include: +- candidate id - title - affected locations, with labels when more than one applies: `entrypoint/wrapper`, `root_control`, `sink`, and `concrete_implementation` - instance key in the form `::` for repository-wide scans @@ -150,6 +157,8 @@ Otherwise, for each candidate include: - taxonomy with CWE IDs when known - enough evidence that a later reviewer can understand why the candidate is technically plausible before validation +When candidates are emitted, create the per-finding directory from `../../references/scan-artifacts.md` and append one discovery receipt to that finding's candidate ledger. The ledger row should identify the candidate, scan scope, discovery status, affected locations, and the discovery artifact or evidence that produced it. + ## Hard Rules @@ -157,6 +166,7 @@ Otherwise, for each candidate include: - Focus on the actual changes, not the commit message. - Stay anchored to the diff and the files it relies on for diff-scoped scans. For repository-wide scans, treat the checked-out repository as in scope and ignore diff-overlap restrictions for affected locations. - Candidate discovery is about plausibility, not final severity. +- Do not emit an untracked candidate. Every candidate finding needs a stable candidate id and a discovery receipt in `findings//candidate_ledger.jsonl` so later validation and attack-path analysis can prove coverage for that exact finding. - Do not add `relevant_lines` when no bug exists. For diff-scoped scans, add `relevant_lines` only when the bug overlaps the diff and those lines are relevant to the bug. - Do not turn discovery into full validation or full severity calibration. - Continue reviewing until no additional distinct plausible candidates remain. diff --git a/plugins/codex-security/skills/security-scan/SKILL.md b/plugins/codex-security/skills/security-scan/SKILL.md index a412277b..26fbfed2 100644 --- a/plugins/codex-security/skills/security-scan/SKILL.md +++ b/plugins/codex-security/skills/security-scan/SKILL.md @@ -47,18 +47,22 @@ Follow this plan in order. Do not skip ahead to a later phase until the current - Treat the per-scan threat model path as the source of truth threat model for later phases. 3. Run `$finding-discovery` as the second step, against the resolved diff and using the per-scan threat model as context. - If discovery produces no technically plausible candidates in a diff-scoped scan, stop there, skip validation and attack-path analysis, and assemble the final markdown report immediately. - - In repository-wide scans, stop at discovery only when `runtime_inventory.md` exists and the coverage ledger has closed every applicable high-impact and seeded root-control row as `suppressed`, `not_applicable`, or `deferred` with exact reasons. Open, reportable, or unresolved seeded rows continue to validation even when they are not yet numbered as findings. + - In repository-wide scans, stop at discovery only when the ranked runtime-surface worklist exists and the coverage ledger has closed every applicable high-impact and seeded root-control row as `suppressed`, `not_applicable`, or `deferred` with exact reasons. Open, reportable, or unresolved seeded rows continue to validation even when they are not yet numbered as findings. 4. Run `$validation` as the third step, for each candidate that came out of discovery and, in repository-wide scans, each open, reportable, or deferred seeded/root-control ledger row that still needs closure. - Pass the resolved scan scope, discovery notes, and candidate inventory to validation. Validation should preserve or suppress the provided instances; it should not independently decide whether a standalone single-candidate request should become diff-scoped or repository-wide. - - For repository-wide scans, the exhaustive file checklist and discovery coverage ledger are part of the validation input; the ledger is a coverage artifact, not just a findings tracker. Validation should preserve checked surfaces with not_applicable, suppressed, deferred, and reportable dispositions, and continue the ledger's high-impact sibling checks when needed rather than narrowing to one representative finding. - - As repository-wide rows are validated, keep the saved validation report current enough that reportable, suppressed, not_applicable, and deferred closure rows survive interruption or later phase summarization, including exact root-control file:line and seed-anchor file:line when distinct. + - Each candidate finding's `findings//candidate_ledger.jsonl` is part of the validation input for every scan scope. Every candidate finding that came out of discovery must have a discovery receipt before validation starts and a validation receipt before the scan can proceed to final reporting. + - For repository-wide scans, `rank_input.csv`, `rank_output.csv` when ranking applies, `deep_review_input.csv`, `work_ledger.jsonl`, `raw_candidates.jsonl`, per-finding `findings//candidate_ledger.jsonl` files, `deduped_candidates.jsonl`, and the discovery coverage ledger are part of the validation input; the ledger is a coverage artifact, not just a findings tracker. Raw candidates should already include the discovering file-review subagent's or parent agent's candidate-local validation evidence and attack-path facts before dedupe, and each per-finding candidate ledger should prove that its raw candidate finding received both checks or has an explicit deferred reason. Validation should preserve checked surfaces with not_applicable, suppressed, deferred, and reportable dispositions, reconcile cross-file proof gaps, and continue the ledger's high-impact sibling checks when needed rather than narrowing to one representative finding. + - When multiple candidates or repository coverage-ledger rows need validation and subagents are available and approved, divide validation across validation subagents by candidate, deduped candidate, or ledger row. Each validation subagent must receive the candidate or row, discovery evidence, artifact paths, and candidate-ledger path it owns, then write or return the validation report update and validation receipt for that assignment. + - As repository-wide rows are validated, keep the saved per-finding validation reports current enough that reportable, suppressed, not_applicable, and deferred closure rows survive interruption or later phase summarization, including exact root-control file:line and seed-anchor file:line when distinct. 5. Run `$attack-path-analysis` as the fourth step, for findings and repository-wide validation closure rows that still need reportability, attack-path, and severity analysis after validation. -6. Assemble the final markdown report last using the final report path from `../../references/scan-artifacts.md` and the outputs of the earlier phases: finding discovery, validation, attack path analysis. + - Each candidate finding's `findings//candidate_ledger.jsonl` is part of the attack-path input for every scan scope. Every candidate finding that reaches attack-path analysis must have an attack-path receipt before final reporting, even when the final decision is `ignore`, suppressed, or deferred. + - When multiple validated candidates or validation closure rows need attack-path analysis and subagents are available and approved, divide attack-path work across attack-path subagents by candidate or row. Each attack-path subagent must receive the validation evidence, affected root-control and sink lines, artifact paths, and candidate-ledger path it owns, then write or return attack-path facts, severity/policy analysis, and the attack-path receipt for that assignment. +6. Assemble the final markdown report last using the final report path from `../../references/scan-artifacts.md` and the outputs of the earlier phases: finding discovery plus each candidate finding's validation and attack-path reports. ## Phase Scope - Phase 1 (threat model generation) is repository-scope by default, unless the user explicitly asks for narrower scope or provides an authoritative threat model or sufficiently repository-specific security scan guidance such as `AGENTS.md`. -- For PR, commit, branch, and local-patch scans, Phase 2 onward (finding discovery, validation, attack path analysis) are diff-focused and should follow the changed code and its supporting files. +- For PR, commit, branch, and local-patch scans, Phase 2 onward (finding discovery, validation, attack path analysis) are diff-focused and should follow the changed code and its supporting files. Diff scans use `deep_review_input.csv` plus the shared scoped file-review rules in `references/scan-artifacts-and-ledger.md`; when the diff is too large for one parent-agent pass, use approved file-review subagents and aggregation without broadening the scan beyond changed files and directly supporting files. - For repository-wide scans, Phase 2 onward remain repository-wide. Before the `$finding-discovery` phase, read `references/repository-wide-scan.md` and every required reference it lists, then use them for finding discovery, validation, and attack path analysis. Treat this asymmetry as intentional: @@ -86,7 +90,7 @@ For normal PR, commit, branch, and local-patch scans, stay diff-focused but pres Diff scans should: - start from the changed files and the supporting files needed to understand the changed behavior -- fan out from a changed route, handler, shared helper, guard, template pattern, query builder, serializer/deserializer, filesystem/network sink, config block, or wrapper to sibling instances that the diff also changes, newly reaches, or affects through the same modified shared dependency +- expand from a changed route, handler, shared helper, guard, template pattern, query builder, serializer/deserializer, filesystem/network sink, config block, or wrapper to sibling instances that the diff also changes, newly reaches, or affects through the same modified shared dependency - when the diff adds, removes, or reshapes a guard around an existing parser, deserializer, expression evaluator, filesystem/path helper, archive utility, or auth/authz helper, use the adjacent pre-existing sink/control as supporting context for the changed behavior; keep the candidate anchored to the changed guard or newly exposed path unless the user explicitly asks for wider instance expansion - when a changed wrapper, guard, or API delegates to a shared parser/deserializer/path/archive/auth helper, keep both the wrapper call site and the underlying shared sink/control line addressable; do not replace the root sink/control evidence with wrapper-only evidence - carry each vulnerable sibling instance through discovery and validation with its own affected location, source, closest control, sink, impact, and suppression evidence @@ -115,15 +119,18 @@ Assemble the final markdown report and Codex app review directives using `refere - Follow the execution plan in order. - Use the tools to inspect the repository before making decisions. - For repository-wide scans, do not equate broad sink counts with completed coverage. The coverage ledger must close each applicable high-impact shard row as `reportable`, `suppressed`, `not_applicable`, or `deferred`. +- For every scan scope, candidate-finding coverage is required. Do not finalize a candidate finding until `findings//candidate_ledger.jsonl` shows discovery, validation, and attack-path receipts for that exact candidate, or an explicit deferred reason for the missing proof. +- For diff-scoped scans, do not claim diff coverage until every `deep_review_input.csv` row has a completion receipt in `work_ledger.jsonl`. +- For repository-wide scans, subagent dispatch must have explicit ownership: ranking subagents own one `rank_input.csv` row and return only ranking JSON; file-review subagents own one assessed file or tiny shard and return full-file receipts plus pre-dedupe finding objects with candidate-local validation evidence and attack-path facts; validation subagents own one candidate or ledger row that needs validation closure; attack-path subagents own one validated candidate or validation closure row; the parent agent owns orchestration, ledger reconciliation, aggregation, cross-file dedupe, and final closure. +- For repository-wide scans, candidate-finding coverage is separate from file coverage. Do not dedupe or finalize a raw candidate finding until its `findings//candidate_ledger.jsonl` shows candidate-local validation and candidate-local attack-path receipts, or an explicit deferred reason for missing proof. - Candidate ids are optional links from coverage rows to findings; a not_applicable, suppressed, or deferred row is still required when the surface was in scope. -- For repository-wide scans, the runtime inventory must exist as an artifact before discovery is considered complete, and the coverage ledger must be materially broader than the promoted candidate list. +- For repository-wide scans, the ranked runtime-surface worklist must exist before discovery is considered complete, and the coverage ledger must be materially broader than the promoted candidate list. - For repository-wide scans with CVE, GHSA, advisory, issue, release, or package-version identifiers, `seed_research.md` must exist before discovery is considered complete. It should record authoritative sources searched, candidate files/functions/classes/hunks, and failed lookup attempts. Missing seed research means advisory-led discovery is incomplete unless the scan explicitly states that no network/local-history source was available. -- In large repository-wide scans, checkpoint the runtime inventory and initial coverage ledger to disk before deep sink review or validation. A run that is interrupted after frontier mapping should still leave auditable coverage artifacts. +- In large repository-wide scans, checkpoint the ranked runtime-surface worklist and initial coverage ledger to disk before deep sink review or validation. A run that is interrupted after frontier mapping should still leave auditable coverage artifacts. - In large monorepos, top product/runtime areas by file count or deployment significance must appear as ledger shards or be explicitly excluded with repository evidence; global sink counts and `no top candidate surfaced` do not close coverage. - User/advisory/tag-seeded packages, class families, or vulnerability families remain open until the exact seeded row is closed as `reportable`, `suppressed`, `not_applicable`, or `deferred`. A neighboring same-family finding does not close the seeded row. - For large repository-wide scans, make one reachability pass across every applicable high-impact shard before prolonged validation of any single shard. A row becomes a validation candidate only when it has a concrete entrypoint or privileged boundary, closest relevant control, sink or broken control, and plausible impact. - Discovery is incomplete when a shard has a promoted finding but still has unclosed sibling packages, concrete implementations, or reusable root-control rows that could be independently vulnerable. Finish those rows or mark them explicitly deferred before final reporting. -- In large repositories, discovery is incomplete when it only follows the first obvious hotspot cluster and never checks a disjoint seeded/advisory or low-salience utility/control shard such as protocol/version helpers, central object-model helpers, reusable validators, or shared deserializer controls. - Final assembly must start from reportable validation closure rows and surviving candidates. Do not drop a reportable seeded/root-control row because attack-path analysis or discovery spent more prose on a neighboring same-family finding. - Final reporting is incomplete when a promoted high-impact finding's affected lines omit the concrete root-control file/line discovered or seeded during discovery, such as a codec, converter, parser feature setup, class filter, resource-path control, protocol state transition, or self-service update guard. Add the root-control affected line or explicitly suppress/defer it with exact counterevidence before finalizing. - In repository-wide scans, preserve independently reachable sibling instances through final reporting. Repeated vulnerable templates, query builders, parser operations, auth/object endpoints, or shared-helper callers need separate finding entries, affected lines, and dispositions; put grouping in summary prose only after the individual instances are emitted. diff --git a/plugins/codex-security/skills/security-scan/agents/openai.yaml b/plugins/codex-security/skills/security-scan/agents/openai.yaml index a55ba610..f33b9082 100644 --- a/plugins/codex-security/skills/security-scan/agents/openai.yaml +++ b/plugins/codex-security/skills/security-scan/agents/openai.yaml @@ -1,4 +1,4 @@ interface: display_name: "Security Scan" short_description: "Run security scan" - default_prompt: "Use $security-scan to run a security scan of the target repository or code change and identify vulnerabilities." + default_prompt: "/goal Run a security scan on this codebase using subagents. Do not stop until all in-scope files are covered and all required steps are complete." diff --git a/plugins/codex-security/skills/security-scan/references/code-diff-scan.md b/plugins/codex-security/skills/security-scan/references/code-diff-scan.md deleted file mode 100644 index 433b782d..00000000 --- a/plugins/codex-security/skills/security-scan/references/code-diff-scan.md +++ /dev/null @@ -1,58 +0,0 @@ -# Code Diff Review Guidance - -Use this guidance when the security scan target is a specific code-diff or PR. - -## Workflow - -1. Load the per-scan threat model path from `../../../references/scan-artifacts.md` as the repo-specific threat-model source of truth. -2. Investigate the actual diff and any supporting files the patch relies on. You must enumerate ALL plausible security finding candidates. Continue reviewing until you have covered the diff and its immediate supporting code comprehensively enough that no additional distinct plausible candidates remain. -3. Follow the chain of files needed to understand the effects of the change. -4. Trace attacker-controlled inputs through the minimal surrounding code needed to understand propagation and controls. -5. Identify all security vulnerabilities that are valid given the threat model. Non-exaustive list of some common failures to look for (but you should read all the code to find other vulnerabilities): - - authentication - - authorization - - tenant isolation - - input handling into interpreters, queries, or filesystem paths - - unsafe network fetch or callback behavior - - privilege-bearing state transitions - - secret handling - - security-sensitive config or policy enforcement - - etc -6. Look for multiple distinct findings caused by different root causes. -7. For each candidate, prioritize technical plausibility rather than final reportability. -8. If there are no plausible candidates, stop and return a no-findings result immediately. - -### Diff-Scoped Sibling Expansion - -For diff scans, stay anchored to the changed code, but do not stop at one representative instance when the diff introduces, modifies, or makes reachable a repeated vulnerable pattern. - -Apply sibling expansion when a changed route, handler, template, query helper, deserializer, filesystem/network sink, authz guard, config block, or wrapper has nearby changed or directly supporting siblings with the same security-relevant shape. In that case: - -- enumerate each changed or newly reachable sibling instance with its own source, closest control, sink, and affected location -- search the immediate pattern family needed to determine whether the diff made additional siblings vulnerable, such as adjacent routes in the same router, templates rendered by the changed handler, call sites of the changed helper, or policy checks sharing the changed guard -- when the diff adds, removes, or reshapes a guard around an existing parser, deserializer, expression evaluator, filesystem/path helper, archive utility, or auth/authz helper, use the adjacent pre-existing sink/control as supporting context for the changed behavior; keep the candidate anchored to the changed guard or newly exposed path unless the user explicitly asks for wider instance expansion -- when a changed wrapper, guard, or API delegates to a shared parser/deserializer/path/archive/auth helper, keep both the wrapper call site and the underlying shared sink/control line addressable; do not replace the root sink/control evidence with wrapper-only evidence -- keep unchanged siblings as context or negative controls unless the diff newly routes attacker input to them, weakens their shared control, or changes a shared sink/helper they depend on -- suppress only the exact sibling instance that has counterevidence, and do not let one safe sibling suppress another vulnerable sibling -- stop expansion once the changed pattern family and its directly supporting call sites are exhausted; do not turn a diff scan into an unrelated repository-wide scan - -## Discovery Checklist - -Use this checklist to keep discovery specific without turning it into validation or attack-path analysis: - -- Use tools to inspect the changed files and the minimum supporting files they rely on before deciding anything. -- Treat the commit message and title as potentially incomplete or misleading; trust the actual code path more than the narrative. -- Follow the entire changed-code chain far enough to understand how the diff affects authorization, trust boundaries, dangerous sinks, or security controls. -- Prefer multiple distinct findings only when they come from different root causes; do not split one issue into cosmetic variants. -- When the diff changes a shared helper, guard, route pattern, template pattern, or sink wrapper, fan out to sibling call sites that the changed code directly affects, and keep each vulnerable instance addressable. -- Look for attacker-controlled input, broken enforcement, or dangerous sinks introduced or made reachable by the change. -- Stay anchored to the diff and the supporting files it depends on rather than drifting into unrelated repository scanning. -- Do not group many vulnerable files under one candidate when the files have separate line-level source/sink/control evidence. -- When a dangerous sink has multiple call sites, enumerate each call site with its own source and closest control. -- When source/sink evidence crosses a wrapper into a shared sink/control helper, include both locations in the candidate so validation can test reachability without losing the root vulnerable line. -- Do not collapse separate high-impact proof tuples into one candidate only because they share a route or helper. Split command execution, SSRF, path/file impact, XML/parser behavior, XSS/template execution, and authz/state-change impact when the sink, closest control, or impact differs. -- In XML/parser/deserializer surfaces, enumerate default parser factories, converters, validators, transformers, unmarshal/parse calls, and handler entrypoints independently. A safe sibling parser path is negative control for that sibling, not suppression evidence for a different default factory or converter. -- In auth/authz surfaces, enumerate public webhook, status, callback, and API endpoints that read protected objects, trigger builds/jobs, or mutate protected state independently from nearby credential or configuration bugs. -- When a template or config pattern appears repeatedly, enumerate each affected file/line and note any nearby safe control that should not be reported. -- Include `relevant_lines` only when the bug overlaps the diff and those lines are genuinely relevant to the issue. -- Include CWE IDs when known; use an empty list when the class is unclear. diff --git a/plugins/codex-security/skills/security-scan/references/final-report.md b/plugins/codex-security/skills/security-scan/references/final-report.md index 11b7c2ce..39f0e6d7 100644 --- a/plugins/codex-security/skills/security-scan/references/final-report.md +++ b/plugins/codex-security/skills/security-scan/references/final-report.md @@ -21,7 +21,7 @@ Set the finding category and CWE from the primary broken control. Do not add sec Examples that should normally become separate final findings include SQL API modes such as `execute`, `executemany`, and `executescript`; deserializer variants such as `pickle.load`, `pickle.loads`, `yaml.load`, and `yaml.load_all`; distinct path/file helper calls; SSRF modes with different destination controls; and missing-auth protected actions such as create, delete, reset, admin, and job-trigger endpoints. -Before writing the report, reconcile the final findings against the saved validation closure table and repository coverage ledger when those artifacts exist. Start from validated rows marked `reportable` or `survives: yes`, not only from the most polished candidate narrative. Every `reportable` seeded or root-control ledger row must become a finding with the same root-control file:line. Rows closed as `suppressed`, `not_applicable`, or `deferred` should appear in a short `Coverage Closure` section with the exact file:line when known and the reason or proof gap. Do not silently drop a seeded/root-control row because a same-family neighboring finding survived. If attack-path analysis omitted a reportable validation row, assemble a concise attack path from the validation evidence and threat model rather than dropping the row. +Before writing the report, reconcile each final finding against its `findings//candidate_ledger.jsonl`, the saved validation closure table, and the repository coverage ledger when those artifacts exist. Every final candidate finding must have discovery, validation, and attack-path receipts for the same candidate id, or an explicit deferred reason for the missing proof. Start from validated rows marked `reportable` or `survives: yes`, not only from the most polished candidate narrative. Every `reportable` seeded or root-control ledger row must become a finding with the same root-control file:line. Rows closed as `suppressed`, `not_applicable`, or `deferred` should appear in a short `Coverage Closure` section with the exact file:line when known and the reason or proof gap. Do not silently drop a seeded/root-control row because a same-family neighboring finding survived. If attack-path analysis omitted a reportable validation row, assemble a concise attack path from the validation evidence and threat model rather than dropping the row. Use a format close to: diff --git a/plugins/codex-security/skills/security-scan/references/repo-wide-artifacts-and-ledger.md b/plugins/codex-security/skills/security-scan/references/repo-wide-artifacts-and-ledger.md index 963c892b..cac8e08d 100644 --- a/plugins/codex-security/skills/security-scan/references/repo-wide-artifacts-and-ledger.md +++ b/plugins/codex-security/skills/security-scan/references/repo-wide-artifacts-and-ledger.md @@ -1,39 +1,50 @@ # Repository-Wide Artifacts And Ledger -Use this reference with `repository-wide-scan.md` for repository-wide security scans. - -## Artifact Requirements - -- Load the per-scan threat model path from `../../../references/scan-artifacts.md` as the repo-specific threat-model source of truth. -- Build and save a runtime inventory using the runtime inventory path from `../../../references/scan-artifacts.md`. - - Do not proceed past repository-wide discovery until `runtime_inventory.md` exists on disk and is referenced by the discovery report. Missing `runtime_inventory.md` means discovery is incomplete, even if candidates or a coverage ledger already exist. - - Save the runtime inventory immediately after the initial entrypoint/trust-boundary pass, before deep sink review. Large-repo scans should leave this checkpoint even if later validation or environment setup is interrupted. - - Derive product and privileged surfaces from router declarations, OpenAPI or RPC metadata, public or anonymous endpoints, applied specs, ingress/service config, job/worker definitions, package exports, and privileged local or agent/tool surfaces before free-text sink search. - - Include HTTP, GraphQL, RPC, CLI, job, webhook, file-processing, message, template, package API, and agent/tool entrypoints; authn/authz/session middleware and decorators; database/query builders, ORM raw-query escapes, serializers/deserializers, shell/process/eval/template engines, filesystem APIs, network clients/fetchers, upload/download paths; first-party security/protocol namespaces such as SSO, SAML, OAuth, OIDC/JWT, LDAP, Kerberos, XML security, remoting, config import/export, protocol codecs, parser/converter registries, and version/feature gates. - - Default-exclude tests, docs, examples, personal workspaces, lockfiles, vendored trees, generated caches, and one-off research tooling from the first pass unless repository evidence shows they are deployed, privilege-bearing, generated into shipped runtime code, or reachable from untrusted input. If excluded content is added back, record the reason in the ledger. -- CRITICALLY IMPORTANT, DO NOT SKIP: Create an exhaustive check-list artifact of files that are in scope. This must be a file named `exhaustive-file-checklist.md`. Your goal is to be the most correct, complete, comprensive vulnerability researcher. You are not afraid of large repos, you can easily handle 100s of files, and ideally will be able to run for hours to do a complete scan for the user. Do not shy away from repos just because they seem large. The user will only be happy if you actually perform detailed vulnerability research on every file in the entire scope. - - When building this checklist, only include in-scope files. This only includes main applicaiton code files, not docs or other build files. - - In-scope files are actual code files that are relevant to the application in the context what an attacker can reach via the attack-surface outlined in the threat model. - - DO NOT include test files: unit tests, regression tests, example code. - - In large multi-purpose mono-repos, treat clearly non-product directories/components as out of scope unless the threat model explicitly includes them. Examples are `personal/`, `test/`, `tests/`, `dev/`, `docs/`, `junk/`, `examples/`, experiments, benchmarks, local tooling, and other developer-only subprojects. - - IMPORTANT: Each line must start with `- [ ] ` IT CANNOT BE CHECKED UNTIL YOU HAVE ACTUALLY READ THAT FILE FULLY AND PERFORMED FULL VR ON IT. -- Before checking off any files from the exaustive list, perform initial recon for potential high severity vulnerabilities and annotate checklist entries where these searches had hits. - -## Seed Research - -- First capture user-provided scope hints such as CVE/GHSA/advisory identifiers, package versions, named vulnerability families, or release/security-test references. -- When the user request or scan context includes CVE, GHSA, advisory, issue, release, package-version, or explicit vulnerability-family identifiers, run an advisory seed pass before deep frontier scanning and save it to the advisory seed research path from `../../../references/scan-artifacts.md`. -- Use authoritative advisory text, project security notes, release notes, fix commits, pull requests, issue trackers, and security tests when network access or local history is available. Record the sources searched, candidate files/functions/classes/hunks, expected vulnerable behavior, and any failed lookup attempts. -- Treat those candidates as seed rows only: validate the vulnerable behavior against the checked-out repository before reporting. Do not let the seed lane replace the repository-wide frontier pass. -- When CVE/advisory context has a generic or unhelpful category, prioritize advisory, fix-commit, release-note, and security-test lookup before broad sink hotspot scanning. If external lookup is unavailable or inconclusive, run a local regression-seed pass over project-specific protocol, parser, validator, and utility names plus the CVE/advisory terms; do not assume obvious REST/upload/XML hotspots are the intended security regression. -- When the seed pass or local search opens a candidate file, class, package, or hunk, create an exact seed-target row for that area before opportunistic same-family scanning. Run a short seed-first triage over that file/package and its immediate shared helper or caller chain, then close the row as `reportable`, `suppressed`, `not_applicable`, or `deferred`. A more obvious neighboring issue can be reported too, but it does not replace the seed-target row. -- Keep every user/advisory/tag-seeded boundary package or class family open until that exact area is closed as `reportable`, `suppressed`, `not_applicable`, or `deferred`. A broader same-family finding in a neighboring parser, auth flow, deserializer, or template engine does not implicitly close the seeded row. -- In advisory-led scans, treat the advisory, fix hunk, release note, or security test as evidence for the intended root cause, not as an exclusivity filter and not as a bare finding. Keep the exact seed row open until checked-out repository evidence independently supports or disproves the same source, broken control, and impact tuple. +Use this reference with `repository-wide-scan.md` for repository-wide ranking and coverage-ledger rules. Read `scan-artifacts-and-ledger.md` first for shared artifact, seed, subagent, scoped file-review, candidate-ledger, and dedupe rules. + +## Repository-Wide Artifact Requirements + +- Use the artifact paths from `../../../references/scan-artifacts.md` for `rank_input.csv`, `rank_output.csv` when ranking applies, `deep_review_input.csv`, `work_ledger.jsonl`, `raw_candidates.jsonl`, `dedupe_report.md`, `deduped_candidates.jsonl`, and `repository_coverage_ledger.md`. + +## Repository-Wide Subagent Ownership + +- Ranking-subagent ownership: one ranking subagent owns exactly one `rank_input.csv` row and returns only ranking JSON. +- Parent-agent ownership: the parent agent owns `rank_input.csv` generation when an upstream orchestrator did not already provide it, ranking-subagent dispatch when ranking is needed, `deep_review_input.csv` selection when an upstream orchestrator did not already provide it, global frontier/coverage work, and final repository-wide closure. + +## Files In Scope + +- A parent orchestrator may provide authoritative repository-wide worklists at the standard `/rank_input.csv` and `/deep_review_input.csv` paths before this workflow begins. + - Treat the parent-provided worklists as authoritative only when the current scan instructions explicitly say they are authoritative and both files are present. A stale or partial artifact pair is not a valid scope contract. + - When authoritative parent-provided worklists are present, use them exactly as supplied. Do not regenerate `rank_input.csv`, rerun ranking, overwrite `deep_review_input.csv`, or reinterpret `top-percent` inside this scan. + - The parent orchestrator owns explaining whether its `deep_review_input.csv` is exhaustive or selected. This repo-wide workflow still owns full review receipts, candidate ledgers, coverage-ledger closure, and final repository-wide closure for the supplied worklist. +- Otherwise, create a deterministic repository-wide file worklist before subagent dispatch. Use `../scripts/generate_rank_input.py` from this skill directory to create `rank_input.csv`; do not ask the model to invent the file inventory. + - Command shape: `python3 /scripts/generate_rank_input.py make-repo-rank-input --repo --scope --out /rank_input.csv`. + - The generated CSV is the canonical candidate list for ranking subagents. It contains `path`, `area`, and `preview`. + - The script only includes source-like text files and default-excludes tests, docs, examples, personal/dev-only trees, vendored trees, generated caches, and build artifacts unless the threat model explicitly makes one of those areas runtime-reachable or privilege-bearing. + - If excluded content is added back manually, record the reason in the coverage ledger. + - The Python script does not make the security ranking decision. Ranking is performed by ranking subagents over `rank_input.csv`. +- When authoritative parent-provided worklists are not present, convert the candidate list into the deep-review worklist: + - Interpret `top-percent` as the percentage of ranked, included files that receive deep full-file audit. + - If `top-percent` is below 100, dispatch ranking subagents with `spawn_agents_on_csv` on `/rank_input.csv`. + - Ranking-subagent ownership: one ranking subagent owns exactly one `rank_input.csv` row. It decides only whether that file should enter deep review, and returns JSON with `path`, `include` true/false, `score` 1-10, and `reason`. + - Ranking subagents do not perform deep review, validation, attack-path analysis, dedupe, or ledger closure. + - Save ranking output to `/rank_output.csv`, then create `/deep_review_input.csv` with `select-deep-review-input`. + - If `top-percent` is 100 or higher, skip ranking and copy every `rank_input.csv` row directly into `/deep_review_input.csv`. + - Do not treat deterministic path order or broad grep hits as ranking evidence; the ranking-subagent output is the ranking source of truth. +- Deep-review every file selected into `deep_review_input.csv` using the shared scoped file-review rules in `scan-artifacts-and-ledger.md`. +- When `top-percent` is 100 or higher, or when an authoritative parent-provided worklist declares `deep_review_input.csv` exhaustive over `rank_input.csv`, do not stop until every `rank_input.csv` row has a completion receipt in the shared work ledger. + +## Ranking Requirements + +- Derive product and privileged surfaces from router declarations, OpenAPI or RPC metadata, public or anonymous endpoints, applied specs, ingress/service config, job/worker definitions, package exports, and privileged local or agent/tool surfaces before free-text sink search. +- Include HTTP, GraphQL, RPC, CLI, job, webhook, file-processing, message, template, package API, and agent/tool entrypoints; authn/authz/session middleware and decorators; database/query builders, ORM raw-query escapes, serializers/deserializers, shell/process/eval/template engines, filesystem APIs, network clients/fetchers, upload/download paths; first-party security/protocol namespaces such as SSO, SAML, OAuth, OIDC/JWT, LDAP, Kerberos, XML security, remoting, config import/export, protocol codecs, parser/converter registries, and version or feature gates. +- Rank files highly when they define, configure, or materially control those runtime/security surfaces; record the concrete surface in `reason`. +- Default-exclude tests, docs, examples, personal workspaces, lockfiles, vendored trees, generated caches, and one-off research tooling from the first pass unless repository evidence shows they are deployed, privilege-bearing, generated into shipped runtime code, or reachable from untrusted input. If excluded content is added back, record the reason in the ledger. ## Coverage Ledger - Create a high-impact coverage ledger first across the vulnerability families most likely to produce serious bugs: command/code injection and RCE, SQL/NoSQL/LDAP/XPath/template injection, SSTI, unsafe deserialization, SSRF/callback abuse, path traversal/arbitrary file read or write, unsafe file upload, header injection/open redirect with credential or callback impact, and authz/tenant/object isolation bypasses that cross a meaningful privilege or data boundary. -- Build and save the ledger from the saved runtime inventory with one row per applicable boundary and serious vulnerability family before deep validation begins. The ledger must include: ledger row id, seed or root-control file:line when one is known, boundary, shard or area, files checked, applicable family, source or privileged boundary checked, sink/control checked, candidate ids when any were produced, disposition, evidence summary, prune reason or add-back trigger when applicable, and any deferred reason. +- Build and save the ledger from the ranked runtime/security surfaces and deep-review evidence with one row per applicable boundary and serious vulnerability family before deep validation begins. The ledger must include: ledger row id, seed or root-control file:line when one is known, boundary, shard or area, files checked, applicable family, source or privileged boundary checked, sink/control checked, candidate ids when any were produced, disposition, evidence summary, prune reason or add-back trigger when applicable, and any deferred reason. - Rows with no candidate are still required, seeded rows must close the exact seeded package/class family, and dominant runtime/product areas must have explicit rows or explicit repository-evidenced exclusions. - For large repositories, partition the inventory into review shards by deployed or privileged area and vulnerability family before deep validation. A shard is a concrete boundary such as a service, router group, package API, parser family, job/worker family, deployment surface, CI/deploy path, or privileged local/agent tool surface. - Shard by product module, package namespace, or protocol/security subsystem as well as by bug family. Do not let one reportable finding in a broad family close sibling modules such as separate SSO/SAML/OAuth/JWT, parser, protocol, config-import, or deserialization-wrapper packages. @@ -42,17 +53,3 @@ Use this reference with `repository-wide-scan.md` for repository-wide security s - Dominant ambiguous trees must be split by runtime, deployment, package, or privilege evidence before they can be left deferred. A single blob row such as "project", "server", "core", or "plugins" is incomplete unless it cites the concrete entrypoint/control files checked and explains why further subdivision is not possible from repository evidence. - Treat broad sink searches as seed generation only. They do not count as coverage completion until the relevant files have been tied to an entrypoint or privileged boundary and the ledger row has a final disposition. - Promote a seed into a reportable finding candidate only after it has a concrete source or privileged boundary, closest relevant guard/control, sink or broken control, and impact. Public or anonymous routes, upload/parser entrypoints, webhooks, build/job triggers, package APIs, and privileged internal workers count as first-class boundaries. - -## File Pass And Subagents - -- For each file in the list, starting with the already annotated ones: - - IMPORTANT: Ideally only assign each agent a single file or a very small set of files!!! If you give them too many files, the results will get a LOT WORSE! If you have to, give a max of 5 files! This is not batching, this is a targeted scalpal ensuring maximum analysis of a single file! - - Task a sub-agent to perform targeted vulnerability research in it. Make sure to tell the agent to follow $finding-discovery skill but with the scope of only that single file. - - When ever you or a sub-agent has finished the exaustive reading and analyzing of the target file, check it off on the list. - - CRITICALLY IMPORTANT! You CANNOT check a file off UNTIL YOU OR A SUBAGENT HAS READ THE FULL FILE! Not just broad searching! Not just skim! READ EVERY SINGLE LINE! If dispatching a subpagent make sure to include this! - - You CAN NOT check off files just because they appear in searches. Only reading the file will allow you to check it off. - - You MUST process each file one at a time. Do not simple read all the files or search through them all. This is a process where each file must get your undivided attention! That is the only way to perform a exhaustive security audit! -- Do not skip any files in the check list. Do not stop or return to the user until every single file in the list has been checked off. Even if you have found and logged vulnerabilities, it is critical that you continue tasking sub-agents or performing vulnerability research on all files in the list. -- If you have subagents avaliable in your current tool set, you MUST use them. Before starting this phase, if the user has not mentioned allowing sub-agents, stop and explicity ask them to allow for the use of sub-agents, as not using them is something which will make this plugin not work. That is the only efficent and effective way to perform this scan. -- If sub-agents are not avaliable in your environment, iterate through all files in the list yourself, performing the same detailed vulnerability research on each one. -- When you hit a cap of active agents, perform additional analysis on the target until one of the agents has returned with its results. Once you confirm the results are recorded and checked off on the file list, spawn the next agent with the next file in the list. diff --git a/plugins/codex-security/skills/security-scan/references/repo-wide-high-impact-families.md b/plugins/codex-security/skills/security-scan/references/repo-wide-high-impact-families.md index e487cb24..bf566d3f 100644 --- a/plugins/codex-security/skills/security-scan/references/repo-wide-high-impact-families.md +++ b/plugins/codex-security/skills/security-scan/references/repo-wide-high-impact-families.md @@ -24,7 +24,7 @@ Use this reference with `repository-wide-scan.md` for family-specific repository - For each parser/helper, deserializer, auth/token/assertion, protocol/version, and polymorphic-operation shard, run targeted control-hazard searches inside the boundary package before finalizing the shard. Use repository-appropriate patterns; examples include subclass/override declarations, concrete codec/converter/input-handler/validator/filter/guard/resolver classes, `super.` calls, `split`, `parse`, `parseInt`, `matches`, `find`, `startsWith`, `equals`, `URL`, `URI`, `get(0)`, `cloneNode`, `registerConverter`, deny/allow lists, and validator/comparator method pairs. - Every matching root-control line that is in a reachable boundary package must be closed in the ledger as reportable, suppressed with exact counterevidence, or deferred. - For file-format object models such as PDF/COS, XML/DOM, YAML nodes, archive entries, protobuf/message fields, image/font tables, or binary protocol records, run a primitive-helper sweep over array, dictionary, node, token, numeric, and iterator helpers before closing the parser shard. Search method shapes such as `to*Array`, `get*`, `getObject`, `toFloat`, `toInt`, `parse*`, `iterator`, `size`, unchecked casts, and loops over attacker-controlled collection sizes; these helpers are root controls when they convert, cast, recurse, or allocate from untrusted document structure. -- When the runtime inventory identifies a central first-party object-model package for an untrusted file or message format, the coverage ledger must contain rows for that package's array, dictionary, node, collection, and primitive conversion helpers before the parser shard can close. +- When ranking or deep review identifies a central first-party object-model package for an untrusted file or message format, the coverage ledger must contain rows for that package's array, dictionary, node, collection, and primitive conversion helpers before the parser shard can close. - Object-model helper sweeps create mandatory coverage rows first, not automatic findings. Promote a helper row only when malformed or adversarial input can plausibly reach the helper and the missing type, size, shape, recursion, numeric, or conversion guard can cause crash, denial of service, parser confusion, authorization bypass, or another concrete security impact. - Deterministic malformed-input crashes are security-relevant when untrusted remote, protocol, document, archive, or package input reaches a missing parser/helper guard and can abort a service, request worker, parser pipeline, or security negotiation. Do not suppress these as generic robustness unless exact containment evidence shows caller-side recovery, equivalent prevalidation, or a non-security-only boundary for that instance. - For advisory, release-note, or security-test seeded parser/file-format DoS rows, a bespoke runtime harness is not always required when checked-out code plus existing tests or deterministic code reasoning make the untrusted source, broken helper control, and impact directly evident. Record a missing harness as a confidence limit; otherwise keep the row deferred rather than substituting seed text for local proof. diff --git a/plugins/codex-security/skills/security-scan/references/repo-wide-instance-expansion.md b/plugins/codex-security/skills/security-scan/references/repo-wide-instance-expansion.md index aad0ad9c..bd3fc48c 100644 --- a/plugins/codex-security/skills/security-scan/references/repo-wide-instance-expansion.md +++ b/plugins/codex-security/skills/security-scan/references/repo-wide-instance-expansion.md @@ -7,7 +7,7 @@ Use this reference with `repository-wide-scan.md` to avoid representative-only r Within the existing scan workflow, keep repository-wide scans instance-aware: - Discovery should create one candidate per independently vulnerable source/sink/control instance. -- Validation should preserve or suppress each candidate instance independently. +- The file-review subagent or parent agent that discovers a candidate should validate and attack-path that candidate instance before it enters cross-file dedupe, then later validation should preserve or suppress each deduped instance independently. - The final markdown report may add grouped summaries for readability, but only after each surviving instance has its own finding entry with affected location, source, broken control, sink, impact, and counterevidence. - Include suppressed candidates in the phase artifacts with the exact file/line and counterevidence so false-positive controls remain auditable. @@ -15,7 +15,7 @@ This mode improves recall while preserving precision: breadth comes from systema ## Child Instance Expansion -- When a broad ledger or candidate row names a whole operation family such as "all SQL trigger variants", "all deserialization variants", "all path traversal helpers", "all SSRF modes", "all generated framework adapters", or "all unauthenticated mutation endpoints", split it into child instances keyed by concrete exported function, route branch, sink statement, API mode, parser/deserializer variant, or protected action before validation and final reporting. +- When a broad ledger or candidate row names a whole operation family such as "all SQL trigger variants", "all deserialization variants", "all path traversal helpers", "all SSRF modes", "all generated framework adapters", or "all unauthenticated mutation endpoints", split it into child instances keyed by concrete exported function, route branch, sink statement, API mode, parser/deserializer variant, or protected action before cross-file dedupe, validation closure, and final reporting. - If one root cause creates multiple vulnerable templates, routes, query builders, parser/deserializer variants, path/file helpers, auth/object endpoints, protected actions, shared-helper callers, or config entries, carry each affected file/line through the phases as its own instance unless the runtime path truly cannot be separated. - If one route or helper contains multiple same-family sink/control lines, such as `execute`/`executemany`/`executescript`, `pickle.load`/`pickle.loads`/`yaml.load`/`yaml.load_all`, distinct file/path helper calls, insert/select/delete/update query builders, or unauthenticated create/delete/reset/admin/job actions, preserve each operation as a separate instance when attackers can trigger it independently. - For repeated vulnerable patterns, keep each independently vulnerable file and sink/control line as its own finding entry through discovery, validation, and final reporting. Do not rely on one representative finding with many extra files when those files can be attacked independently. diff --git a/plugins/codex-security/skills/security-scan/references/repo-wide-validation-closure.md b/plugins/codex-security/skills/security-scan/references/repo-wide-validation-closure.md index b92a2019..f3276a22 100644 --- a/plugins/codex-security/skills/security-scan/references/repo-wide-validation-closure.md +++ b/plugins/codex-security/skills/security-scan/references/repo-wide-validation-closure.md @@ -1,6 +1,6 @@ # Repository-Wide Validation Closure -Use this reference with `repository-wide-scan.md` to preserve repo-wide coverage through validation, attack-path analysis, and final reporting. +Use this reference with `repository-wide-scan.md` to preserve repo-wide coverage through candidate-local validation, candidate-local attack-path analysis, post-dedupe reconciliation, and final reporting. ## Closure Dispositions @@ -10,11 +10,15 @@ Use this reference with `repository-wide-scan.md` to preserve repo-wide coverage - Each in-scope row must be recorded even when no candidate is found. Candidate ids are optional links from coverage rows to findings, not the reason the row exists. - `open_seed` is an interim disposition only; before final reporting, every row must become `reportable`, `suppressed`, `not_applicable`, or `deferred` with exact file:line evidence or a concrete proof-gap reason. +## Pre-Dedupe Candidate Requirements + +- Raw repository-wide candidates should already carry the discovering file-review subagent's or parent agent's candidate-local validation evidence and attack-path facts before dedupe. +- Each raw candidate finding's `findings//candidate_ledger.jsonl` is its candidate-finding coverage artifact. Every raw candidate finding must have a stable candidate id plus candidate-ledger receipts for candidate-local validation and candidate-local attack-path analysis, or an explicit deferred reason for missing proof, before dedupe or final reporting. +- Post-dedupe validation is for reconciliation, unresolved proof gaps, cross-file confirmation, and final closure; do not let dedupe erase the earlier proof tuple or its candidate-ledger traceability. + ## Validation Budget And Coverage - Before finalizing a high-impact finding in a large repository, check whether the same inventory shard has unreviewed sibling packages, wrappers, or root controls that could be the actual broken control. If so, finish that shard's sibling checks or mark the remaining rows `deferred` explicitly. -- After promoting one or two obvious hotspot findings in a large repository, run a short coverage-diversity check before finalization: inspect at least one disjoint seeded/advisory row or low-salience utility shard outside the current hotspot cluster, such as protocol/version utilities, central parser/object-model helpers, reusable auth/token validators, or shared deserializer controls. Record the checked disjoint shard in the ledger as `reportable`, `suppressed`, `not_applicable`, or `deferred`. -- For large repositories, do not equate top-level hotspot coverage with subsystem coverage. Within high-risk subsystems, sample leaf runtime validators, comparators, object-model helpers, and special-case operation branches because these are often where the security control is actually skipped. - For query injection candidates, parser-control evidence is enough to keep the instance alive when attacker input crosses into query syntax or selector operators. Record later checks as constraints, but do not suppress the root injection until the exact query API, parser behavior, and post-query guard prove attacker input cannot change semantics or create a meaningful read/write/error side effect. - For template and placeholder injection candidates, preserve the helper line that creates the placeholder/expression engine and the line that resolves model values through it. Recursive expansion or re-parsing of resolved values remains in scope when the resolved values are request, client, tenant, stored configuration, or error data. Suppress only with exact evidence that resolved values are constant/trusted or escaped before any second parse/evaluation. diff --git a/plugins/codex-security/skills/security-scan/references/repository-wide-scan.md b/plugins/codex-security/skills/security-scan/references/repository-wide-scan.md index de0b78a8..abc34ee3 100644 --- a/plugins/codex-security/skills/security-scan/references/repository-wide-scan.md +++ b/plugins/codex-security/skills/security-scan/references/repository-wide-scan.md @@ -6,10 +6,11 @@ Use this guidance when the security scan target is the entire checked-out reposi Before repository-wide discovery or validation, read this file and all of these same-directory references in order. They are mandatory extensions of this workflow, not optional background: -1. `repo-wide-artifacts-and-ledger.md` for runtime inventory, seed research, exhaustive file checklist, coverage ledger, subagent/file-pass, and closure artifact rules. -2. `repo-wide-high-impact-families.md` for high-impact vulnerability family heuristics and exact suppression boundaries. -3. `repo-wide-instance-expansion.md` for child-instance splitting, wrapper/root-control preservation, and per-operation reporting. -4. `repo-wide-validation-closure.md` for validation/report closure, deferred rows, secondary issue ordering, and false-positive controls. +1. `scan-artifacts-and-ledger.md` for shared artifact, seed, subagent, scoped file-review, candidate-ledger, and dedupe rules. +2. `repo-wide-artifacts-and-ledger.md` for rank input, subagent ranking, deep-review selection, and repository coverage-ledger rules. +3. `repo-wide-high-impact-families.md` for high-impact vulnerability family heuristics and exact suppression boundaries. +4. `repo-wide-instance-expansion.md` for child-instance splitting, wrapper/root-control preservation, and per-operation reporting. +5. `repo-wide-validation-closure.md` for validation/report closure, deferred rows, secondary issue ordering, and false-positive controls. Do not treat `repository-wide-scan.md` alone as the complete repo-wide scan procedure. @@ -20,8 +21,8 @@ Use an exhaustive instance-finding workflow rather than the diff-scan workflow's Repository-wide scans must: - Load the per-scan threat model path from `../../../references/scan-artifacts.md` as the repo-specific threat-model source of truth. -- Build an entrypoint and trust-boundary inventory before validation: routes, handlers, templates, serializers, deserializers, query builders, shell/process calls, file/path APIs, network fetches/callbacks, auth/authz middleware, session/cookie config, secret/config sources, IaC or policy resources, and agent/tool boundaries. -- Create `runtime_inventory.md`, `seed_research.md` when seed hints exist, `exhaustive-file-checklist.md`, and `repository_coverage_ledger.md` using the artifact paths from `../../../references/scan-artifacts.md`. +- Build or consume an authoritative parent-provided `rank_input.csv` before validation so the repository-wide candidate file inventory covers routes, handlers, templates, serializers, deserializers, query builders, shell/process calls, file/path APIs, network fetches/callbacks, auth/authz middleware, session/cookie config, secret/config sources, IaC or policy resources, and agent/tool boundaries. +- Create `seed_research.md` when seed hints exist, `rank_input.csv`, `rank_output.csv` when ranking applies, `deep_review_input.csv`, `work_ledger.jsonl`, `raw_candidates.jsonl`, per-finding `findings//candidate_ledger.jsonl`, `dedupe_report.md`, `deduped_candidates.jsonl`, and `repository_coverage_ledger.md` using the artifact paths from `../../../references/scan-artifacts.md`. - Create a high-impact coverage ledger before deep validation. The ledger is a coverage artifact, not a list of potential findings, and must include rows without candidates as well as reportable candidates. - Keep every applicable high-impact, user-seeded, advisory-seeded, or tag-seeded row open until that exact area is closed as `reportable`, `suppressed`, `not_applicable`, or `deferred` with exact evidence or proof-gap reasons. - Enumerate every technically distinct high-impact vulnerable instance discovered under those families, not just one representative example per class. @@ -35,15 +36,23 @@ During finding discovery, apply this repository-wide workflow instead of the dif Run this broader but still bounded workflow: 1. Read the required references listed above. -2. Build and save `runtime_inventory.md` from entrypoints, trust boundaries, security-sensitive configuration, and product/runtime areas. -3. Build `exhaustive-file-checklist.md` before fan-out. Each line must start with `- [ ] ` and cannot be checked until the file has been fully read and reviewed. +2. Resolve `rank_input.csv` before subagent dispatch: + - if an upstream parent orchestrator explicitly provided authoritative repo-wide worklists and both `/rank_input.csv` and `/deep_review_input.csv` already exist, consume that `rank_input.csv` as supplied + - otherwise generate `rank_input.csv` using `scripts/generate_rank_input.py make-repo-rank-input`; this is the deterministic candidate file inventory +3. Resolve `deep_review_input.csv`: + - if an upstream parent orchestrator explicitly provided authoritative repo-wide worklists and both standard worklist files already exist, consume that `deep_review_input.csv` as supplied without reranking or overwrite + - otherwise apply the `top-percent` flow from `repo-wide-artifacts-and-ledger.md`: for `top-percent` below 100, run subagent ranking over `rank_input.csv` using the runtime-surface scoring guidance and select `deep_review_input.csv`; for `top-percent` 100 or higher, copy every candidate row directly into `deep_review_input.csv` 4. Run advisory/seed research when the user or scan context includes CVE, GHSA, advisory, issue, release, package-version, or vulnerability-family identifiers. Save `seed_research.md` and create exact seed-target ledger rows. 5. Build and save `repository_coverage_ledger.md` with one row per applicable boundary and serious vulnerability family before deep validation begins. 6. Run one frontier pass across every applicable high-impact shard before prolonged validation or build/debug work on any single shard. 7. Run targeted control-hazard searches for parser/helper, deserializer, auth/token/assertion, protocol/version, and polymorphic-operation shards using `repo-wide-high-impact-families.md`. -8. Run high-impact fan-out passes before any secondary review. When one vulnerable pattern is found, search sibling files, routes, templates, handlers, models, and config variants before moving on. +8. Run high-impact sibling-expansion passes before any secondary review. When one vulnerable pattern is found, the file-review subagent or parent agent that owns that candidate must search sibling files, routes, templates, handlers, models, and config variants before moving on. 9. When a high-impact instance flows through a wrapper into a shared parser, deserializer, path/archive helper, expression evaluator, or auth/authz control, record both the reachable wrapper and the underlying shared sink/control. -10. Iterate over every file in the in-scope checklist, using subagents when available as described in `repo-wide-artifacts-and-ledger.md`. -11. Split broad families and repeated same-family operations into child instances using `repo-wide-instance-expansion.md`. -12. Treat data exposure, hardcoded secrets, weak session/cookie/security config, CSRF, rate limits, and plaintext storage as secondary. Include them only after the high-impact ledger and file list are exhausted or when they directly enable code execution, injection, privilege escalation, meaningful auth bypass, or sensitive cross-boundary impact. -13. Preserve each validated or suppressed instance through validation, attack-path analysis, and final reporting using `repo-wide-validation-closure.md`. +10. Dispatch file-review subagents over `deep_review_input.csv` using the shared ownership rules in `scan-artifacts-and-ledger.md`. Each file-review subagent owns its assigned file or tiny shard, performs full-file review, and returns pre-dedupe finding objects with candidate-local validation evidence and attack-path facts for findings it discovered. +11. Aggregate file-review-subagent outputs into `raw_candidates.jsonl` and append one candidate-ledger row per raw candidate finding. +12. Do not continue until each raw candidate finding's `findings//candidate_ledger.jsonl` shows validation and attack-path coverage, or an explicit deferred reason for any missing proof. +13. Split broad families and repeated same-family operations into child instances using `repo-wide-instance-expansion.md` before cross-file dedupe whenever the child instances are already visible in subagent output. +14. Run cross-file dedupe into `dedupe_report.md` and `deduped_candidates.jsonl` without dropping independently reachable sibling instances, and preserve the raw candidate ids absorbed into each deduped candidate. +15. Use post-dedupe validation and attack-path work for repository-wide reconciliation, unresolved proof gaps, and final closure, not as the first review pass for raw findings. When multiple deduped candidates or coverage-ledger rows remain open and subagents are available and approved, divide validation and attack-path work across candidate/row-scoped subagents using `scan-artifacts-and-ledger.md`. +16. Treat data exposure, hardcoded secrets, weak session/cookie/security config, CSRF, rate limits, and plaintext storage as secondary. Include them only after the high-impact ledger and file list are exhausted or when they directly enable code execution, injection, privilege escalation, meaningful auth bypass, or sensitive cross-boundary impact. +17. Preserve each validated or suppressed instance through validation, attack-path analysis, and final reporting using `repo-wide-validation-closure.md`. diff --git a/plugins/codex-security/skills/security-scan/references/scan-artifacts-and-ledger.md b/plugins/codex-security/skills/security-scan/references/scan-artifacts-and-ledger.md new file mode 100644 index 00000000..6db08bcc --- /dev/null +++ b/plugins/codex-security/skills/security-scan/references/scan-artifacts-and-ledger.md @@ -0,0 +1,83 @@ +# Scan Artifacts And Ledger + +Use this reference whenever the scan needs auditable candidate coverage or a scoped file-review worklist. + +## Artifact Requirements + +- Load the per-scan threat model path from `../../../references/scan-artifacts.md` as the repo-specific threat-model source of truth. +- Use the artifact paths from `../../../references/scan-artifacts.md` for `seed_research.md`, `deep_review_input.csv` when a scoped file-review worklist is needed, `work_ledger.jsonl` when a scoped file-review worklist is needed, `raw_candidates.jsonl` when multiple file-review results are aggregated, `dedupe_report.md` and `deduped_candidates.jsonl` when cross-file dedupe is needed, and per-finding `findings//candidate_ledger.jsonl`. + +## Seed Research + +- First capture user-provided scope hints such as CVE/GHSA/advisory identifiers, package versions, named vulnerability families, or release/security-test references. +- When the user request or scan context includes CVE, GHSA, advisory, issue, release, package-version, or explicit vulnerability-family identifiers, run an advisory seed pass before deep frontier scanning and save it to the advisory seed research path from `../../../references/scan-artifacts.md`. +- Use authoritative advisory text, project security notes, release notes, fix commits, pull requests, issue trackers, and security tests when network access or local history is available. Record the sources searched, candidate files/functions/classes/hunks, expected vulnerable behavior, and any failed lookup attempts. +- Treat those candidates as seed rows only: validate the vulnerable behavior against the checked-out repository before reporting. Do not let the seed lane replace the scan's primary scope. +- When CVE/advisory context has a generic or unhelpful category, prioritize advisory, fix-commit, release-note, and security-test lookup before broad sink hotspot scanning. If external lookup is unavailable or inconclusive, run a local regression-seed pass over project-specific protocol, parser, validator, and utility names plus the CVE/advisory terms; do not assume obvious REST/upload/XML hotspots are the intended security regression. +- When the seed pass or local search opens a candidate file, class, package, or hunk, create an exact seed-target row for that area before opportunistic same-family scanning. Run a short seed-first triage over that file/package and its immediate shared helper or caller chain, then close the row as `reportable`, `suppressed`, `not_applicable`, or `deferred`. A more obvious neighboring issue can be reported too, but it does not replace the seed-target row. +- Keep every user/advisory/tag-seeded boundary package or class family open until that exact area is closed as `reportable`, `suppressed`, `not_applicable`, or `deferred`. A broader same-family finding in a neighboring parser, auth flow, deserializer, or template engine does not implicitly close the seeded row. +- In advisory-led scans, treat the advisory, fix hunk, release note, or security test as evidence for the intended root cause, not as an exclusivity filter and not as a bare finding. Keep the exact seed row open until checked-out repository evidence independently supports or disproves the same source, broken control, and impact tuple. + +## Subagent Requirements + +- When a scan uses subagent-dispatch phases and subagents are available in the current tool set, use subagents for those phases. +- If subagents are available and the user has not explicitly allowed subagents, stop before starting subagent dispatch and ask for that approval. Do not silently fall back to parent-agent-only review when subagents are available. +- File-review-subagent ownership: one file-review subagent owns one `deep_review_input.csv` row or one very small tightly coupled shard, max 5 files, and returns full-file receipts plus pre-dedupe finding objects for that assignment. +- File-review subagents are read-only with respect to the target code under review, but they are allowed and expected to write scan artifacts under ``, including `work_ledger.jsonl`, raw candidate snippets, and per-candidate ledger receipts when those artifact paths are provided in the prompt. +- Validation-subagent ownership: one validation subagent owns one candidate finding, one deduped candidate, or one repository coverage-ledger row that needs validation closure. It writes or returns validation artifacts, the visible validation report update, and the validation candidate-ledger receipt for that assignment. +- Attack-path-subagent ownership: one attack-path subagent owns one validated candidate finding or one reportable/deferred validation closure row. It writes or returns attack-path facts, severity/policy analysis, the visible attack-path report update, and the attack-path candidate-ledger receipt for that assignment. +- Parent-agent ownership: the parent agent owns `deep_review_input.csv` generation, subagent dispatch, work-ledger and candidate-ledger reconciliation, aggregation of subagent outputs, cross-file dedupe when needed, and final scan closure. +- Subagent prompts must carry the exact current scan instructions they are expected to follow. Do not rely on the subagent implicitly inheriting this skill, another phase skill, previous parent context, or a summarized reference name. + +### File-Review Subagent Handoff + +The parent agent should give each file-review subagent enough concrete context to execute its assigned row without relying on implicit parent context. Keep the prompt concise, but include: + +- the assigned `deep_review_input.csv` row or tiny shard +- the scan target, scan mode, `repo_name`, `scan_id`, `artifacts_dir`, and per-scan threat model path or summary +- the writable artifact paths for `work_ledger.jsonl`, `raw_candidates.jsonl`, and per-candidate ledger receipts +- any user-provided comparison or seed artifact that should affect coverage, such as an HTML/markdown report, prior security-review output, advisory text, CVE/GHSA, release note, issue, or separate audit directory +- the expectation to read assigned files in full, read only the supporting files needed for concrete findings, and return or write full-file receipts, raw candidates, suppressions/deferred rows, and ledger receipts + +The parent agent must reject or re-prompt any file-review subagent result that lacks full-file receipts, omits source/control/sink/impact for candidates, omits candidate-local validation or attack-path facts for reported candidates, or returns only "no bugs found" without closing the assigned row with evidence. + +### Validation And Attack-Path Subagent Handoff + +After discovery and dedupe, divide validation and attack-path work across subagents when multiple candidates or coverage-ledger rows need closure and subagents are available and approved. + +For validation subagents, include the candidate or ledger row, discovery evidence, relevant raw/deduped candidate ids, affected files, threat-model context, validation artifact/report paths, and the candidate-ledger path that needs the validation receipt. The validation subagent should preserve or suppress the assigned instance only; it does not own final report assembly or unrelated candidates. + +For attack-path subagents, include the validated candidate or validation closure row, validation report path or summary, affected root-control and sink lines, threat-model context, attack-path report path, and the candidate-ledger path that needs the attack-path receipt. The attack-path subagent should produce reachability, counterevidence, severity/policy analysis, and final reportability facts for the assigned instance only. + +The parent agent must reconcile validation and attack-path subagent outputs before final reporting. Do not finalize while any reportable, suppressed, not_applicable, or deferred candidate/ledger row lacks the required receipt or explicit proof-gap reason. + +## Scoped Deep Review + +- Use `deep_review_input.csv` as the canonical scoped deep-review worklist for every diff-scoped and repository-wide scan. +- For diff-scoped scans, generate `rank_input.csv` deterministically from changed source-like files with `python3 /scripts/generate_rank_input.py make-diff-rank-input --repo --base --head --out /rank_input.csv`, then copy every row into `deep_review_input.csv` with `python3 /scripts/generate_rank_input.py copy-deep-review-input --rank-input /rank_input.csv --out /deep_review_input.csv`. +- Diff-scoped scans do not rank or drop changed files before deep review. Every row in diff `rank_input.csv` must be copied into `deep_review_input.csv` and receive a full-file review receipt. +- Add directly supporting files required to understand the changed security behavior only when repository evidence shows they are needed; record the add-back reason in the work ledger or per-file result. +- For repository-wide scans, `deep_review_input.csv` is selected from the repository-wide ranked inventory. +- Deep-review every file selected into `deep_review_input.csv`. + - Use `/work_ledger.jsonl` as the append-only record of claims and completions, and reconcile it against `deep_review_input.csv` so rows are not skipped or double-counted. + - Use subagents when available and approved. + - A file-review subagent must read every assigned file in full, update the ledger receipt for those files, and return the raw finding results for that assignment. + - When a file-review subagent finds a plausible finding in its assigned file or shard, that same subagent should carry that finding through candidate-local validation and candidate-local attack-path analysis before handing it back. The raw result should include the source or privileged boundary, closest relevant control, sink or broken control, impact, validation method/evidence or exact proof gap, attack-path facts, and disposition. + - A file-review subagent may read the minimum supporting files needed to validate or explain a finding it discovered, but it does not own unrelated rows or final scan closure. + - If subagents are not available, iterate through `deep_review_input.csv` yourself with the same full-file standard. + - A file is not covered because it appeared in searches. It is covered only when the responsible subagent or parent agent returns a receipt showing the file was read in full. + - Record file-level completion, disposition, and a concise evidence note in `/work_ledger.jsonl`; do not create a separate per-file findings directory. + - Append normalized, pre-dedupe candidate objects to `/raw_candidates.jsonl` when multiple file-review results are being aggregated or cross-file dedupe is needed. + - Do not stop until every `deep_review_input.csv` row has a completion receipt. + +## Candidate Finding Coverage + +- Track candidate-finding coverage separately from file coverage. + - Use `/findings//candidate_ledger.jsonl` as the append-only record for each candidate finding. + - Every candidate finding must have a stable candidate id plus candidate-ledger receipts for discovery, validation, and attack-path analysis before final reporting. + - When the finding is emitted during a scoped file-review pass, candidate-local validation and candidate-local attack-path analysis should be recorded before the finding is eligible for cross-file dedupe. + - Validation coverage must record the validation method, evidence or exact proof gap, and disposition for that candidate finding. + - Attack-path coverage must record the source or privileged boundary, closest relevant control, sink or broken control, impact path, and severity-relevant facts for that candidate finding. + - A candidate finding is not covered because it appears in a report or `raw_candidates.jsonl`. It is covered only when its candidate ledger shows the required receipts, or an explicit deferred reason for the missing proof. +- When multiple raw candidate streams need cross-file dedupe, dedupe only after the relevant candidate ledgers prove the required coverage. Write `/dedupe_report.md` and `/deduped_candidates.jsonl`, preserving independently reachable sibling instances. +- Dedupe must preserve candidate-ledger traceability: every deduped candidate must list the raw candidate ids and per-finding candidate ledgers it absorbed, and independently reachable sibling instances must remain separate. diff --git a/plugins/codex-security/skills/security-scan/scripts/generate_rank_input.py b/plugins/codex-security/skills/security-scan/scripts/generate_rank_input.py new file mode 100644 index 00000000..9353c926 --- /dev/null +++ b/plugins/codex-security/skills/security-scan/scripts/generate_rank_input.py @@ -0,0 +1,433 @@ +#!/usr/bin/env python3 +"""Generate and post-process security scan worklists. + +This script stays deliberately model-free: + +- `make-repo-rank-input` creates the deterministic repository-wide candidate CSV + that ranking subagents consume. +- `make-diff-rank-input` creates the deterministic diff-scoped candidate CSV + from Git changed paths that ranking subagents consume. +- `copy-deep-review-input` copies every candidate row into the deep-review input for + exhaustive mode. +- `select-deep-review-input` parses worker-produced ranking output and selects + the rows for deep review. +""" + +from __future__ import annotations + +import argparse +import csv +import json +import subprocess +from pathlib import Path + +EXCLUDED_DIRS = { + ".cache", + ".circleci", + ".devcontainer", + ".git", + ".github", + ".idea", + ".mypy_cache", + ".pytest_cache", + ".ruff_cache", + ".tox", + ".venv", + ".vscode", + "__pycache__", + "bench", + "benchmark", + "bintest", + "build", + "build_config", + "build_configs", + "build-tools", + "build_tools", + "ci", + "coverage", + "deps", + "dev", + "dist", + "doc", + "docs", + "example", + "examples", + "external", + "extern", + "fixture", + "fixtures", + "generated", + "node_modules", + "sample", + "samples", + "target", + "test", + "tests", + "testing", + "third-party", + "third_party", + "tmp", + "vendor", +} + +EXCLUDED_FILENAMES = { + ".DS_Store", + "CHANGELOG", + "CHANGELOG.md", + "CONTRIBUTING.md", + "Dockerfile", + "Gemfile", + "Gemfile.lock", + "LICENSE", + "LICENSE.md", + "Makefile", + "NEWS", + "NEWS.md", + "NOTICE", + "README", + "README.md", + "README.rst", + "Rakefile", + "SECURITY.md", + "TODO", + "TODO.md", + "docker-compose.yml", + "package-lock.json", + "pnpm-lock.yaml", + "yarn.lock", +} + +TEXT_CODE_EXTENSIONS = { + ".c", + ".cc", + ".cfg", + ".clj", + ".cpp", + ".cs", + ".css", + ".cue", + ".cxx", + ".dart", + ".ex", + ".exs", + ".go", + ".graphql", + ".h", + ".hpp", + ".hs", + ".html", + ".java", + ".js", + ".json", + ".jsx", + ".kt", + ".kts", + ".lua", + ".mjs", + ".mm", + ".php", + ".proto", + ".py", + ".rb", + ".rs", + ".scala", + ".sh", + ".sql", + ".swift", + ".toml", + ".ts", + ".tsx", + ".vue", + ".xml", + ".yaml", + ".yml", +} + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Codex Security scan worklist helper.") + subparsers = parser.add_subparsers(dest="command", required=True) + + make = subparsers.add_parser( + "make-repo-rank-input", + help="Create rank_input.csv for subagent-based file ranking.", + ) + make.add_argument("--repo", required=True, help="Repository root.") + make.add_argument("--scope", default=".", help="Subdirectory to scan.") + make.add_argument("--out", required=True, help="Output rank_input.csv path.") + make.add_argument("--area", default="", help="Area label. Defaults to scope.") + make.add_argument( + "--preview-bytes", + type=int, + default=200, + help="Number of bytes to include in preview column.", + ) + + diff = subparsers.add_parser( + "make-diff-rank-input", + help="Create rank_input.csv from Git changed source-like files.", + ) + diff.add_argument("--repo", required=True, help="Repository root.") + diff.add_argument("--base", required=True, help="Git diff base revision.") + diff.add_argument("--head", default="HEAD", help="Git diff head revision.") + diff.add_argument("--out", required=True, help="Output rank_input.csv path.") + diff.add_argument("--area", default="diff", help="Area label for assess rows.") + diff.add_argument( + "--preview-bytes", + type=int, + default=200, + help="Number of bytes to include in preview column.", + ) + + select = subparsers.add_parser( + "copy-deep-review-input", + help="Create deep_review_input.csv directly from rank_input.csv for exhaustive mode.", + ) + select.add_argument("--rank-input", required=True, help="Deterministic rank input CSV.") + select.add_argument("--out", required=True, help="Output deep_review_input.csv path.") + + select = subparsers.add_parser( + "select-deep-review-input", + help="Create deep_review_input.csv from worker-produced rank_output.csv.", + ) + select.add_argument("--rank-output", required=True, help="Worker ranking CSV.") + select.add_argument("--out", required=True, help="Output deep_review_input.csv path.") + select.add_argument( + "--top-percent", + type=int, + default=20, + help="Percent of included files to keep for deep review.", + ) + return parser.parse_args() + + +def is_binary_sample(data: bytes) -> bool: + return b"\0" in data + + +def preview_for(path: Path, preview_bytes: int) -> tuple[str, bool]: + try: + data = path.read_bytes() + except OSError: + return "", True + sample = data[:4096] + if is_binary_sample(sample): + return "", True + preview = ( + data[:preview_bytes].decode("utf-8", errors="ignore").replace("\n", " ").replace("\r", " ") + ) + return preview, False + + +def path_is_excluded(path: Path) -> bool: + if any(part in EXCLUDED_DIRS for part in path.parts): + return True + if path.name in EXCLUDED_FILENAMES: + return True + if path.name.endswith((".min.js", ".map")): + return True + return False + + +def resolve_scope(repo: Path, scope: str) -> Path: + scope_path = Path(scope).expanduser() + if not scope_path.is_absolute(): + scope_path = repo / scope_path + scope_path = scope_path.resolve() + repo_resolved = repo.resolve() + try: + scope_path.relative_to(repo_resolved) + except ValueError as exc: + raise SystemExit(f"Scope must be inside repo: {scope_path}") from exc + if not scope_path.is_dir(): + raise SystemExit(f"Scope path not found: {scope_path}") + return scope_path + + +def write_rows(output: Path, rows: list[tuple[str, str, str]], headers: list[str]) -> None: + output.parent.mkdir(parents=True, exist_ok=True) + with output.open("w", newline="") as f: + writer = csv.writer(f) + writer.writerow(headers) + writer.writerows(rows) + + +def make_repo_rank_input(args: argparse.Namespace) -> None: + repo = Path(args.repo).expanduser().resolve() + if not repo.is_dir(): + raise SystemExit(f"Repo path not found: {repo}") + scope_abs = resolve_scope(repo, args.scope) + scope_rel = scope_abs.relative_to(repo).as_posix() + area = args.area or scope_rel + + rows: list[tuple[str, str, str]] = [] + for path in scope_abs.rglob("*"): + try: + if not path.is_file(): + continue + except OSError: + continue + rel = path.relative_to(repo) + if path_is_excluded(rel): + continue + if path.suffix.lower() not in TEXT_CODE_EXTENSIONS: + continue + + preview, is_binary = preview_for(path, args.preview_bytes) + if is_binary: + continue + rows.append((rel.as_posix(), area, preview)) + + rows.sort(key=lambda row: row[0]) + output = Path(args.out).expanduser() + write_rows(output, rows, ["path", "area", "preview"]) + + print(f"Wrote {len(rows)} rows to {output}") + + +def git_changed_paths(repo: Path, base: str, head: str) -> list[Path]: + result = subprocess.run( + [ + "git", + "-C", + str(repo), + "diff", + "--name-only", + "--diff-filter=ACMR", + f"{base}..{head}", + ], + check=True, + capture_output=True, + text=True, + ) + paths: list[Path] = [] + for line in result.stdout.splitlines(): + if not line: + continue + path = repo / line + if not path.exists() or not path.is_file(): + continue + paths.append(path) + return paths + + +def make_diff_rank_input(args: argparse.Namespace) -> None: + repo = Path(args.repo).expanduser().resolve() + if not repo.is_dir(): + raise SystemExit(f"Repo path not found: {repo}") + + rows: list[tuple[str, str, str]] = [] + for path in git_changed_paths(repo, args.base, args.head): + rel = path.relative_to(repo) + if path_is_excluded(rel): + continue + if path.suffix.lower() not in TEXT_CODE_EXTENSIONS: + continue + + preview, is_binary = preview_for(path, args.preview_bytes) + if is_binary: + continue + rows.append((rel.as_posix(), args.area, preview)) + + rows.sort(key=lambda row: row[0]) + output = Path(args.out).expanduser() + write_rows(output, rows, ["path", "area", "preview"]) + print(f"Wrote {len(rows)} rows to {output}") + + +def parse_bool(value: object) -> bool | None: + if isinstance(value, bool): + return value + if value is None: + return None + normalized = str(value).strip().lower() + if normalized in {"true", "1", "yes", "y"}: + return True + if normalized in {"false", "0", "no", "n"}: + return False + return None + + +def select_deep_review_input(args: argparse.Namespace) -> None: + rank_output = Path(args.rank_output).expanduser() + if not rank_output.exists(): + raise SystemExit(f"Rank output missing: {rank_output}") + + rows: list[dict[str, object]] = [] + with rank_output.open(newline="") as f: + reader = csv.DictReader(f) + for row in reader: + payload: dict[str, object] = {} + raw_payload = row.get("result_json") or "" + if raw_payload: + try: + parsed = json.loads(raw_payload) + if isinstance(parsed, dict): + payload = parsed + except json.JSONDecodeError: + payload = {} + + score_raw: object = row.get("score") or payload.get("score") or 0 + try: + row["score"] = int(score_raw) + except (TypeError, ValueError): + row["score"] = 0 + + include = parse_bool(row.get("include")) + if include is None: + include = parse_bool(payload.get("include")) + row["include"] = True if include is None else include + rows.append(row) + + included = [row for row in rows if row.get("include")] + base_rows = included if included else rows + base_rows.sort(key=lambda row: int(row["score"]), reverse=True) + keep = max(1, int(len(base_rows) * (args.top_percent / 100.0))) if base_rows else 0 + selected = base_rows[:keep] + + output = Path(args.out).expanduser() + output.parent.mkdir(parents=True, exist_ok=True) + with output.open("w", newline="") as f: + writer = csv.writer(f) + writer.writerow(["path", "area"]) + for row in selected: + writer.writerow([row.get("path", ""), row.get("area", "")]) + + print(f"Selected {len(selected)} of {len(base_rows)} rows into {output}") + + +def copy_deep_review_input(args: argparse.Namespace) -> None: + rank_input = Path(args.rank_input).expanduser() + if not rank_input.exists(): + raise SystemExit(f"Rank input missing: {rank_input}") + + count = 0 + output = Path(args.out).expanduser() + output.parent.mkdir(parents=True, exist_ok=True) + with rank_input.open(newline="") as f, output.open("w", newline="") as out: + reader = csv.DictReader(f) + writer = csv.writer(out) + writer.writerow(["path", "area"]) + for row in reader: + writer.writerow([row.get("path", ""), row.get("area", "")]) + count += 1 + + print(f"Copied {count} rows into {output}") + + +def main() -> None: + args = parse_args() + if args.command == "make-repo-rank-input": + make_repo_rank_input(args) + elif args.command == "make-diff-rank-input": + make_diff_rank_input(args) + elif args.command == "copy-deep-review-input": + copy_deep_review_input(args) + elif args.command == "select-deep-review-input": + select_deep_review_input(args) + else: + raise SystemExit(f"Unknown command: {args.command}") + + +if __name__ == "__main__": + main() diff --git a/plugins/codex-security/skills/validation/SKILL.md b/plugins/codex-security/skills/validation/SKILL.md index 3b67a66c..6de0c1a7 100644 --- a/plugins/codex-security/skills/validation/SKILL.md +++ b/plugins/codex-security/skills/validation/SKILL.md @@ -32,9 +32,11 @@ Use the shared scan artifact path conventions in `../../references/scan-artifact - large internal repository mode: for repository-wide scans where runtime reproduction requires unavailable internal services, secrets, cloud accounts, service meshes, or local production data, use static trace plus existing tests and deploy/config evidence once the candidate has a complete source/control/sink/impact tuple. Missing internal runtime setup is not suppression evidence. 4. For non-compiled stacks, attempt to generate PoCs or targeted commands that exercise the vulnerable path and trigger the vulnerability. 5. For compiled stacks, prefer dynamic validation when it is feasible with bounded setup: build a debug variant or targeted test harness when available, reproduce the vulnerable behavior with a small PoC, then use valgrind, ASan, or a non-interactive debugger trace when those tools materially improve confidence. -6. Save any PoC files, inputs, or logs under the validation artifacts path from `../../references/scan-artifacts.md`. +6. Save any PoC files, inputs, or logs under that finding's validation artifacts path from `../../references/scan-artifacts.md`. 7. If validation is not feasible, document what was tried, what remains uncertain, and the exact proof gap. 8. Return a clear validation assessment per finding grounded in the evidence, proof gaps, and remaining uncertainty. +9. Save that finding's visible validation report to its per-finding validation report path from `../../references/scan-artifacts.md`. +10. Append one validation receipt per candidate id to that finding's candidate ledger path from `../../references/scan-artifacts.md`. The receipt must record the validation method, evidence or exact proof gap, disposition, and validation artifact/report reference for that candidate finding. ## Usage Guidance @@ -81,15 +83,16 @@ For repository-wide scans, also include a validation closure table with columns: ## Hard Rules - Do not imply validation happened when it did not. +- Do not leave candidate coverage implicit. Every candidate finding that enters validation must leave a validation receipt in `findings//candidate_ledger.jsonl`, even when the result is suppressed, uncertain, or deferred. - Prefer realistic local reproduction paths over contrived setups. - If a finding depends on missing product assumptions, state the question clearly instead of fabricating the answer. - Keep commands short, bounded, and non-interactive. - Use stronger validation methods such as crashing PoCs, valgrind, ASan, debugger traces, focused tests, or realistic interface reproduction before falling back to code understanding when the stack and scan scope make that feasible. - Calibrate confidence from the validation method and evidence, not from how dangerous the bug class sounds. -- Keep validation artifacts and the final visible report in the validation paths from `../../references/scan-artifacts.md` so the full scan bundle lives together. +- Keep validation artifacts and the final visible report in that finding's validation paths from `../../references/scan-artifacts.md` so the full scan bundle lives together. - Make a serious, bounded effort to get runtime validation working when it would materially change reportability, confidence, or severity. Consult repository guidance such as `AGENTS.md`, `README.md`, setup docs, test docs, build files, and package-manager metadata to identify the required dependencies, generated files, services, and setup steps. -- For scans that should not modify the target tree, use a disposable copy or generated-artifact directory under the validation artifacts path for builds, generated clients, patched test harnesses, and PoC files. A no-edit target rule does not forbid output-only build copies when they are needed to validate the original code. -- For repository-wide scans, update the validation report and closure table as each reportable, suppressed, not_applicable, or deferred row is decided. Do not leave validated rows only in transient notes, terminal logs, or validation artifacts; later phases must be able to reconstruct surviving findings from the saved validation report if the scan is interrupted. +- For scans that should not modify the target tree, use a disposable copy or generated-artifact directory under that finding's validation artifacts path for builds, generated clients, patched test harnesses, and PoC files. A no-edit target rule does not forbid output-only build copies when they are needed to validate the original code. +- For repository-wide scans, update each affected finding's validation report and closure table as each reportable, suppressed, not_applicable, or deferred row is decided. Do not leave validated rows only in transient notes, terminal logs, or validation artifacts; later phases must be able to reconstruct surviving findings from the saved per-finding validation reports if the scan is interrupted. - For large repository-wide scans, keep setup/build/debug effort proportionate to the candidate and the remaining high-impact coverage ledger. Do not spend the review budget trying to fully reproduce one internal service when static trace, existing tests, and deploy/config evidence are enough to validate or suppress the candidate. - In repository-wide validation, once one candidate in a repeated high-impact pattern has a strong proof tuple, switch to sibling candidates from the coverage ledger and validate each by checking the same source, closest control, sink, and impact. Only continue deeper runtime work when it would materially change reportability, severity, or confidence. - If a repository-wide shard has a promoted same-family finding plus unresolved seeded or root-control rows, close those sibling rows next as reportable, suppressed, or deferred before replacing the review with a more dramatic neighboring finding. Representative proof improves confidence, but it does not close sibling root controls without exact counterevidence. diff --git a/plugins/codex-security/skills/validation/references/validation-guidance.md b/plugins/codex-security/skills/validation/references/validation-guidance.md index f2317e84..8cdc3093 100644 --- a/plugins/codex-security/skills/validation/references/validation-guidance.md +++ b/plugins/codex-security/skills/validation/references/validation-guidance.md @@ -96,7 +96,7 @@ Use this checklist to keep validation close to the prompt contract: - If the code exposes a realistic interface, attempt validation through that interface before concluding when feasible. - Keep commands short, non-interactive, and scoped to the touched files or the minimum referenced paths. - If validation fails, record what was attempted, why it was inconclusive, and what proof gap remains. -- Save any PoCs, logs, or crafted inputs under the validation artifacts path from `../../../references/scan-artifacts.md`. +- Save any PoCs, logs, or crafted inputs under that finding's validation artifacts path from `../../../references/scan-artifacts.md`. - When multiple instances are provided, keep each candidate individually marked as survived, suppressed, or uncertain; do not silently omit candidates from the final validation report. - For a single standalone validation request, do not infer repository-wide or sibling scope unless the user explicitly asks for expansion or provides a multi-instance candidate list. - For a top-level repository-wide security scan, do not narrow validation to one representative finding when discovery supplied a coverage ledger or repeated pattern family.