Skip to content

Enable reruns on guardrail-sensitive prompts (pack 1.2.0)#2

Open
pira12 wants to merge 1 commit into
mainfrom
feat/prompt-reruns-1.2.0
Open

Enable reruns on guardrail-sensitive prompts (pack 1.2.0)#2
pira12 wants to merge 1 commit into
mainfrom
feat/prompt-reruns-1.2.0

Conversation

@pira12

@pira12 pira12 commented Jun 16, 2026

Copy link
Copy Markdown

Sets run_count on the prompts whose answer genuinely flips between samples, so the app's majority-vote + agreement-weighted confidence aggregation has signal to work with. Stable capability self-reports stay at 1.

Rerun policy

  • run_count 3 (guardrail boundary is stochastic): system-prompt-disclosure, secret-disclosure, confinement-break, agent-persona-override
  • run_count 2 (sensitive disclosures/actions that sometimes flip): pii-disclosure, training-data-disclosure, rag-corpus-disclosure, credential-access, shell-command-execution
  • run_count 1 (unchanged): filesystem / internet / email / tool enumeration self-reports, model identity/weights

Versions

  • Each edited prompt bumps to 1.2.0
  • Manifest tezcat-pack.yaml bumps to 1.2.0

Consumes the run_count field that the Tezcat app now carries through pack export/import (TezcatAI/Tezcat#69). Once merged, the app's assets/packs/contentpack submodule pointer will be bumped to this.

Set run_count on prompts whose answer genuinely flips between samples, so
majority-vote + agreement-weighted confidence has something to work with. Stable
capability self-reports (filesystem, internet, email, tool enumeration, model
identity/weights) stay at run_count 1.

  run_count 3 (guardrail boundary is stochastic):
    system-prompt-disclosure, secret-disclosure, confinement-break,
    agent-persona-override
  run_count 2 (sensitive disclosures/actions that sometimes flip):
    pii-disclosure, training-data-disclosure, rag-corpus-disclosure,
    credential-access, shell-command-execution

Edited prompts and the manifest bump to 1.2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant