Skip to content

feat(evaluators): add built-in budget evaluator for per-agent cost tracking #130

@amabito

Description

@amabito

Summary

Built-in budget evaluator for per-agent cumulative cost tracking.

Motivation

Retries and recursive tool chains pile up fast -- a 3-layer retry loop is 64 API calls from one user request. Current evaluators are stateless, so there's no way to express "deny after $X total spend" without maintaining a separate counter service outside the control plane.

Current behavior

Controls evaluate step content (regex, list, JSON, SQL) but can't track cumulative state across evaluations. Cost enforcement requires a custom evaluator with external state management.

Expected behavior

A built-in budget evaluator that:

  1. Tracks cumulative cost per agent (in-memory, or via PostgreSQL for persistence)
  2. Config: max_cost_usd, cost_per_1k_input_tokens, cost_per_1k_output_tokens
  3. On post-stage evaluation, reads token counts from step.output or step.context, accumulates
  4. Returns matched=True when ceiling is hit
  5. Pairs with existing actions -- deny for hard stop, steer with steering_context: {fallback_model: "..."} for degradation

Proposed solution

Should work as a regular evaluator. confidence could just be spent / limit (0-1 utilization ratio).

The stateful part is new -- current evaluators are stateless -- but the SQL evaluator already caches query analysis across calls, so there's precedent for evaluator-level state.

Additional context

LMK if a PR for this makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions