Anytime Verified Agents (AVA)

Anytime Verified Agents: Adaptive Compute Allocation for Reliable LLM Reasoning under Budget Constraints

Published in Transactions on Machine Learning Research (TMLR), 2026.

Overview

AVA is a framework that dynamically allocates compute across search, sampling, and verification within a user-specified budget. It combines calibrated uncertainty estimation, value-of-information-guided search expansion, and selective verification cascades with early exits. The controller allocates compute based on uncertainty and estimated marginal reliability gains.

Evaluated on GSM8K, MATH, HotpotQA, and HumanEval with GPT-5 and GPT-4o.

Installation

git clone https://github.com/llmsresearch/AVA.git
cd AVA
pip install -r requirements.txt

Model backends

AVA supports two model backends:

Ollama (local inference): Install Ollama and pull a model:

ollama pull qwen2.5:7b

Azure OpenAI: Create a .env file:

AZURE_MODEL_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_MODEL_KEY=your-key
AZURE_MODEL_DEPLOYMENT=gpt-4o
AZURE_MODEL_API_VERSION=2024-12-01-preview

Quick start

from ava.agents.ava_agent import AVAAgent
from ava.core.interfaces import Budget
from ava.models.ollama_model import OllamaModel

model = OllamaModel(model_name="qwen2.5:7b")
agent = AVAAgent(model, target_reliability=0.9)

budget = Budget(token_limit=800, tool_calls_limit=20, verify_calls_limit=20)
result = agent.solve("Question: What is 17 * 23?\nAnswer:", budget)

print(result.text)
print(f"Tokens used: {budget.tokens_used}/{budget.token_limit}")

See examples/quick_start.py for a full working example comparing AVA against the self-consistency baseline.

Project structure

ava/
├── agents/ava_agent.py          # AVA agent (bootstrap, adaptive sampling, verify-and-vote)
├── controllers/ava_controller.py # Budget-aware controller (Algorithm 1 in paper)
├── core/interfaces.py           # Protocols: ModelProvider, Budget, SearchStrategy
├── search/adaptive.py           # VoI-guided adaptive tree search
├── uncertainty/
│   ├── estimators.py            # Token entropy, consistency, trajectory estimators
│   └── calibration.py           # Isotonic regression calibrator
├── verification/
│   ├── base.py                  # Verifier protocol, heuristic verifier
│   ├── cascade.py               # Multi-level verification cascade with early exits
│   └── llm_verifier.py          # Re-solve-and-vote verifier
├── baselines/
│   ├── self_consistency.py      # Majority voting over k samples
│   ├── fixed_depth_search.py    # Tree-of-Thoughts style BFS
│   ├── confidence_early_exit.py # Confidence-threshold early stopping
│   └── difficulty_bin.py        # Fixed allocation by difficulty tier
├── benchmarks/
│   ├── gsm8k.py                 # GSM8K loader
│   ├── hotpotqa.py              # HotpotQA loader
│   ├── humaneval.py             # HumanEval loader
│   ├── math.py                  # MATH dataset loader
│   └── math_grading.py          # SymPy-based MATH answer equivalence checker
├── models/
│   ├── azure_openai.py          # Azure OpenAI provider
│   ├── ollama_model.py          # Local Ollama provider
│   └── rate_limiter.py          # API rate limiter
└── utils/
    ├── metrics.py               # Reliability metrics, ECE, Brier score
    └── logging.py               # Result table formatting

experiments/
├── run_full_evaluation.py       # Main evaluation runner
├── sensitivity_analysis.py      # Controller threshold sensitivity
└── calibration_transfer.py      # Cross-dataset calibration transfer

Running experiments

# Evaluate AVA and baselines on GSM8K with Ollama
python experiments/run_full_evaluation.py \
  --model ollama --model-name qwen2.5:7b \
  --benchmarks gsm8k \
  --methods ava,self_consistency,fixed_depth,always_verify \
  --budgets 400,600,800,1000

# Evaluate on multiple benchmarks with Azure OpenAI
python experiments/run_full_evaluation.py \
  --model azure \
  --benchmarks gsm8k,hotpotqa,humaneval \
  --methods ava,self_consistency \
  --budgets 400,600,800,1000 \
  --output results/my_run

Benchmark datasets should be placed in data/:

data/
├── gsm8k/gsm8k_test.jsonl
├── hotpotqa/hotpotqa_dev.json
├── humaneval/HumanEval.jsonl
└── math/test.json

Citation

@article{patel2026ava,
  title={Anytime Verified Agents: Adaptive Compute Allocation for Reliable LLM Reasoning under Budget Constraints},
  author={Patel, Dipkumar},
  journal={Transactions on Machine Learning Research},
  year={2026},
  url={https://openreview.net/forum?id=JMDCMf7mlF}
}

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anytime Verified Agents (AVA)

Overview

Installation

Model backends

Quick start

Project structure

Running experiments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ava		ava
examples		examples
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Anytime Verified Agents (AVA)

Overview

Installation

Model backends

Quick start

Project structure

Running experiments

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages