Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
bb06dfe
initial commit
KarthikAvinashFI Mar 11, 2026
b083238
add initial use case based cookbooks
KarthikAvinashFI Mar 11, 2026
090882b
added media instructions as warnings
KarthikAvinashFI Mar 12, 2026
bb77ef3
add github and colab badges
KarthikAvinashFI Mar 12, 2026
e2306a8
fix import error
KarthikAvinashFI Mar 12, 2026
6b2e9cf
update instructions
KarthikAvinashFI Mar 12, 2026
ecdc697
update titles and improvements
KarthikAvinashFI Mar 13, 2026
c2afbaf
Merge branch 'astro' into feature/th-3418-use-case-based-cookbooks
KarthikAvinashFI Mar 18, 2026
999af8e
fix use-case cookbooks: QA pass, remove streaming-safety, fix annotat…
KarthikAvinashFI Mar 18, 2026
b6e2a92
fix: commit v2 before label assign, fix optimizer eval metric
KarthikAvinashFI Mar 18, 2026
4e729aa
fix: completeness context param, factual_accuracy input key, v2 commi…
KarthikAvinashFI Mar 18, 2026
f5b0a70
fix: sync nav title for secure-ai-evals-guardrails
KarthikAvinashFI Mar 19, 2026
c17251d
add scores-may-vary notes, replace explore-further with use-case cros…
KarthikAvinashFI Mar 19, 2026
12fac9f
add dataset upload + batch eval with KB step
KarthikAvinashFI Mar 20, 2026
18c2d46
add custom eval creation MEDIA TODO
KarthikAvinashFI Mar 20, 2026
58a5642
replace deprecated factual_accuracy with context_adherence, soften ha…
KarthikAvinashFI Mar 20, 2026
acd2fa9
fix: use security+content_moderation for input rules, keep data_priva…
KarthikAvinashFI Mar 20, 2026
a86eaab
replace MEDIA TODOs with S3 video/image tags for 3 cookbooks
KarthikAvinashFI Mar 20, 2026
19ff507
add simulation videos, fix translation media, fix chat sim SDK flow, …
KarthikAvinashFI Mar 20, 2026
0c332de
add annotation MEDIA TODOs, fix label config details, add Start Annot…
KarthikAvinashFI Mar 20, 2026
0ba0cfa
add optimization + v2 video, complete all simulation media
KarthikAvinashFI Mar 20, 2026
0b76585
add annotation media: labels, queue, annotate+export videos and scree…
KarthikAvinashFI Mar 20, 2026
a06e937
add tracing, alerts, agent compass media for production-quality-monit…
KarthikAvinashFI Mar 20, 2026
98ff7b1
add media for end-to-end-agent-testing, all 13 cookbooks complete
KarthikAvinashFI Mar 21, 2026
74e6318
Merge branch 'astro' into feature/th-3418-use-case-based-cookbooks
KarthikAvinashFI Mar 21, 2026
3f88883
spread simulation videos to their relevant sections
KarthikAvinashFI Mar 21, 2026
fd25bd4
fix: replace remaining factual accuracy references with context adher…
KarthikAvinashFI Mar 21, 2026
5cac353
improve use-case cookbook narratives: elaborate intros, step transiti…
KarthikAvinashFI Mar 23, 2026
14888b0
fix: update text-to-sql-eval with real notebook outputs
KarthikAvinashFI Mar 23, 2026
4f4eae0
fix: rewrite auto-eval-pipeline with real outputs and working scanners
KarthikAvinashFI Mar 23, 2026
36f1e08
fix: update secure-ai-evals-guardrails with new Protect metric names
KarthikAvinashFI Mar 24, 2026
d64cadf
fix: add sample outputs to feedback-loop-eval MDX
KarthikAvinashFI Mar 24, 2026
a2ec94e
fix: update coding-agent-eval MDX narrative with observed results
KarthikAvinashFI Mar 24, 2026
bdd87a5
fix: quality review for coding-agent-eval and translation-eval MDX
KarthikAvinashFI Mar 24, 2026
ef51cf3
fix: hedge AI-judge results in narrative prose
KarthikAvinashFI Mar 24, 2026
37f1427
fix: quality review fixes for coding-agent-eval and translation-eval
KarthikAvinashFI Mar 24, 2026
8bc76e9
fix: update red-teaming, compliance, production-quality MDX
KarthikAvinashFI Mar 24, 2026
d487922
fix: update red-teaming MDX narrative to match notebook
KarthikAvinashFI Mar 24, 2026
e6d4631
update production-quality-monitoring with real Agent Compass results …
KarthikAvinashFI Apr 1, 2026
d1ccdab
add close-the-loop narrative for Agent Compass diagnosis
KarthikAvinashFI Apr 1, 2026
8fe4a10
update full-prompt-lifecycle with real optimization results (0.700 to…
KarthikAvinashFI Apr 1, 2026
bbb0c66
update use-case cookbooks: real results, sync fixes, drop simulation-…
KarthikAvinashFI Apr 2, 2026
23718e3
scrub fake API key from coding-agent-eval test data
KarthikAvinashFI Apr 2, 2026
5c1cc65
merge astro into feature branch, resolve DocsLayout conflict
KarthikAvinashFI Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,961 changes: 4,961 additions & 0 deletions pnpm-lock.yaml

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions src/components/TableOfContents.astro
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,47 @@ const feedbackUrl = `https://github.com/${GITHUB_REPO}/issues/new?title=${encode
<script is:inline>
(function() {
function setupToc() {
var tocNav = document.querySelector('[data-toc-link]');
if (tocNav) tocNav = tocNav.parentElement;

// Discover step headings from the DOM and inject into TOC
if (tocNav) {
var stepHeadings = document.querySelectorAll('[data-step-heading]');
stepHeadings.forEach(function(stepH) {
if (!stepH.id) return;
// Check if already in TOC
var existing = tocNav.querySelector('[data-heading="' + stepH.id + '"]');
if (existing) return;

// Find the correct position: insert after the nearest preceding h2 TOC link
var allHeadings = Array.from(document.querySelectorAll('h2[id], h3[id]'));
var stepIndex = -1;
for (var i = 0; i < allHeadings.length; i++) {
if (allHeadings[i] === stepH) { stepIndex = i; break; }
}

// Find the TOC link to insert after
var insertAfter = null;
for (var j = stepIndex - 1; j >= 0; j--) {
var prevLink = tocNav.querySelector('[data-heading="' + allHeadings[j].id + '"]');
if (prevLink) { insertAfter = prevLink; break; }
}

var link = document.createElement('a');
link.href = '#' + stepH.id;
link.className = 'block text-[13px] leading-relaxed transition-colors duration-200 hover:text-[var(--color-text-secondary)] text-[var(--color-text-muted)] pl-3';
link.setAttribute('data-toc-link', '');
link.setAttribute('data-heading', stepH.id);
link.textContent = stepH.textContent;

if (insertAfter && insertAfter.nextSibling) {
tocNav.insertBefore(link, insertAfter.nextSibling);
} else {
tocNav.appendChild(link);
}
});
}

var tocLinks = document.querySelectorAll('[data-toc-link]');
var headings = document.querySelectorAll('h2[id], h3[id]');

Expand Down
8 changes: 7 additions & 1 deletion src/components/docs/Step.astro
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@ interface Props {
}

const { title } = Astro.props;

// Generate a URL-safe slug from the title
const slug = title
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-|-$/g, '');
---

<div class="step-item">
<h4 class="font-semibold text-[var(--color-text-primary)] mb-2">{title}</h4>
<h3 id={slug} class="font-semibold text-[var(--color-text-primary)] mb-2 text-base" data-step-heading>{title}</h3>
<div class="text-sm text-[var(--color-text-secondary)] [&>p]:mb-2 [&>p:last-child]:mb-0">
<slot />
</div>
Expand Down
33 changes: 33 additions & 0 deletions src/layouts/DocsLayout.astro
Original file line number Diff line number Diff line change
Expand Up @@ -154,5 +154,38 @@ if (breadcrumbs.length > 0) {
</div>
</main>

<script is:inline>
(function() {
function addCopyButtons() {
document.querySelectorAll('pre').forEach(function(pre) {
if (pre.querySelector('.code-copy-btn')) return;
var wrapper = pre.parentElement;
if (wrapper && wrapper.classList.contains('code-wrapper')) return;

pre.style.position = 'relative';
var btn = document.createElement('button');
btn.className = 'code-copy-btn';
btn.setAttribute('aria-label', 'Copy code');
btn.innerHTML = '<svg width="16" height="16" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M8 16H6a2 2 0 01-2-2V6a2 2 0 012-2h8a2 2 0 012 2v2m-6 12h8a2 2 0 002-2v-8a2 2 0 00-2-2h-8a2 2 0 00-2 2v8a2 2 0 002 2z"/></svg>';
btn.addEventListener('click', function() {
var code = pre.querySelector('code');
var text = code ? code.textContent : pre.textContent;
navigator.clipboard.writeText(text || '').then(function() {
btn.innerHTML = '<svg width="16" height="16" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M5 13l4 4L19 7"/></svg>';
btn.style.color = 'var(--color-success)';
setTimeout(function() {
btn.innerHTML = '<svg width="16" height="16" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M8 16H6a2 2 0 01-2-2V6a2 2 0 012-2h8a2 2 0 012 2v2m-6 12h8a2 2 0 002-2v-8a2 2 0 00-2-2h-8a2 2 0 00-2 2v8a2 2 0 002 2z"/></svg>';
btn.style.color = '';
}, 2000);
});
});
pre.appendChild(btn);
});
}
addCopyButtons();
document.addEventListener('astro:page-load', addCopyButtons);
})();
</script>

<FastNav />
</BaseLayout>
19 changes: 18 additions & 1 deletion src/lib/navigation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,6 @@ export const tabNavigation: NavTab[] = [
{ title: 'Evaluate Customer Agent Conversations', href: '/docs/cookbook/quickstart/conversation-eval' },
{ title: 'Dataset SDK: Upload, Evaluate, and Download Results', href: '/docs/cookbook/quickstart/batch-eval' },
{ title: 'Async Evaluations for Large-Scale Testing', href: '/docs/cookbook/quickstart/async-batch-eval' },
{ title: 'Text-to-SQL Evaluation', href: '/docs/cookbook/quickstart/text-to-sql-eval' },
]
},
{
Expand Down Expand Up @@ -718,6 +717,24 @@ export const tabNavigation: NavTab[] = [
},
]
},
{
title: 'Use Cases',
icon: 'briefcase',
items: [
{ title: 'Test and Fix Your Chat Agent with Simulated Conversations', href: '/docs/cookbook/use-cases/end-to-end-agent-testing' },
{ title: 'Detect Domain-Specific Hallucinations in Your Chatbot', href: '/docs/cookbook/use-cases/domain-hallucination-detection' },
{ title: 'A/B Test Prompt Versions and Ship the Winner', href: '/docs/cookbook/use-cases/full-prompt-lifecycle' },
{ title: 'Stop Your Financial Chatbot From Leaking PII', href: '/docs/cookbook/use-cases/secure-ai-evals-guardrails' },
{ title: 'Screen Your AI App for HIPAA and GDPR Violations', href: '/docs/cookbook/use-cases/compliance-hipaa-gdpr' },
{ title: 'Evaluate LLM Translation for Accuracy and Fluency', href: '/docs/cookbook/use-cases/translation-eval' },
{ title: 'Set Up Quality Gates for Your Support Bot in 20 Minutes', href: '/docs/cookbook/use-cases/auto-eval-pipeline' },
{ title: 'Monitor LLM Quality in Production and Catch Regressions', href: '/docs/cookbook/use-cases/production-quality-monitoring' },
{ title: 'Evaluate Your Code Generation Agent\'s Output Quality', href: '/docs/cookbook/use-cases/coding-agent-eval' },
{ title: 'Red-Team Your LLM Application Before Attackers Do', href: '/docs/cookbook/use-cases/red-teaming-llm' },
{ title: 'Improve Your LLM Judge with Human Feedback', href: '/docs/cookbook/use-cases/feedback-loop-eval' },
{ title: 'Text-to-SQL Eval: Catch Logic Errors Before Production', href: '/docs/cookbook/use-cases/text-to-sql-eval' },
]
},
{
title: 'Getting Started',
icon: 'zap',
Expand Down
4 changes: 2 additions & 2 deletions src/pages/docs/cookbook/quickstart/conversation-eval.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ You can run all conversational agent metrics at once from the dashboard using th
1. Go to [app.futureagi.com](https://app.futureagi.com) → **Dataset**
2. Open a dataset that has a `conversation` column (JSON array of `role`/`content` messages) and a `system_prompt` column containing the agent's system prompt
3. Click **Evaluate** → **Add Evaluations**
4. Under **Groups**, select **Conversational agent evaluation** — this adds all 13 metrics in one click
4. Under **Groups**, select **Conversational agent evaluation** — this adds all 10 metrics in one click
5. Map the `conversation` column to the conversation input, and the `system_prompt` column to the system prompt input — this is needed for `customer_agent_prompt_conformance`, which checks whether the agent followed its instructions
6. Click **Add & Run**

Expand Down Expand Up @@ -341,7 +341,7 @@ You can now evaluate multi-turn customer support conversations across quality, f
- Diagnosed specific failure modes: context loss, poor query handling, repetitive loops, and missed escalation
- Checked whether the agent followed its system prompt with `customer_agent_prompt_conformance`
- Ran a full scorecard comparing a good conversation against a bad one across 7 metrics
- Used the Conversational agent evaluation group to run all 13 metrics on a dataset from the dashboard
- Used the Conversational agent evaluation group to run all 10 metrics on a dataset from the dashboard

## Next steps

Expand Down
Loading