diff --git a/src/pages/docs/cookbook/mcp/debug-traces-from-ide.mdx b/src/pages/docs/cookbook/mcp/debug-traces-from-ide.mdx index 3e07374e..cb10b8a5 100644 --- a/src/pages/docs/cookbook/mcp/debug-traces-from-ide.mdx +++ b/src/pages/docs/cookbook/mcp/debug-traces-from-ide.mdx @@ -1,28 +1,84 @@ --- title: "Debug LLM Traces From Your IDE Using Natural Language MCP Queries" -description: "Connect Future AGI's MCP server to Cursor, Claude Code, or VS Code, then debug failing traces, run evals, and annotate spans without leaving your editor." +slug: "debug-traces-from-ide" +description: "Connect FutureAGI's MCP server to Cursor, Claude Code, or VS Code, then debug failing traces, run evals, and annotate spans without leaving your editor." +date: "2026-05-21" +author: "futureagi-engineering" +products: + - "traceAI" + - "fi.evals" +frameworks: + - "MCP" + - "Cursor" + - "Claude Code" + - "VS Code" +difficulty: "beginner" +time-to-complete: "10 minutes" +tags: + - "mcp" + - "ide-integration" + - "trace-debugging" +og-image: "/images/cookbooks/debug-traces-from-ide/og.webp" +canonical: "https://docs.futureagi.com/docs/cookbook/mcp/debug-traces-from-ide" +last-tested-date: "2026-05-21" +last-tested-with: + claude-code: "1.0+" + cursor: "0.45+" + vscode-mcp-extension: "0.5+" +code-repo-url: "https://docs.futureagi.com/docs/quickstart/setup-mcp-server" +page-type: "cookbook" --- - -Add the Future AGI MCP server to your IDE with one config line, sign in via OAuth, and ask your AI assistant questions like *"what went wrong with the last failing trace in my support-bot project?"* It pulls span data, runs error analysis, and proposes fixes, all in the same chat where you're writing code. - - | Time | Difficulty | |------|-----------| | 10 min | Beginner | + +Connect FutureAGI's MCP server to Cursor, Claude Code, or VS Code with one config line. Ask natural-language questions about your traces (find failing spans, run error analysis, annotate issues) without leaving your editor. The full debug loop (detect failure, diagnose root cause, patch prompt, verify fix) happens in one chat thread next to your code. + + +## What you'll build + +A working MCP connection from your IDE to your FutureAGI workspace, exercised against a real failing trace. By the end you will have: + +- One config line registered in your IDE (Claude Code, Cursor, VS Code, Claude Desktop, or Windsurf) pointing at `https://api.futureagi.com/mcp`. +- An OAuth token cached after a one-time consent screen, scoped to the trace + eval + annotation tool groups you approved. +- Five working natural-language queries against your real traces (search, span tree, error analysis, project-wide grouping, tagging + annotating). +- A diagnose-and-patch loop in one chat thread, with the assistant reading both the failing trace (via MCP) and your source files (via the editor). +- A `Settings → MCP` revoke path you can use to disable the connection if a teammate leaves or a token leaks. + - FutureAGI account → [app.futureagi.com](https://app.futureagi.com) - A traced project with at least a few traces. If you don't have one, follow [Manual Tracing](/docs/cookbook/quickstart/manual-tracing) to instrument an agent first. - An MCP-capable IDE: Cursor, Claude Code, VS Code (with the MCP extension), Claude Desktop, or Windsurf -## Tutorial +## Why this matters + +Most trace-debugging time isn't spent thinking. It's spent switching tabs. The current loop for a failing agent looks like: open the dashboard, find the trace, drill into spans, copy the offending input, alt-tab to the editor, locate the system prompt, patch it, save, redeploy, alt-tab back to the dashboard, send a new request, wait, find the new trace, and compare. Five context switches per fix attempt is a conservative count; the practical number is higher because each tab switch tends to drop a piece of context the next step needed. + +Standard editor tooling does not close this loop. Source-of-truth for the failure (the trace) and source-of-truth for the fix (the code) live in two different applications. MCP lets your IDE's assistant call the same trace-search, error-analysis, and annotation endpoints the dashboard uses, but from inside the chat where you're already writing code. The diagnosis, the patch, and the verify step all happen in one thread. The metric that proves the fix is the cluster you tagged disappearing from `list_error_clusters` on the next run, asked in the same chat that drafted the patch. + +## How MCP works + +MCP (Model Context Protocol) is an open standard for letting AI assistants call external tools. FutureAGI's MCP server at `https://api.futureagi.com/mcp` exposes ~50 trace-debugging tools (search, error analysis, span trees, error clusters, tags, annotations). Connect it to your IDE once, sign in via OAuth, and your assistant can answer questions like *"what went wrong with the last failing trace?"* by calling those tools directly. + +'what's failing?'"] --> F["FAGI MCP returns
traces + error analysis"] + F --> P["IDE drafts a patch
using trace + your source"] + P --> V["Re-ask in chat
'did the cluster drop?'"] + V -->|"no"| F + V -->|"yes"| Done["Fix verified"]:::done + + classDef done fill:#064e3b,color:#fff,stroke:#10b981 +`} /> + +The four steps below register the server, complete the OAuth handshake, run debugging queries against your real traces, then iterate on the fix in the same chat thread. - + -The MCP server lives at `https://api.futureagi.com/mcp` and uses OAuth. No API keys to copy around. +The connect step is one config line per IDE. The URL (`https://api.futureagi.com/mcp`) is the same everywhere, only the file path and JSON shape differ. Once registered, your IDE knows where to route MCP tool calls but doesn't have permission yet. The OAuth handshake in step 2 is what unlocks access to your workspace. @@ -32,10 +88,24 @@ The MCP server lives at `https://api.futureagi.com/mcp` and uses OAuth. No API k claude mcp add futureagi --transport http https://api.futureagi.com/mcp ``` -After running, `claude mcp list` should show `futureagi` with `! Needs authentication`. That's expected; the OAuth handshake happens in the next step. +Verify it registered: + +```bash +claude mcp list +``` + +Expected output: + +```text +futureagi: https://api.futureagi.com/mcp ! Needs authentication +``` + +The `! Needs authentication` status is expected; the OAuth handshake happens in the next step. Terminal showing claude mcp add futureagi succeeding and claude mcp list confirming the server with Needs authentication status +Verify: the list shows `futureagi` with the correct URL and `Needs authentication` status. If it shows `Not found`, re-run the `add` command. + @@ -112,32 +182,40 @@ Restart your IDE after editing the config. -The first MCP tool call opens a browser to the consent screen. Review the 14 permission groups, click **Authorize**. Token cached, done. +OAuth instead of API keys means the connection is tied to your user account, scoped to the permission groups you approve, and revocable from the dashboard at any time. No shared keys to rotate, no `.env` to manage. The first MCP tool call your assistant attempts triggers the consent screen automatically. -Future AGI MCP OAuth consent screen showing Claude Code requesting access with all 14 permission groups (Context & Navigation, Evaluations, Datasets, Annotations, Prompt Optimization, Observability / Traces, Error Feed, etc.) and an Authorize button +Click **Authorize**. Token cached. Done. + +FutureAGI MCP OAuth consent screen showing Claude Code requesting access with all 14 permission groups (Context & Navigation, Evaluations, Datasets, Annotations, Prompt Optimization, Observability / Traces, Error Feed, etc.) and an Authorize button + +Verify: the consent screen lists your IDE name and the permission groups (Traces, Evaluations, Datasets, Annotations, etc.). Click **Authorize**. The token is cached, so you won't see this screen again unless you revoke in **Settings → MCP**. -If the browser doesn't open, ask your assistant *"list my Future AGI projects"* to trigger the handshake. You can revoke access anytime in **Settings → MCP** in the dashboard. +If the browser doesn't open automatically, ask your assistant *"list my FutureAGI projects"*; that triggers the OAuth handshake. -Open your IDE's chat panel and ask. The MCP server exposes ~50 trace-related tools (search, error analysis, span trees, error clusters), so phrase questions naturally. Your assistant picks the right tools. +You don't memorize tool names. The MCP server publishes ~50 trace-debugging tools (search, error analysis, span trees, error clusters, tags, annotations), each with a description. Your assistant reads your question against those descriptions and picks the right tool, the right arguments, and chains follow-ups when needed. Five example questions below, each mapped to the tool it actually calls so you can see the pattern. **Find failing traces:** -> List the most recent traces in my project that have errors. +> List the most recent traces in my projects that have errors. Calls `search_traces` with `has_error=True`. If no traces have raw error flags, your assistant pivots to `list_error_clusters` and surfaces the AI-detected error categories across your projects. A richer signal than HTTP errors alone. -Claude Code terminal showing the futureagi MCP response: a table of recent error clusters across projects with Last Seen, Project, Error category, and Impact columns +Claude Code terminal returning the futureagi MCP response for 'review recent error traces across projects': a table with When, Project, Error Category, and Events columns covering litellm_app, adk-weather-agent, traceai-vdb-smoke-all, falcon-ai-end-to-end, and Nexa Fintech Support across 6 projects over 2026-05, with a closing note that all clusters are Medium impact + +The response should show a table with columns like When, Project, Error Category, and Events. If it returns empty results, your project may not have any errored traces yet; send a few requests that trigger tool failures or hallucinations first. **Inspect a specific trace:** > Show me the span tree for the second trace from the previous list. -Calls `get_span_tree`. Returns the parent span plus nested LLM/tool calls with timing and inputs. +Calls `get_span_tree`. Returns the parent span plus nested LLM/tool calls with timing and inputs. The follow-up *"second trace from the previous list"* works because the assistant carries chat context across turns. + +Terminal showing the futureagi MCP response for the second trace in the previous list: a span tree for trace 0d6da6be-622c-4f97-bb44-dcd0e8c5f0b6 (adk-weather-agent) with three nested spans (invocation 491ms, agent_run 489ms, call_llm 422ms using gemini-2.5-flash), each marked with ERROR, plus the assistant's note that all three spans errored and the failure originates in the call_llm span **Diagnose what went wrong:** @@ -145,26 +223,32 @@ Calls `get_span_tree`. Returns the parent span plus nested LLM/tool calls with t Calls `get_trace_error_analysis`. Returns categorized findings (hallucination, wrong intent, tool misuse) with severity and a quality scorecard. +Terminal showing the futureagi MCP tool call analyze_project_traces kicking off in the background, with a JSON-shaped result confirming 'Trace Analysis Triggered' for the failing trace, plus an analysis_id and a follow-up status note saying the analysis kicked off and the assistant will pull the error cluster detail once it's done + **Look across the project:** > Analyze all traces in my project from the last hour and group failures by category. -Calls `analyze_project_traces` and `list_error_clusters`. Returns a histogram with the dominant error types. +Calls `analyze_project_traces` and `list_error_clusters`. Returns a histogram with the dominant error types so you can prioritize which one to fix first. + +Terminal showing the futureagi MCP response: 23 traces queued for analysis across 6 projects with a per-project breakdown (customer_support_delivery, instructor_app, mistralai_support_triage, adk-weather-agent, litellm_app, agno-agent) and a note that the longest batch (adk-weather-agent, 9 traces) will take about 5 to 9 minutes; the assistant offers to ping the user once analysis completes and then pull list_error_clusters **Score or annotate from chat:** > Add the tag `needs-policy-grounding` to the failing traces, and annotate them with "fabricated specifics, needs RAG over policy docs." -Calls `add_trace_tags` + `create_trace_annotation` per matching trace. The annotations show up in the dashboard immediately. +Calls `add_trace_tags` + `create_trace_annotation` per matching trace. The annotations show up in the dashboard immediately so the rest of your team sees what you flagged. + +Terminal showing the futureagi MCP submit_annotation tool call completing: confirmation that the trace was tagged needs-policy-grounding and submitted as an Output Quality / MEDIUM finding (code E001, analysis 4f098578...) in the Nexa Fintech Support project, with the description, recommendation 'Implement RAG over the policy doc corpus...', and the Spanish policy-claim line quoted as evidence, plus a note that the annotation is visible in the dashboard sidebar -The same chat that read the trace can now read your code. Ask: +This step is the payoff. Diagnosing a trace in the dashboard then coming back to your editor to patch the prompt is two context switches. With MCP, your assistant has both the trace findings (from the MCP server) and your source files (from your editor) in one thread, so it can write the fix grounded in the actual failure. -> Based on the error analysis, draft a system-prompt patch that refuses to answer policy questions when no grounding tool is available. Show it as a diff against [agent.py](agent.py). +> Based on the error analysis, draft a system-prompt patch that refuses to answer policy questions when no grounding tool is available. Show it as a diff against `agent.py`. -Your assistant has both the trace findings (from MCP) and the file (from your editor). It produces a paste-ready diff. Apply it, re-run a few queries through the agent, and ask the next turn: +Apply the diff, re-run a few queries through the agent, then verify in the same chat: > Re-check the latest traces in my project and confirm the fabrication category dropped. @@ -173,20 +257,34 @@ That's the full loop. Failure detection, diagnosis, fix, verification, all drive +## Troubleshooting + +| Symptom | Likely cause | Fix | Verify | +|---|---|---|---| +| `claude mcp list` doesn't show `futureagi` | Config not saved or IDE not restarted | Re-run the `add` command (Claude Code) or re-check the JSON path (Cursor/VS Code). Restart the IDE | `claude mcp list` now shows `futureagi: https://api.futureagi.com/mcp` | +| OAuth browser window never opens | IDE can't launch the default browser, or a firewall blocks the callback URL | Ask the assistant *"list my FutureAGI projects"* to manually trigger the handshake. Check browser pop-up settings | Consent screen renders; after Authorize, the next MCP query returns real data | +| Assistant says "I don't have access to FutureAGI tools" | OAuth expired or was revoked | Re-authorize: remove and re-add the server (`claude mcp remove futureagi && claude mcp add ...`), then trigger OAuth again | A test query (*"list my projects"*) returns project names | +| Queries return empty results | Project has no traces, or the project name doesn't match | Send a few test requests through a traced agent first. Confirm the project name matches what you see in the dashboard | A `search_traces` query returns at least 1 row matching your filter | +| Assistant picks the wrong tool or returns unrelated data | Ambiguous question | Be more specific: include the project name, time range, or error type. Example: *"errors in support-bot from the last hour"* instead of *"what's failing?"* | Re-asked query returns the tool call you expected (visible in the assistant's tool-use log) | + +## What you built + -You connected Future AGI's MCP server to your IDE, asked natural-language questions about your trace data, and ran an end-to-end debug loop without copying trace IDs or switching to the dashboard. +FutureAGI's MCP server connected to your IDE, natural-language trace debugging from your editor chat, and a full detect-diagnose-fix-verify loop without leaving the IDE. -## Explore further - - - - Full setup reference, OAuth scopes, and supported tool groups - - - The same workflow inside the FutureAGI dashboard sidebar - - - Custom spans, metadata tagging, and prompt template tracking - - +- Trace search, error analysis, span trees, and annotations accessible via natural language in your editor +- No trace IDs to copy; the assistant carries context across turns (*"the second trace from the previous list"* works) +- Diagnosis and fix in one thread: the assistant sees both trace data (via MCP) and your source files (via the editor) +- Per-user OAuth instead of shared API keys, scoped and revocable from the dashboard + +## Next steps + +Once the MCP loop works, the next moves to make it part of your real debug routine: + +1. **Save the five queries above as IDE snippets or slash commands.** Most editors let you bind a stored prompt to a shortcut. Wire *"run error analysis on the last failing trace in ``"* to one keystroke so it's the first thing you reach for when an alert fires. +2. **Add a `triaged-by-mcp` annotation label** in **Settings → Annotation Labels** and have the assistant tag every trace it diagnoses with that label. You'll have an audit trail of which failures the MCP loop has touched versus which ones still need a human. +3. **Revoke and re-issue tokens per teammate departure.** OAuth tokens are per-user, but the dashboard lets you scope which permission groups each consent grants. Set a quarterly review on **Settings → MCP** so revoked accounts don't linger. +4. **Chain MCP queries through Falcon AI for the same flow inside the dashboard.** When you can't reach an editor (mobile, browser-only environments), [End-to-End with Falcon AI](/docs/cookbook/falcon-ai/end-to-end) runs the same detect-diagnose-fix loop from the Falcon sidebar instead. + +Reference: [Setup MCP Server](/docs/quickstart/setup-mcp-server) for the full list of OAuth scopes, supported tool groups, and one-click install links per IDE.