diff --git a/README.md b/README.md index 1b5f3de..6f325ea 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ ## Table of Contents +- [What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)](#whats-new-2026-06-19--agent-observability-genai-opentelemetry-spans) - [What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)](#whats-new-2026-06-19--compliance-control-report-soc2--iso-27001) - [What's new (2026-06-19) — Agent Trajectory Evaluation](#whats-new-2026-06-19--agent-trajectory-evaluation) - [What's new (2026-06-19) — Approval Testing (Golden-Master Baselines)](#whats-new-2026-06-19--approval-testing-golden-master-baselines) @@ -90,6 +91,12 @@ --- +## What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans) + +OTel GenAI-convention spans for LLM runs. Full reference: [`docs/source/Eng/doc/new_features/v38_features_doc.rst`](docs/source/Eng/doc/new_features/v38_features_doc.rst). + +- **`AgentTrace`** (`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`, `ac_*`): records spans whose attributes follow the OpenTelemetry **GenAI semantic conventions** (`gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`, `gen_ai.tool.name`) and the `"{operation} {model}"` span name. `to_otel()` drops into an OTLP exporter; `summary()` rolls up token cost and latency; an `operation()` context manager times live blocks and marks errors. Pure-stdlib (no `opentelemetry` dep), injectable clock; pairs with trajectory evaluation (record here, score there). + ## What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001) Map governance evidence to named controls. Full reference: [`docs/source/Eng/doc/new_features/v37_features_doc.rst`](docs/source/Eng/doc/new_features/v37_features_doc.rst). diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md index f393753..df1578b 100644 --- a/README/README_zh-CN.md +++ b/README/README_zh-CN.md @@ -12,6 +12,7 @@ ## 目录 +- [本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可观测性genai-opentelemetry-spans) - [本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合规控制报告soc2--iso-27001) - [本次更新 (2026-06-19) — Agent 轨迹评估](#本次更新-2026-06-19--agent-轨迹评估) - [本次更新 (2026-06-19) — 核准式测试(Golden-Master 基准)](#本次更新-2026-06-19--核准式测试golden-master-基准) @@ -89,6 +90,12 @@ --- +## 本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans) + +LLM 运行的 OTel GenAI 惯例 spans。完整参考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。 + +- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):记录的 span 其属性遵循 OpenTelemetry **GenAI 语意惯例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)与 `"{operation} {model}"` span 名称。`to_otel()` 可送入 OTLP exporter;`summary()` 汇整 token 成本与延迟;`operation()` 上下文管理器为实时区块计时并标记错误。纯标准库(无 `opentelemetry` 依赖)、可注入时钟;与轨迹评估互补(在此记录、在那里评分)。 + ## 本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001) 将治理证据映射到具名控制项。完整参考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。 diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md index 6c05c39..b2dd44e 100644 --- a/README/README_zh-TW.md +++ b/README/README_zh-TW.md @@ -12,6 +12,7 @@ ## 目錄 +- [本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可觀測性genai-opentelemetry-spans) - [本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合規控制報告soc2--iso-27001) - [本次更新 (2026-06-19) — Agent 軌跡評估](#本次更新-2026-06-19--agent-軌跡評估) - [本次更新 (2026-06-19) — 核准式測試(Golden-Master 基準)](#本次更新-2026-06-19--核准式測試golden-master-基準) @@ -89,6 +90,12 @@ --- +## 本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans) + +LLM 執行的 OTel GenAI 慣例 spans。完整參考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。 + +- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):記錄的 span 其屬性遵循 OpenTelemetry **GenAI 語意慣例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)與 `"{operation} {model}"` span 名稱。`to_otel()` 可送入 OTLP exporter;`summary()` 彙整 token 成本與延遲;`operation()` 情境管理器為即時區塊計時並標記錯誤。純標準函式庫(無 `opentelemetry` 相依)、可注入時鐘;與軌跡評估互補(在此記錄、在那裡評分)。 + ## 本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001) 將治理證據對應到具名控制項。完整參考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。 diff --git a/docs/source/Eng/doc/new_features/v38_features_doc.rst b/docs/source/Eng/doc/new_features/v38_features_doc.rst new file mode 100644 index 0000000..139cbf0 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v38_features_doc.rst @@ -0,0 +1,60 @@ +Agent Observability (GenAI OpenTelemetry Spans) +=============================================== + +When automation drives an LLM agent, you want the same observability an +OpenTelemetry backend gives a service: per-operation spans carrying token usage, +model, and status. ``AgentTrace`` records spans whose attributes follow the +OpenTelemetry **GenAI semantic conventions** — ``gen_ai.operation.name``, +``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` / +``gen_ai.usage.output_tokens``, ``gen_ai.tool.name`` — and the convention span +name ``"{operation} {model}"``. :meth:`AgentTrace.to_otel` output drops straight +into an OTLP exporter, while :meth:`AgentTrace.summary` rolls up cost and latency +for a run. + +It pairs with :doc:`trajectory evaluation ` — record the run +here, score it there. Pure standard library (no ``opentelemetry`` dependency); +the clock is injectable so durations are deterministically testable. Imports no +``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import AgentTrace + + trace = AgentTrace() + # one-shot record of a completed call: + trace.record("chat", model="claude-opus-4-8", system="anthropic", + input_tokens=1200, output_tokens=180, duration_s=0.9) + + # or time a live block; set token counts on the yielded dict: + with trace.operation("tool", tool_name="search") as fields: + result = run_tool() + fields["output_tokens"] = 42 # error inside marks the span error + + print(trace.summary()) # {span_count, error_count, input_tokens, ...} + exporter.export(trace.to_otel()) # OTLP-friendly span dicts + +``summary`` aggregates ``span_count``, ``error_count``, ``input_tokens``, +``output_tokens``, and total ``duration_s``. ``to_otel`` returns each span as +``{name, kind, attributes, duration_s, status:{code}}`` with an OTel status code. + +Executor commands +----------------- + +A module-level default trace backs the executor/MCP surfaces so a flow can build +a trace across steps: + +================================ =================================================== +Command Effect +================================ =================================================== +``AC_trace_record`` Record a GenAI span (operation/model/tokens/…). +``AC_trace_summary`` Roll up the default trace. +``AC_trace_export`` Export the default trace as OTLP spans. +``AC_trace_reset`` Clear the default trace. +================================ =================================================== + +The same operations are exposed as MCP tools (``ac_trace_record`` / +``ac_trace_summary`` / ``ac_trace_export`` / ``ac_trace_reset``) and as Script +Builder commands under **Agent**. diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index 45ce6bb..34cf42d 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -60,6 +60,7 @@ Comprehensive guides for all AutoControl features. doc/new_features/v35_features_doc doc/new_features/v36_features_doc doc/new_features/v37_features_doc + doc/new_features/v38_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v38_features_doc.rst b/docs/source/Zh/doc/new_features/v38_features_doc.rst new file mode 100644 index 0000000..04d7125 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v38_features_doc.rst @@ -0,0 +1,56 @@ +Agent 可觀測性(GenAI OpenTelemetry Spans) +========================================== + +當自動化驅動 LLM agent 時,你會想要 OpenTelemetry 後端給服務的那種可觀測性:每個操作 +一個 span,帶有 token 用量、模型與狀態。``AgentTrace`` 記錄的 span 其屬性遵循 +OpenTelemetry **GenAI 語意慣例** —— ``gen_ai.operation.name``、``gen_ai.system``、 +``gen_ai.request.model``、``gen_ai.usage.input_tokens`` / +``gen_ai.usage.output_tokens``、``gen_ai.tool.name`` —— 以及慣例 span 名稱 +``"{operation} {model}"``。:meth:`AgentTrace.to_otel` 的輸出可直接送入 OTLP +exporter,而 :meth:`AgentTrace.summary` 則彙整一次執行的成本與延遲。 + +它與 :doc:`軌跡評估 ` 互補 —— 在此記錄執行,在那裡評分。純標準函式 +庫(無 ``opentelemetry`` 相依);時鐘可注入,因此持續時間可被確定性地測試。不匯入 +``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import AgentTrace + + trace = AgentTrace() + # 一次性記錄已完成的呼叫: + trace.record("chat", model="claude-opus-4-8", system="anthropic", + input_tokens=1200, output_tokens=180, duration_s=0.9) + + # 或為即時區塊計時;在 yield 出的 dict 上設定 token 數: + with trace.operation("tool", tool_name="search") as fields: + result = run_tool() + fields["output_tokens"] = 42 # 區塊內若拋出例外則標記為 error + + print(trace.summary()) # {span_count, error_count, input_tokens, ...} + exporter.export(trace.to_otel()) # OTLP 友善的 span dict + +``summary`` 彙整 ``span_count``、``error_count``、``input_tokens``、 +``output_tokens`` 與總 ``duration_s``。``to_otel`` 將每個 span 回傳為 +``{name, kind, attributes, duration_s, status:{code}}``,帶有 OTel 狀態碼。 + +執行器指令 +---------- + +模組層級的預設 trace 支撐 executor/MCP 介面,讓流程可跨步驟建立一條 trace: + +================================ =================================================== +指令 效果 +================================ =================================================== +``AC_trace_record`` 記錄一個 GenAI span(operation/model/tokens/…)。 +``AC_trace_summary`` 彙整預設 trace。 +``AC_trace_export`` 將預設 trace 匯出為 OTLP spans。 +``AC_trace_reset`` 清除預設 trace。 +================================ =================================================== + +相同操作亦提供為 MCP 工具(``ac_trace_record`` / ``ac_trace_summary`` / +``ac_trace_export`` / ``ac_trace_reset``),以及 Script Builder 中 **Agent** 分類下的 +指令。 diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index 3f269e7..9e8854c 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -60,6 +60,7 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v35_features_doc doc/new_features/v36_features_doc doc/new_features/v37_features_doc + doc/new_features/v38_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index 5725b8d..92b543a 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -222,6 +222,10 @@ from je_auto_control.utils.compliance import ( build_compliance_report, render_compliance_html, write_compliance_report, ) +# Agent observability: OpenTelemetry GenAI-convention spans +from je_auto_control.utils.agent_trace import ( + AgentTrace, default_trace, reset_trace, +) # Background popup/interrupt watchdog (unattended automation) from je_auto_control.utils.watchdog import ( PopupWatchdog, WatchdogRule, default_popup_watchdog, @@ -665,6 +669,7 @@ def start_autocontrol_gui(*args, **kwargs): "evaluate_trajectory", "build_compliance_report", "render_compliance_html", "write_compliance_report", + "AgentTrace", "default_trace", "reset_trace", # MCP server "AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt", "MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index cae1221..fd2eef9 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -824,6 +824,37 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None: ), description="Map governance evidence to SOC2/ISO 27001 controls.", )) + specs.append(CommandSpec( + "AC_trace_record", "Agent", "Trace: Record Span", + fields=( + FieldSpec("operation", FieldType.STRING, placeholder="chat"), + FieldSpec("model", FieldType.STRING, optional=True), + FieldSpec("system", FieldType.STRING, optional=True), + FieldSpec("input_tokens", FieldType.INT, optional=True), + FieldSpec("output_tokens", FieldType.INT, optional=True), + FieldSpec("tool_name", FieldType.STRING, optional=True), + FieldSpec("duration_s", FieldType.FLOAT, optional=True, + default=0.0), + FieldSpec("status", FieldType.ENUM, optional=True, default="ok", + choices=("ok", "error")), + ), + description="Record a GenAI-convention span on the default trace.", + )) + specs.append(CommandSpec( + "AC_trace_summary", "Agent", "Trace: Summary", + fields=(), + description="Roll up the default agent trace (count/tokens/duration).", + )) + specs.append(CommandSpec( + "AC_trace_export", "Agent", "Trace: Export (OTLP)", + fields=(), + description="Export the default agent trace as OTLP-friendly spans.", + )) + specs.append(CommandSpec( + "AC_trace_reset", "Agent", "Trace: Reset", + fields=(), + description="Clear the default agent trace.", + )) specs.append(CommandSpec( "AC_generate_sop", "Report", "Generate SOP Document", fields=( diff --git a/je_auto_control/utils/agent_trace/__init__.py b/je_auto_control/utils/agent_trace/__init__.py new file mode 100644 index 0000000..904182b --- /dev/null +++ b/je_auto_control/utils/agent_trace/__init__.py @@ -0,0 +1,6 @@ +"""Agent observability: OpenTelemetry GenAI-convention spans for LLM runs.""" +from je_auto_control.utils.agent_trace.agent_trace import ( + AgentTrace, default_trace, reset_trace, +) + +__all__ = ["AgentTrace", "default_trace", "reset_trace"] diff --git a/je_auto_control/utils/agent_trace/agent_trace.py b/je_auto_control/utils/agent_trace/agent_trace.py new file mode 100644 index 0000000..181cca0 --- /dev/null +++ b/je_auto_control/utils/agent_trace/agent_trace.py @@ -0,0 +1,123 @@ +"""Record agent/LLM activity as OpenTelemetry GenAI-convention spans. + +When automation drives an LLM agent, you want the same observability an +OpenTelemetry backend gives a service: per-operation spans carrying token usage, +model, and status. ``AgentTrace`` records spans whose attributes follow the +OTel **GenAI semantic conventions** (``gen_ai.operation.name``, +``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` / +``output_tokens``, ``gen_ai.tool.name``) and the convention span name +``"{operation} {model}"`` — so :meth:`AgentTrace.to_otel` output drops straight +into an OTLP exporter, while :meth:`summary` rolls up cost/latency for a run. + +It pairs with trajectory evaluation: record the run here, score it there. Pure +standard library (no ``opentelemetry`` dependency); the clock is injectable so +durations are deterministically testable. Imports no ``PySide6``. +""" +import time +from contextlib import contextmanager +from typing import Any, Callable, Dict, Iterator, List, Optional + +STATUS_OK = "ok" +STATUS_ERROR = "error" + + +def _genai_attributes(operation: str, model: Optional[str], + system: Optional[str], input_tokens: Optional[int], + output_tokens: Optional[int], tool_name: Optional[str], + extra: Dict[str, Any]) -> Dict[str, Any]: + attributes: Dict[str, Any] = {"gen_ai.operation.name": operation} + if system is not None: + attributes["gen_ai.system"] = system + if model is not None: + attributes["gen_ai.request.model"] = model + if input_tokens is not None: + attributes["gen_ai.usage.input_tokens"] = int(input_tokens) + if output_tokens is not None: + attributes["gen_ai.usage.output_tokens"] = int(output_tokens) + if tool_name is not None: + attributes["gen_ai.tool.name"] = tool_name + attributes.update(extra) + return attributes + + +class AgentTrace: + """Collects GenAI-convention spans for one agent run.""" + + def __init__(self, clock: Callable[[], float] = time.monotonic) -> None: + """``clock`` returns a monotonic time; injectable for tests.""" + self._clock = clock + self._spans: List[Dict[str, Any]] = [] + + def record(self, operation: str, *, model: Optional[str] = None, + system: Optional[str] = None, + input_tokens: Optional[int] = None, + output_tokens: Optional[int] = None, + tool_name: Optional[str] = None, duration_s: float = 0.0, + status: str = STATUS_OK, + attributes: Optional[Dict[str, Any]] = None) -> Dict[str, Any]: + """Record a completed span and return it.""" + attrs = _genai_attributes(operation, model, system, input_tokens, + output_tokens, tool_name, attributes or {}) + name = f"{operation} {model}" if model else operation + span = {"name": name, "attributes": attrs, + "duration_s": float(duration_s), "status": status} + self._spans.append(span) + return span + + @contextmanager + def operation(self, operation: str, **kwargs: Any + ) -> Iterator[Dict[str, Any]]: + """Time a block as a span; yields a mutable ``fields`` dict. + + Set token counts etc. on the yielded dict (e.g. + ``fields['output_tokens'] = 42``); a raised exception marks the span + ``error`` and re-raises. + """ + fields: Dict[str, Any] = {} + start = self._clock() + try: + yield fields + except Exception: + self.record(operation, duration_s=self._clock() - start, + status=STATUS_ERROR, **kwargs, **fields) + raise + self.record(operation, duration_s=self._clock() - start, + status=STATUS_OK, **kwargs, **fields) + + def spans(self) -> List[Dict[str, Any]]: + """Return a copy of the recorded spans.""" + return [dict(span) for span in self._spans] + + def summary(self) -> Dict[str, Any]: + """Roll up span count, errors, token usage, and total duration.""" + def _tokens(key: str) -> int: + return sum(int(s["attributes"].get(key, 0)) for s in self._spans) + return { + "span_count": len(self._spans), + "error_count": sum(1 for s in self._spans + if s["status"] == STATUS_ERROR), + "input_tokens": _tokens("gen_ai.usage.input_tokens"), + "output_tokens": _tokens("gen_ai.usage.output_tokens"), + "duration_s": sum(s["duration_s"] for s in self._spans), + } + + def to_otel(self) -> List[Dict[str, Any]]: + """Export spans in an OTLP-friendly shape with an OTel status code.""" + return [{ + "name": s["name"], "kind": "CLIENT", "attributes": s["attributes"], + "duration_s": s["duration_s"], + "status": {"code": "ERROR" if s["status"] == STATUS_ERROR + else "OK"}, + } for s in self._spans] + + def reset(self) -> None: + """Drop all recorded spans.""" + self._spans.clear() + + +default_trace = AgentTrace() + + +def reset_trace() -> None: + """Clear the module-level :data:`default_trace`.""" + default_trace.reset() diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index 77bf303..fe1cc59 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -3023,6 +3023,39 @@ def _compliance_report(evidence: Any, frameworks: Any = None, return report +def _trace_record(operation: str, model: Optional[str] = None, + system: Optional[str] = None, + input_tokens: Optional[int] = None, + output_tokens: Optional[int] = None, + tool_name: Optional[str] = None, duration_s: float = 0.0, + status: str = "ok") -> Dict[str, Any]: + """Adapter: record a GenAI-convention span on the default agent trace.""" + from je_auto_control.utils.agent_trace import default_trace + return default_trace.record( + operation, model=model, system=system, input_tokens=input_tokens, + output_tokens=output_tokens, tool_name=tool_name, + duration_s=duration_s, status=status) + + +def _trace_summary() -> Dict[str, Any]: + """Adapter: roll up the default agent trace (count/tokens/duration).""" + from je_auto_control.utils.agent_trace import default_trace + return default_trace.summary() + + +def _trace_export() -> Dict[str, Any]: + """Adapter: export the default agent trace in OTLP-friendly shape.""" + from je_auto_control.utils.agent_trace import default_trace + return {"spans": default_trace.to_otel()} + + +def _trace_reset() -> Dict[str, Any]: + """Adapter: clear the default agent trace.""" + from je_auto_control.utils.agent_trace import reset_trace + reset_trace() + return {"reset": True} + + class Executor: """ Executor @@ -3270,6 +3303,10 @@ def __init__(self): "AC_pending_artifacts": _pending_artifacts, "AC_evaluate_trajectory": _evaluate_trajectory, "AC_compliance_report": _compliance_report, + "AC_trace_record": _trace_record, + "AC_trace_summary": _trace_summary, + "AC_trace_export": _trace_export, + "AC_trace_reset": _trace_reset, "AC_a11y_record_start": _a11y_record_start, "AC_a11y_record_stop": _a11y_record_stop, "AC_a11y_record_events": _a11y_record_events, diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index 8d20156..451cfe1 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -2775,6 +2775,53 @@ def compliance_tools() -> List[MCPTool]: ] +def agent_trace_tools() -> List[MCPTool]: + return [ + MCPTool( + name="ac_trace_record", + description=("Record a GenAI-convention span on the default agent " + "trace: 'operation' (e.g. chat/tool), optional model, " + "system, input_tokens, output_tokens, tool_name, " + "duration_s, status (ok/error). Returns the span."), + input_schema=schema( + {"operation": {"type": "string"}, + "model": {"type": "string"}, "system": {"type": "string"}, + "input_tokens": {"type": "integer"}, + "output_tokens": {"type": "integer"}, + "tool_name": {"type": "string"}, + "duration_s": {"type": "number"}, + "status": {"type": "string", "enum": ["ok", "error"]}}, + ["operation"]), + handler=h.trace_record, + annotations=SIDE_EFFECT_ONLY, + ), + MCPTool( + name="ac_trace_summary", + description=("Roll up the default agent trace: span_count, " + "error_count, input_tokens, output_tokens, " + "duration_s."), + input_schema=schema({}), + handler=h.trace_summary, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_trace_export", + description=("Export the default agent trace as OTLP-friendly " + "spans (gen_ai.* attributes). Returns {spans}."), + input_schema=schema({}), + handler=h.trace_export, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_trace_reset", + description="Clear the default agent trace. Returns {reset}.", + input_schema=schema({}), + handler=h.trace_reset, + annotations=SIDE_EFFECT_ONLY, + ), + ] + + def unattended_tools() -> List[MCPTool]: return [ MCPTool( @@ -3834,7 +3881,7 @@ def media_assert_tools() -> List[MCPTool]: ci_annotation_tools, clipboard_history_tools, audit_analysis_tools, process_doc_tools, tween_drag_tools, plugin_sdk_tools, governance_tools, credential_lease_tools, egress_tools, approval_testing_tools, - trajectory_eval_tools, compliance_tools, + trajectory_eval_tools, compliance_tools, agent_trace_tools, screen_record_tools, process_and_shell_tools, remote_desktop_tools, gamepad_tools, usb_passthrough_tools, assertion_tools, data_source_tools, diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index b95cfaf..472cfdf 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -1340,6 +1340,32 @@ def compliance_report(evidence, frameworks=None, path=None, fmt="json"): return report +def trace_record(operation, model=None, system=None, input_tokens=None, + output_tokens=None, tool_name=None, duration_s=0.0, + status="ok"): + from je_auto_control.utils.agent_trace import default_trace + return default_trace.record( + operation, model=model, system=system, input_tokens=input_tokens, + output_tokens=output_tokens, tool_name=tool_name, + duration_s=duration_s, status=status) + + +def trace_summary(): + from je_auto_control.utils.agent_trace import default_trace + return default_trace.summary() + + +def trace_export(): + from je_auto_control.utils.agent_trace import default_trace + return {"spans": default_trace.to_otel()} + + +def trace_reset(): + from je_auto_control.utils.agent_trace import reset_trace + reset_trace() + return {"reset": True} + + def vlm_locate(description: str, screen_region: Optional[List[int]] = None, model: Optional[str] = None) -> Optional[List[int]]: diff --git a/test/unit_test/headless/test_agent_trace_batch.py b/test/unit_test/headless/test_agent_trace_batch.py new file mode 100644 index 0000000..d851122 --- /dev/null +++ b/test/unit_test/headless/test_agent_trace_batch.py @@ -0,0 +1,114 @@ +"""Headless tests for GenAI-convention agent tracing. The clock is injected, +so durations are deterministic. Pure stdlib, no Qt imports.""" +import pytest + +import je_auto_control as ac +from je_auto_control.utils.agent_trace import AgentTrace + + +class _Clock: + def __init__(self): + self.now = 0.0 + + def __call__(self): + return self.now + + +def test_record_uses_genai_conventions(): + trace = AgentTrace() + span = trace.record("chat", model="claude-opus-4-8", system="anthropic", + input_tokens=100, output_tokens=20) + assert span["name"] == "chat claude-opus-4-8" + attrs = span["attributes"] + assert attrs["gen_ai.operation.name"] == "chat" + assert attrs["gen_ai.system"] == "anthropic" + assert attrs["gen_ai.request.model"] == "claude-opus-4-8" + assert attrs["gen_ai.usage.input_tokens"] == 100 + assert attrs["gen_ai.usage.output_tokens"] == 20 + + +def test_summary_aggregates_tokens_and_errors(): + trace = AgentTrace() + trace.record("chat", model="m", input_tokens=10, output_tokens=5) + trace.record("chat", model="m", input_tokens=7, output_tokens=3) + trace.record("tool", tool_name="search", status="error") + summary = trace.summary() + assert summary["span_count"] == 3 + assert summary["error_count"] == 1 + assert summary["input_tokens"] == 17 + assert summary["output_tokens"] == 8 + + +def test_operation_context_times_and_records(): + clock = _Clock() + trace = AgentTrace(clock=clock) + with trace.operation("chat", model="m") as fields: + clock.now += 2.5 + fields["output_tokens"] = 42 + span = trace.spans()[0] + assert span["duration_s"] == pytest.approx(2.5) + assert span["status"] == "ok" + assert span["attributes"]["gen_ai.usage.output_tokens"] == 42 + + +def test_operation_marks_error_on_exception(): + trace = AgentTrace(clock=_Clock()) + with pytest.raises(ValueError): + with trace.operation("tool", tool_name="x"): + raise ValueError("boom") + span = trace.spans()[0] + assert span["status"] == "error" + assert span["attributes"]["gen_ai.tool.name"] == "x" + + +def test_to_otel_shape(): + trace = AgentTrace() + trace.record("chat", model="m", status="error") + otel = trace.to_otel() + assert otel[0]["kind"] == "CLIENT" + assert otel[0]["status"]["code"] == "ERROR" + assert otel[0]["attributes"]["gen_ai.operation.name"] == "chat" + + +def test_reset_clears(): + trace = AgentTrace() + trace.record("chat") + trace.reset() + assert trace.spans() == [] + + +# --- wiring --------------------------------------------------------------- + +def test_executor_round_trip(): + ac.execute_action([["AC_trace_reset", {}]]) + ac.execute_action([[ + "AC_trace_record", + {"operation": "chat", "model": "m", "input_tokens": 5, + "output_tokens": 2}, + ]]) + rec = ac.execute_action([["AC_trace_summary", {}]]) + summary = next(v for v in rec.values() if isinstance(v, dict)) + assert summary["span_count"] == 1 + assert summary["input_tokens"] == 5 + ac.execute_action([["AC_trace_reset", {}]]) # leave global state clean + + +def test_wiring(): + known = ac.executor.known_commands() + assert {"AC_trace_record", "AC_trace_summary", "AC_trace_export", + "AC_trace_reset"} <= known + from je_auto_control.utils.mcp_server.tools import ( + build_default_tool_registry) + names = {t.name for t in build_default_tool_registry()} + assert {"ac_trace_record", "ac_trace_summary", "ac_trace_export", + "ac_trace_reset"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + cmds = {s.command for s in _build_specs()} + assert {"AC_trace_record", "AC_trace_summary", "AC_trace_export", + "AC_trace_reset"} <= cmds + + +def test_facade_exports(): + for attr in ("AgentTrace", "default_trace", "reset_trace"): + assert hasattr(ac, attr) + assert attr in ac.__all__