Integration-Automation · JE-Chen · Jun 19, 2026 · Jun 19, 2026
diff --git a/README.md b/README.md
@@ -13,6 +13,7 @@
 
 ## Table of Contents
 
+- [What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)](#whats-new-2026-06-19--agent-observability-genai-opentelemetry-spans)
 - [What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)](#whats-new-2026-06-19--compliance-control-report-soc2--iso-27001)
 - [What's new (2026-06-19) — Agent Trajectory Evaluation](#whats-new-2026-06-19--agent-trajectory-evaluation)
 - [What's new (2026-06-19) — Approval Testing (Golden-Master Baselines)](#whats-new-2026-06-19--approval-testing-golden-master-baselines)
@@ -90,6 +91,12 @@
 
 ---
 
+## What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)
+
+OTel GenAI-convention spans for LLM runs. Full reference: [`docs/source/Eng/doc/new_features/v38_features_doc.rst`](docs/source/Eng/doc/new_features/v38_features_doc.rst).
+
+- **`AgentTrace`** (`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`, `ac_*`): records spans whose attributes follow the OpenTelemetry **GenAI semantic conventions** (`gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`, `gen_ai.tool.name`) and the `"{operation} {model}"` span name. `to_otel()` drops into an OTLP exporter; `summary()` rolls up token cost and latency; an `operation()` context manager times live blocks and marks errors. Pure-stdlib (no `opentelemetry` dep), injectable clock; pairs with trajectory evaluation (record here, score there).
+
 ## What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)
 
 Map governance evidence to named controls. Full reference: [`docs/source/Eng/doc/new_features/v37_features_doc.rst`](docs/source/Eng/doc/new_features/v37_features_doc.rst).

diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md
@@ -12,6 +12,7 @@
 
 ## 目录
 
+- [本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可观测性genai-opentelemetry-spans)
 - [本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合规控制报告soc2--iso-27001)
 - [本次更新 (2026-06-19) — Agent 轨迹评估](#本次更新-2026-06-19--agent-轨迹评估)
 - [本次更新 (2026-06-19) — 核准式测试(Golden-Master 基准)](#本次更新-2026-06-19--核准式测试golden-master-基准)
@@ -89,6 +90,12 @@
 
 ---
 
+## 本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)
+
+LLM 运行的 OTel GenAI 惯例 spans。完整参考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。
+
+- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):记录的 span 其属性遵循 OpenTelemetry **GenAI 语意惯例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)与 `"{operation} {model}"` span 名称。`to_otel()` 可送入 OTLP exporter;`summary()` 汇整 token 成本与延迟;`operation()` 上下文管理器为实时区块计时并标记错误。纯标准库(无 `opentelemetry` 依赖)、可注入时钟;与轨迹评估互补(在此记录、在那里评分)。
+
 ## 本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)
 
 将治理证据映射到具名控制项。完整参考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。

diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md
@@ -12,6 +12,7 @@
 
 ## 目錄
 
+- [本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可觀測性genai-opentelemetry-spans)
 - [本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合規控制報告soc2--iso-27001)
 - [本次更新 (2026-06-19) — Agent 軌跡評估](#本次更新-2026-06-19--agent-軌跡評估)
 - [本次更新 (2026-06-19) — 核准式測試(Golden-Master 基準)](#本次更新-2026-06-19--核准式測試golden-master-基準)
@@ -89,6 +90,12 @@
 
 ---
 
+## 本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)
+
+LLM 執行的 OTel GenAI 慣例 spans。完整參考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。
+
+- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):記錄的 span 其屬性遵循 OpenTelemetry **GenAI 語意慣例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)與 `"{operation} {model}"` span 名稱。`to_otel()` 可送入 OTLP exporter;`summary()` 彙整 token 成本與延遲;`operation()` 情境管理器為即時區塊計時並標記錯誤。純標準函式庫(無 `opentelemetry` 相依)、可注入時鐘;與軌跡評估互補(在此記錄、在那裡評分)。
+
 ## 本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)
 
 將治理證據對應到具名控制項。完整參考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。

diff --git a/docs/source/Eng/doc/new_features/v38_features_doc.rst b/docs/source/Eng/doc/new_features/v38_features_doc.rst
@@ -0,0 +1,60 @@
+Agent Observability (GenAI OpenTelemetry Spans)
+===============================================
+
+When automation drives an LLM agent, you want the same observability an
+OpenTelemetry backend gives a service: per-operation spans carrying token usage,
+model, and status. ``AgentTrace`` records spans whose attributes follow the
+OpenTelemetry **GenAI semantic conventions** — ``gen_ai.operation.name``,
+``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` /
+``gen_ai.usage.output_tokens``, ``gen_ai.tool.name`` — and the convention span
+name ``"{operation} {model}"``. :meth:`AgentTrace.to_otel` output drops straight
+into an OTLP exporter, while :meth:`AgentTrace.summary` rolls up cost and latency
+for a run.
+
+It pairs with :doc:`trajectory evaluation <v36_features_doc>` — record the run
+here, score it there. Pure standard library (no ``opentelemetry`` dependency);
+the clock is injectable so durations are deterministically testable. Imports no
+``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import AgentTrace
+
+    trace = AgentTrace()
+    # one-shot record of a completed call:
+    trace.record("chat", model="claude-opus-4-8", system="anthropic",
+                 input_tokens=1200, output_tokens=180, duration_s=0.9)
+
+    # or time a live block; set token counts on the yielded dict:
+    with trace.operation("tool", tool_name="search") as fields:
+        result = run_tool()
+        fields["output_tokens"] = 42        # error inside marks the span error
+
+    print(trace.summary())   # {span_count, error_count, input_tokens, ...}
+    exporter.export(trace.to_otel())        # OTLP-friendly span dicts
+
+``summary`` aggregates ``span_count``, ``error_count``, ``input_tokens``,
+``output_tokens``, and total ``duration_s``. ``to_otel`` returns each span as
+``{name, kind, attributes, duration_s, status:{code}}`` with an OTel status code.
+
+Executor commands
+-----------------
+
+A module-level default trace backs the executor/MCP surfaces so a flow can build
+a trace across steps:
+
+================================ ===================================================
+Command                          Effect
+================================ ===================================================
+``AC_trace_record``              Record a GenAI span (operation/model/tokens/…).
+``AC_trace_summary``             Roll up the default trace.
+``AC_trace_export``              Export the default trace as OTLP spans.
+``AC_trace_reset``               Clear the default trace.
+================================ ===================================================
+
+The same operations are exposed as MCP tools (``ac_trace_record`` /
+``ac_trace_summary`` / ``ac_trace_export`` / ``ac_trace_reset``) and as Script
+Builder commands under **Agent**.
diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst
@@ -60,6 +60,7 @@ Comprehensive guides for all AutoControl features.
    doc/new_features/v35_features_doc
    doc/new_features/v36_features_doc
    doc/new_features/v37_features_doc
+   doc/new_features/v38_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/docs/source/Zh/doc/new_features/v38_features_doc.rst b/docs/source/Zh/doc/new_features/v38_features_doc.rst
@@ -0,0 +1,56 @@
+Agent 可觀測性(GenAI OpenTelemetry Spans)
+==========================================
+
+當自動化驅動 LLM agent 時,你會想要 OpenTelemetry 後端給服務的那種可觀測性:每個操作
+一個 span,帶有 token 用量、模型與狀態。``AgentTrace`` 記錄的 span 其屬性遵循
+OpenTelemetry **GenAI 語意慣例** —— ``gen_ai.operation.name``、``gen_ai.system``、
+``gen_ai.request.model``、``gen_ai.usage.input_tokens`` /
+``gen_ai.usage.output_tokens``、``gen_ai.tool.name`` —— 以及慣例 span 名稱
+``"{operation} {model}"``。:meth:`AgentTrace.to_otel` 的輸出可直接送入 OTLP
+exporter,而 :meth:`AgentTrace.summary` 則彙整一次執行的成本與延遲。
+
+它與 :doc:`軌跡評估 <v36_features_doc>` 互補 —— 在此記錄執行,在那裡評分。純標準函式
+庫(無 ``opentelemetry`` 相依);時鐘可注入,因此持續時間可被確定性地測試。不匯入
+``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import AgentTrace
+
+    trace = AgentTrace()
+    # 一次性記錄已完成的呼叫:
+    trace.record("chat", model="claude-opus-4-8", system="anthropic",
+                 input_tokens=1200, output_tokens=180, duration_s=0.9)
+
+    # 或為即時區塊計時;在 yield 出的 dict 上設定 token 數:
+    with trace.operation("tool", tool_name="search") as fields:
+        result = run_tool()
+        fields["output_tokens"] = 42        # 區塊內若拋出例外則標記為 error
+
+    print(trace.summary())   # {span_count, error_count, input_tokens, ...}
+    exporter.export(trace.to_otel())        # OTLP 友善的 span dict
+
+``summary`` 彙整 ``span_count``、``error_count``、``input_tokens``、
+``output_tokens`` 與總 ``duration_s``。``to_otel`` 將每個 span 回傳為
+``{name, kind, attributes, duration_s, status:{code}}``,帶有 OTel 狀態碼。
+
+執行器指令
+----------
+
+模組層級的預設 trace 支撐 executor/MCP 介面,讓流程可跨步驟建立一條 trace:
+
+================================ ===================================================
+指令                             效果
+================================ ===================================================
+``AC_trace_record``              記錄一個 GenAI span(operation/model/tokens/…)。
+``AC_trace_summary``             彙整預設 trace。
+``AC_trace_export``              將預設 trace 匯出為 OTLP spans。
+``AC_trace_reset``               清除預設 trace。
+================================ ===================================================
+
+相同操作亦提供為 MCP 工具(``ac_trace_record`` / ``ac_trace_summary`` /
+``ac_trace_export`` / ``ac_trace_reset``),以及 Script Builder 中 **Agent** 分類下的
+指令。
diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst
@@ -60,6 +60,7 @@ AutoControl 所有功能的完整使用指南。
    doc/new_features/v35_features_doc
    doc/new_features/v36_features_doc
    doc/new_features/v37_features_doc
+   doc/new_features/v38_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
@@ -222,6 +222,10 @@
 from je_auto_control.utils.compliance import (
     build_compliance_report, render_compliance_html, write_compliance_report,
 )
+# Agent observability: OpenTelemetry GenAI-convention spans
+from je_auto_control.utils.agent_trace import (
+    AgentTrace, default_trace, reset_trace,
+)
 # Background popup/interrupt watchdog (unattended automation)
 from je_auto_control.utils.watchdog import (
     PopupWatchdog, WatchdogRule, default_popup_watchdog,
@@ -665,6 +669,7 @@ def start_autocontrol_gui(*args, **kwargs):
     "evaluate_trajectory",
     "build_compliance_report", "render_compliance_html",
     "write_compliance_report",
+    "AgentTrace", "default_trace", "reset_trace",
     # MCP server
     "AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
     "MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",

diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py
@@ -824,6 +824,37 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
         ),
         description="Map governance evidence to SOC2/ISO 27001 controls.",
     ))
+    specs.append(CommandSpec(
+        "AC_trace_record", "Agent", "Trace: Record Span",
+        fields=(
+            FieldSpec("operation", FieldType.STRING, placeholder="chat"),
+            FieldSpec("model", FieldType.STRING, optional=True),
+            FieldSpec("system", FieldType.STRING, optional=True),
+            FieldSpec("input_tokens", FieldType.INT, optional=True),
+            FieldSpec("output_tokens", FieldType.INT, optional=True),
+            FieldSpec("tool_name", FieldType.STRING, optional=True),
+            FieldSpec("duration_s", FieldType.FLOAT, optional=True,
+                      default=0.0),
+            FieldSpec("status", FieldType.ENUM, optional=True, default="ok",
+                      choices=("ok", "error")),
+        ),
+        description="Record a GenAI-convention span on the default trace.",
+    ))
+    specs.append(CommandSpec(
+        "AC_trace_summary", "Agent", "Trace: Summary",
+        fields=(),
+        description="Roll up the default agent trace (count/tokens/duration).",
+    ))
+    specs.append(CommandSpec(
+        "AC_trace_export", "Agent", "Trace: Export (OTLP)",
+        fields=(),
+        description="Export the default agent trace as OTLP-friendly spans.",
+    ))
+    specs.append(CommandSpec(
+        "AC_trace_reset", "Agent", "Trace: Reset",
+        fields=(),
+        description="Clear the default agent trace.",
+    ))
     specs.append(CommandSpec(
         "AC_generate_sop", "Report", "Generate SOP Document",
         fields=(

diff --git a/je_auto_control/utils/agent_trace/__init__.py b/je_auto_control/utils/agent_trace/__init__.py
@@ -0,0 +1,6 @@
+"""Agent observability: OpenTelemetry GenAI-convention spans for LLM runs."""
+from je_auto_control.utils.agent_trace.agent_trace import (
+    AgentTrace, default_trace, reset_trace,
+)
+
+__all__ = ["AgentTrace", "default_trace", "reset_trace"]
diff --git a/je_auto_control/utils/agent_trace/agent_trace.py b/je_auto_control/utils/agent_trace/agent_trace.py
@@ -0,0 +1,123 @@
+"""Record agent/LLM activity as OpenTelemetry GenAI-convention spans.
+
+When automation drives an LLM agent, you want the same observability an
+OpenTelemetry backend gives a service: per-operation spans carrying token usage,
+model, and status. ``AgentTrace`` records spans whose attributes follow the
+OTel **GenAI semantic conventions** (``gen_ai.operation.name``,
+``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` /
+``output_tokens``, ``gen_ai.tool.name``) and the convention span name
+``"{operation} {model}"`` — so :meth:`AgentTrace.to_otel` output drops straight
+into an OTLP exporter, while :meth:`summary` rolls up cost/latency for a run.
+
+It pairs with trajectory evaluation: record the run here, score it there. Pure
+standard library (no ``opentelemetry`` dependency); the clock is injectable so
+durations are deterministically testable. Imports no ``PySide6``.
+"""
+import time
+from contextlib import contextmanager
+from typing import Any, Callable, Dict, Iterator, List, Optional
+
+STATUS_OK = "ok"
+STATUS_ERROR = "error"
+
+
+def _genai_attributes(operation: str, model: Optional[str],
+                      system: Optional[str], input_tokens: Optional[int],
+                      output_tokens: Optional[int], tool_name: Optional[str],
+                      extra: Dict[str, Any]) -> Dict[str, Any]:
+    attributes: Dict[str, Any] = {"gen_ai.operation.name": operation}
+    if system is not None:
+        attributes["gen_ai.system"] = system
+    if model is not None:
+        attributes["gen_ai.request.model"] = model
+    if input_tokens is not None:
+        attributes["gen_ai.usage.input_tokens"] = int(input_tokens)
+    if output_tokens is not None:
+        attributes["gen_ai.usage.output_tokens"] = int(output_tokens)
+    if tool_name is not None:
+        attributes["gen_ai.tool.name"] = tool_name
+    attributes.update(extra)
+    return attributes
+
+
+class AgentTrace:
+    """Collects GenAI-convention spans for one agent run."""
+
+    def __init__(self, clock: Callable[[], float] = time.monotonic) -> None:
+        """``clock`` returns a monotonic time; injectable for tests."""
+        self._clock = clock
+        self._spans: List[Dict[str, Any]] = []
+
+    def record(self, operation: str, *, model: Optional[str] = None,
+               system: Optional[str] = None,
+               input_tokens: Optional[int] = None,
+               output_tokens: Optional[int] = None,
+               tool_name: Optional[str] = None, duration_s: float = 0.0,
+               status: str = STATUS_OK,
+               attributes: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
+        """Record a completed span and return it."""
+        attrs = _genai_attributes(operation, model, system, input_tokens,
+                                  output_tokens, tool_name, attributes or {})
+        name = f"{operation} {model}" if model else operation
+        span = {"name": name, "attributes": attrs,
+                "duration_s": float(duration_s), "status": status}
+        self._spans.append(span)
+        return span
+
+    @contextmanager
+    def operation(self, operation: str, **kwargs: Any
+                  ) -> Iterator[Dict[str, Any]]:
+        """Time a block as a span; yields a mutable ``fields`` dict.
+
+        Set token counts etc. on the yielded dict (e.g.
+        ``fields['output_tokens'] = 42``); a raised exception marks the span
+        ``error`` and re-raises.
+        """
+        fields: Dict[str, Any] = {}
+        start = self._clock()
+        try:
+            yield fields
+        except Exception:
+            self.record(operation, duration_s=self._clock() - start,
+                        status=STATUS_ERROR, **kwargs, **fields)
+            raise
+        self.record(operation, duration_s=self._clock() - start,
+                    status=STATUS_OK, **kwargs, **fields)
+
+    def spans(self) -> List[Dict[str, Any]]:
+        """Return a copy of the recorded spans."""
+        return [dict(span) for span in self._spans]
+
+    def summary(self) -> Dict[str, Any]:
+        """Roll up span count, errors, token usage, and total duration."""
+        def _tokens(key: str) -> int:
+            return sum(int(s["attributes"].get(key, 0)) for s in self._spans)
+        return {
+            "span_count": len(self._spans),
+            "error_count": sum(1 for s in self._spans
+                               if s["status"] == STATUS_ERROR),
+            "input_tokens": _tokens("gen_ai.usage.input_tokens"),
+            "output_tokens": _tokens("gen_ai.usage.output_tokens"),
+            "duration_s": sum(s["duration_s"] for s in self._spans),
+        }
+
+    def to_otel(self) -> List[Dict[str, Any]]:
+        """Export spans in an OTLP-friendly shape with an OTel status code."""
+        return [{
+            "name": s["name"], "kind": "CLIENT", "attributes": s["attributes"],
+            "duration_s": s["duration_s"],
+            "status": {"code": "ERROR" if s["status"] == STATUS_ERROR
+                       else "OK"},
+        } for s in self._spans]
+
+    def reset(self) -> None:
+        """Drop all recorded spans."""
+        self._spans.clear()
+
+
+default_trace = AgentTrace()
+
+
+def reset_trace() -> None:
+    """Clear the module-level :data:`default_trace`."""
+    default_trace.reset()