Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)](#whats-new-2026-06-19--agent-observability-genai-opentelemetry-spans)
- [What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)](#whats-new-2026-06-19--compliance-control-report-soc2--iso-27001)
- [What's new (2026-06-19) — Agent Trajectory Evaluation](#whats-new-2026-06-19--agent-trajectory-evaluation)
- [What's new (2026-06-19) — Approval Testing (Golden-Master Baselines)](#whats-new-2026-06-19--approval-testing-golden-master-baselines)
Expand Down Expand Up @@ -90,6 +91,12 @@

---

## What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)

OTel GenAI-convention spans for LLM runs. Full reference: [`docs/source/Eng/doc/new_features/v38_features_doc.rst`](docs/source/Eng/doc/new_features/v38_features_doc.rst).

- **`AgentTrace`** (`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`, `ac_*`): records spans whose attributes follow the OpenTelemetry **GenAI semantic conventions** (`gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`, `gen_ai.tool.name`) and the `"{operation} {model}"` span name. `to_otel()` drops into an OTLP exporter; `summary()` rolls up token cost and latency; an `operation()` context manager times live blocks and marks errors. Pure-stdlib (no `opentelemetry` dep), injectable clock; pairs with trajectory evaluation (record here, score there).

## What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)

Map governance evidence to named controls. Full reference: [`docs/source/Eng/doc/new_features/v37_features_doc.rst`](docs/source/Eng/doc/new_features/v37_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可观测性genai-opentelemetry-spans)
- [本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合规控制报告soc2--iso-27001)
- [本次更新 (2026-06-19) — Agent 轨迹评估](#本次更新-2026-06-19--agent-轨迹评估)
- [本次更新 (2026-06-19) — 核准式测试(Golden-Master 基准)](#本次更新-2026-06-19--核准式测试golden-master-基准)
Expand Down Expand Up @@ -89,6 +90,12 @@

---

## 本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)

LLM 运行的 OTel GenAI 惯例 spans。完整参考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。

- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):记录的 span 其属性遵循 OpenTelemetry **GenAI 语意惯例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)与 `"{operation} {model}"` span 名称。`to_otel()` 可送入 OTLP exporter;`summary()` 汇整 token 成本与延迟;`operation()` 上下文管理器为实时区块计时并标记错误。纯标准库(无 `opentelemetry` 依赖)、可注入时钟;与轨迹评估互补(在此记录、在那里评分)。

## 本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)

将治理证据映射到具名控制项。完整参考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可觀測性genai-opentelemetry-spans)
- [本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合規控制報告soc2--iso-27001)
- [本次更新 (2026-06-19) — Agent 軌跡評估](#本次更新-2026-06-19--agent-軌跡評估)
- [本次更新 (2026-06-19) — 核准式測試(Golden-Master 基準)](#本次更新-2026-06-19--核准式測試golden-master-基準)
Expand Down Expand Up @@ -89,6 +90,12 @@

---

## 本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)

LLM 執行的 OTel GenAI 慣例 spans。完整參考:[`docs/source/Zh/doc/new_features/v38_features_doc.rst`](../docs/source/Zh/doc/new_features/v38_features_doc.rst)。

- **`AgentTrace`**(`AC_trace_record` / `AC_trace_summary` / `AC_trace_export` / `AC_trace_reset`、`ac_*`):記錄的 span 其屬性遵循 OpenTelemetry **GenAI 語意慣例**(`gen_ai.operation.name`、`gen_ai.system`、`gen_ai.request.model`、`gen_ai.usage.input_tokens`/`output_tokens`、`gen_ai.tool.name`)與 `"{operation} {model}"` span 名稱。`to_otel()` 可送入 OTLP exporter;`summary()` 彙整 token 成本與延遲;`operation()` 情境管理器為即時區塊計時並標記錯誤。純標準函式庫(無 `opentelemetry` 相依)、可注入時鐘;與軌跡評估互補(在此記錄、在那裡評分)。

## 本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)

將治理證據對應到具名控制項。完整參考:[`docs/source/Zh/doc/new_features/v37_features_doc.rst`](../docs/source/Zh/doc/new_features/v37_features_doc.rst)。
Expand Down
60 changes: 60 additions & 0 deletions docs/source/Eng/doc/new_features/v38_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Agent Observability (GenAI OpenTelemetry Spans)
===============================================

When automation drives an LLM agent, you want the same observability an
OpenTelemetry backend gives a service: per-operation spans carrying token usage,
model, and status. ``AgentTrace`` records spans whose attributes follow the
OpenTelemetry **GenAI semantic conventions** — ``gen_ai.operation.name``,
``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` /
``gen_ai.usage.output_tokens``, ``gen_ai.tool.name`` — and the convention span
name ``"{operation} {model}"``. :meth:`AgentTrace.to_otel` output drops straight
into an OTLP exporter, while :meth:`AgentTrace.summary` rolls up cost and latency
for a run.

It pairs with :doc:`trajectory evaluation <v36_features_doc>` — record the run
here, score it there. Pure standard library (no ``opentelemetry`` dependency);
the clock is injectable so durations are deterministically testable. Imports no
``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import AgentTrace

trace = AgentTrace()
# one-shot record of a completed call:
trace.record("chat", model="claude-opus-4-8", system="anthropic",
input_tokens=1200, output_tokens=180, duration_s=0.9)

# or time a live block; set token counts on the yielded dict:
with trace.operation("tool", tool_name="search") as fields:
result = run_tool()
fields["output_tokens"] = 42 # error inside marks the span error

print(trace.summary()) # {span_count, error_count, input_tokens, ...}
exporter.export(trace.to_otel()) # OTLP-friendly span dicts

``summary`` aggregates ``span_count``, ``error_count``, ``input_tokens``,
``output_tokens``, and total ``duration_s``. ``to_otel`` returns each span as
``{name, kind, attributes, duration_s, status:{code}}`` with an OTel status code.

Executor commands
-----------------

A module-level default trace backs the executor/MCP surfaces so a flow can build
a trace across steps:

================================ ===================================================
Command Effect
================================ ===================================================
``AC_trace_record`` Record a GenAI span (operation/model/tokens/…).
``AC_trace_summary`` Roll up the default trace.
``AC_trace_export`` Export the default trace as OTLP spans.
``AC_trace_reset`` Clear the default trace.
================================ ===================================================

The same operations are exposed as MCP tools (``ac_trace_record`` /
``ac_trace_summary`` / ``ac_trace_export`` / ``ac_trace_reset``) and as Script
Builder commands under **Agent**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v35_features_doc
doc/new_features/v36_features_doc
doc/new_features/v37_features_doc
doc/new_features/v38_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
56 changes: 56 additions & 0 deletions docs/source/Zh/doc/new_features/v38_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Agent 可觀測性(GenAI OpenTelemetry Spans)
==========================================

當自動化驅動 LLM agent 時,你會想要 OpenTelemetry 後端給服務的那種可觀測性:每個操作
一個 span,帶有 token 用量、模型與狀態。``AgentTrace`` 記錄的 span 其屬性遵循
OpenTelemetry **GenAI 語意慣例** —— ``gen_ai.operation.name``、``gen_ai.system``、
``gen_ai.request.model``、``gen_ai.usage.input_tokens`` /
``gen_ai.usage.output_tokens``、``gen_ai.tool.name`` —— 以及慣例 span 名稱
``"{operation} {model}"``。:meth:`AgentTrace.to_otel` 的輸出可直接送入 OTLP
exporter,而 :meth:`AgentTrace.summary` 則彙整一次執行的成本與延遲。

它與 :doc:`軌跡評估 <v36_features_doc>` 互補 —— 在此記錄執行,在那裡評分。純標準函式
庫(無 ``opentelemetry`` 相依);時鐘可注入,因此持續時間可被確定性地測試。不匯入
``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import AgentTrace

trace = AgentTrace()
# 一次性記錄已完成的呼叫:
trace.record("chat", model="claude-opus-4-8", system="anthropic",
input_tokens=1200, output_tokens=180, duration_s=0.9)

# 或為即時區塊計時;在 yield 出的 dict 上設定 token 數:
with trace.operation("tool", tool_name="search") as fields:
result = run_tool()
fields["output_tokens"] = 42 # 區塊內若拋出例外則標記為 error

print(trace.summary()) # {span_count, error_count, input_tokens, ...}
exporter.export(trace.to_otel()) # OTLP 友善的 span dict

``summary`` 彙整 ``span_count``、``error_count``、``input_tokens``、
``output_tokens`` 與總 ``duration_s``。``to_otel`` 將每個 span 回傳為
``{name, kind, attributes, duration_s, status:{code}}``,帶有 OTel 狀態碼。

執行器指令
----------

模組層級的預設 trace 支撐 executor/MCP 介面,讓流程可跨步驟建立一條 trace:

================================ ===================================================
指令 效果
================================ ===================================================
``AC_trace_record`` 記錄一個 GenAI span(operation/model/tokens/…)。
``AC_trace_summary`` 彙整預設 trace。
``AC_trace_export`` 將預設 trace 匯出為 OTLP spans。
``AC_trace_reset`` 清除預設 trace。
================================ ===================================================

相同操作亦提供為 MCP 工具(``ac_trace_record`` / ``ac_trace_summary`` /
``ac_trace_export`` / ``ac_trace_reset``),以及 Script Builder 中 **Agent** 分類下的
指令。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v35_features_doc
doc/new_features/v36_features_doc
doc/new_features/v37_features_doc
doc/new_features/v38_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,10 @@
from je_auto_control.utils.compliance import (
build_compliance_report, render_compliance_html, write_compliance_report,
)
# Agent observability: OpenTelemetry GenAI-convention spans
from je_auto_control.utils.agent_trace import (
AgentTrace, default_trace, reset_trace,
)
# Background popup/interrupt watchdog (unattended automation)
from je_auto_control.utils.watchdog import (
PopupWatchdog, WatchdogRule, default_popup_watchdog,
Expand Down Expand Up @@ -665,6 +669,7 @@ def start_autocontrol_gui(*args, **kwargs):
"evaluate_trajectory",
"build_compliance_report", "render_compliance_html",
"write_compliance_report",
"AgentTrace", "default_trace", "reset_trace",
# MCP server
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",
Expand Down
31 changes: 31 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,37 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
),
description="Map governance evidence to SOC2/ISO 27001 controls.",
))
specs.append(CommandSpec(
"AC_trace_record", "Agent", "Trace: Record Span",
fields=(
FieldSpec("operation", FieldType.STRING, placeholder="chat"),
FieldSpec("model", FieldType.STRING, optional=True),
FieldSpec("system", FieldType.STRING, optional=True),
FieldSpec("input_tokens", FieldType.INT, optional=True),
FieldSpec("output_tokens", FieldType.INT, optional=True),
FieldSpec("tool_name", FieldType.STRING, optional=True),
FieldSpec("duration_s", FieldType.FLOAT, optional=True,
default=0.0),
FieldSpec("status", FieldType.ENUM, optional=True, default="ok",
choices=("ok", "error")),
),
description="Record a GenAI-convention span on the default trace.",
))
specs.append(CommandSpec(
"AC_trace_summary", "Agent", "Trace: Summary",
fields=(),
description="Roll up the default agent trace (count/tokens/duration).",
))
specs.append(CommandSpec(
"AC_trace_export", "Agent", "Trace: Export (OTLP)",
fields=(),
description="Export the default agent trace as OTLP-friendly spans.",
))
specs.append(CommandSpec(
"AC_trace_reset", "Agent", "Trace: Reset",
fields=(),
description="Clear the default agent trace.",
))
specs.append(CommandSpec(
"AC_generate_sop", "Report", "Generate SOP Document",
fields=(
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/agent_trace/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Agent observability: OpenTelemetry GenAI-convention spans for LLM runs."""
from je_auto_control.utils.agent_trace.agent_trace import (
AgentTrace, default_trace, reset_trace,
)

__all__ = ["AgentTrace", "default_trace", "reset_trace"]
123 changes: 123 additions & 0 deletions je_auto_control/utils/agent_trace/agent_trace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
"""Record agent/LLM activity as OpenTelemetry GenAI-convention spans.

When automation drives an LLM agent, you want the same observability an
OpenTelemetry backend gives a service: per-operation spans carrying token usage,
model, and status. ``AgentTrace`` records spans whose attributes follow the
OTel **GenAI semantic conventions** (``gen_ai.operation.name``,
``gen_ai.system``, ``gen_ai.request.model``, ``gen_ai.usage.input_tokens`` /
``output_tokens``, ``gen_ai.tool.name``) and the convention span name
``"{operation} {model}"`` — so :meth:`AgentTrace.to_otel` output drops straight
into an OTLP exporter, while :meth:`summary` rolls up cost/latency for a run.

It pairs with trajectory evaluation: record the run here, score it there. Pure
standard library (no ``opentelemetry`` dependency); the clock is injectable so
durations are deterministically testable. Imports no ``PySide6``.
"""
import time
from contextlib import contextmanager
from typing import Any, Callable, Dict, Iterator, List, Optional

STATUS_OK = "ok"
STATUS_ERROR = "error"


def _genai_attributes(operation: str, model: Optional[str],
system: Optional[str], input_tokens: Optional[int],
output_tokens: Optional[int], tool_name: Optional[str],
extra: Dict[str, Any]) -> Dict[str, Any]:
attributes: Dict[str, Any] = {"gen_ai.operation.name": operation}
if system is not None:
attributes["gen_ai.system"] = system
if model is not None:
attributes["gen_ai.request.model"] = model
if input_tokens is not None:
attributes["gen_ai.usage.input_tokens"] = int(input_tokens)
if output_tokens is not None:
attributes["gen_ai.usage.output_tokens"] = int(output_tokens)
if tool_name is not None:
attributes["gen_ai.tool.name"] = tool_name
attributes.update(extra)
return attributes


class AgentTrace:
"""Collects GenAI-convention spans for one agent run."""

def __init__(self, clock: Callable[[], float] = time.monotonic) -> None:
"""``clock`` returns a monotonic time; injectable for tests."""
self._clock = clock
self._spans: List[Dict[str, Any]] = []

def record(self, operation: str, *, model: Optional[str] = None,
system: Optional[str] = None,
input_tokens: Optional[int] = None,
output_tokens: Optional[int] = None,
tool_name: Optional[str] = None, duration_s: float = 0.0,
status: str = STATUS_OK,
attributes: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
"""Record a completed span and return it."""
attrs = _genai_attributes(operation, model, system, input_tokens,
output_tokens, tool_name, attributes or {})
name = f"{operation} {model}" if model else operation
span = {"name": name, "attributes": attrs,
"duration_s": float(duration_s), "status": status}
self._spans.append(span)
return span

@contextmanager
def operation(self, operation: str, **kwargs: Any
) -> Iterator[Dict[str, Any]]:
"""Time a block as a span; yields a mutable ``fields`` dict.

Set token counts etc. on the yielded dict (e.g.
``fields['output_tokens'] = 42``); a raised exception marks the span
``error`` and re-raises.
"""
fields: Dict[str, Any] = {}
start = self._clock()
try:
yield fields
except Exception:
self.record(operation, duration_s=self._clock() - start,
status=STATUS_ERROR, **kwargs, **fields)
raise
self.record(operation, duration_s=self._clock() - start,
status=STATUS_OK, **kwargs, **fields)

def spans(self) -> List[Dict[str, Any]]:
"""Return a copy of the recorded spans."""
return [dict(span) for span in self._spans]

def summary(self) -> Dict[str, Any]:
"""Roll up span count, errors, token usage, and total duration."""
def _tokens(key: str) -> int:
return sum(int(s["attributes"].get(key, 0)) for s in self._spans)
return {
"span_count": len(self._spans),
"error_count": sum(1 for s in self._spans
if s["status"] == STATUS_ERROR),
"input_tokens": _tokens("gen_ai.usage.input_tokens"),
"output_tokens": _tokens("gen_ai.usage.output_tokens"),
"duration_s": sum(s["duration_s"] for s in self._spans),
}

def to_otel(self) -> List[Dict[str, Any]]:
"""Export spans in an OTLP-friendly shape with an OTel status code."""
return [{
"name": s["name"], "kind": "CLIENT", "attributes": s["attributes"],
"duration_s": s["duration_s"],
"status": {"code": "ERROR" if s["status"] == STATUS_ERROR
else "OK"},
} for s in self._spans]

def reset(self) -> None:
"""Drop all recorded spans."""
self._spans.clear()


default_trace = AgentTrace()


def reset_trace() -> None:
"""Clear the module-level :data:`default_trace`."""
default_trace.reset()
Loading
Loading