Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-20) — Fuzzy String Matching & Dedupe](#whats-new-2026-06-20--fuzzy-string-matching--dedupe)
- [What's new (2026-06-19) — Video Step-Overlay Report](#whats-new-2026-06-19--video-step-overlay-report)
- [What's new (2026-06-19) — Agent Observability (GenAI OpenTelemetry Spans)](#whats-new-2026-06-19--agent-observability-genai-opentelemetry-spans)
- [What's new (2026-06-19) — Compliance Control Report (SOC2 / ISO 27001)](#whats-new-2026-06-19--compliance-control-report-soc2--iso-27001)
Expand Down Expand Up @@ -92,6 +93,12 @@

---

## What's new (2026-06-20) — Fuzzy String Matching & Dedupe

Match noisy OCR/UI text robustly. Full reference: [`docs/source/Eng/doc/new_features/v40_features_doc.rst`](docs/source/Eng/doc/new_features/v40_features_doc.rst).

- **`fuzzy_ratio` / `fuzzy_best_match` / `fuzzy_matches` / `fuzzy_dedupe`** (`AC_fuzzy_ratio` / `AC_fuzzy_best_match` / `AC_fuzzy_dedupe`, `ac_*`): score similarity (0..1), pick the closest candidate from a list, or collapse near-duplicates — so a flow can act on "the button that *looks like* Submit" rather than an exact label. The default backend is stdlib `difflib` (**zero extra deps**); the optional `[fuzzy]` extra adds `rapidfuzz` for speed, with scores normalised either way. `ignore_case` and `score_cutoff` supported.

## What's new (2026-06-19) — Video Step-Overlay Report

Caption screenshots into a walkthrough video. Full reference: [`docs/source/Eng/doc/new_features/v39_features_doc.rst`](docs/source/Eng/doc/new_features/v39_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-20) — 模糊字符串匹配与去重](#本次更新-2026-06-20--模糊字符串匹配与去重)
- [本次更新 (2026-06-19) — 视频步骤叠加报告](#本次更新-2026-06-19--视频步骤叠加报告)
- [本次更新 (2026-06-19) — Agent 可观测性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可观测性genai-opentelemetry-spans)
- [本次更新 (2026-06-19) — 合规控制报告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合规控制报告soc2--iso-27001)
Expand Down Expand Up @@ -91,6 +92,12 @@

---

## 本次更新 (2026-06-20) — 模糊字符串匹配与去重

稳健匹配含噪声的 OCR/UI 文本。完整参考:[`docs/source/Zh/doc/new_features/v40_features_doc.rst`](../docs/source/Zh/doc/new_features/v40_features_doc.rst)。

- **`fuzzy_ratio` / `fuzzy_best_match` / `fuzzy_matches` / `fuzzy_dedupe`**(`AC_fuzzy_ratio` / `AC_fuzzy_best_match` / `AC_fuzzy_dedupe`、`ac_*`):为相似度评分(0..1)、从列表挑最接近的候选,或收合近似重复 —— 让流程可针对「*看起来像* Submit 的按钮」动作,而非精确标签。默认后端为标准库 `difflib`(**无额外依赖**);可选的 `[fuzzy]` extra 加入 `rapidfuzz` 以加速,两者分数皆归一化。支持 `ignore_case` 与 `score_cutoff`。

## 本次更新 (2026-06-19) — 视频步骤叠加报告

将屏幕截图加上字幕制成走查视频。完整参考:[`docs/source/Zh/doc/new_features/v39_features_doc.rst`](../docs/source/Zh/doc/new_features/v39_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-20) — 模糊字串比對與去重](#本次更新-2026-06-20--模糊字串比對與去重)
- [本次更新 (2026-06-19) — 影片步驟疊加報告](#本次更新-2026-06-19--影片步驟疊加報告)
- [本次更新 (2026-06-19) — Agent 可觀測性(GenAI OpenTelemetry Spans)](#本次更新-2026-06-19--agent-可觀測性genai-opentelemetry-spans)
- [本次更新 (2026-06-19) — 合規控制報告(SOC2 / ISO 27001)](#本次更新-2026-06-19--合規控制報告soc2--iso-27001)
Expand Down Expand Up @@ -91,6 +92,12 @@

---

## 本次更新 (2026-06-20) — 模糊字串比對與去重

穩健比對含雜訊的 OCR/UI 文字。完整參考:[`docs/source/Zh/doc/new_features/v40_features_doc.rst`](../docs/source/Zh/doc/new_features/v40_features_doc.rst)。

- **`fuzzy_ratio` / `fuzzy_best_match` / `fuzzy_matches` / `fuzzy_dedupe`**(`AC_fuzzy_ratio` / `AC_fuzzy_best_match` / `AC_fuzzy_dedupe`、`ac_*`):為相似度評分(0..1)、從清單挑最接近的候選,或收合近似重複 —— 讓流程可針對「*看起來像* Submit 的按鈕」動作,而非精確標籤。預設後端為標準函式庫 `difflib`(**無額外相依**);選用的 `[fuzzy]` extra 加入 `rapidfuzz` 以加速,兩者分數皆正規化。支援 `ignore_case` 與 `score_cutoff`。

## 本次更新 (2026-06-19) — 影片步驟疊加報告

將螢幕截圖加上字幕製成走查影片。完整參考:[`docs/source/Zh/doc/new_features/v39_features_doc.rst`](../docs/source/Zh/doc/new_features/v39_features_doc.rst)。
Expand Down
51 changes: 51 additions & 0 deletions docs/source/Eng/doc/new_features/v40_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Fuzzy String Matching & Dedupe
==============================

Exact string comparison is brittle when text comes from OCR or shifting UI copy.
These helpers score similarity, pick the best candidate from a list, and collapse
near-duplicates — so a flow can act on "the button that *looks like* Submit"
rather than an exact label.

The default backend is the standard library :mod:`difflib`, so the feature works
with **zero extra dependencies**. If the optional ``rapidfuzz`` package is
installed (``pip install je_auto_control[fuzzy]``) it is used instead for speed;
scores are normalised to ``0.0..1.0`` either way, so callers never depend on
which backend ran. ``BACKEND`` names the active one. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import (
fuzzy_ratio, fuzzy_best_match, fuzzy_matches, fuzzy_dedupe)

fuzzy_ratio("Sumbit", "Submit") # ~0.83 (case-insensitive default)

fuzzy_best_match("Sve", ["Cancel", "Save", "Submit"])
# -> ("Save", 0.86, 1) (choice, score, index) — or None below score_cutoff

fuzzy_matches("login", ["login", "logon", "logout"], limit=2)
# -> [("login", 1.0, 0), ("logon", 0.8, 1)] sorted best-first

fuzzy_dedupe(["Invoice", "invoice ", "Receipt"], threshold=0.85)
# -> ["Invoice", "Receipt"] near-duplicates collapse, first kept

All functions take ``ignore_case`` (default ``True``); ``fuzzy_best_match`` /
``fuzzy_matches`` take ``score_cutoff`` to drop weak candidates.

Executor commands
-----------------

================================ ===================================================
Command Effect
================================ ===================================================
``AC_fuzzy_ratio`` ``{score}`` similarity between two strings.
``AC_fuzzy_best_match`` ``{match, score, index}`` (or null) from choices.
``AC_fuzzy_dedupe`` ``{unique}`` with near-duplicates collapsed.
================================ ===================================================

``choices`` / ``items`` accept a list or a JSON-string list (so the visual
builder works). The same operations are exposed as MCP tools (``ac_fuzzy_ratio``
/ ``ac_fuzzy_best_match`` / ``ac_fuzzy_dedupe``) and as Script Builder commands
under **Data**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v37_features_doc
doc/new_features/v38_features_doc
doc/new_features/v39_features_doc
doc/new_features/v40_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
48 changes: 48 additions & 0 deletions docs/source/Zh/doc/new_features/v40_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
模糊字串比對與去重
==================

當文字來自 OCR 或時常變動的 UI 文案時,精確字串比對很脆弱。這些輔助函式為相似度評分、
從清單中挑出最佳候選,並收合近似重複項 —— 讓流程可以針對「*看起來像* Submit 的按鈕」
動作,而非精確標籤。

預設後端為標準函式庫 :mod:`difflib`,因此本功能**無需任何額外相依**即可運作。若安裝了
選用的 ``rapidfuzz`` 套件(``pip install je_auto_control[fuzzy]``)則改用其以加速;無論
何者,分數皆正規化為 ``0.0..1.0``,故呼叫端永不依賴實際執行的後端。``BACKEND`` 標示目
前作用中的後端。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import (
fuzzy_ratio, fuzzy_best_match, fuzzy_matches, fuzzy_dedupe)

fuzzy_ratio("Sumbit", "Submit") # ~0.83(預設不分大小寫)

fuzzy_best_match("Sve", ["Cancel", "Save", "Submit"])
# -> ("Save", 0.86, 1) (choice, score, index) —— 低於 score_cutoff 則為 None

fuzzy_matches("login", ["login", "logon", "logout"], limit=2)
# -> [("login", 1.0, 0), ("logon", 0.8, 1)] 由高分至低分排序

fuzzy_dedupe(["Invoice", "invoice ", "Receipt"], threshold=0.85)
# -> ["Invoice", "Receipt"] 近似重複收合,保留第一個

所有函式皆接受 ``ignore_case``(預設 ``True``);``fuzzy_best_match`` /
``fuzzy_matches`` 接受 ``score_cutoff`` 以濾除弱候選。

執行器指令
----------

================================ ===================================================
指令 效果
================================ ===================================================
``AC_fuzzy_ratio`` 兩字串相似度的 ``{score}``。
``AC_fuzzy_best_match`` 從候選中取 ``{match, score, index}``(或 null)。
``AC_fuzzy_dedupe`` 收合近似重複後的 ``{unique}``。
================================ ===================================================

``choices`` / ``items`` 接受清單或 JSON 字串清單(因此視覺化建構器可用)。相同操作亦提供
為 MCP 工具(``ac_fuzzy_ratio`` / ``ac_fuzzy_best_match`` / ``ac_fuzzy_dedupe``),以及
Script Builder 中 **Data** 分類下的指令。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v37_features_doc
doc/new_features/v38_features_doc
doc/new_features/v39_features_doc
doc/new_features/v40_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,10 @@
from je_auto_control.utils.video_report import (
VideoStep, build_overlay_plan, render_overlay_frame, write_step_video,
)
# Fuzzy string matching / dedupe (difflib default, optional rapidfuzz)
from je_auto_control.utils.fuzzy import (
fuzzy_best_match, fuzzy_dedupe, fuzzy_matches, fuzzy_ratio,
)
# Background popup/interrupt watchdog (unattended automation)
from je_auto_control.utils.watchdog import (
PopupWatchdog, WatchdogRule, default_popup_watchdog,
Expand Down Expand Up @@ -676,6 +680,7 @@ def start_autocontrol_gui(*args, **kwargs):
"AgentTrace", "default_trace", "reset_trace",
"VideoStep", "build_overlay_plan", "render_overlay_frame",
"write_step_video",
"fuzzy_best_match", "fuzzy_dedupe", "fuzzy_matches", "fuzzy_ratio",
# MCP server
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",
Expand Down
35 changes: 35 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -867,6 +867,41 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
),
description="Render captioned screenshots into a walkthrough video.",
))
specs.append(CommandSpec(
"AC_fuzzy_ratio", "Data", "Fuzzy: Similarity Ratio",
fields=(
FieldSpec("left", FieldType.STRING),
FieldSpec("right", FieldType.STRING),
FieldSpec("ignore_case", FieldType.BOOL, optional=True,
default=True),
),
description="Similarity score (0..1) between two strings.",
))
specs.append(CommandSpec(
"AC_fuzzy_best_match", "Data", "Fuzzy: Best Match",
fields=(
FieldSpec("query", FieldType.STRING),
FieldSpec("choices", FieldType.STRING,
placeholder='["Save", "Cancel", "Submit"]'),
FieldSpec("score_cutoff", FieldType.FLOAT, optional=True,
default=0.0),
FieldSpec("ignore_case", FieldType.BOOL, optional=True,
default=True),
),
description="Best fuzzy match of query within choices (JSON list).",
))
specs.append(CommandSpec(
"AC_fuzzy_dedupe", "Data", "Fuzzy: Dedupe",
fields=(
FieldSpec("items", FieldType.STRING,
placeholder='["foo", "foo ", "bar"]'),
FieldSpec("threshold", FieldType.FLOAT, optional=True,
default=0.9),
FieldSpec("ignore_case", FieldType.BOOL, optional=True,
default=True),
),
description="Collapse near-duplicate strings (JSON list).",
))
specs.append(CommandSpec(
"AC_generate_sop", "Report", "Generate SOP Document",
fields=(
Expand Down
34 changes: 34 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3067,6 +3067,37 @@ def _write_step_video(steps: Any, output: str, fps: int = 10,
seconds_per_step=seconds_per_step)


def _coerce_list(value: Any) -> List[Any]:
import json
return json.loads(value) if isinstance(value, str) else list(value)


def _fuzzy_ratio(left: Any, right: Any,
ignore_case: bool = True) -> Dict[str, Any]:
"""Adapter: similarity score (0..1) between two values."""
from je_auto_control.utils.fuzzy import fuzzy_ratio
return {"score": fuzzy_ratio(left, right, ignore_case=ignore_case)}


def _fuzzy_best_match(query: Any, choices: Any, score_cutoff: float = 0.0,
ignore_case: bool = True) -> Dict[str, Any]:
"""Adapter: best fuzzy match from choices, or a null match."""
from je_auto_control.utils.fuzzy import fuzzy_best_match
best = fuzzy_best_match(query, _coerce_list(choices),
score_cutoff=score_cutoff, ignore_case=ignore_case)
if best is None:
return {"match": None, "score": 0.0, "index": -1}
return {"match": best[0], "score": best[1], "index": best[2]}


def _fuzzy_dedupe(items: Any, threshold: float = 0.9,
ignore_case: bool = True) -> Dict[str, Any]:
"""Adapter: drop near-duplicate items, keeping the first of each cluster."""
from je_auto_control.utils.fuzzy import fuzzy_dedupe
return {"unique": fuzzy_dedupe(_coerce_list(items), threshold=threshold,
ignore_case=ignore_case)}


class Executor:
"""
Executor
Expand Down Expand Up @@ -3319,6 +3350,9 @@ def __init__(self):
"AC_trace_export": _trace_export,
"AC_trace_reset": _trace_reset,
"AC_write_step_video": _write_step_video,
"AC_fuzzy_ratio": _fuzzy_ratio,
"AC_fuzzy_best_match": _fuzzy_best_match,
"AC_fuzzy_dedupe": _fuzzy_dedupe,
"AC_a11y_record_start": _a11y_record_start,
"AC_a11y_record_stop": _a11y_record_stop,
"AC_a11y_record_events": _a11y_record_events,
Expand Down
9 changes: 9 additions & 0 deletions je_auto_control/utils/fuzzy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""Fuzzy string matching and dedupe (difflib by default, rapidfuzz if present)."""
from je_auto_control.utils.fuzzy.fuzzy_match import (
BACKEND, fuzzy_best_match, fuzzy_dedupe, fuzzy_matches, fuzzy_ratio,
)

__all__ = [
"BACKEND", "fuzzy_best_match", "fuzzy_dedupe", "fuzzy_matches",
"fuzzy_ratio",
]
85 changes: 85 additions & 0 deletions je_auto_control/utils/fuzzy/fuzzy_match.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""Fuzzy string matching for noisy automation text (OCR labels, table cells).

Exact string comparison is brittle when text comes from OCR or shifting UI
copy. These helpers score similarity, pick the best candidate from a list, and
collapse near-duplicates. The default backend is the standard library
``difflib`` (so the feature works with **zero** extra dependencies); if the
optional ``rapidfuzz`` package is installed it is used instead for speed — the
scores are normalised to ``0.0..1.0`` either way, so callers don't care which
backend ran. :data:`BACKEND` names the active one.

Pure Python; imports no ``PySide6``.
"""
from typing import Any, List, Optional, Sequence, Tuple

try: # optional acceleration; the difflib fallback is always correct
from rapidfuzz import fuzz as _rf

BACKEND = "rapidfuzz"

def _similarity(left: str, right: str) -> float:
return _rf.ratio(left, right) / 100.0
except ImportError: # pragma: no cover - exercised wherever rapidfuzz is absent
from difflib import SequenceMatcher

BACKEND = "difflib"

def _similarity(left: str, right: str) -> float:
return SequenceMatcher(None, left, right).ratio()


def _prepare(value: Any, ignore_case: bool) -> str:
text = str(value)
return text.lower() if ignore_case else text


def fuzzy_ratio(left: Any, right: Any, *, ignore_case: bool = True) -> float:
"""Return a similarity score in ``0.0..1.0`` for two values."""
return _similarity(_prepare(left, ignore_case),
_prepare(right, ignore_case))


def fuzzy_matches(query: Any, choices: Sequence[Any], *, limit: int = 5,
score_cutoff: float = 0.0, ignore_case: bool = True
) -> List[Tuple[Any, float, int]]:
"""Return up to ``limit`` ``(choice, score, index)`` tuples, best first.

Only choices scoring at least ``score_cutoff`` are returned.
"""
prepared_query = _prepare(query, ignore_case)
scored = [
(choice, _similarity(prepared_query, _prepare(choice, ignore_case)),
index)
for index, choice in enumerate(choices)
]
scored = [item for item in scored if item[1] >= score_cutoff]
scored.sort(key=lambda item: item[1], reverse=True)
return scored[:limit] if limit >= 0 else scored


def fuzzy_best_match(query: Any, choices: Sequence[Any], *,
score_cutoff: float = 0.0, ignore_case: bool = True
) -> Optional[Tuple[Any, float, int]]:
"""Return the single best ``(choice, score, index)`` or ``None``."""
ranked = fuzzy_matches(query, choices, limit=1, score_cutoff=score_cutoff,
ignore_case=ignore_case)
return ranked[0] if ranked else None


def fuzzy_dedupe(items: Sequence[Any], *, threshold: float = 0.9,
ignore_case: bool = True) -> List[Any]:
"""Collapse near-duplicate items, keeping the first of each cluster.

An item is dropped when it scores at least ``threshold`` against an item
already kept.
"""
kept: List[Any] = []
kept_prepared: List[str] = []
for item in items:
prepared = _prepare(item, ignore_case)
if any(_similarity(prepared, seen) >= threshold
for seen in kept_prepared):
continue
kept.append(item)
kept_prepared.append(prepared)
return kept
Loading
Loading