Integration-Automation · JE-Chen · Jun 22, 2026 · Jun 22, 2026
diff --git a/README.md b/README.md
@@ -13,6 +13,7 @@
 
 ## Table of Contents
 
+- [What's new (2026-06-22) — Confusable / Homoglyph Detection](#whats-new-2026-06-22--confusable--homoglyph-detection)
 - [What's new (2026-06-22) — Locale-Aware String Collation](#whats-new-2026-06-22--locale-aware-string-collation)
 - [What's new (2026-06-22) — Transactional Outbox](#whats-new-2026-06-22--transactional-outbox)
 - [What's new (2026-06-22) — Optimistic-Concurrency Versioned Store](#whats-new-2026-06-22--optimistic-concurrency-versioned-store)
@@ -161,6 +162,12 @@
 
 ---
 
+## What's new (2026-06-22) — Confusable / Homoglyph Detection
+
+Catch Unicode visual spoofing (IDN-homograph phishing, lookalike labels). Full reference: [`docs/source/Eng/doc/new_features/v109_features_doc.rst`](docs/source/Eng/doc/new_features/v109_features_doc.rst).
+
+- **`confusable_skeleton` / `is_confusable` / `detect_homoglyphs` / `is_mixed_script` / `scripts_of`** (`AC_confusable_scan`, `AC_confusable_compare`): a Cyrillic `"а"` is pixel-for-pixel a Latin `"a"`, so `"pаypal"` reads as `"paypal"` yet compares unequal. Following Unicode TR39, this folds confusables to a prototype skeleton (strings match when skeletons match) and flags mixed-script tokens. Pure-stdlib (`unicodedata`), deterministic.
+
 ## What's new (2026-06-22) — Locale-Aware String Collation
 
 Sort strings the way a reader of the language expects. Full reference: [`docs/source/Eng/doc/new_features/v108_features_doc.rst`](docs/source/Eng/doc/new_features/v108_features_doc.rst).

diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md
@@ -12,6 +12,7 @@
 
 ## 目录
 
+- [本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测](#本次更新-2026-06-22--易混淆字符--同形异义字检测)
 - [本次更新 (2026-06-22) — 区域感知字符串排序](#本次更新-2026-06-22--区域感知字符串排序)
 - [本次更新 (2026-06-22) — 事务型 Outbox](#本次更新-2026-06-22--事务型-outbox)
 - [本次更新 (2026-06-22) — 乐观并发版本存储](#本次更新-2026-06-22--乐观并发版本存储)
@@ -164,6 +165,12 @@
 
 平滑噪声值序列。完整参考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。
 
+## 本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测
+
+抓出 Unicode 视觉仿冒(IDN 同形异义字钓鱼、仿冒标签)。完整参考:[`docs/source/Zh/doc/new_features/v109_features_doc.rst`](../docs/source/Zh/doc/new_features/v109_features_doc.rst)。
+
+- **`confusable_skeleton` / `is_confusable` / `detect_homoglyphs` / `is_mixed_script` / `scripts_of`**(`AC_confusable_scan`、`AC_confusable_compare`):西里尔字母 `"а"` 与拉丁字母 `"a"` 在像素上相同,因此 `"pаypal"` 读来是 `"paypal"` 却比较不相等。参照 Unicode TR39,本功能将易混淆字折叠为原型骨架(骨架相同即相符),并标记混用文字系统的令牌。纯标准库(`unicodedata`)、确定。
+
 ## 本次更新 (2026-06-22) — 区域感知字符串排序
 
 依某语言读者的期望排序字符串。完整参考:[`docs/source/Zh/doc/new_features/v108_features_doc.rst`](../docs/source/Zh/doc/new_features/v108_features_doc.rst)。

diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md
@@ -12,6 +12,7 @@
 
 ## 目錄
 
+- [本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測](#本次更新-2026-06-22--易混淆字元--同形異義字偵測)
 - [本次更新 (2026-06-22) — 地區感知字串排序](#本次更新-2026-06-22--地區感知字串排序)
 - [本次更新 (2026-06-22) — 交易型 Outbox](#本次更新-2026-06-22--交易型-outbox)
 - [本次更新 (2026-06-22) — 樂觀並行版本儲存](#本次更新-2026-06-22--樂觀並行版本儲存)
@@ -164,6 +165,12 @@
 
 平滑雜訊值序列。完整參考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。
 
+## 本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測
+
+抓出 Unicode 視覺仿冒(IDN 同形異義字釣魚、仿冒標籤)。完整參考:[`docs/source/Zh/doc/new_features/v109_features_doc.rst`](../docs/source/Zh/doc/new_features/v109_features_doc.rst)。
+
+- **`confusable_skeleton` / `is_confusable` / `detect_homoglyphs` / `is_mixed_script` / `scripts_of`**(`AC_confusable_scan`、`AC_confusable_compare`):西里爾字母 `"а"` 與拉丁字母 `"a"` 在像素上相同,因此 `"pаypal"` 讀來是 `"paypal"` 卻比較不相等。參照 Unicode TR39,本功能將易混淆字折疊為原型骨架(骨架相同即相符),並標記混用文字系統的權杖。純標準函式庫(`unicodedata`)、具決定性。
+
 ## 本次更新 (2026-06-22) — 地區感知字串排序
 
 依某語言讀者的期望排序字串。完整參考:[`docs/source/Zh/doc/new_features/v108_features_doc.rst`](../docs/source/Zh/doc/new_features/v108_features_doc.rst)。

diff --git a/docs/source/Eng/doc/new_features/v109_features_doc.rst b/docs/source/Eng/doc/new_features/v109_features_doc.rst
@@ -0,0 +1,45 @@
+Confusable / Homoglyph Detection
+================================
+
+``secrets_scan`` finds secret-shaped tokens and ``guardrail`` screens text for
+prompt injection, but nothing catches *visual* spoofing: a Cyrillic ``"а"``
+(U+0430) is pixel-for-pixel a Latin ``"a"`` (U+0061), so ``"pаypal"`` (with a
+Cyrillic ``а``) reads as ``"paypal"`` to a human yet compares unequal — the basis
+of IDN-homograph phishing and lookalike UI labels.
+
+Following the idea of Unicode TR39, this folds confusable characters to a
+prototype *skeleton* (two strings are confusable when their skeletons match) and
+flags strings that mix scripts. Pure standard library (``unicodedata``); imports
+no ``PySide6``. Every function is pure, so it is fully deterministic in CI.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import (
+        confusable_skeleton, is_confusable, detect_homoglyphs,
+        is_mixed_script, scripts_of,
+    )
+
+    confusable_skeleton("pаypal")          # 'paypal'  (Cyrillic а -> a)
+    is_confusable("pаypal", "paypal")      # True
+    detect_homoglyphs("pаypal")            # [{'index': 1, 'char': 'а', 'prototype': 'a'}]
+    is_mixed_script("pаypal")              # True  (Latin + Cyrillic)
+    scripts_of("pаypal")                   # {'LATIN', 'CYRILLIC'}
+
+``confusable_skeleton`` NFKC-normalises (folding fullwidth, ligatures and math
+alphanumerics) then maps each remaining cross-script lookalike to its Latin
+prototype. ``is_confusable`` is true only for *distinct* strings with equal
+skeletons. ``detect_homoglyphs`` returns the offending characters with their
+position and prototype. ``scripts_of`` / ``is_mixed_script`` classify characters
+by Unicode block (ignoring digits, punctuation and spaces) so a single mixed-
+script token can be flagged on its own.
+
+Executor commands
+-----------------
+
+``AC_confusable_scan`` returns ``{skeleton, homoglyphs, mixed_script, scripts}``
+for one string; ``AC_confusable_compare`` returns ``{confusable}`` for a pair.
+Both are exposed as MCP tools (``ac_confusable_scan`` / ``ac_confusable_compare``)
+and as Script Builder commands under **Data**.
diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst
@@ -131,6 +131,7 @@ Comprehensive guides for all AutoControl features.
    doc/new_features/v106_features_doc
    doc/new_features/v107_features_doc
    doc/new_features/v108_features_doc
+   doc/new_features/v109_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/docs/source/Zh/doc/new_features/v109_features_doc.rst b/docs/source/Zh/doc/new_features/v109_features_doc.rst
@@ -0,0 +1,38 @@
+易混淆字元 / 同形異義字偵測
+==========================
+
+``secrets_scan`` 找出疑似機密的權杖、``guardrail`` 篩檢提示注入,但沒有任何功能能抓出*視覺*仿冒:西里爾字母
+``"а"``(U+0430)與拉丁字母 ``"a"``(U+0061)在像素上完全相同,因此 ``"pаypal"``(其中 ``а`` 為西里爾字母)
+對人類讀來就是 ``"paypal"``,但比較卻不相等——這正是 IDN 同形異義字釣魚與仿冒 UI 標籤的根源。
+
+本功能參照 Unicode TR39 的概念,將易混淆字元折疊為原型*骨架*(skeleton)(兩字串骨架相同即為易混淆),並標記
+混用多種文字系統的字串。純標準函式庫(``unicodedata``);不匯入 ``PySide6``。每個函式皆為純函式,因此在 CI 中
+完全具決定性。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import (
+        confusable_skeleton, is_confusable, detect_homoglyphs,
+        is_mixed_script, scripts_of,
+    )
+
+    confusable_skeleton("pаypal")          # 'paypal'  (西里爾 а -> a)
+    is_confusable("pаypal", "paypal")      # True
+    detect_homoglyphs("pаypal")            # [{'index': 1, 'char': 'а', 'prototype': 'a'}]
+    is_mixed_script("pаypal")              # True  (拉丁 + 西里爾)
+    scripts_of("pаypal")                   # {'LATIN', 'CYRILLIC'}
+
+``confusable_skeleton`` 先以 NFKC 正規化(折疊全形、連字與數學英數字),再將每個剩餘的跨文字系統仿冒字對映到
+其拉丁原型。``is_confusable`` 僅在兩個*不同*字串骨架相同時為真。``detect_homoglyphs`` 回傳有問題的字元連同其
+位置與原型。``scripts_of`` / ``is_mixed_script`` 依 Unicode 區塊將字元分類(忽略數字、標點與空白),因此可單獨
+標記一個混用文字系統的權杖。
+
+執行器命令
+----------
+
+``AC_confusable_scan`` 對單一字串回傳 ``{skeleton, homoglyphs, mixed_script, scripts}``;``AC_confusable_compare``
+對一組字串回傳 ``{confusable}``。兩者皆以 MCP 工具(``ac_confusable_scan`` / ``ac_confusable_compare``)以及
+Script Builder 中 **Data** 分類下的命令提供。
diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst
@@ -131,6 +131,7 @@ AutoControl 所有功能的完整使用指南。
    doc/new_features/v106_features_doc
    doc/new_features/v107_features_doc
    doc/new_features/v108_features_doc
+   doc/new_features/v109_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
@@ -218,6 +218,11 @@
     collation_key, sort_strings,
 )
 from je_auto_control.utils.locale_collation import compare as collation_compare
+# Confusable / homoglyph detection (Unicode-spoofing skeletons)
+from je_auto_control.utils.confusables import (
+    detect_homoglyphs, is_confusable, is_mixed_script, scripts_of,
+)
+from je_auto_control.utils.confusables import skeleton as confusable_skeleton
 # CI workflow annotations (GitHub Actions)
 from je_auto_control.utils.ci_annotations import (
     emit_annotations, format_annotation,
@@ -951,6 +956,11 @@ def start_autocontrol_gui(*args, **kwargs):
     "collation_key",
     "collation_compare",
     "sort_strings",
+    "confusable_skeleton",
+    "detect_homoglyphs",
+    "is_confusable",
+    "is_mixed_script",
+    "scripts_of",
     "emit_annotations", "format_annotation",
     "ClipboardHistory", "default_clipboard_history",
     "analyze_heal_log", "heal_stats", "scan_secrets",

diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py
@@ -2090,6 +2090,21 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None:
         ),
         description="Locale-aware compare; returns order -1/0/1.",
     ))
+    specs.append(CommandSpec(
+        "AC_confusable_scan", "Data", "Text: Confusable Scan",
+        fields=(
+            FieldSpec("text", FieldType.STRING, placeholder="pаypal.com"),
+        ),
+        description="Homoglyph / mixed-script spoofing report for a string.",
+    ))
+    specs.append(CommandSpec(
+        "AC_confusable_compare", "Data", "Text: Confusable Compare",
+        fields=(
+            FieldSpec("first", FieldType.STRING, placeholder="paypal"),
+            FieldSpec("second", FieldType.STRING, placeholder="pаypal"),
+        ),
+        description="Whether two strings share the same confusable skeleton.",
+    ))
     specs.append(CommandSpec(
         "AC_diff_rows", "Data", "Dataset Diff: Rows by Key",
         fields=(

diff --git a/je_auto_control/utils/confusables/__init__.py b/je_auto_control/utils/confusables/__init__.py
@@ -0,0 +1,9 @@
+"""Confusable / homoglyph detection (Unicode-spoofing skeletons)."""
+from je_auto_control.utils.confusables.confusables import (
+    detect_homoglyphs, is_confusable, is_mixed_script, scripts_of, skeleton,
+)
+
+__all__ = [
+    "detect_homoglyphs", "is_confusable", "is_mixed_script", "scripts_of",
+    "skeleton",
+]
diff --git a/je_auto_control/utils/confusables/confusables.py b/je_auto_control/utils/confusables/confusables.py
@@ -0,0 +1,103 @@
+"""Confusable / homoglyph detection (Unicode-spoofing skeletons + mixed script).
+
+``secrets_scan`` finds secret-shaped tokens and ``guardrail`` screens text for
+prompt injection, but nothing catches *visual* spoofing: a Cyrillic ``"а"``
+(U+0430) is pixel-for-pixel a Latin ``"a"`` (U+0061), so ``"pаypal"`` (with a
+Cyrillic ``а``) reads as ``"paypal"`` to a human yet compares unequal — the basis
+of IDN-homograph phishing and lookalike UI labels.
+
+Following the idea of Unicode TR39, this folds confusable characters to a
+prototype *skeleton* (two strings are confusable when their skeletons match) and
+flags strings that mix scripts (Latin + Cyrillic). Pure standard library
+(``unicodedata``); imports no ``PySide6``. Every function is pure, so it is fully
+deterministic in CI.
+"""
+import unicodedata
+from typing import Dict, List, Set, Tuple
+
+# Cross-script homoglyphs that NFKC does not fold. Maps each lookalike to its
+# Latin/ASCII prototype. (Fullwidth, math-alphanumerics, etc. are handled by the
+# NFKC pass in ``skeleton`` and need no entry here.)
+_CONFUSABLES: Dict[str, str] = {
+    # Cyrillic lowercase
+    "а": "a", "е": "e", "о": "o", "р": "p", "с": "c", "у": "y", "х": "x",
+    "і": "i", "ј": "j", "ѕ": "s", "ԁ": "d", "һ": "h", "ѵ": "v", "ԛ": "q",
+    "ԝ": "w", "ё": "e", "г": "r", "п": "n",
+    # Cyrillic uppercase
+    "А": "A", "В": "B", "Е": "E", "К": "K", "М": "M", "Н": "H", "О": "O",
+    "Р": "P", "С": "C", "Т": "T", "Х": "X", "І": "I", "Ј": "J", "Ѕ": "S",
+    "У": "Y", "Ԛ": "Q", "Ԝ": "W", "Г": "r",
+    # Greek lowercase
+    "ο": "o", "α": "a", "ν": "v", "ρ": "p", "ε": "e", "ι": "i", "κ": "k",
+    "μ": "u", "τ": "t", "υ": "u", "χ": "x", "γ": "y",
+    # Greek uppercase
+    "Α": "A", "Β": "B", "Ε": "E", "Ζ": "Z", "Η": "H", "Ι": "I", "Κ": "K",
+    "Μ": "M", "Ν": "N", "Ο": "O", "Ρ": "P", "Τ": "T", "Υ": "Y", "Χ": "X",
+}
+
+# Script blocks for mixed-script detection. Ranges are inclusive; characters
+# outside every range (digits, punctuation, spaces, symbols) count as COMMON and
+# are ignored when deciding whether scripts are mixed.
+_SCRIPT_RANGES: Tuple[Tuple[int, int, str], ...] = (
+    (0x0041, 0x005A, "LATIN"), (0x0061, 0x007A, "LATIN"),
+    (0x00C0, 0x024F, "LATIN"), (0x1E00, 0x1EFF, "LATIN"),
+    (0x0370, 0x03FF, "GREEK"), (0x1F00, 0x1FFF, "GREEK"),
+    (0x0400, 0x052F, "CYRILLIC"),
+    (0x0530, 0x058F, "ARMENIAN"),
+    (0x0590, 0x05FF, "HEBREW"),
+    (0x0600, 0x06FF, "ARABIC"),
+    (0x3040, 0x309F, "HIRAGANA"), (0x30A0, 0x30FF, "KATAKANA"),
+    (0x3400, 0x9FFF, "HAN"), (0xAC00, 0xD7AF, "HANGUL"),
+)
+
+
+def _script_of(char: str) -> str:
+    """Return the script name of a character (``COMMON`` if not a letter)."""
+    code = ord(char)
+    for start, end, name in _SCRIPT_RANGES:
+        if start <= code <= end:
+            return name
+    return "COMMON"
+
+
+def skeleton(text: str) -> str:
+    """Return the confusable skeleton of ``text`` (TR39-style).
+
+    NFKC-normalises (folding fullwidth, ligatures, math alphanumerics), then maps
+    each remaining cross-script homoglyph to its Latin prototype. Two strings are
+    confusable exactly when their skeletons are equal.
+    """
+    normalised = unicodedata.normalize("NFKC", text or "")
+    return "".join(_CONFUSABLES.get(char, char) for char in normalised)
+
+
+def is_confusable(first: str, second: str) -> bool:
+    """Whether two *distinct* strings render to the same skeleton."""
+    return first != second and skeleton(first) == skeleton(second)
+
+
+def detect_homoglyphs(text: str) -> List[Dict[str, object]]:
+    """List the confusable characters in ``text``.
+
+    Each entry is ``{index, char, prototype}`` for a character whose skeleton
+    differs from itself (i.e. a cross-script lookalike).
+    """
+    findings: List[Dict[str, object]] = []
+    for index, char in enumerate(unicodedata.normalize("NFKC", text or "")):
+        prototype = _CONFUSABLES.get(char)
+        if prototype is not None:
+            findings.append({"index": index, "char": char,
+                             "prototype": prototype})
+    return findings
+
+
+def scripts_of(text: str) -> Set[str]:
+    """Return the set of (non-common) scripts present in ``text``."""
+    scripts = {_script_of(char) for char in text or ""}
+    scripts.discard("COMMON")
+    return scripts
+
+
+def is_mixed_script(text: str) -> bool:
+    """Whether ``text`` mixes more than one script (a spoofing red flag)."""
+    return len(scripts_of(text)) > 1
diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py
@@ -2976,6 +2976,23 @@ def _collation_compare(first: str, second: str, strength: str = "tertiary",
                              tailoring=tailoring or None)}
 
 
+def _confusable_scan(text: str) -> Dict[str, Any]:
+    """Adapter: homoglyph / mixed-script spoofing report for a string."""
+    from je_auto_control.utils.confusables import (
+        detect_homoglyphs, is_mixed_script, scripts_of, skeleton,
+    )
+    return {"skeleton": skeleton(text),
+            "homoglyphs": detect_homoglyphs(text),
+            "mixed_script": is_mixed_script(text),
+            "scripts": sorted(scripts_of(text))}
+
+
+def _confusable_compare(first: str, second: str) -> Dict[str, Any]:
+    """Adapter: whether two strings render to the same skeleton."""
+    from je_auto_control.utils.confusables import is_confusable
+    return {"confusable": is_confusable(first, second)}
+
+
 def _cas_put(name: str, key: str, value: Any,
              expected_version: Any = None) -> Dict[str, Any]:
     """Adapter: optimistic put into a named versioned store."""
@@ -4660,6 +4677,8 @@ def __init__(self):
             "AC_outbox_pending": _outbox_pending,
             "AC_collation_sort": _collation_sort,
             "AC_collation_compare": _collation_compare,
+            "AC_confusable_scan": _confusable_scan,
+            "AC_confusable_compare": _confusable_compare,
             "AC_detect_drift": _detect_drift,
             "AC_categorical_drift": _categorical_drift,
             "AC_diff_rows": _diff_rows,

diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py
@@ -3630,6 +3630,29 @@ def locale_collation_tools() -> List[MCPTool]:
     ]
 
 
+def confusables_tools() -> List[MCPTool]:
+    return [
+        MCPTool(
+            name="ac_confusable_scan",
+            description=("Homoglyph / mixed-script spoofing report for 'text'. "
+                         "Returns {skeleton, homoglyphs, mixed_script, scripts}."),
+            input_schema=schema({"text": {"type": "string"}}, ["text"]),
+            handler=h.confusable_scan,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_confusable_compare",
+            description=("Whether 'first' and 'second' render to the same "
+                         "confusable skeleton. Returns {confusable}."),
+            input_schema=schema(
+                {"first": {"type": "string"}, "second": {"type": "string"}},
+                ["first", "second"]),
+            handler=h.confusable_compare,
+            annotations=READ_ONLY,
+        ),
+    ]
+
+
 def sequence_gap_tools() -> List[MCPTool]:
     return [
         MCPTool(
@@ -5670,7 +5693,7 @@ def media_assert_tools() -> List[MCPTool]:
     sse_client_tools, layered_config_tools, data_drift_tools, schema_compat_tools,
     timeseries_tools, anomaly_tools, smoothing_tools, idempotency_tools,
     dedup_window_tools, sequence_gap_tools, optimistic_tools, outbox_tools,
-    locale_collation_tools,
+    locale_collation_tools, confusables_tools,
     dataset_diff_tools, referential_tools, link_header_tools, multipart_tools,
     http_content_tools, cookie_jar_tools, http_conditional_tools,
     saga_tools, decision_table_tools, locator_repair_tools,