Integration-Automation · JE-Chen · Jun 21, 2026 · Jun 21, 2026
diff --git a/README.md b/README.md
@@ -13,6 +13,7 @@
 
 ## Table of Contents
 
+- [What's new (2026-06-22) — Unicode Text Normalisation & Slugify](#whats-new-2026-06-22--unicode-text-normalisation--slugify)
 - [What's new (2026-06-22) — JSON-Schema Compatibility Checking](#whats-new-2026-06-22--json-schema-compatibility-checking)
 - [What's new (2026-06-22) — Typed Configuration Schema](#whats-new-2026-06-22--typed-configuration-schema)
 - [What's new (2026-06-22) — OTLP/JSON Span Export](#whats-new-2026-06-22--otlpjson-span-export)
@@ -149,6 +150,12 @@
 
 ---
 
+## What's new (2026-06-22) — Unicode Text Normalisation & Slugify
+
+Canonicalize text before fuzzy/search/OCR matching. Full reference: [`docs/source/Eng/doc/new_features/v97_features_doc.rst`](docs/source/Eng/doc/new_features/v97_features_doc.rst).
+
+- **`normalize_text` / `deaccent` / `slugify` / `normalize_quotes` / `fold_whitespace`** (`AC_normalize_text`, `AC_slugify`): `fuzzy` and `search_index.tokenize` only lowercase and OCR matching only `.lower()`+substring, so `"Café"` (NFC) vs `"Café"` (NFD) vs `"cafe"` compare unequal. This adds the missing canonicalization layer (NFKC + casefold + whitespace fold, accent stripping, smart-quote mapping, ASCII slugs). Pure-stdlib (`unicodedata`), deterministic.
+
 ## What's new (2026-06-22) — JSON-Schema Compatibility Checking
 
 Classify schema changes as backward/forward/full. Full reference: [`docs/source/Eng/doc/new_features/v96_features_doc.rst`](docs/source/Eng/doc/new_features/v96_features_doc.rst).

diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md
@@ -12,6 +12,7 @@
 
 ## 目录
 
+- [本次更新 (2026-06-22) — Unicode 文本规范化与 Slug](#本次更新-2026-06-22--unicode-文本规范化与-slug)
 - [本次更新 (2026-06-22) — JSON-Schema 兼容性检查](#本次更新-2026-06-22--json-schema-兼容性检查)
 - [本次更新 (2026-06-22) — 具类型的配置结构](#本次更新-2026-06-22--具类型的配置结构)
 - [本次更新 (2026-06-22) — OTLP/JSON Span 导出](#本次更新-2026-06-22--otlpjson-span-导出)
@@ -148,6 +149,12 @@
 
 ---
 
+## 本次更新 (2026-06-22) — Unicode 文本规范化与 Slug
+
+在 fuzzy/search/OCR 匹配前规范化文本。完整参考:[`docs/source/Zh/doc/new_features/v97_features_doc.rst`](../docs/source/Zh/doc/new_features/v97_features_doc.rst)。
+
+- **`normalize_text` / `deaccent` / `slugify` / `normalize_quotes` / `fold_whitespace`**(`AC_normalize_text`、`AC_slugify`):`fuzzy` 与 `search_index.tokenize` 只做小写,OCR 匹配只做 `.lower()`+子串,因此 `"Café"`(NFC)、`"Café"`(NFD)、`"cafe"` 会匹配不相等。本功能补上缺少的规范化层(NFKC + casefold + 空白折叠、去重音、智能引号映射、ASCII slug)。纯标准库(`unicodedata`)、确定。
+
 ## 本次更新 (2026-06-22) — JSON-Schema 兼容性检查
 
 把结构变更分类为 backward/forward/full。完整参考:[`docs/source/Zh/doc/new_features/v96_features_doc.rst`](../docs/source/Zh/doc/new_features/v96_features_doc.rst)。

diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md
@@ -12,6 +12,7 @@
 
 ## 目錄
 
+- [本次更新 (2026-06-22) — Unicode 文字正規化與 Slug](#本次更新-2026-06-22--unicode-文字正規化與-slug)
 - [本次更新 (2026-06-22) — JSON-Schema 相容性檢查](#本次更新-2026-06-22--json-schema-相容性檢查)
 - [本次更新 (2026-06-22) — 具型別的設定結構](#本次更新-2026-06-22--具型別的設定結構)
 - [本次更新 (2026-06-22) — OTLP/JSON Span 匯出](#本次更新-2026-06-22--otlpjson-span-匯出)
@@ -148,6 +149,12 @@
 
 ---
 
+## 本次更新 (2026-06-22) — Unicode 文字正規化與 Slug
+
+在 fuzzy/search/OCR 比對前正規化文字。完整參考:[`docs/source/Zh/doc/new_features/v97_features_doc.rst`](../docs/source/Zh/doc/new_features/v97_features_doc.rst)。
+
+- **`normalize_text` / `deaccent` / `slugify` / `normalize_quotes` / `fold_whitespace`**(`AC_normalize_text`、`AC_slugify`):`fuzzy` 與 `search_index.tokenize` 只做小寫,OCR 比對只做 `.lower()`+子字串,因此 `"Café"`(NFC)、`"Café"`(NFD)、`"cafe"` 會比對不相等。本功能補上缺少的正規化層(NFKC + casefold + 空白折疊、去重音、智慧引號對應、ASCII slug)。純標準函式庫(`unicodedata`)、具決定性。
+
 ## 本次更新 (2026-06-22) — JSON-Schema 相容性檢查
 
 把結構變更分類為 backward/forward/full。完整參考:[`docs/source/Zh/doc/new_features/v96_features_doc.rst`](../docs/source/Zh/doc/new_features/v96_features_doc.rst)。

diff --git a/docs/source/Eng/doc/new_features/v97_features_doc.rst b/docs/source/Eng/doc/new_features/v97_features_doc.rst
@@ -0,0 +1,40 @@
+Unicode Text Normalisation & Slugify
+====================================
+
+``fuzzy`` and ``search_index.tokenize`` only lowercase, and OCR
+``find_text_matches`` only ``.lower()`` + substring — so ``"Café"`` (NFC) versus
+``"Café"`` (NFD) versus OCR ``"cafe"`` compare unequal. This adds the
+canonicalisation layer they should run before matching.
+
+Pure standard library (``unicodedata`` / ``re``); imports no ``PySide6``. Every
+function is pure (text in, text out), so it is fully deterministic in CI.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import (
+        normalize_text, deaccent, slugify, normalize_quotes, fold_whitespace,
+    )
+
+    normalize_text("CAFÉ  Menu")          # "café menu"  (NFKC + casefold + ws)
+    deaccent("résumé")                     # "resume"
+    slugify("Café Menu! 2026")             # "cafe-menu-2026"
+    normalize_quotes("“Hi” — it’s…")       # '"Hi" - it\'s...'
+
+``normalize_text`` applies a Unicode ``form`` (default ``NFKC``), optional
+casefolding, and whitespace folding, so the same text in different code-point
+forms compares equal. ``deaccent`` strips combining marks; ``fold_whitespace``
+collapses runs to single spaces; ``normalize_quotes`` maps smart quotes, dashes,
+ellipsis and NBSP to ASCII; ``slugify`` produces an ASCII slug (de-accent,
+lowercase, join alphanumeric runs with a separator). Run ``normalize_text``
+before fuzzy/search/OCR matching to make matches accent- and form-insensitive.
+
+Executor commands
+-----------------
+
+``AC_normalize_text`` returns ``{text}`` (with optional ``form`` / ``casefold``
+/ ``collapse_ws``); ``AC_slugify`` returns ``{slug}``. Both are exposed as MCP
+tools (``ac_normalize_text`` / ``ac_slugify``) and as Script Builder commands
+under **Data**.
diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst
@@ -119,6 +119,7 @@ Comprehensive guides for all AutoControl features.
    doc/new_features/v94_features_doc
    doc/new_features/v95_features_doc
    doc/new_features/v96_features_doc
+   doc/new_features/v97_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/docs/source/Zh/doc/new_features/v97_features_doc.rst b/docs/source/Zh/doc/new_features/v97_features_doc.rst
@@ -0,0 +1,35 @@
+Unicode 文字正規化與 Slug
+========================
+
+``fuzzy`` 與 ``search_index.tokenize`` 只做小寫化,OCR ``find_text_matches`` 只做 ``.lower()`` + 子字串
+比對 —— 因此 ``"Café"``(NFC)、``"Café"``(NFD)與 OCR 的 ``"cafe"`` 會比對不相等。本功能補上它們在比對
+前應執行的正規化層。
+
+純標準函式庫(``unicodedata`` / ``re``);不匯入 ``PySide6``。每個函式皆為純函式(輸入文字、輸出文字),
+因此在 CI 中完全具決定性。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import (
+        normalize_text, deaccent, slugify, normalize_quotes, fold_whitespace,
+    )
+
+    normalize_text("CAFÉ  Menu")          # "café menu"(NFKC + casefold + 空白)
+    deaccent("résumé")                     # "resume"
+    slugify("Café Menu! 2026")             # "cafe-menu-2026"
+    normalize_quotes("“Hi” — it’s…")       # '"Hi" - it\'s...'
+
+``normalize_text`` 套用 Unicode ``form``(預設 ``NFKC``)、選用 casefold 與空白折疊,讓不同碼點形式的相同
+文字比對相等。``deaccent`` 去除組合附加符號;``fold_whitespace`` 把連續空白收成單一空格;``normalize_quotes``
+把智慧引號、破折號、省略號與 NBSP 對應成 ASCII;``slugify`` 產生 ASCII slug(去重音、小寫、以分隔符連接
+英數段)。在 fuzzy/search/OCR 比對前先執行 ``normalize_text`` 可讓比對對重音與形式不敏感。
+
+執行器命令
+----------
+
+``AC_normalize_text`` 回傳 ``{text}``(可選 ``form`` / ``casefold`` / ``collapse_ws``);``AC_slugify`` 回傳
+``{slug}``。兩者皆以 MCP 工具(``ac_normalize_text`` / ``ac_slugify``)以及 Script Builder 中 **Data** 分類下
+的命令提供。
diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst
@@ -119,6 +119,7 @@ AutoControl 所有功能的完整使用指南。
    doc/new_features/v94_features_doc
    doc/new_features/v95_features_doc
    doc/new_features/v96_features_doc
+   doc/new_features/v97_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
@@ -253,6 +253,10 @@
 from je_auto_control.utils.fuzzy import (
     fuzzy_best_match, fuzzy_dedupe, fuzzy_matches, fuzzy_ratio,
 )
+# Unicode text normalisation + slugify (canonicalise before matching)
+from je_auto_control.utils.text_normalize import (
+    deaccent, fold_whitespace, normalize_quotes, normalize_text, slugify,
+)
 # S3-compatible artifact store (optional boto3, injectable client)
 from je_auto_control.utils.artifact_store import (
     S3ArtifactStore, configure_default_store, get_default_store,
@@ -917,6 +921,8 @@ def start_autocontrol_gui(*args, **kwargs):
     "VideoStep", "build_overlay_plan", "render_overlay_frame",
     "write_step_video",
     "fuzzy_best_match", "fuzzy_dedupe", "fuzzy_matches", "fuzzy_ratio",
+    "deaccent", "fold_whitespace", "normalize_quotes", "normalize_text",
+    "slugify",
     "S3ArtifactStore", "configure_default_store", "get_default_store",
     "set_default_store",
     "average_hash", "dedupe_images", "dhash", "hamming_distance",

diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py
@@ -1648,6 +1648,26 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None:
         ),
         description="Build a canonical wide-event log line (rendered as JSON).",
     ))
+    specs.append(CommandSpec(
+        "AC_normalize_text", "Data", "Text: Normalize (Unicode)",
+        fields=(
+            FieldSpec("text", FieldType.STRING, placeholder="Café  Menu"),
+            FieldSpec("form", FieldType.STRING, optional=True,
+                      placeholder="NFKC"),
+            FieldSpec("casefold", FieldType.BOOL, optional=True, default=True),
+            FieldSpec("collapse_ws", FieldType.BOOL, optional=True,
+                      default=True),
+        ),
+        description="Unicode-normalise text (form + casefold + ws fold).",
+    ))
+    specs.append(CommandSpec(
+        "AC_slugify", "Data", "Text: Slugify",
+        fields=(
+            FieldSpec("text", FieldType.STRING, placeholder="Café Menu!"),
+            FieldSpec("sep", FieldType.STRING, optional=True, placeholder="-"),
+        ),
+        description="Produce an ASCII slug (de-accent, lowercase, join).",
+    ))
     specs.append(CommandSpec(
         "AC_spans_to_otlp", "Report", "OTLP: Export Spans",
         fields=(

diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py
@@ -3138,6 +3138,20 @@ def _baggage_parse(header: str) -> Dict[str, Any]:
     return {"items": parse_baggage(header).to_dict()}
 
 
+def _normalize_text(text: str, form: str = "NFKC", casefold: Any = True,
+                    collapse_ws: Any = True) -> Dict[str, Any]:
+    """Adapter: Unicode-normalise text into {text}."""
+    from je_auto_control.utils.text_normalize import normalize_text
+    return {"text": normalize_text(text, form=form, casefold=bool(casefold),
+                                   collapse_ws=bool(collapse_ws))}
+
+
+def _slugify(text: str, sep: str = "-") -> Dict[str, Any]:
+    """Adapter: produce an ASCII slug from text."""
+    from je_auto_control.utils.text_normalize import slugify
+    return {"slug": slugify(text, sep=sep)}
+
+
 def _canonical_log(fields: Any) -> Dict[str, Any]:
     """Adapter: build a canonical log line from a fields dict."""
     import json
@@ -4424,6 +4438,8 @@ def __init__(self):
             "AC_baggage_format": _baggage_format,
             "AC_canonical_log": _canonical_log,
             "AC_spans_to_otlp": _spans_to_otlp,
+            "AC_normalize_text": _normalize_text,
+            "AC_slugify": _slugify,
             "AC_validate_config": _validate_config,
             "AC_resolve_ref": _resolve_ref,
             "AC_resolve_refs": _resolve_refs,

diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py
@@ -3782,6 +3782,35 @@ def otlp_export_tools() -> List[MCPTool]:
     ]
 
 
+def text_normalize_tools() -> List[MCPTool]:
+    return [
+        MCPTool(
+            name="ac_normalize_text",
+            description=("Unicode-normalise 'text' (form NFKC/NFC/..., casefold, "
+                         "collapse whitespace) for robust matching. Returns "
+                         "{text}."),
+            input_schema=schema(
+                {"text": {"type": "string"}, "form": {"type": "string"},
+                 "casefold": {"type": "boolean"},
+                 "collapse_ws": {"type": "boolean"}},
+                ["text"]),
+            handler=h.normalize_text,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_slugify",
+            description=("Produce an ASCII slug from 'text' (de-accent, "
+                         "lowercase, join alnum runs with 'sep'). Returns "
+                         "{slug}."),
+            input_schema=schema(
+                {"text": {"type": "string"}, "sep": {"type": "string"}},
+                ["text"]),
+            handler=h.slugify,
+            annotations=READ_ONLY,
+        ),
+    ]
+
+
 def canonical_log_tools() -> List[MCPTool]:
     return [
         MCPTool(
@@ -5378,6 +5407,7 @@ def media_assert_tools() -> List[MCPTool]:
     feature_flag_tools, provenance_tools, json_contract_tools, chaos_tools,
     slo_tools, percentiles_tools, bulkhead_tools, http_cassette_tools,
     trace_context_tools, baggage_tools, canonical_log_tools, otlp_export_tools,
+    text_normalize_tools,
     secret_ref_tools, config_schema_tools, config_redaction_tools,
     data_profile_tools, http_problem_tools, dotenv_tools,
     sse_client_tools, layered_config_tools, data_drift_tools, schema_compat_tools,

diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py
@@ -1736,6 +1736,16 @@ def baggage_format(items):
     return _baggage_format(items)
 
 
+def normalize_text(text, form="NFKC", casefold=True, collapse_ws=True):
+    from je_auto_control.utils.executor.action_executor import _normalize_text
+    return _normalize_text(text, form, casefold, collapse_ws)
+
+
+def slugify(text, sep="-"):
+    from je_auto_control.utils.executor.action_executor import _slugify
+    return _slugify(text, sep)
+
+
 def canonical_log(fields):
     from je_auto_control.utils.executor.action_executor import _canonical_log
     return _canonical_log(fields)

diff --git a/je_auto_control/utils/text_normalize/__init__.py b/je_auto_control/utils/text_normalize/__init__.py
@@ -0,0 +1,9 @@
+"""Unicode text normalisation and slug generation for AutoControl."""
+from je_auto_control.utils.text_normalize.text_normalize import (
+    deaccent, fold_whitespace, normalize_quotes, normalize_text, slugify,
+)
+
+__all__ = [
+    "deaccent", "fold_whitespace", "normalize_quotes", "normalize_text",
+    "slugify",
+]
diff --git a/je_auto_control/utils/text_normalize/text_normalize.py b/je_auto_control/utils/text_normalize/text_normalize.py
@@ -0,0 +1,54 @@
+"""Unicode text normalisation and slug generation for robust text matching.
+
+``fuzzy`` and ``search_index.tokenize`` only lowercase, and OCR
+``find_text_matches`` only ``.lower()`` + substring — so ``"Café"`` (NFC) versus
+``"Café"`` (NFD) versus OCR ``"cafe"`` compare unequal. This is the
+canonicalisation layer they should run before matching.
+
+Pure standard library (``unicodedata`` / ``re``); imports no ``PySide6``. Every
+function is pure (text in, text out), so it is fully deterministic in CI.
+"""
+import re
+import unicodedata
+
+_QUOTE_MAP = {
+    "‘": "'", "’": "'", "‚": "'", "‛": "'",
+    "“": '"', "”": '"', "„": '"', "‟": '"',
+    "–": "-", "—": "-", "−": "-", "…": "...",
+    " ": " ",
+}
+_QUOTE_TABLE = str.maketrans(_QUOTE_MAP)
+
+
+def fold_whitespace(text: str) -> str:
+    """Collapse runs of whitespace to single spaces and strip the ends."""
+    return " ".join((text or "").split())
+
+
+def deaccent(text: str) -> str:
+    """Strip combining diacritical marks (``café`` -> ``cafe``)."""
+    decomposed = unicodedata.normalize("NFD", text or "")
+    return "".join(ch for ch in decomposed if not unicodedata.combining(ch))
+
+
+def normalize_quotes(text: str) -> str:
+    """Replace smart quotes, dashes, ellipsis and NBSP with ASCII equivalents."""
+    return (text or "").translate(_QUOTE_TABLE)
+
+
+def normalize_text(text: str, *, form: str = "NFKC", casefold: bool = True,
+                   collapse_ws: bool = True) -> str:
+    """Canonicalise ``text``: Unicode ``form``, optional casefold + ws fold."""
+    result = unicodedata.normalize(form, text or "")
+    if casefold:
+        result = result.casefold()
+    if collapse_ws:
+        result = fold_whitespace(result)
+    return result
+
+
+def slugify(text: str, *, sep: str = "-") -> str:
+    """Produce an ASCII slug: de-accent, lowercase, join alnum runs with ``sep``."""
+    base = deaccent(unicodedata.normalize("NFKC", text or "")).lower()
+    slug = re.sub(r"[^a-z0-9]+", sep, base)
+    return slug.strip(sep) if sep else slug