Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-22) — Readability Scoring](#whats-new-2026-06-22--readability-scoring)
- [What's new (2026-06-22) — Confusable / Homoglyph Detection](#whats-new-2026-06-22--confusable--homoglyph-detection)
- [What's new (2026-06-22) — Locale-Aware String Collation](#whats-new-2026-06-22--locale-aware-string-collation)
- [What's new (2026-06-22) — Transactional Outbox](#whats-new-2026-06-22--transactional-outbox)
Expand Down Expand Up @@ -162,6 +163,12 @@

---

## What's new (2026-06-22) — Readability Scoring

Score how hard text is to read; gate generated copy on a reading grade. Full reference: [`docs/source/Eng/doc/new_features/v110_features_doc.rst`](docs/source/Eng/doc/new_features/v110_features_doc.rst).

- **`flesch_reading_ease` / `flesch_kincaid_grade` / `gunning_fog` / `smog_index` / `automated_readability_index` / `readability_report` / `readability_stats` / `count_syllables`** (`AC_readability_report`): the text utilities canonicalise, match and rank text but never scored *difficulty*. This adds the classic English readability formulae over a deterministic tokeniser and syllable heuristic, so a test can assert an on-screen message or label stays within a target reading grade. Pure-stdlib (`re`/`math`), deterministic.

## What's new (2026-06-22) — Confusable / Homoglyph Detection

Catch Unicode visual spoofing (IDN-homograph phishing, lookalike labels). Full reference: [`docs/source/Eng/doc/new_features/v109_features_doc.rst`](docs/source/Eng/doc/new_features/v109_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-22) — 可读性评分](#本次更新-2026-06-22--可读性评分)
- [本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测](#本次更新-2026-06-22--易混淆字符--同形异义字检测)
- [本次更新 (2026-06-22) — 区域感知字符串排序](#本次更新-2026-06-22--区域感知字符串排序)
- [本次更新 (2026-06-22) — 事务型 Outbox](#本次更新-2026-06-22--事务型-outbox)
Expand Down Expand Up @@ -165,6 +166,12 @@

平滑噪声值序列。完整参考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。

## 本次更新 (2026-06-22) — 可读性评分

评估文字有多难读;以阅读年级把关生成的文案。完整参考:[`docs/source/Zh/doc/new_features/v110_features_doc.rst`](../docs/source/Zh/doc/new_features/v110_features_doc.rst)。

- **`flesch_reading_ease` / `flesch_kincaid_grade` / `gunning_fog` / `smog_index` / `automated_readability_index` / `readability_report` / `readability_stats` / `count_syllables`**(`AC_readability_report`):文字工具能正规化、比对与排名文字,却从未评估*难度*。本功能在确定性分词器与音节启发式之上加入经典英文可读性公式,让测试能断言画面消息或标签落在目标阅读年级内。纯标准库(`re`/`math`)、确定。

## 本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测

抓出 Unicode 视觉仿冒(IDN 同形异义字钓鱼、仿冒标签)。完整参考:[`docs/source/Zh/doc/new_features/v109_features_doc.rst`](../docs/source/Zh/doc/new_features/v109_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-22) — 可讀性評分](#本次更新-2026-06-22--可讀性評分)
- [本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測](#本次更新-2026-06-22--易混淆字元--同形異義字偵測)
- [本次更新 (2026-06-22) — 地區感知字串排序](#本次更新-2026-06-22--地區感知字串排序)
- [本次更新 (2026-06-22) — 交易型 Outbox](#本次更新-2026-06-22--交易型-outbox)
Expand Down Expand Up @@ -165,6 +166,12 @@

平滑雜訊值序列。完整參考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。

## 本次更新 (2026-06-22) — 可讀性評分

評估文字有多難讀;以閱讀年級把關產生的文案。完整參考:[`docs/source/Zh/doc/new_features/v110_features_doc.rst`](../docs/source/Zh/doc/new_features/v110_features_doc.rst)。

- **`flesch_reading_ease` / `flesch_kincaid_grade` / `gunning_fog` / `smog_index` / `automated_readability_index` / `readability_report` / `readability_stats` / `count_syllables`**(`AC_readability_report`):文字工具能正規化、比對與排名文字,卻從未評估*難度*。本功能在決定性斷詞器與音節啟發式之上加入經典英文可讀性公式,讓測試能斷言畫面訊息或標籤落在目標閱讀年級內。純標準函式庫(`re`/`math`)、具決定性。

## 本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測

抓出 Unicode 視覺仿冒(IDN 同形異義字釣魚、仿冒標籤)。完整參考:[`docs/source/Zh/doc/new_features/v109_features_doc.rst`](../docs/source/Zh/doc/new_features/v109_features_doc.rst)。
Expand Down
45 changes: 45 additions & 0 deletions docs/source/Eng/doc/new_features/v110_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Readability Scoring
===================

The text utilities canonicalise (``text_normalize``), match (``text_similarity``,
``fuzzy``) and rank (``search_index``) text, but nothing scores *how hard it is
to read*. There was no way to assert that an on-screen message, a generated label
or a doc string stays within a target reading grade. This adds the classic
English readability formulae over a deterministic tokeniser and syllable
heuristic.

Pure standard library (``re`` / ``math``); imports no ``PySide6``. Every function
is pure (text in, number/report out), so it is fully deterministic in CI.

Headless API
------------

.. code-block:: python

from je_auto_control import (
flesch_reading_ease, flesch_kincaid_grade, gunning_fog, smog_index,
automated_readability_index, readability_report, readability_stats,
count_syllables,
)

flesch_reading_ease("The cat sat on the mat.") # ~116 (very easy)
flesch_kincaid_grade(marketing_copy) # US grade level
readability_report(text) # every metric + counts

# gate generated UI copy on a reading grade
assert flesch_kincaid_grade(label) <= 8

``readability_stats`` returns the raw counts (``words``, ``sentences``,
``syllables``, ``characters``, ``complex_words``) shared by every formula.
``flesch_reading_ease`` is higher-is-easier (~0-100 for normal prose); the others
(Flesch-Kincaid, Gunning Fog, SMOG, ARI) return a US grade level. ``count_syllables``
is the heuristic vowel-group counter (with silent-``e`` and consonant-``le``
handling) the formulae build on. ``readability_report`` bundles all five metrics
plus the stats into one dict.

Executor commands
-----------------

``AC_readability_report`` returns the full report (all five metrics plus counts)
for a string. It is exposed as the MCP tool ``ac_readability_report`` and as a
Script Builder command under **Data**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v107_features_doc
doc/new_features/v108_features_doc
doc/new_features/v109_features_doc
doc/new_features/v110_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
38 changes: 38 additions & 0 deletions docs/source/Zh/doc/new_features/v110_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
可讀性評分
==========

文字工具能正規化(``text_normalize``)、比對(``text_similarity``、``fuzzy``)與排名(``search_index``)文字,
但沒有任何功能能評估文字*有多難讀*。先前無法斷言畫面訊息、產生的標籤或文件字串是否落在目標閱讀年級內。
本功能在決定性的斷詞器與音節啟發式之上,加入經典的英文可讀性公式。

純標準函式庫(``re`` / ``math``);不匯入 ``PySide6``。每個函式皆為純函式(文字進、數字/報告出),因此在 CI 中
完全具決定性。

無頭 API
--------

.. code-block:: python

from je_auto_control import (
flesch_reading_ease, flesch_kincaid_grade, gunning_fog, smog_index,
automated_readability_index, readability_report, readability_stats,
count_syllables,
)

flesch_reading_ease("The cat sat on the mat.") # ~116(非常易讀)
flesch_kincaid_grade(marketing_copy) # 美國年級
readability_report(text) # 所有指標 + 計數

# 以閱讀年級把關產生的 UI 文案
assert flesch_kincaid_grade(label) <= 8

``readability_stats`` 回傳每個公式共用的原始計數(``words``、``sentences``、``syllables``、``characters``、
``complex_words``)。``flesch_reading_ease`` 為愈高愈易讀(一般文章約 0-100);其餘(Flesch-Kincaid、
Gunning Fog、SMOG、ARI)回傳美國年級。``count_syllables`` 是公式所依據的啟發式母音群計數器(含無聲 ``e`` 與
子音 + ``le`` 處理)。``readability_report`` 將五個指標連同計數打包成一個 dict。

執行器命令
----------

``AC_readability_report`` 對單一字串回傳完整報告(五個指標加計數)。它以 MCP 工具 ``ac_readability_report``
以及 Script Builder 中 **Data** 分類下的命令提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v107_features_doc
doc/new_features/v108_features_doc
doc/new_features/v109_features_doc
doc/new_features/v110_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
14 changes: 14 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,12 @@
detect_homoglyphs, is_confusable, is_mixed_script, scripts_of,
)
from je_auto_control.utils.confusables import skeleton as confusable_skeleton
# Readability scoring (Flesch / Flesch-Kincaid / Gunning Fog / SMOG / ARI)
from je_auto_control.utils.readability import (
automated_readability_index, count_syllables, flesch_kincaid_grade,
flesch_reading_ease, gunning_fog, readability_report, readability_stats,
smog_index,
)
# CI workflow annotations (GitHub Actions)
from je_auto_control.utils.ci_annotations import (
emit_annotations, format_annotation,
Expand Down Expand Up @@ -961,6 +967,14 @@ def start_autocontrol_gui(*args, **kwargs):
"is_confusable",
"is_mixed_script",
"scripts_of",
"automated_readability_index",
"count_syllables",
"flesch_kincaid_grade",
"flesch_reading_ease",
"gunning_fog",
"readability_report",
"readability_stats",
"smog_index",
"emit_annotations", "format_annotation",
"ClipboardHistory", "default_clipboard_history",
"analyze_heal_log", "heal_stats", "scan_secrets",
Expand Down
8 changes: 8 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2105,6 +2105,14 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None:
),
description="Whether two strings share the same confusable skeleton.",
))
specs.append(CommandSpec(
"AC_readability_report", "Data", "Text: Readability Report",
fields=(
FieldSpec("text", FieldType.STRING,
placeholder="The cat sat on the mat."),
),
description="Flesch / Flesch-Kincaid / Fog / SMOG / ARI scores + counts.",
))
specs.append(CommandSpec(
"AC_diff_rows", "Data", "Dataset Diff: Rows by Key",
fields=(
Expand Down
7 changes: 7 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -2993,6 +2993,12 @@ def _confusable_compare(first: str, second: str) -> Dict[str, Any]:
return {"confusable": is_confusable(first, second)}


def _readability_report(text: str) -> Dict[str, Any]:
"""Adapter: full readability report (all metrics + counts) for a string."""
from je_auto_control.utils.readability import readability_report
return readability_report(text)


def _cas_put(name: str, key: str, value: Any,
expected_version: Any = None) -> Dict[str, Any]:
"""Adapter: optimistic put into a named versioned store."""
Expand Down Expand Up @@ -4679,6 +4685,7 @@ def __init__(self):
"AC_collation_compare": _collation_compare,
"AC_confusable_scan": _confusable_scan,
"AC_confusable_compare": _confusable_compare,
"AC_readability_report": _readability_report,
"AC_detect_drift": _detect_drift,
"AC_categorical_drift": _categorical_drift,
"AC_diff_rows": _diff_rows,
Expand Down
15 changes: 14 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3653,6 +3653,19 @@ def confusables_tools() -> List[MCPTool]:
]


def readability_tools() -> List[MCPTool]:
return [
MCPTool(
name="ac_readability_report",
description=("Readability report for 'text': Flesch reading ease, "
"Flesch-Kincaid grade, Gunning Fog, SMOG, ARI + counts."),
input_schema=schema({"text": {"type": "string"}}, ["text"]),
handler=h.readability_report,
annotations=READ_ONLY,
),
]


def sequence_gap_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -5693,7 +5706,7 @@ def media_assert_tools() -> List[MCPTool]:
sse_client_tools, layered_config_tools, data_drift_tools, schema_compat_tools,
timeseries_tools, anomaly_tools, smoothing_tools, idempotency_tools,
dedup_window_tools, sequence_gap_tools, optimistic_tools, outbox_tools,
locale_collation_tools, confusables_tools,
locale_collation_tools, confusables_tools, readability_tools,
dataset_diff_tools, referential_tools, link_header_tools, multipart_tools,
http_content_tools, cookie_jar_tools, http_conditional_tools,
saga_tools, decision_table_tools, locator_repair_tools,
Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1982,6 +1982,11 @@ def confusable_compare(first, second):
return _confusable_compare(first, second)


def readability_report(text):
from je_auto_control.utils.executor.action_executor import _readability_report
return _readability_report(text)


def detect_drift(reference, current, threshold=0.25, bins=10):
from je_auto_control.utils.executor.action_executor import _detect_drift
return _detect_drift(reference, current, threshold, bins)
Expand Down
12 changes: 12 additions & 0 deletions je_auto_control/utils/readability/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""Readability scoring (Flesch, Flesch-Kincaid, Gunning Fog, SMOG, ARI)."""
from je_auto_control.utils.readability.readability import (
automated_readability_index, count_syllables, flesch_kincaid_grade,
flesch_reading_ease, gunning_fog, readability_report, readability_stats,
smog_index,
)

__all__ = [
"automated_readability_index", "count_syllables", "flesch_kincaid_grade",
"flesch_reading_ease", "gunning_fog", "readability_report",
"readability_stats", "smog_index",
]
Loading
Loading