diff --git a/README.md b/README.md index 5c6b4926..ae0c5c49 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ ## Table of Contents +- [What's new (2026-06-22) — Locale-Aware List Formatting](#whats-new-2026-06-22--locale-aware-list-formatting) - [What's new (2026-06-22) — Bidirectional-Text QA (Trojan-Source Scan)](#whats-new-2026-06-22--bidirectional-text-qa-trojan-source-scan) - [What's new (2026-06-22) — Readability Scoring](#whats-new-2026-06-22--readability-scoring) - [What's new (2026-06-22) — Confusable / Homoglyph Detection](#whats-new-2026-06-22--confusable--homoglyph-detection) @@ -164,6 +165,12 @@ --- +## What's new (2026-06-22) — Locale-Aware List Formatting + +Join items the way a language expects ("A, B, and C"). Full reference: [`docs/source/Eng/doc/new_features/v112_features_doc.rst`](docs/source/Eng/doc/new_features/v112_features_doc.rst). + +- **`format_list`** (`AC_format_list`): a naive `", ".join` gives "A, B, C" with no "and"/"or" and no localisation. This implements the CLDR list-pattern composition with conjunction / disjunction / unit styles and per-locale conjunction words + serial-comma rule (`en`/`es`/`fr`/`de`/`pt`) — `format_list(["a","b","c"])` → "a, b, and c", `locale="es"` → "a, b y c". Pure-stdlib, deterministic. + ## What's new (2026-06-22) — Bidirectional-Text QA (Trojan-Source Scan) Catch invisible Unicode directional formatting (RTL QA + Trojan-source). Full reference: [`docs/source/Eng/doc/new_features/v111_features_doc.rst`](docs/source/Eng/doc/new_features/v111_features_doc.rst). diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md index e1ce571f..c0de127e 100644 --- a/README/README_zh-CN.md +++ b/README/README_zh-CN.md @@ -12,6 +12,7 @@ ## 目录 +- [本次更新 (2026-06-22) — 区域感知列表格式化](#本次更新-2026-06-22--区域感知列表格式化) - [本次更新 (2026-06-22) — 双向文字 QA(Trojan-Source 扫描)](#本次更新-2026-06-22--双向文字-qatrojan-source-扫描) - [本次更新 (2026-06-22) — 可读性评分](#本次更新-2026-06-22--可读性评分) - [本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测](#本次更新-2026-06-22--易混淆字符--同形异义字检测) @@ -167,6 +168,12 @@ 平滑噪声值序列。完整参考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。 +## 本次更新 (2026-06-22) — 区域感知列表格式化 + +依某语言的期望串接项目(「A、B and C」)。完整参考:[`docs/source/Zh/doc/new_features/v112_features_doc.rst`](../docs/source/Zh/doc/new_features/v112_features_doc.rst)。 + +- **`format_list`**(`AC_format_list`):直接 `", ".join` 只会得到「A, B, C」,没有「and/or」也没有在地化。本功能实作 CLDR 列表样式组合,支援连接(and)/选择(or)/单位(unit)样式,并依区域提供连接词与序列逗号规则(`en`/`es`/`fr`/`de`/`pt`)——`format_list(["a","b","c"])` → 「a, b, and c」,`locale="es"` → 「a, b y c」。纯标准库、确定。 + ## 本次更新 (2026-06-22) — 双向文字 QA(Trojan-Source 扫描) 抓出隐形的 Unicode 方向格式控制(RTL QA + Trojan-source)。完整参考:[`docs/source/Zh/doc/new_features/v111_features_doc.rst`](../docs/source/Zh/doc/new_features/v111_features_doc.rst)。 diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md index a45fbe28..b8f79163 100644 --- a/README/README_zh-TW.md +++ b/README/README_zh-TW.md @@ -12,6 +12,7 @@ ## 目錄 +- [本次更新 (2026-06-22) — 地區感知清單格式化](#本次更新-2026-06-22--地區感知清單格式化) - [本次更新 (2026-06-22) — 雙向文字 QA(Trojan-Source 掃描)](#本次更新-2026-06-22--雙向文字-qatrojan-source-掃描) - [本次更新 (2026-06-22) — 可讀性評分](#本次更新-2026-06-22--可讀性評分) - [本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測](#本次更新-2026-06-22--易混淆字元--同形異義字偵測) @@ -167,6 +168,12 @@ 平滑雜訊值序列。完整參考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。 +## 本次更新 (2026-06-22) — 地區感知清單格式化 + +依某語言的期望串接項目(「A、B and C」)。完整參考:[`docs/source/Zh/doc/new_features/v112_features_doc.rst`](../docs/source/Zh/doc/new_features/v112_features_doc.rst)。 + +- **`format_list`**(`AC_format_list`):直接 `", ".join` 只會得到「A, B, C」,沒有「and/or」也沒有在地化。本功能實作 CLDR 清單樣式組合,支援連接(and)/選擇(or)/單位(unit)樣式,並依地區提供連接詞與序列逗號規則(`en`/`es`/`fr`/`de`/`pt`)——`format_list(["a","b","c"])` → 「a, b, and c」,`locale="es"` → 「a, b y c」。純標準函式庫、具決定性。 + ## 本次更新 (2026-06-22) — 雙向文字 QA(Trojan-Source 掃描) 抓出隱形的 Unicode 方向格式控制(RTL QA + Trojan-source)。完整參考:[`docs/source/Zh/doc/new_features/v111_features_doc.rst`](../docs/source/Zh/doc/new_features/v111_features_doc.rst)。 diff --git a/docs/source/Eng/doc/new_features/v112_features_doc.rst b/docs/source/Eng/doc/new_features/v112_features_doc.rst new file mode 100644 index 00000000..49fcd385 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v112_features_doc.rst @@ -0,0 +1,39 @@ +Locale-Aware List Formatting +============================ + +``locale_parse`` formats numbers and dates, but joining a list of items the way a +language expects — the conjunction word, whether there is a serial/Oxford comma, +the two-item special case — is its own small problem. A naive ``", ".join`` gives +``"A, B, C"`` with no "and"/"or" and no localisation. + +This implements the CLDR list-pattern composition (start/middle/end plus a +two-item pattern) for a handful of locales and the conjunction / disjunction / +unit styles. Pure standard library; imports no ``PySide6``. Every function is +pure, so it is fully deterministic in CI. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import format_list + + format_list(["apple", "pear", "grape"]) # 'apple, pear, and grape' + format_list(["apple", "pear", "grape"], style="or") # 'apple, pear, or grape' + format_list(["apple", "pear"], style="unit") # 'apple, pear' + format_list(["manzana", "pera", "uva"], locale="es") # 'manzana, pera y uva' + format_list(["A", "B", "C", "D"], locale="fr") # 'A, B, C et D' + +``style`` is ``"and"`` (conjunction), ``"or"`` (disjunction) or ``"unit"`` +(comma-separated, no conjunction). ``locale`` selects the conjunction word and +the serial-comma rule (``en`` / ``es`` / ``fr`` / ``de`` / ``pt``; English uses +the Oxford comma, the others do not; an unknown locale falls back to English). +One and two element lists, and the empty list, are handled as special cases. +``ValueError`` is raised for an unknown ``style``. + +Executor commands +----------------- + +``AC_format_list`` takes a JSON array and returns ``{text}``, accepting ``style`` +and ``locale``. It is exposed as the MCP tool ``ac_format_list`` and as a Script +Builder command under **Data**. diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index 9adac878..6d30a4ec 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -134,6 +134,7 @@ Comprehensive guides for all AutoControl features. doc/new_features/v109_features_doc doc/new_features/v110_features_doc doc/new_features/v111_features_doc + doc/new_features/v112_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v112_features_doc.rst b/docs/source/Zh/doc/new_features/v112_features_doc.rst new file mode 100644 index 00000000..4b55bb07 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v112_features_doc.rst @@ -0,0 +1,31 @@ +地區感知清單格式化 +================== + +``locale_parse`` 能格式化數字與日期,但依某語言的期望把一串項目串接起來——連接詞、是否有序列(牛津)逗號、 +兩項的特例——本身是個獨立的小問題。直接 ``", ".join`` 只會得到 ``"A, B, C"``,沒有「and/or」也沒有在地化。 + +本功能為少數幾個地區實作 CLDR 的清單樣式組合(start/middle/end 加上兩項樣式),並支援連接(and)/選擇(or)/ +單位(unit)樣式。純標準函式庫;不匯入 ``PySide6``。每個函式皆為純函式,因此在 CI 中完全具決定性。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import format_list + + format_list(["apple", "pear", "grape"]) # 'apple, pear, and grape' + format_list(["apple", "pear", "grape"], style="or") # 'apple, pear, or grape' + format_list(["apple", "pear"], style="unit") # 'apple, pear' + format_list(["manzana", "pera", "uva"], locale="es") # 'manzana, pera y uva' + format_list(["A", "B", "C", "D"], locale="fr") # 'A, B, C et D' + +``style`` 為 ``"and"``(連接)、``"or"``(選擇)或 ``"unit"``(僅以逗號分隔、無連接詞)。``locale`` 選擇連接詞與 +序列逗號規則(``en`` / ``es`` / ``fr`` / ``de`` / ``pt``;英文使用牛津逗號,其餘不使用;未知地區回退為英文)。 +一項、兩項與空清單皆以特例處理。未知的 ``style`` 會拋出 ``ValueError``。 + +執行器命令 +---------- + +``AC_format_list`` 接受 JSON 陣列並回傳 ``{text}``,可帶 ``style`` 與 ``locale``。它以 MCP 工具 +``ac_format_list`` 以及 Script Builder 中 **Data** 分類下的命令提供。 diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index f604f80e..ae08cf99 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -134,6 +134,7 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v109_features_doc doc/new_features/v110_features_doc doc/new_features/v111_features_doc + doc/new_features/v112_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index b627d1a4..b759c06f 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -235,6 +235,8 @@ is_trojan_source, strip_bidi_controls, ) from je_auto_control.utils.bidi_check import is_balanced as is_bidi_balanced +# Locale-aware list formatting ("A, B, and C") in the style of CLDR +from je_auto_control.utils.list_format import format_list # CI workflow annotations (GitHub Actions) from je_auto_control.utils.ci_annotations import ( emit_annotations, format_annotation, @@ -988,6 +990,7 @@ def start_autocontrol_gui(*args, **kwargs): "is_bidi_balanced", "is_trojan_source", "strip_bidi_controls", + "format_list", "emit_annotations", "format_annotation", "ClipboardHistory", "default_clipboard_history", "analyze_heal_log", "heal_stats", "scan_secrets", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index caf33d5b..be454adf 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -2127,6 +2127,18 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None: ), description="Remove all bidirectional control characters from a string.", )) + specs.append(CommandSpec( + "AC_format_list", "Data", "Text: Format List", + fields=( + FieldSpec("items", FieldType.STRING, + placeholder='["apple", "pear", "grape"]'), + FieldSpec("style", FieldType.STRING, optional=True, + placeholder="and | or | unit"), + FieldSpec("locale", FieldType.STRING, optional=True, + placeholder="en | es | fr | de | pt"), + ), + description="Join items into a localised list ('A, B, and C').", + )) specs.append(CommandSpec( "AC_diff_rows", "Data", "Dataset Diff: Rows by Key", fields=( diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index 57ab0c70..b5b6b47d 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -3011,6 +3011,16 @@ def _bidi_strip(text: str) -> Dict[str, Any]: return {"text": strip_bidi_controls(text)} +def _format_list(items: Any, style: str = "and", + locale: str = "en") -> Dict[str, Any]: + """Adapter: join items into a localised list string.""" + import json + from je_auto_control.utils.list_format import format_list + if isinstance(items, str): + items = json.loads(items) + return {"text": format_list(list(items), style=style, locale=locale)} + + def _cas_put(name: str, key: str, value: Any, expected_version: Any = None) -> Dict[str, Any]: """Adapter: optimistic put into a named versioned store.""" @@ -4700,6 +4710,7 @@ def __init__(self): "AC_readability_report": _readability_report, "AC_bidi_check": _bidi_check, "AC_bidi_strip": _bidi_strip, + "AC_format_list": _format_list, "AC_detect_drift": _detect_drift, "AC_categorical_drift": _categorical_drift, "AC_diff_rows": _diff_rows, diff --git a/je_auto_control/utils/list_format/__init__.py b/je_auto_control/utils/list_format/__init__.py new file mode 100644 index 00000000..0984246c --- /dev/null +++ b/je_auto_control/utils/list_format/__init__.py @@ -0,0 +1,4 @@ +"""Locale-aware list formatting ("A, B, and C") in the style of CLDR.""" +from je_auto_control.utils.list_format.list_format import format_list + +__all__ = ["format_list"] diff --git a/je_auto_control/utils/list_format/list_format.py b/je_auto_control/utils/list_format/list_format.py new file mode 100644 index 00000000..782cadf3 --- /dev/null +++ b/je_auto_control/utils/list_format/list_format.py @@ -0,0 +1,68 @@ +"""Locale-aware list formatting ("A, B, and C") in the style of CLDR. + +``locale_parse`` formats numbers/dates and ``message_format`` (planned) renders +plural/select messages, but joining a list of items the way a language expects — +the conjunction word, whether there is a serial/Oxford comma, the two-item +special case — is its own small problem. Naive ``", ".join`` gives "A, B, C" +with no "and"/"or" and no localisation. + +This implements the CLDR list-pattern composition (start/middle/end + a two-item +pattern) for a handful of locales and the conjunction / disjunction / unit +styles. Pure standard library; imports no ``PySide6``. Every function is pure, so +it is fully deterministic in CI. +""" +from typing import Dict, List, Sequence + +# Conjunction word per (locale, style). Unit style uses no word at all. +_CONJUNCTIONS: Dict[str, Dict[str, str]] = { + "en": {"and": "and", "or": "or"}, + "es": {"and": "y", "or": "o"}, + "fr": {"and": "et", "or": "ou"}, + "de": {"and": "und", "or": "oder"}, + "pt": {"and": "e", "or": "ou"}, +} +# Locales that place a serial (Oxford) comma before the final conjunction. +_SERIAL_COMMA = {"en"} +_VALID_STYLES = ("and", "or", "unit") + + +def _patterns(locale: str, style: str) -> Dict[str, str]: + """Return the ``two``/``start``/``middle``/``end`` patterns for a locale.""" + pair = "{0}, {1}" + if style == "unit": + return {"two": pair, "start": pair, "middle": pair, "end": pair} + if locale not in _CONJUNCTIONS: # unknown locale -> behave as English + locale = "en" + word = _CONJUNCTIONS[locale][style] + separator = ", " if locale in _SERIAL_COMMA else " " + return { + "two": f"{{0}} {word} {{1}}", + "start": pair, + "middle": pair, + "end": f"{{0}}{separator}{word} {{1}}", + } + + +def format_list(items: Sequence[object], *, style: str = "and", + locale: str = "en") -> str: + """Join ``items`` into a localised list string. + + ``style`` is ``"and"`` (conjunction), ``"or"`` (disjunction) or ``"unit"`` + (comma-separated, no conjunction). ``locale`` selects the conjunction word + and serial-comma rule (``en``/``es``/``fr``/``de``/``pt``; unknown falls back + to English). Raises ``ValueError`` on an unknown ``style``. + """ + if style not in _VALID_STYLES: + raise ValueError(f"unknown style: {style!r}") + values: List[str] = [str(item) for item in items] + if not values: + return "" + if len(values) == 1: + return values[0] + patterns = _patterns(locale, style) + if len(values) == 2: + return patterns["two"].format(values[0], values[1]) + result = patterns["end"].format(values[-2], values[-1]) + for index in range(len(values) - 3, 0, -1): + result = patterns["middle"].format(values[index], result) + return patterns["start"].format(values[0], result) diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index da5426df..30775162 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -3666,6 +3666,22 @@ def readability_tools() -> List[MCPTool]: ] +def list_format_tools() -> List[MCPTool]: + return [ + MCPTool( + name="ac_format_list", + description=("Join 'items' into a localised list string. 'style' " + "and|or|unit; 'locale' en|es|fr|de|pt. Returns {text}."), + input_schema=schema( + {"items": {"type": "array", "items": {"type": "string"}}, + "style": {"type": "string"}, "locale": {"type": "string"}}, + ["items"]), + handler=h.format_list, + annotations=READ_ONLY, + ), + ] + + def bidi_check_tools() -> List[MCPTool]: return [ MCPTool( @@ -5728,7 +5744,7 @@ def media_assert_tools() -> List[MCPTool]: timeseries_tools, anomaly_tools, smoothing_tools, idempotency_tools, dedup_window_tools, sequence_gap_tools, optimistic_tools, outbox_tools, locale_collation_tools, confusables_tools, readability_tools, - bidi_check_tools, + bidi_check_tools, list_format_tools, dataset_diff_tools, referential_tools, link_header_tools, multipart_tools, http_content_tools, cookie_jar_tools, http_conditional_tools, saga_tools, decision_table_tools, locator_repair_tools, diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index 4be2e362..fc330fe4 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -1997,6 +1997,11 @@ def bidi_strip(text): return _bidi_strip(text) +def format_list(items, style="and", locale="en"): + from je_auto_control.utils.executor.action_executor import _format_list + return _format_list(items, style, locale) + + def detect_drift(reference, current, threshold=0.25, bins=10): from je_auto_control.utils.executor.action_executor import _detect_drift return _detect_drift(reference, current, threshold, bins) diff --git a/test/unit_test/headless/test_list_format_batch.py b/test/unit_test/headless/test_list_format_batch.py new file mode 100644 index 00000000..077e6364 --- /dev/null +++ b/test/unit_test/headless/test_list_format_batch.py @@ -0,0 +1,68 @@ +"""Headless tests for locale-aware list formatting. No Qt.""" +import json + +import pytest + +import je_auto_control as ac +from je_auto_control.utils.list_format import format_list + + +def test_english_oxford_comma(): + assert format_list(["A", "B", "C"]) == "A, B, and C" + assert format_list(["A", "B", "C"], style="or") == "A, B, or C" + + +def test_two_and_one_and_empty(): + assert format_list(["A", "B"]) == "A and B" + assert format_list(["only"]) == "only" + assert format_list([]) == "" + + +def test_unit_style_has_no_conjunction(): + assert format_list(["A", "B", "C"], style="unit") == "A, B, C" + assert format_list(["A", "B"], style="unit") == "A, B" + + +def test_other_locales_have_no_serial_comma(): + assert format_list(["manzana", "pera", "uva"], locale="es") == \ + "manzana, pera y uva" + assert format_list(["A", "B", "C", "D"], locale="fr") == "A, B, C et D" + assert format_list(["A", "B", "C"], locale="de", style="or") == "A, B oder C" + + +def test_unknown_locale_falls_back_to_english(): + assert format_list(["A", "B", "C"], locale="zz") == "A, B, and C" + + +def test_unknown_style_raises(): + with pytest.raises(ValueError): + format_list(["A", "B"], style="nope") + + +def test_non_string_items_coerced(): + assert format_list([1, 2, 3]) == "1, 2, and 3" + + +# --- wiring --------------------------------------------------------------- + +def test_executor_round_trip(): + rec = ac.execute_action([[ + "AC_format_list", + {"items": json.dumps(["A", "B", "C"]), "style": "or"}]]) + out = next(v for v in rec.values() if isinstance(v, dict)) + assert out["text"] == "A, B, or C" + + +def test_wiring(): + known = ac.executor.known_commands() + assert "AC_format_list" in set(known) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_format_list" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_format_list" in specs + + +def test_facade_exports(): + assert hasattr(ac, "format_list") and "format_list" in ac.__all__