Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-22) — Locale-Aware List Formatting](#whats-new-2026-06-22--locale-aware-list-formatting)
- [What's new (2026-06-22) — Bidirectional-Text QA (Trojan-Source Scan)](#whats-new-2026-06-22--bidirectional-text-qa-trojan-source-scan)
- [What's new (2026-06-22) — Readability Scoring](#whats-new-2026-06-22--readability-scoring)
- [What's new (2026-06-22) — Confusable / Homoglyph Detection](#whats-new-2026-06-22--confusable--homoglyph-detection)
Expand Down Expand Up @@ -164,6 +165,12 @@

---

## What's new (2026-06-22) — Locale-Aware List Formatting

Join items the way a language expects ("A, B, and C"). Full reference: [`docs/source/Eng/doc/new_features/v112_features_doc.rst`](docs/source/Eng/doc/new_features/v112_features_doc.rst).

- **`format_list`** (`AC_format_list`): a naive `", ".join` gives "A, B, C" with no "and"/"or" and no localisation. This implements the CLDR list-pattern composition with conjunction / disjunction / unit styles and per-locale conjunction words + serial-comma rule (`en`/`es`/`fr`/`de`/`pt`) — `format_list(["a","b","c"])` → "a, b, and c", `locale="es"` → "a, b y c". Pure-stdlib, deterministic.

## What's new (2026-06-22) — Bidirectional-Text QA (Trojan-Source Scan)

Catch invisible Unicode directional formatting (RTL QA + Trojan-source). Full reference: [`docs/source/Eng/doc/new_features/v111_features_doc.rst`](docs/source/Eng/doc/new_features/v111_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-22) — 区域感知列表格式化](#本次更新-2026-06-22--区域感知列表格式化)
- [本次更新 (2026-06-22) — 双向文字 QA(Trojan-Source 扫描)](#本次更新-2026-06-22--双向文字-qatrojan-source-扫描)
- [本次更新 (2026-06-22) — 可读性评分](#本次更新-2026-06-22--可读性评分)
- [本次更新 (2026-06-22) — 易混淆字符 / 同形异义字检测](#本次更新-2026-06-22--易混淆字符--同形异义字检测)
Expand Down Expand Up @@ -167,6 +168,12 @@

平滑噪声值序列。完整参考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。

## 本次更新 (2026-06-22) — 区域感知列表格式化

依某语言的期望串接项目(「A、B and C」)。完整参考:[`docs/source/Zh/doc/new_features/v112_features_doc.rst`](../docs/source/Zh/doc/new_features/v112_features_doc.rst)。

- **`format_list`**(`AC_format_list`):直接 `", ".join` 只会得到「A, B, C」,没有「and/or」也没有在地化。本功能实作 CLDR 列表样式组合,支援连接(and)/选择(or)/单位(unit)样式,并依区域提供连接词与序列逗号规则(`en`/`es`/`fr`/`de`/`pt`)——`format_list(["a","b","c"])` → 「a, b, and c」,`locale="es"` → 「a, b y c」。纯标准库、确定。

## 本次更新 (2026-06-22) — 双向文字 QA(Trojan-Source 扫描)

抓出隐形的 Unicode 方向格式控制(RTL QA + Trojan-source)。完整参考:[`docs/source/Zh/doc/new_features/v111_features_doc.rst`](../docs/source/Zh/doc/new_features/v111_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-22) — 地區感知清單格式化](#本次更新-2026-06-22--地區感知清單格式化)
- [本次更新 (2026-06-22) — 雙向文字 QA(Trojan-Source 掃描)](#本次更新-2026-06-22--雙向文字-qatrojan-source-掃描)
- [本次更新 (2026-06-22) — 可讀性評分](#本次更新-2026-06-22--可讀性評分)
- [本次更新 (2026-06-22) — 易混淆字元 / 同形異義字偵測](#本次更新-2026-06-22--易混淆字元--同形異義字偵測)
Expand Down Expand Up @@ -167,6 +168,12 @@

平滑雜訊值序列。完整參考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。

## 本次更新 (2026-06-22) — 地區感知清單格式化

依某語言的期望串接項目(「A、B and C」)。完整參考:[`docs/source/Zh/doc/new_features/v112_features_doc.rst`](../docs/source/Zh/doc/new_features/v112_features_doc.rst)。

- **`format_list`**(`AC_format_list`):直接 `", ".join` 只會得到「A, B, C」,沒有「and/or」也沒有在地化。本功能實作 CLDR 清單樣式組合,支援連接(and)/選擇(or)/單位(unit)樣式,並依地區提供連接詞與序列逗號規則(`en`/`es`/`fr`/`de`/`pt`)——`format_list(["a","b","c"])` → 「a, b, and c」,`locale="es"` → 「a, b y c」。純標準函式庫、具決定性。

## 本次更新 (2026-06-22) — 雙向文字 QA(Trojan-Source 掃描)

抓出隱形的 Unicode 方向格式控制(RTL QA + Trojan-source)。完整參考:[`docs/source/Zh/doc/new_features/v111_features_doc.rst`](../docs/source/Zh/doc/new_features/v111_features_doc.rst)。
Expand Down
39 changes: 39 additions & 0 deletions docs/source/Eng/doc/new_features/v112_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Locale-Aware List Formatting
============================

``locale_parse`` formats numbers and dates, but joining a list of items the way a
language expects — the conjunction word, whether there is a serial/Oxford comma,
the two-item special case — is its own small problem. A naive ``", ".join`` gives
``"A, B, C"`` with no "and"/"or" and no localisation.

This implements the CLDR list-pattern composition (start/middle/end plus a
two-item pattern) for a handful of locales and the conjunction / disjunction /
unit styles. Pure standard library; imports no ``PySide6``. Every function is
pure, so it is fully deterministic in CI.

Headless API
------------

.. code-block:: python

from je_auto_control import format_list

format_list(["apple", "pear", "grape"]) # 'apple, pear, and grape'
format_list(["apple", "pear", "grape"], style="or") # 'apple, pear, or grape'
format_list(["apple", "pear"], style="unit") # 'apple, pear'
format_list(["manzana", "pera", "uva"], locale="es") # 'manzana, pera y uva'
format_list(["A", "B", "C", "D"], locale="fr") # 'A, B, C et D'

``style`` is ``"and"`` (conjunction), ``"or"`` (disjunction) or ``"unit"``
(comma-separated, no conjunction). ``locale`` selects the conjunction word and
the serial-comma rule (``en`` / ``es`` / ``fr`` / ``de`` / ``pt``; English uses
the Oxford comma, the others do not; an unknown locale falls back to English).
One and two element lists, and the empty list, are handled as special cases.
``ValueError`` is raised for an unknown ``style``.

Executor commands
-----------------

``AC_format_list`` takes a JSON array and returns ``{text}``, accepting ``style``
and ``locale``. It is exposed as the MCP tool ``ac_format_list`` and as a Script
Builder command under **Data**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v109_features_doc
doc/new_features/v110_features_doc
doc/new_features/v111_features_doc
doc/new_features/v112_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
31 changes: 31 additions & 0 deletions docs/source/Zh/doc/new_features/v112_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
地區感知清單格式化
==================

``locale_parse`` 能格式化數字與日期,但依某語言的期望把一串項目串接起來——連接詞、是否有序列(牛津)逗號、
兩項的特例——本身是個獨立的小問題。直接 ``", ".join`` 只會得到 ``"A, B, C"``,沒有「and/or」也沒有在地化。

本功能為少數幾個地區實作 CLDR 的清單樣式組合(start/middle/end 加上兩項樣式),並支援連接(and)/選擇(or)/
單位(unit)樣式。純標準函式庫;不匯入 ``PySide6``。每個函式皆為純函式,因此在 CI 中完全具決定性。

無頭 API
--------

.. code-block:: python

from je_auto_control import format_list

format_list(["apple", "pear", "grape"]) # 'apple, pear, and grape'
format_list(["apple", "pear", "grape"], style="or") # 'apple, pear, or grape'
format_list(["apple", "pear"], style="unit") # 'apple, pear'
format_list(["manzana", "pera", "uva"], locale="es") # 'manzana, pera y uva'
format_list(["A", "B", "C", "D"], locale="fr") # 'A, B, C et D'

``style`` 為 ``"and"``(連接)、``"or"``(選擇)或 ``"unit"``(僅以逗號分隔、無連接詞)。``locale`` 選擇連接詞與
序列逗號規則(``en`` / ``es`` / ``fr`` / ``de`` / ``pt``;英文使用牛津逗號,其餘不使用;未知地區回退為英文)。
一項、兩項與空清單皆以特例處理。未知的 ``style`` 會拋出 ``ValueError``。

執行器命令
----------

``AC_format_list`` 接受 JSON 陣列並回傳 ``{text}``,可帶 ``style`` 與 ``locale``。它以 MCP 工具
``ac_format_list`` 以及 Script Builder 中 **Data** 分類下的命令提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v109_features_doc
doc/new_features/v110_features_doc
doc/new_features/v111_features_doc
doc/new_features/v112_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
3 changes: 3 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,8 @@
is_trojan_source, strip_bidi_controls,
)
from je_auto_control.utils.bidi_check import is_balanced as is_bidi_balanced
# Locale-aware list formatting ("A, B, and C") in the style of CLDR
from je_auto_control.utils.list_format import format_list
# CI workflow annotations (GitHub Actions)
from je_auto_control.utils.ci_annotations import (
emit_annotations, format_annotation,
Expand Down Expand Up @@ -988,6 +990,7 @@ def start_autocontrol_gui(*args, **kwargs):
"is_bidi_balanced",
"is_trojan_source",
"strip_bidi_controls",
"format_list",
"emit_annotations", "format_annotation",
"ClipboardHistory", "default_clipboard_history",
"analyze_heal_log", "heal_stats", "scan_secrets",
Expand Down
12 changes: 12 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2127,6 +2127,18 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None:
),
description="Remove all bidirectional control characters from a string.",
))
specs.append(CommandSpec(
"AC_format_list", "Data", "Text: Format List",
fields=(
FieldSpec("items", FieldType.STRING,
placeholder='["apple", "pear", "grape"]'),
FieldSpec("style", FieldType.STRING, optional=True,
placeholder="and | or | unit"),
FieldSpec("locale", FieldType.STRING, optional=True,
placeholder="en | es | fr | de | pt"),
),
description="Join items into a localised list ('A, B, and C').",
))
specs.append(CommandSpec(
"AC_diff_rows", "Data", "Dataset Diff: Rows by Key",
fields=(
Expand Down
11 changes: 11 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3011,6 +3011,16 @@ def _bidi_strip(text: str) -> Dict[str, Any]:
return {"text": strip_bidi_controls(text)}


def _format_list(items: Any, style: str = "and",
locale: str = "en") -> Dict[str, Any]:
"""Adapter: join items into a localised list string."""
import json
from je_auto_control.utils.list_format import format_list
if isinstance(items, str):
items = json.loads(items)
return {"text": format_list(list(items), style=style, locale=locale)}


def _cas_put(name: str, key: str, value: Any,
expected_version: Any = None) -> Dict[str, Any]:
"""Adapter: optimistic put into a named versioned store."""
Expand Down Expand Up @@ -4700,6 +4710,7 @@ def __init__(self):
"AC_readability_report": _readability_report,
"AC_bidi_check": _bidi_check,
"AC_bidi_strip": _bidi_strip,
"AC_format_list": _format_list,
"AC_detect_drift": _detect_drift,
"AC_categorical_drift": _categorical_drift,
"AC_diff_rows": _diff_rows,
Expand Down
4 changes: 4 additions & 0 deletions je_auto_control/utils/list_format/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""Locale-aware list formatting ("A, B, and C") in the style of CLDR."""
from je_auto_control.utils.list_format.list_format import format_list

__all__ = ["format_list"]
68 changes: 68 additions & 0 deletions je_auto_control/utils/list_format/list_format.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Locale-aware list formatting ("A, B, and C") in the style of CLDR.

``locale_parse`` formats numbers/dates and ``message_format`` (planned) renders
plural/select messages, but joining a list of items the way a language expects —
the conjunction word, whether there is a serial/Oxford comma, the two-item
special case — is its own small problem. Naive ``", ".join`` gives "A, B, C"
with no "and"/"or" and no localisation.

This implements the CLDR list-pattern composition (start/middle/end + a two-item
pattern) for a handful of locales and the conjunction / disjunction / unit
styles. Pure standard library; imports no ``PySide6``. Every function is pure, so
it is fully deterministic in CI.
"""
from typing import Dict, List, Sequence

# Conjunction word per (locale, style). Unit style uses no word at all.
_CONJUNCTIONS: Dict[str, Dict[str, str]] = {
"en": {"and": "and", "or": "or"},
"es": {"and": "y", "or": "o"},
"fr": {"and": "et", "or": "ou"},
"de": {"and": "und", "or": "oder"},
"pt": {"and": "e", "or": "ou"},
}
# Locales that place a serial (Oxford) comma before the final conjunction.
_SERIAL_COMMA = {"en"}
_VALID_STYLES = ("and", "or", "unit")


def _patterns(locale: str, style: str) -> Dict[str, str]:
"""Return the ``two``/``start``/``middle``/``end`` patterns for a locale."""
pair = "{0}, {1}"
if style == "unit":
return {"two": pair, "start": pair, "middle": pair, "end": pair}
if locale not in _CONJUNCTIONS: # unknown locale -> behave as English
locale = "en"
word = _CONJUNCTIONS[locale][style]
separator = ", " if locale in _SERIAL_COMMA else " "
return {
"two": f"{{0}} {word} {{1}}",
"start": pair,
"middle": pair,
"end": f"{{0}}{separator}{word} {{1}}",
}


def format_list(items: Sequence[object], *, style: str = "and",
locale: str = "en") -> str:
"""Join ``items`` into a localised list string.

``style`` is ``"and"`` (conjunction), ``"or"`` (disjunction) or ``"unit"``
(comma-separated, no conjunction). ``locale`` selects the conjunction word
and serial-comma rule (``en``/``es``/``fr``/``de``/``pt``; unknown falls back
to English). Raises ``ValueError`` on an unknown ``style``.
"""
if style not in _VALID_STYLES:
raise ValueError(f"unknown style: {style!r}")
values: List[str] = [str(item) for item in items]
if not values:
return ""
if len(values) == 1:
return values[0]
patterns = _patterns(locale, style)
if len(values) == 2:
return patterns["two"].format(values[0], values[1])
result = patterns["end"].format(values[-2], values[-1])
for index in range(len(values) - 3, 0, -1):
result = patterns["middle"].format(values[index], result)
return patterns["start"].format(values[0], result)
18 changes: 17 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3666,6 +3666,22 @@ def readability_tools() -> List[MCPTool]:
]


def list_format_tools() -> List[MCPTool]:
return [
MCPTool(
name="ac_format_list",
description=("Join 'items' into a localised list string. 'style' "
"and|or|unit; 'locale' en|es|fr|de|pt. Returns {text}."),
input_schema=schema(
{"items": {"type": "array", "items": {"type": "string"}},
"style": {"type": "string"}, "locale": {"type": "string"}},
["items"]),
handler=h.format_list,
annotations=READ_ONLY,
),
]


def bidi_check_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -5728,7 +5744,7 @@ def media_assert_tools() -> List[MCPTool]:
timeseries_tools, anomaly_tools, smoothing_tools, idempotency_tools,
dedup_window_tools, sequence_gap_tools, optimistic_tools, outbox_tools,
locale_collation_tools, confusables_tools, readability_tools,
bidi_check_tools,
bidi_check_tools, list_format_tools,
dataset_diff_tools, referential_tools, link_header_tools, multipart_tools,
http_content_tools, cookie_jar_tools, http_conditional_tools,
saga_tools, decision_table_tools, locator_repair_tools,
Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1997,6 +1997,11 @@ def bidi_strip(text):
return _bidi_strip(text)


def format_list(items, style="and", locale="en"):
from je_auto_control.utils.executor.action_executor import _format_list
return _format_list(items, style, locale)


def detect_drift(reference, current, threshold=0.25, bins=10):
from je_auto_control.utils.executor.action_executor import _detect_drift
return _detect_drift(reference, current, threshold, bins)
Expand Down
68 changes: 68 additions & 0 deletions test/unit_test/headless/test_list_format_batch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Headless tests for locale-aware list formatting. No Qt."""
import json

import pytest

import je_auto_control as ac
from je_auto_control.utils.list_format import format_list


def test_english_oxford_comma():
assert format_list(["A", "B", "C"]) == "A, B, and C"
assert format_list(["A", "B", "C"], style="or") == "A, B, or C"


def test_two_and_one_and_empty():
assert format_list(["A", "B"]) == "A and B"
assert format_list(["only"]) == "only"
assert format_list([]) == ""


def test_unit_style_has_no_conjunction():
assert format_list(["A", "B", "C"], style="unit") == "A, B, C"
assert format_list(["A", "B"], style="unit") == "A, B"


def test_other_locales_have_no_serial_comma():
assert format_list(["manzana", "pera", "uva"], locale="es") == \
"manzana, pera y uva"
assert format_list(["A", "B", "C", "D"], locale="fr") == "A, B, C et D"
assert format_list(["A", "B", "C"], locale="de", style="or") == "A, B oder C"


def test_unknown_locale_falls_back_to_english():
assert format_list(["A", "B", "C"], locale="zz") == "A, B, and C"


def test_unknown_style_raises():
with pytest.raises(ValueError):
format_list(["A", "B"], style="nope")


def test_non_string_items_coerced():
assert format_list([1, 2, 3]) == "1, 2, and 3"


# --- wiring ---------------------------------------------------------------

def test_executor_round_trip():
rec = ac.execute_action([[
"AC_format_list",
{"items": json.dumps(["A", "B", "C"]), "style": "or"}]])
out = next(v for v in rec.values() if isinstance(v, dict))
assert out["text"] == "A, B, or C"


def test_wiring():
known = ac.executor.known_commands()
assert "AC_format_list" in set(known)
from je_auto_control.utils.mcp_server.tools import build_default_tool_registry
names = {t.name for t in build_default_tool_registry()}
assert "ac_format_list" in names
from je_auto_control.gui.script_builder.command_schema import _build_specs
specs = {s.command for s in _build_specs()}
assert "AC_format_list" in specs


def test_facade_exports():
assert hasattr(ac, "format_list") and "format_list" in ac.__all__
Loading