diff --git a/README.md b/README.md index d0f3f09e..f5e44b84 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ ## Table of Contents +- [What's new (2026-06-22) — GNU gettext Catalog I/O (.po / .mo)](#whats-new-2026-06-22--gnu-gettext-catalog-io-po--mo) - [What's new (2026-06-22) — ICU-lite MessageFormat (Plural / Select)](#whats-new-2026-06-22--icu-lite-messageformat-plural--select) - [What's new (2026-06-22) — Locale-Aware List Formatting](#whats-new-2026-06-22--locale-aware-list-formatting) - [What's new (2026-06-22) — Bidirectional-Text QA (Trojan-Source Scan)](#whats-new-2026-06-22--bidirectional-text-qa-trojan-source-scan) @@ -166,6 +167,12 @@ --- +## What's new (2026-06-22) — GNU gettext Catalog I/O (.po / .mo) + +Read/compile the de-facto translation format. Full reference: [`docs/source/Eng/doc/new_features/v114_features_doc.rst`](docs/source/Eng/doc/new_features/v114_features_doc.rst). + +- **`parse_po` / `read_mo` / `GettextCatalog` / `parse_po_file` / `read_mo_file`** (`AC_gettext_translate`, `AC_gettext_ngettext`): the repo pseudo-localises and renders ICU messages but couldn't read GNU gettext `.po`/`.mo`. This parses `.po` (contexts, plurals, the `Plural-Forms` header via `gettext.c2py`), compiles a standards-compliant `.mo` that Python's own `gettext.GNUTranslations` loads, and exposes `gettext`/`ngettext`/`pgettext`. Pure-stdlib, deterministic. + ## What's new (2026-06-22) — ICU-lite MessageFormat (Plural / Select) Render count-aware localised messages. Full reference: [`docs/source/Eng/doc/new_features/v113_features_doc.rst`](docs/source/Eng/doc/new_features/v113_features_doc.rst). diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md index fa7c5706..906cce47 100644 --- a/README/README_zh-CN.md +++ b/README/README_zh-CN.md @@ -12,6 +12,7 @@ ## 目录 +- [本次更新 (2026-06-22) — GNU gettext 目录 I/O(.po / .mo)](#本次更新-2026-06-22--gnu-gettext-目录-iopo--mo) - [本次更新 (2026-06-22) — ICU-lite MessageFormat(复数 / 选择)](#本次更新-2026-06-22--icu-lite-messageformat复数--选择) - [本次更新 (2026-06-22) — 区域感知列表格式化](#本次更新-2026-06-22--区域感知列表格式化) - [本次更新 (2026-06-22) — 双向文字 QA(Trojan-Source 扫描)](#本次更新-2026-06-22--双向文字-qatrojan-source-扫描) @@ -169,6 +170,12 @@ 平滑噪声值序列。完整参考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。 +## 本次更新 (2026-06-22) — GNU gettext 目录 I/O(.po / .mo) + +读取/编译事实标准翻译格式。完整参考:[`docs/source/Zh/doc/new_features/v114_features_doc.rst`](../docs/source/Zh/doc/new_features/v114_features_doc.rst)。 + +- **`parse_po` / `read_mo` / `GettextCatalog` / `parse_po_file` / `read_mo_file`**(`AC_gettext_translate`、`AC_gettext_ngettext`):本项目能伪在地化并渲染 ICU 消息,却无法读取 GNU gettext `.po`/`.mo`。本功能解析 `.po`(上下文、复数、以 `gettext.c2py` 处理 `Plural-Forms` 标头)、编译可被 Python 内建 `gettext.GNUTranslations` 载入的标准 `.mo`,并提供 `gettext`/`ngettext`/`pgettext`。纯标准库、确定。 + ## 本次更新 (2026-06-22) — ICU-lite MessageFormat(复数 / 选择) 渲染依数量变化的在地化消息。完整参考:[`docs/source/Zh/doc/new_features/v113_features_doc.rst`](../docs/source/Zh/doc/new_features/v113_features_doc.rst)。 diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md index 37ecf28a..81cb4b9f 100644 --- a/README/README_zh-TW.md +++ b/README/README_zh-TW.md @@ -12,6 +12,7 @@ ## 目錄 +- [本次更新 (2026-06-22) — GNU gettext 目錄 I/O(.po / .mo)](#本次更新-2026-06-22--gnu-gettext-目錄-iopo--mo) - [本次更新 (2026-06-22) — ICU-lite MessageFormat(複數 / 選擇)](#本次更新-2026-06-22--icu-lite-messageformat複數--選擇) - [本次更新 (2026-06-22) — 地區感知清單格式化](#本次更新-2026-06-22--地區感知清單格式化) - [本次更新 (2026-06-22) — 雙向文字 QA(Trojan-Source 掃描)](#本次更新-2026-06-22--雙向文字-qatrojan-source-掃描) @@ -169,6 +170,12 @@ 平滑雜訊值序列。完整參考:[`docs/source/Zh/doc/new_features/v102_features_doc.rst`](../docs/source/Zh/doc/new_features/v102_features_doc.rst)。 +## 本次更新 (2026-06-22) — GNU gettext 目錄 I/O(.po / .mo) + +讀取/編譯事實標準翻譯格式。完整參考:[`docs/source/Zh/doc/new_features/v114_features_doc.rst`](../docs/source/Zh/doc/new_features/v114_features_doc.rst)。 + +- **`parse_po` / `read_mo` / `GettextCatalog` / `parse_po_file` / `read_mo_file`**(`AC_gettext_translate`、`AC_gettext_ngettext`):本專案能偽在地化並渲染 ICU 訊息,卻無法讀取 GNU gettext `.po`/`.mo`。本功能解析 `.po`(上下文、複數、以 `gettext.c2py` 處理 `Plural-Forms` 標頭)、編譯可被 Python 內建 `gettext.GNUTranslations` 載入的標準 `.mo`,並提供 `gettext`/`ngettext`/`pgettext`。純標準函式庫、具決定性。 + ## 本次更新 (2026-06-22) — ICU-lite MessageFormat(複數 / 選擇) 渲染依數量變化的在地化訊息。完整參考:[`docs/source/Zh/doc/new_features/v113_features_doc.rst`](../docs/source/Zh/doc/new_features/v113_features_doc.rst)。 diff --git a/docs/source/Eng/doc/new_features/v114_features_doc.rst b/docs/source/Eng/doc/new_features/v114_features_doc.rst new file mode 100644 index 00000000..b5ba22d2 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v114_features_doc.rst @@ -0,0 +1,48 @@ +GNU gettext Catalog I/O (.po / .mo) +=================================== + +The repo has ``i18n_test`` (pseudo-localisation, catalog placeholder checks) and +``message_format`` (ICU rendering) but no reader for the *de-facto* translation +format — GNU gettext ``.po`` / ``.mo``. This parses ``.po`` text (contexts, +plurals, multi-line strings, escapes, the ``Plural-Forms`` header), compiles the +binary ``.mo`` (the same little-endian format Python's own ``gettext`` reads) and +exposes ``gettext`` / ``ngettext`` / ``pgettext`` lookups. + +Pure standard library (``re`` / ``struct`` / ``gettext.c2py`` for the plural +expression); imports no ``PySide6``. Parsing and compilation are pure data +in / data out, so they are fully deterministic in CI. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import parse_po, read_mo, GettextCatalog + + catalog = parse_po(po_text) + catalog.gettext("Hello") # 'Hola' + catalog.ngettext("file", "files", 3) # 'archivos' + catalog.pgettext("menu", "Open") # 'Abrir' + + mo_bytes = catalog.compile_mo("out/messages.mo") # or .to_mo_bytes() + same = read_mo(mo_bytes) # round-trips, incl. plural rules + + # build one by hand + cat = GettextCatalog() + cat.add("apple", ["pomme", "pommes"], plural_id="apples") + +``gettext`` returns the translation (or the source ``msgid`` when untranslated); +``ngettext`` evaluates the catalog's ``Plural-Forms`` expression (via +``gettext.c2py``) to pick the right form for ``n``; ``pgettext`` adds a +disambiguation context. ``to_mo_bytes`` / ``compile_mo`` emit a standards- +compliant ``.mo`` that Python's own ``gettext.GNUTranslations`` can load, and +``read_mo`` / ``read_mo_file`` parse one back (little- or big-endian). + +Executor commands +----------------- + +``AC_gettext_translate`` parses an inline ``.po`` string and returns ``{text}`` +for a ``msgid`` (optional ``context``); ``AC_gettext_ngettext`` returns the +plural-correct ``{text}`` for a count ``n``. Both are exposed as MCP tools +(``ac_gettext_translate`` / ``ac_gettext_ngettext``) and as Script Builder +commands under **Data**. diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index f22f4ae4..77ff21a5 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -136,6 +136,7 @@ Comprehensive guides for all AutoControl features. doc/new_features/v111_features_doc doc/new_features/v112_features_doc doc/new_features/v113_features_doc + doc/new_features/v114_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v114_features_doc.rst b/docs/source/Zh/doc/new_features/v114_features_doc.rst new file mode 100644 index 00000000..d7ddffb1 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v114_features_doc.rst @@ -0,0 +1,41 @@ +GNU gettext 目錄 I/O(.po / .mo) +=============================== + +本專案已有 ``i18n_test``(偽在地化、目錄佔位符檢查)與 ``message_format``(ICU 渲染),但沒有讀取*事實標準* +翻譯格式 GNU gettext ``.po`` / ``.mo`` 的工具。本功能解析 ``.po`` 文字(上下文、複數、多行字串、跳脫、 +``Plural-Forms`` 標頭)、編譯二進位 ``.mo``(與 Python 內建 ``gettext`` 讀取的小端格式相同),並提供 +``gettext`` / ``ngettext`` / ``pgettext`` 查詢。 + +純標準函式庫(``re`` / ``struct`` / 以 ``gettext.c2py`` 處理複數運算式);不匯入 ``PySide6``。解析與編譯皆為 +純資料進/資料出,因此在 CI 中完全具決定性。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import parse_po, read_mo, GettextCatalog + + catalog = parse_po(po_text) + catalog.gettext("Hello") # 'Hola' + catalog.ngettext("file", "files", 3) # 'archivos' + catalog.pgettext("menu", "Open") # 'Abrir' + + mo_bytes = catalog.compile_mo("out/messages.mo") # 或 .to_mo_bytes() + same = read_mo(mo_bytes) # 可往返,含複數規則 + + # 以程式手動建立 + cat = GettextCatalog() + cat.add("apple", ["pomme", "pommes"], plural_id="apples") + +``gettext`` 回傳翻譯(未翻譯時回傳原始 ``msgid``);``ngettext`` 評估目錄的 ``Plural-Forms`` 運算式 +(透過 ``gettext.c2py``)以為 ``n`` 選擇正確形式;``pgettext`` 加入消歧上下文。``to_mo_bytes`` / ``compile_mo`` +產生符合標準、可被 Python 內建 ``gettext.GNUTranslations`` 載入的 ``.mo``,而 ``read_mo`` / ``read_mo_file`` +可反向解析(小端或大端)。 + +執行器命令 +---------- + +``AC_gettext_translate`` 解析內嵌 ``.po`` 字串並回傳某 ``msgid``(可帶 ``context``)的 ``{text}``; +``AC_gettext_ngettext`` 回傳計數 ``n`` 的複數正確 ``{text}``。兩者皆以 MCP 工具(``ac_gettext_translate`` / +``ac_gettext_ngettext``)以及 Script Builder 中 **Data** 分類下的命令提供。 diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index 1cf876c2..6f85b45d 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -136,6 +136,7 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v111_features_doc doc/new_features/v112_features_doc doc/new_features/v113_features_doc + doc/new_features/v114_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index 96584da8..61aff45b 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -241,6 +241,10 @@ from je_auto_control.utils.message_format import ( format_message, ordinal_category, plural_category, ) +# GNU gettext catalog I/O (parse .po, compile/read .mo, message lookup) +from je_auto_control.utils.gettext_catalog import ( + GettextCatalog, parse_po, parse_po_file, read_mo, read_mo_file, +) # CI workflow annotations (GitHub Actions) from je_auto_control.utils.ci_annotations import ( emit_annotations, format_annotation, @@ -998,6 +1002,11 @@ def start_autocontrol_gui(*args, **kwargs): "format_message", "ordinal_category", "plural_category", + "GettextCatalog", + "parse_po", + "parse_po_file", + "read_mo", + "read_mo_file", "emit_annotations", "format_annotation", "ClipboardHistory", "default_clipboard_history", "analyze_heal_log", "heal_stats", "scan_secrets", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index 28f2be76..e2a66b3d 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -2150,6 +2150,26 @@ def _add_resilience_specs(specs: List[CommandSpec]) -> None: ), description="Render ICU plural/select/selectordinal message.", )) + specs.append(CommandSpec( + "AC_gettext_translate", "Data", "Text: gettext Translate (.po)", + fields=( + FieldSpec("po", FieldType.STRING, + placeholder='msgid "Hello"\\nmsgstr "Hola"'), + FieldSpec("msgid", FieldType.STRING, placeholder="Hello"), + FieldSpec("context", FieldType.STRING, optional=True), + ), + description="Look up a singular translation in a gettext .po catalog.", + )) + specs.append(CommandSpec( + "AC_gettext_ngettext", "Data", "Text: gettext Plural (.po)", + fields=( + FieldSpec("po", FieldType.STRING, placeholder="(.po source)"), + FieldSpec("msgid", FieldType.STRING, placeholder="file"), + FieldSpec("msgid_plural", FieldType.STRING, placeholder="files"), + FieldSpec("n", FieldType.INT, placeholder="3"), + ), + description="Pick the plural-correct translation for count n.", + )) specs.append(CommandSpec( "AC_diff_rows", "Data", "Dataset Diff: Rows by Key", fields=( diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index 7679ae47..65539211 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -3031,6 +3031,22 @@ def _format_message(pattern: str, args: Any = None, return {"text": format_message(pattern, args or {}, locale=locale)} +def _gettext_translate(po: str, msgid: str, + context: Any = None) -> Dict[str, Any]: + """Adapter: parse a .po string and look up a singular translation.""" + from je_auto_control.utils.gettext_catalog import parse_po + catalog = parse_po(po) + return {"text": catalog.gettext(msgid, context=context or None)} + + +def _gettext_ngettext(po: str, msgid: str, msgid_plural: str, + n: Any) -> Dict[str, Any]: + """Adapter: parse a .po string and look up a plural translation.""" + from je_auto_control.utils.gettext_catalog import parse_po + catalog = parse_po(po) + return {"text": catalog.ngettext(msgid, msgid_plural, int(n))} + + def _cas_put(name: str, key: str, value: Any, expected_version: Any = None) -> Dict[str, Any]: """Adapter: optimistic put into a named versioned store.""" @@ -4722,6 +4738,8 @@ def __init__(self): "AC_bidi_strip": _bidi_strip, "AC_format_list": _format_list, "AC_format_message": _format_message, + "AC_gettext_translate": _gettext_translate, + "AC_gettext_ngettext": _gettext_ngettext, "AC_detect_drift": _detect_drift, "AC_categorical_drift": _categorical_drift, "AC_diff_rows": _diff_rows, diff --git a/je_auto_control/utils/gettext_catalog/__init__.py b/je_auto_control/utils/gettext_catalog/__init__.py new file mode 100644 index 00000000..78e25091 --- /dev/null +++ b/je_auto_control/utils/gettext_catalog/__init__.py @@ -0,0 +1,8 @@ +"""GNU gettext catalog I/O (parse .po, compile/read .mo, message lookup).""" +from je_auto_control.utils.gettext_catalog.gettext_catalog import ( + GettextCatalog, parse_po, parse_po_file, read_mo, read_mo_file, +) + +__all__ = [ + "GettextCatalog", "parse_po", "parse_po_file", "read_mo", "read_mo_file", +] diff --git a/je_auto_control/utils/gettext_catalog/gettext_catalog.py b/je_auto_control/utils/gettext_catalog/gettext_catalog.py new file mode 100644 index 00000000..93d22818 --- /dev/null +++ b/je_auto_control/utils/gettext_catalog/gettext_catalog.py @@ -0,0 +1,288 @@ +"""GNU gettext catalog I/O: parse ``.po``, compile/read ``.mo``, look up messages. + +The repo has ``i18n_test`` (pseudo-localisation, catalog placeholder checks) and +``message_format`` (ICU rendering) but no reader for the *de-facto* translation +format — GNU gettext ``.po`` / ``.mo``. This parses ``.po`` text (contexts, +plurals, multi-line strings, escapes, the ``Plural-Forms`` header), compiles the +binary ``.mo`` (little-endian, the format Python's own ``gettext`` reads) and +exposes ``gettext`` / ``ngettext`` / ``pgettext`` lookups. + +Pure standard library (``re`` / ``struct`` / ``gettext.c2py`` for the plural +expression); imports no ``PySide6``. Parsing and compilation are pure data +in / data out, so they are fully deterministic in CI. +""" +import re +import struct +from gettext import c2py +from pathlib import Path +from typing import Dict, List, Optional, Tuple + +_MO_MAGIC_LE = 0x950412DE +_MO_MAGIC_BE = 0xDE120495 +_CONTEXT_SEP = "\x04" +_PLURAL_SEP = "\x00" +_ESCAPES = {"n": "\n", "t": "\t", "r": "\r", '"': '"', "\\": "\\"} + +Key = Tuple[Optional[str], str] + + +def _decode_escapes(text: str) -> str: + """Decode the backslash escapes used inside ``.po`` quoted strings.""" + out: List[str] = [] + index = 0 + while index < len(text): + char = text[index] + if char == "\\" and index + 1 < len(text): + out.append(_ESCAPES.get(text[index + 1], text[index + 1])) + index += 2 + else: + out.append(char) + index += 1 + return "".join(out) + + +def _unquote(line: str) -> str: + """Strip the surrounding quotes of a ``.po`` string line and unescape it.""" + stripped = line.strip() + if len(stripped) >= 2 and stripped[0] == '"' and stripped[-1] == '"': + stripped = stripped[1:-1] + return _decode_escapes(stripped) + + +class GettextCatalog: + """An in-memory gettext catalog with ``.po`` / ``.mo`` round-tripping.""" + + def __init__(self) -> None: + self._messages: Dict[Key, List[str]] = {} + self._plural_ids: Dict[Key, str] = {} + self.metadata: Dict[str, str] = {} + self.nplurals = 2 + self._plural_func = c2py("n != 1") + + # -- building ---------------------------------------------------------- + + def add(self, msgid: str, message: object = "", *, + context: Optional[str] = None, + plural_id: Optional[str] = None) -> None: + """Add a message; ``message`` is a string, or a list of plural forms + (pair with ``plural_id``).""" + key = (context, msgid) + self._messages[key] = (list(message) if isinstance(message, list) + else [str(message)]) + if plural_id is not None: + self._plural_ids[key] = plural_id + + def finalize(self) -> None: + """Parse the header entry for ``Plural-Forms`` / metadata.""" + header = self._messages.get((None, "")) + if not header: + return + for raw in header[0].split("\n"): + if ":" in raw: + name, value = raw.split(":", 1) + self.metadata[name.strip()] = value.strip() + self._apply_plural_forms(self.metadata.get("Plural-Forms", "")) + + def _apply_plural_forms(self, spec: str) -> None: + count = re.search(r"nplurals\s*=\s*(\d+)", spec) + if count: + self.nplurals = int(count.group(1)) + expr = re.search(r"plural\s*=\s*([^;]+)", spec) + if expr: + try: + self._plural_func = c2py(expr.group(1).strip()) + except ValueError: + pass + + # -- lookup ------------------------------------------------------------ + + def gettext(self, msgid: str, *, context: Optional[str] = None) -> str: + """Return the translation of ``msgid`` (or ``msgid`` if untranslated).""" + forms = self._messages.get((context, msgid)) + if forms and forms[0]: + return forms[0] + return msgid + + def ngettext(self, msgid: str, msgid_plural: str, n: int, *, + context: Optional[str] = None) -> str: + """Return the plural-correct translation for count ``n``.""" + forms = self._messages.get((context, msgid)) + if forms: + index = self._plural_func(n) + if 0 <= index < len(forms) and forms[index]: + return forms[index] + return msgid if n == 1 else msgid_plural + + def pgettext(self, context: str, msgid: str) -> str: + """Context-qualified :meth:`gettext`.""" + return self.gettext(msgid, context=context) + + # -- .mo output -------------------------------------------------------- + + def _mo_pairs(self) -> List[Tuple[bytes, bytes]]: + pairs: List[Tuple[bytes, bytes]] = [] + for (context, msgid), forms in self._messages.items(): + original = msgid + plural_id = self._plural_ids.get((context, msgid)) + if plural_id is not None: + original = msgid + _PLURAL_SEP + plural_id + if context is not None: + original = context + _CONTEXT_SEP + original + pairs.append((original.encode("utf-8"), + _PLURAL_SEP.join(forms).encode("utf-8"))) + pairs.sort(key=lambda pair: pair[0]) + return pairs + + def to_mo_bytes(self) -> bytes: + """Serialise the catalog to GNU ``.mo`` binary bytes (little-endian).""" + return _pack_mo(self._mo_pairs()) + + def compile_mo(self, path: str) -> str: + """Write the catalog as a ``.mo`` file; return the path.""" + out = Path(path) + out.parent.mkdir(parents=True, exist_ok=True) + out.write_bytes(self.to_mo_bytes()) + return str(out) + + +def _pack_mo(pairs: List[Tuple[bytes, bytes]]) -> bytes: + """Pack sorted (original, translation) byte pairs into ``.mo`` format.""" + count = len(pairs) + originals_table = 7 * 4 + translations_table = originals_table + 8 * count + offset = translations_table + 8 * count + originals: List[Tuple[int, int]] = [] + translations: List[Tuple[int, int]] = [] + blob = bytearray() + for original, _ in pairs: + originals.append((len(original), offset)) + blob += original + b"\x00" + offset += len(original) + 1 + for _, translation in pairs: + translations.append((len(translation), offset)) + blob += translation + b"\x00" + offset += len(translation) + 1 + header = struct.pack("<7I", _MO_MAGIC_LE, 0, count, + originals_table, translations_table, 0, 0) + table = b"".join(struct.pack("<2I", length, off) + for length, off in originals + translations) + return header + table + bytes(blob) + + +def _store_mo_entry(catalog: GettextCatalog, original: str, + translation: str) -> None: + context: Optional[str] = None + if _CONTEXT_SEP in original: + context, original = original.split(_CONTEXT_SEP, 1) + ids = original.split(_PLURAL_SEP) + forms = translation.split(_PLURAL_SEP) + if len(ids) > 1: + catalog.add(ids[0], forms, context=context, plural_id=ids[1]) + else: + catalog.add(ids[0], forms[0], context=context) + + +def read_mo(data: bytes) -> GettextCatalog: + """Parse GNU ``.mo`` binary ``data`` into a catalog.""" + magic = struct.unpack("" + count, orig_off, trans_off = struct.unpack(endian + "III", data[8:20]) + catalog = GettextCatalog() + for index in range(count): + olen, ostart = struct.unpack( + endian + "II", data[orig_off + 8 * index:orig_off + 8 * index + 8]) + tlen, tstart = struct.unpack( + endian + "II", data[trans_off + 8 * index:trans_off + 8 * index + 8]) + original = data[ostart:ostart + olen].decode("utf-8") + translation = data[tstart:tstart + tlen].decode("utf-8") + _store_mo_entry(catalog, original, translation) + catalog.finalize() + return catalog + + +def read_mo_file(path: str) -> GettextCatalog: + """Read a ``.mo`` file into a catalog.""" + return read_mo(Path(path).read_bytes()) + + +# --- .po parsing ---------------------------------------------------------- + +def _append(entry: Dict[str, object], field: Optional[object], + value: str) -> None: + """Append a continuation string to the field last seen in a block.""" + if field == "msgctxt": + entry["msgctxt"] = str(entry.get("msgctxt", "")) + value + elif field == "msgid": + entry["msgid"] = str(entry.get("msgid", "")) + value + elif field == "msgid_plural": + entry["msgid_plural"] = str(entry.get("msgid_plural", "")) + value + elif isinstance(field, int): + plurals: Dict[int, str] = entry.setdefault("plurals", {}) # type: ignore[assignment] + plurals[field] = plurals.get(field, "") + value + + +def _parse_block(block: str) -> Optional[Dict[str, object]]: + """Parse one blank-line-delimited ``.po`` block into a field dict.""" + entry: Dict[str, object] = {} + field: Optional[object] = None + for line in block.splitlines(): + stripped = line.strip() + if not stripped or stripped.startswith("#"): + continue + field = _consume_line(stripped, entry, field) + return entry if "msgid" in entry else None + + +def _consume_line(stripped: str, entry: Dict[str, object], + field: Optional[object]) -> Optional[object]: + """Process one ``.po`` line, returning the field it continues.""" + plural = re.match(r"msgstr\[(\d+)\]\s*(.*)", stripped) + if plural: + field = int(plural.group(1)) + _append(entry, field, _unquote(plural.group(2))) + elif stripped.startswith("msgctxt "): + field = "msgctxt" + _append(entry, field, _unquote(stripped[8:])) + elif stripped.startswith("msgid_plural "): + field = "msgid_plural" + _append(entry, field, _unquote(stripped[13:])) + elif stripped.startswith("msgid "): + field = "msgid" + _append(entry, field, _unquote(stripped[6:])) + elif stripped.startswith("msgstr "): + field = 0 + _append(entry, field, _unquote(stripped[7:])) + elif stripped.startswith('"'): + _append(entry, field, _unquote(stripped)) + return field + + +def _store_block(catalog: GettextCatalog, entry: Dict[str, object]) -> None: + context = entry.get("msgctxt") + ctx = None if context is None else str(context) + msgid = str(entry["msgid"]) + plurals: Dict[int, str] = entry.get("plurals", {}) # type: ignore[assignment] + if "msgid_plural" in entry: + forms = [plurals[i] for i in sorted(plurals)] if plurals else [""] + catalog.add(msgid, forms, context=ctx, + plural_id=str(entry["msgid_plural"])) + else: + catalog.add(msgid, plurals.get(0, ""), context=ctx) + + +def parse_po(text: str) -> GettextCatalog: + """Parse ``.po`` source ``text`` into a :class:`GettextCatalog`.""" + catalog = GettextCatalog() + for block in re.split(r"\n[ \t]*\n", text or ""): + entry = _parse_block(block) + if entry is not None: + _store_block(catalog, entry) + catalog.finalize() + return catalog + + +def parse_po_file(path: str) -> GettextCatalog: + """Read and parse a ``.po`` file into a catalog.""" + return parse_po(Path(path).read_text(encoding="utf-8")) diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index 9c62f224..be267fbc 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -3682,6 +3682,33 @@ def list_format_tools() -> List[MCPTool]: ] +def gettext_catalog_tools() -> List[MCPTool]: + return [ + MCPTool( + name="ac_gettext_translate", + description=("Parse a gettext '.po' string 'po' and translate " + "'msgid' (optional 'context'). Returns {text}."), + input_schema=schema( + {"po": {"type": "string"}, "msgid": {"type": "string"}, + "context": {"type": "string"}}, + ["po", "msgid"]), + handler=h.gettext_translate, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_gettext_ngettext", + description=("Parse a '.po' string 'po' and pick the plural-correct " + "translation of 'msgid'/'msgid_plural' for count 'n'."), + input_schema=schema( + {"po": {"type": "string"}, "msgid": {"type": "string"}, + "msgid_plural": {"type": "string"}, "n": {"type": "integer"}}, + ["po", "msgid", "msgid_plural", "n"]), + handler=h.gettext_ngettext, + annotations=READ_ONLY, + ), + ] + + def message_format_tools() -> List[MCPTool]: return [ MCPTool( @@ -5762,6 +5789,7 @@ def media_assert_tools() -> List[MCPTool]: dedup_window_tools, sequence_gap_tools, optimistic_tools, outbox_tools, locale_collation_tools, confusables_tools, readability_tools, bidi_check_tools, list_format_tools, message_format_tools, + gettext_catalog_tools, dataset_diff_tools, referential_tools, link_header_tools, multipart_tools, http_content_tools, cookie_jar_tools, http_conditional_tools, saga_tools, decision_table_tools, locator_repair_tools, diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index 6c351095..882db17b 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -2007,6 +2007,16 @@ def format_message(pattern, args=None, locale="en"): return _format_message(pattern, args, locale) +def gettext_translate(po, msgid, context=None): + from je_auto_control.utils.executor.action_executor import _gettext_translate + return _gettext_translate(po, msgid, context) + + +def gettext_ngettext(po, msgid, msgid_plural, n): + from je_auto_control.utils.executor.action_executor import _gettext_ngettext + return _gettext_ngettext(po, msgid, msgid_plural, n) + + def detect_drift(reference, current, threshold=0.25, bins=10): from je_auto_control.utils.executor.action_executor import _detect_drift return _detect_drift(reference, current, threshold, bins) diff --git a/test/unit_test/headless/test_gettext_catalog_batch.py b/test/unit_test/headless/test_gettext_catalog_batch.py new file mode 100644 index 00000000..84e06e4e --- /dev/null +++ b/test/unit_test/headless/test_gettext_catalog_batch.py @@ -0,0 +1,114 @@ +"""Headless tests for GNU gettext catalog I/O. No Qt.""" +import io +from gettext import GNUTranslations + +import pytest + +import je_auto_control as ac +from je_auto_control.utils.gettext_catalog import ( + GettextCatalog, parse_po, read_mo, +) + +# Build a .po source. Continuation header lines carry a literal "\n" escape. +_PO = "\n".join([ + 'msgid ""', + 'msgstr ""', + '"Content-Type: text/plain; charset=UTF-8\\n"', + '"Plural-Forms: nplurals=2; plural=(n != 1);\\n"', + '', + 'msgid "Hello"', + 'msgstr "Hola"', + '', + 'msgid "file"', + 'msgid_plural "files"', + 'msgstr[0] "archivo"', + 'msgstr[1] "archivos"', + '', + 'msgctxt "menu"', + 'msgid "Open"', + 'msgstr "Abrir"', + '', +]) + + +def test_parse_singular_and_fallback(): + catalog = parse_po(_PO) + assert catalog.gettext("Hello") == "Hola" + assert catalog.gettext("Missing") == "Missing" # untranslated -> id + + +def test_parse_plural(): + catalog = parse_po(_PO) + assert catalog.ngettext("file", "files", 1) == "archivo" + assert catalog.ngettext("file", "files", 4) == "archivos" + assert catalog.ngettext("dog", "dogs", 2) == "dogs" # missing -> english + + +def test_context(): + catalog = parse_po(_PO) + assert catalog.pgettext("menu", "Open") == "Abrir" + assert catalog.gettext("Open") == "Open" # no ctx -> untranslated + + +def test_header_metadata_and_plural_forms(): + catalog = parse_po(_PO) + assert catalog.metadata["Content-Type"] == "text/plain; charset=UTF-8" + assert catalog.nplurals == 2 + + +def test_mo_round_trip(): + catalog = parse_po(_PO) + restored = read_mo(catalog.to_mo_bytes()) + assert restored.gettext("Hello") == "Hola" + assert restored.ngettext("file", "files", 2) == "archivos" + assert restored.pgettext("menu", "Open") == "Abrir" + + +def test_mo_is_readable_by_stdlib_gettext(): + catalog = parse_po(_PO) + translations = GNUTranslations(io.BytesIO(catalog.to_mo_bytes())) + assert translations.gettext("Hello") == "Hola" + assert translations.ngettext("file", "files", 5) == "archivos" + + +def test_read_mo_rejects_bad_magic(): + with pytest.raises(ValueError): + read_mo(b"not a mo file at all") + + +def test_build_programmatically(): + catalog = GettextCatalog() + catalog.add("Yes", "Oui") + catalog.add("apple", ["pomme", "pommes"], plural_id="apples") + assert catalog.gettext("Yes") == "Oui" + assert catalog.ngettext("apple", "apples", 2) == "pommes" + + +# --- wiring --------------------------------------------------------------- + +def test_executor_round_trip(): + rec = ac.execute_action([[ + "AC_gettext_translate", {"po": _PO, "msgid": "Hello"}]]) + out = next(v for v in rec.values() if isinstance(v, dict)) + assert out["text"] == "Hola" + rec2 = ac.execute_action([[ + "AC_gettext_ngettext", + {"po": _PO, "msgid": "file", "msgid_plural": "files", "n": 3}]]) + assert next(v for v in rec2.values() if isinstance(v, dict))["text"] == "archivos" + + +def test_wiring(): + known = ac.executor.known_commands() + assert {"AC_gettext_translate", "AC_gettext_ngettext"} <= set(known) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_gettext_translate", "ac_gettext_ngettext"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_gettext_translate", "AC_gettext_ngettext"} <= specs + + +def test_facade_exports(): + for attr in ("GettextCatalog", "parse_po", "parse_po_file", "read_mo", + "read_mo_file"): + assert hasattr(ac, attr) and attr in ac.__all__