Add Unicode text normalisation and slugify by JE-Chen · Pull Request #305 · Integration-Automation/AutoControlGUI

JE-Chen · 2026-06-21T20:31:49Z

What

fuzzy and search_index.tokenize only lowercase, and OCR find_text_matches only .lower()+substring — so "Café" (NFC) vs "Café" (NFD) vs "cafe" compare unequal. This adds the canonicalization layer to run before matching.

normalize_text(text, *, form="NFKC", casefold=True, collapse_ws=True).
deaccent, normalize_quotes (smart quotes/dashes/ellipsis/NBSP → ASCII), fold_whitespace, slugify(text, sep="-").

Round-10 research pick (text-normalization lane); unicodedata was imported nowhere in text modules.

Layers

Headless core: utils/text_normalize/ (pure stdlib unicodedata/re, zero PySide6).
Facade: 5 symbols + __all__.
Executor: AC_normalize_text, AC_slugify.
MCP: ac_normalize_text, ac_slugify (read-only).
Script Builder: both under Data.
Tests: test/unit_test/headless/test_text_normalize_batch.py (10 tests, incl. NFC/NFD match).
Docs: v97_features_doc.rst (EN + Zh) + toctrees + 3 README What's-new sections.

Verification

pytest test/unit_test/headless/test_text_normalize_batch.py → 10 passed.
ruff check je_auto_control/ clean; pylint 10.00/10; bandit clean; radon CC clean.
Package stays Qt-free.

fuzzy and search_index.tokenize only lowercase and OCR find_text_matches only .lower()+substring, so the same text in different Unicode forms (NFC/NFD), accents, or smart quotes compares unequal. Add normalize_text (NFKC + casefold + whitespace fold), deaccent, normalize_quotes, fold_whitespace, and slugify — the canonicalisation layer to run before matching. Wired through facade, executor (AC_normalize_text / AC_slugify), MCP, and the Script Builder with a headless test batch and EN/Zh docs.

codacy-production · 2026-06-21T20:33:38Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 33 complexity · 0 duplication

Metric Results

Complexity 33

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

sonarqubecloud · 2026-06-21T20:39:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

JE-Chen merged commit 331a6c2 into dev Jun 21, 2026
16 checks passed

JE-Chen deleted the feat/text-normalize-batch branch June 21, 2026 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Unicode text normalisation and slugify#305

Add Unicode text normalisation and slugify#305
JE-Chen merged 1 commit into
devfrom
feat/text-normalize-batch

JE-Chen commented Jun 21, 2026

Uh oh!

codacy-production Bot commented Jun 21, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JE-Chen commented Jun 21, 2026

What

Layers

Verification

Uh oh!

codacy-production Bot commented Jun 21, 2026

Up to standards ✅

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 21, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant