fix(codegen-js): emit \uXXXX/\u{X} for non-ASCII string literals (closes #460)#463
Merged
Merged
Conversation
OCaml's `String.escaped` emits non-ASCII bytes as `\NNN` *decimal*
sequences. JavaScript parses `\NNN` as *octal* escapes which strict-mode
ESM rejects outright (`SyntaxError: Octal escape sequences are not
allowed in strict mode`), and which would decode to wrong characters
even outside strict mode.
Adds `Js_codegen.js_string_lit` that walks the UTF-8 byte sequence,
decodes code points, and emits `\uXXXX` (BMP) or `\u{XXXXX}` (non-BMP)
Unicode escapes. ASCII printable bytes pass through unchanged; `\\` `\"`
`\n` `\r` `\t` use conventional escapes; ASCII control bytes use
`\xHH`. Wired into both `js_codegen.ml` (Node target) and
`codegen_deno.ml` (Deno-ESM target) LitString/LitChar emit sites.
Regression fixture `tests/codegen-deno/non_ascii.affine` + harness
exercise BMP emoji (❌ ✓), CJK (你好), Latin accents (café résumé),
non-BMP code points (😭 = U+1F62D), mixed strings, and the
existing-escape regression path (\\ and \"). Pre-fix: harness
`import` itself fails with SyntaxError. Post-fix: 8/8 assertions pass.
Verified: full `tools/run_codegen_deno_tests.sh` (13/13 harnesses
green); full `dune test` suite (352/352 green).
Closes #460
Refs hyperpolymath/standards#284 (the seam-analyst PR that surfaced
this)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Hypatia Security ScanFindings: 83 issues detected
View findings[
{
"reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action ons/checkout@v6\n needs attention",
"type": "unpinned_action",
"file": "publish-jsr.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action land/setup-deno@v2\n needs attention",
"type": "unpinned_action",
"file": "publish-jsr.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in affine-vscode-publish.yml",
"type": "unknown",
"file": "affine-vscode-publish.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in casket-pages.yml",
"type": "unknown",
"file": "casket-pages.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in casket-pages.yml",
"type": "unknown",
"file": "casket-pages.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in ci.yml",
"type": "unknown",
"file": "ci.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in ci.yml",
"type": "unknown",
"file": "ci.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in ci.yml",
"type": "unknown",
"file": "ci.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in ci.yml",
"type": "unknown",
"file": "ci.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
3 tasks
hyperpolymath
added a commit
that referenced
this pull request
May 30, 2026
…464) ## Summary Closes #458 — \`String < String\` (and \`>\` / \`<=\` / \`>=\`) now type-check, lowering to JS's native lexicographic string comparison. Pre-fix: \`TypeMismatch (String, Int)\`. ## Implementation Single addition to the existing comparison dispatch in \`Typecheck.synth_expr\` for \`ExprBinary\`: \`\`\`ocaml match repr lhs_ty with | TCon "Float" -> ... | TCon "String" -> let* () = check ctx rhs ty_string in Ok ty_bool | _ -> ... (* legacy Int monomorphism *) \`\`\` Pattern mirrors the existing Float dispatch a few lines up. No codegen changes needed — JavaScript's \`<\` / \`>\` / \`<=\` / \`>=\` on strings is lex compare natively, and the JS-family backends already emit those operators verbatim. ## Test plan New regression fixture \`tests/codegen-deno/string_lex_cmp.affine\` + harness with **22 assertions**: - All four ops via functional form (\`lt(a, b)\`, etc.) — covers each operator's positive/negative direction - All four ops via literal form (\`first_lt()\`, etc.) - Equal-string corner cases — \`x <= x\` true, \`x >= x\` true, \`x < x\` false - Empty strings — \`\"\" < \"a\"\`, \`\"\" <= \"\"\` - Prefix relations — \`\"abc\" < \"abcd\"\` - [x] Local \`./tools/run_codegen_deno_tests.sh\`: **14/14** harnesses green - [x] Local \`dune test\`: **352/352** green - [x] Smoke compile: \`return a < b;\` emits as \`return (a < b);\` (JS native) ## Out of scope - **Non-ASCII string comparison** in the fixture: this branch forked from \`main\` before #463 (the companion Unicode-escape codegen fix for #460) lands, so non-ASCII source literals would still emit OCaml-style \`\\NNN\` octal escapes that strict-mode ESM rejects. The relational typecheck change is orthogonal to literal encoding — non-ASCII lex compare works naturally once both PRs merge. A non-ASCII assertion can be added in a follow-up commit after #463 merges, or auto-rebased here if they land in either order. - **Other backends** (rescript, wasm, lua, c, rust): out of scope; #458 specifically called out the JS-family ergonomic gap. If \`String <\` lowering for other backends becomes load-bearing, file separately. ## Refs - Closes #458 - Refs hyperpolymath/standards#284 (the seam-analyst PR with the \`str_lt\` workaround) - Companion: #463 (#460 Unicode-escape codegen, lands together to unblock non-ASCII relational comparisons) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #460 — non-ASCII string literals in AffineScript source no longer break strict-mode ESM in the Deno/Node JS backends.
Root cause
OCaml's `String.escaped` emits non-ASCII bytes as `\NNN` decimal sequences. JavaScript parses `\NNN` as octal escapes which strict-mode ESM rejects:
```
SyntaxError: Octal escape sequences are not allowed in strict mode.
```
(And even outside strict mode the bytes would decode to the wrong characters — `\226` octal = 0x96, not the 0xE2 lead-byte of ❌.)
Fix
New helper `Js_codegen.js_string_lit` walks the UTF-8 byte sequence, decodes code points, and emits:
Wired into both `js_codegen.ml` (Node target) and `codegen_deno.ml` (Deno-ESM target) at the `LitString`/`LitChar` emit sites.
Test plan
New `tests/codegen-deno/non_ascii.affine` fixture + harness:
```affine
pub fn emoji_cross() -> String { return "❌"; } // BMP U+274C
pub fn non_bmp_sob() -> String { return "😭"; } // non-BMP U+1F62D
pub fn cjk_hello() -> String { return "你好"; }
pub fn latin_accent() -> String { return "café résumé"; }
pub fn mixed() -> String { return "[OK] café 你好 ❌"; }
pub fn ascii_only() -> String { return "plain ASCII"; }
pub fn quotes_and_backslash() -> String { return "\"escaped\" and \\back"; }
```
The `import` itself is the strictest test: if the emitted `.deno.js` contains octal escapes, the module fails to parse and the harness import throws SyntaxError before any assertion runs.
Out of scope
Refs
🤖 Generated with Claude Code