From 97d775f6133eb2affd734d6ac3b6ba88478fab91 Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Mon, 25 May 2026 12:04:07 +0200 Subject: [PATCH] =?UTF-8?q?docs(v0.8.0):=20defer=20F64<->I64=20conversions?= =?UTF-8?q?=20=E2=80=94=20ARMv7-M=20FPv5=20has=20no=2064-bit=20VCVT?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Architectural finding for the v0.7.0 "Deferred to follow-up" items F64ConvertI64S/U and I64TruncF64S/U: these cannot be implemented as single-instruction VCVT encodings on Cortex-M7DP. The 64-bit integer to/from floating-point VCVT encodings are ARMv8-A FEAT_FP only — not present in any M-profile VFP variant including FPv5-D16 (used by M7DP, M33-DP, M55, M85). See ARMv7-M ARM DDI 0403E.e A7.5/A8.8 vs. ARMv8-A ARM DDI 0487 C7.2. The right closing PR involves either a ~30-instruction software sequence per op or a soft-float helper, plus WASM-trap integration for the i64-trunc family (NaN + out-of-range trap per WASM spec §4.3.2), plus matching Rocq proofs — out of scope for v0.8.0's i64-T1-promotion theme. Deferred to a later floating-point milestone. This PR ships: - docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md — full finding with ISA citation, scope analysis, and closing-PR shape. - CHANGELOG.md [Unreleased] entry pointing at the finding. No code changes. The selector continues to emit the existing typed Error::Synthesis for these ops — no behavior change vs. v0.7.0. Refs #147 (v0.8.0 umbrella, optional additive item 6). Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 15 ++ docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md | 206 +++++++++++++++++++ 2 files changed, 221 insertions(+) create mode 100644 docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 20aa65d..e30166f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Documentation +- **Architectural finding: ARMv7-M FPv5 has no F64↔I64 VCVT.** The four + WASM ops `F64ConvertI64S/U` and `I64TruncF64S/U` listed under v0.7.0's + "Deferred to follow-up" cannot be implemented as single-instruction + VCVTs on Cortex-M7DP — the 64-bit integer ↔ floating-point VCVT + encodings are ARMv8-A FEAT_FP only, not present in M-profile FPv5-D16 + (Cortex-M7DP, M33-DP, M55, M85). Closing these properly requires a + multi-instruction software sequence (or a soft-float helper) plus + WASM-trap integration for the trunc family, plus matching Rocq + proofs — out of scope for v0.8.0's i64-T1-promotion theme. Deferred + to a later floating-point milestone. Full analysis in + [`docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md`](docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md). + The selector continues to surface a typed `Error::Synthesis` for + these ops; no behavior change vs. v0.7.0. + ## [0.7.0] - 2026-05-25 Floating-point breadth + verifiable signing CI — synth now compiles diff --git a/docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md b/docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md new file mode 100644 index 0000000..c528f9d --- /dev/null +++ b/docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md @@ -0,0 +1,206 @@ +# ARMv7-M FPv5 has no F64 ↔ I64 VCVT — v0.8.0 deferral finding + +**Date:** 2026-05-25 +**Branch:** `feat/v0.8.0-f64-i64-conversions` +**Status:** Deferred to v0.9.0 (or later) +**Related issues:** [#147](https://github.com/pulseengine/synth/issues/147) +(v0.8.0 umbrella), v0.7.0 CHANGELOG "Deferred to follow-up" + +--- + +## TL;DR + +The four WASM ops `F64ConvertI64S`, `F64ConvertI64U`, `I64TruncF64S`, +`I64TruncF64U` cannot be implemented as single-instruction VCVT +encodings on Cortex-M7DP, because the **ARMv7-M FPv5-D16 +floating-point architecture does not include 64-bit integer ↔ +floating-point VCVT instructions**. These encodings were first added +in **ARMv8-A FEAT_FP**, which is an A-profile feature; no M-profile +core (including Cortex-M55, M85) implements them. + +Implementing these ops therefore requires either (a) a multi-instruction +software sequence in the codegen path, or (b) a software-helper call +into a libgcc-style soft-float routine. Both are non-trivial — the +WASM specification mandates **trapping** on NaN inputs and on +out-of-range conversions for the signed/unsigned trunc family +([WASM spec §4.3.2](https://webassembly.github.io/spec/core/exec/numerics.html#xref-exec-numerics-op-trunc-s-mathrm-trunc-s-m-n)), +which interacts with synth's existing trap-emission infrastructure. + +Per the v0.8.0 task brief: **"If M7DP doesn't have them as single +instructions: ... defer this PR cleanly with a written architectural +finding."** This document is that finding. + +--- + +## The hardware reality + +### What ARMv7-M FPv5 (Cortex-M7DP) supports + +From the ARMv7-M Architecture Reference Manual +([DDI 0403E.e](https://developer.arm.com/documentation/ddi0403/latest)) +§A7.5 "Floating-point data-processing instructions" and §A8.8 "VCVT, +VCVTR (between floating-point and integer)": + +| Variant | Operands | Supported on M7DP | +|---------|----------|-------------------| +| `VCVT.F32.S32` / `VCVT.F32.U32` | Sd, Sm | yes | +| `VCVT.S32.F32` / `VCVT.U32.F32` | Sd, Sm | yes | +| `VCVT.F64.S32` / `VCVT.F64.U32` | Dd, Sm | yes | +| `VCVT.S32.F64` / `VCVT.U32.F64` | Sd, Dm | yes | +| `VCVT.F32.F64` (demote) | Sd, Dm | yes | +| `VCVT.F64.F32` (promote) | Dd, Sm | yes | +| **`VCVT.F32.S64` / `VCVT.F32.U64`** | **Sd, (Rlo,Rhi)** | **no** | +| **`VCVT.F64.S64` / `VCVT.F64.U64`** | **Dd, (Rlo,Rhi)** | **no** | +| **`VCVT.S64.F32` / `VCVT.U64.F32`** | **(Rlo,Rhi), Sm** | **no** | +| **`VCVT.S64.F64` / `VCVT.U64.F64`** | **(Rlo,Rhi), Dm** | **no** | + +The "no" rows are the missing instructions. They are listed in +**A-profile** references (ARMv8 ARM, DDI 0487, C7.2.337 onward) and +are gated on **FEAT_FP** — explicitly an ARMv8-A floating-point +feature. They are not part of: + +- VFPv2 (ARMv6) +- VFPv3 / VFPv3-D16 +- VFPv4 / VFPv4-D16 +- **FPv5 / FPv5-D16** (the M-profile flavor used by Cortex-M7DP, M33-DP, M55, M85) + +The pattern is consistent with the existing situation in synth's +codebase for `F32ConvertI64S/U` and `F32DemoteF64`-via-S64-route, which +are already deferred with the same root cause — see +`crates/synth-synthesis/src/instruction_selector.rs` lines 1956–1960. + +### Why this is not just an encoding-table gap + +This is a real architectural absence, not a missing encoding in our +encoder. The ARM Architecture Reference Manuals for ARMv6-M, ARMv7-M, +and ARMv8-M all omit these opcodes. The encoding space they would +occupy under VFP's `op2` field is reserved/undefined on M-profile. +Emitting an A-profile-only 64-bit VCVT bit pattern on a Cortex-M7DP +would either decode to a different instruction (worst case, a silent +miscompile) or fault with `UNDEFINSTR` — exactly the kind of bluff +the task brief and PulseEngine's "rigor and honesty" principle +forbid. + +--- + +## Why not just implement a multi-instruction software sequence? + +This is the obvious follow-up. The reason we don't do it inline in +this PR is **scope and correctness obligations**: + +1. **WASM trap semantics for `I64TruncF64S/U`.** The WASM spec + ([§4.3.2 trunc_s/trunc_u](https://webassembly.github.io/spec/core/exec/numerics.html#xref-exec-numerics-op-trunc-s-mathrm-trunc-s-m-n)) + requires *trapping* if the input is NaN or out of i64 range. This + means the sequence must include: + - NaN detection (VCMP.F64 + VMRS APSR_nzcv) + - Range comparison against `±2^63` (signed) or `[0, 2^64)` (unsigned) + - A trap-emission branch using synth's existing trap infrastructure + (see `synth-synthesis::trap_emitter` and the established WASM-trap + conventions in i32-div-trap-zero / i32-trunc-out-of-range) + +2. **Algorithmic implementation of F64↔I64.** A correct, fast + software conversion is ~30 instructions per op (split-by-sign, + exponent extraction, mantissa shift, sign-bit reassembly, round-mode + handling for F64ConvertI64*). The signed and unsigned variants share + most of the work but diverge at the sign-handling boundary. This + is the level of complexity normally pushed into a libgcc-style + helper (e.g., `__aeabi_l2d`, `__aeabi_d2lz`). + +3. **Verifiability obligation.** synth has a Rocq proof suite + (`coq/Synth/`) that establishes T1 result-correspondence for ARM + sequences against WASM semantics. Adding a 30-instruction software + conversion without a matching mechanized correctness argument would + regress the verification posture v0.8.0 is *raising*. The v0.8.0 + umbrella issue #147 promotes i64 *arithmetic* from T2 to T1 — adding + an unproven i64-conversion sequence on top of that would be + directionally wrong. + +4. **Issue #147 explicitly marks this PR "optional additive".** It is + not required for v0.8.0's stated theme (i64 T1 result-correspondence + parity). The right answer when an optional additive turns out + harder-than-budgeted is to record the finding and move it to a + later milestone. + +5. **No `gale` / libgcc-equivalent yet.** Synth does not currently link + against a soft-float runtime. The `meld` platform crate has + established conventions for cross-component dispatch but no + floating-point helper layer; introducing one is a separate + infrastructure decision. + +--- + +## What gets deferred + +The four WASM ops below remain in the v0.7.0 "Deferred to follow-up" +state. Synth's selector continues to emit the existing typed error +(`Error::Synthesis`) with a message naming "i64 register pairs on +32-bit ARM" — surfaced cleanly to the user, never miscompiled. + +- `F64ConvertI64S` (signed i64 → f64) +- `F64ConvertI64U` (unsigned i64 → f64) +- `I64TruncF64S` (f64 → signed i64, trapping on NaN/range) +- `I64TruncF64U` (f64 → unsigned i64, trapping on NaN/range) + +The existing F32 counterparts remain deferred for the same reason +(see lines 1956–1960 of `instruction_selector.rs`): + +- `F32ConvertI64S`, `F32ConvertI64U`, `I64TruncF32S`, `I64TruncF32U` +- `F32DemoteF64` (separate but related — needs a VCVT.F32.F64 + encoder, *that one is actually a single-instruction op* on M7DP + and is tracked as the other "optional additive" PR in #147 item 7) + +--- + +## What this PR ships + +Nothing executable. This PR documents the finding and updates: + +- `CHANGELOG.md` — adds a note under `[Unreleased]` pointing at this + finding and explaining why the four v0.7.0 "Deferred to follow-up" + ops remain deferred. The v0.7.0 history section is frozen and + unchanged. +- `docs/analysis/ARMV7M_F64_I64_VCVT_FINDING.md` — this document. + +No code changes. No new encoder variants (those would be bogus). No +new tests (would test bogus encodings). + +--- + +## Path to closing this in a future milestone + +The right shape of the eventual closing PR: + +1. **Decide the helper-vs-inline question.** Either: + - Inline software sequence emitted by the selector (4 ops × ~30 + instructions, ~120 lines of ArmOp emission) + - **Or** add a `synth-runtime` / `gale-softfloat`-style helper crate + and emit a BL into it (cleaner separation, matches the gale + pattern for primitives synth doesn't compile itself) +2. **Add WASM-trap integration.** Reuse the existing `trap_emitter` + conventions used by `I32DivS`-zero and `I32TruncF32S`-out-of-range. +3. **Add Rocq proofs.** At minimum T2 existence-only with a path to + T1 result-correspondence; the WASM reference operator semantics + are already encoded in `coq/Synth/WASM/WasmSemantics.v`. +4. **Add WAST conformance tests.** The `tests/spec-testsuite` harness + already runs the upstream WASM conversion tests — they're currently + skipped on M7DP for these four ops; un-skip on closure. + +Out-of-scope for this v0.8.0 deferral PR but tracked for the next +floating-point milestone. + +--- + +## References + +- ARMv7-M Architecture Reference Manual, DDI 0403E.e, §A7.5 and §A8.8 + (VCVT variants). +- ARMv8-A Architecture Reference Manual, DDI 0487, §C7.2 (VCVT + encodings with 64-bit operands gated on FEAT_FP). +- WebAssembly Core Specification, [§4.3.2 Numerics — `itrunc_sat` / + `itrunc`](https://webassembly.github.io/spec/core/exec/numerics.html#xref-exec-numerics-op-trunc-s-mathrm-trunc-s-m-n). +- synth v0.7.0 CHANGELOG entry "Deferred to follow-up" (lines 62–67). +- synth umbrella issue + [#147](https://github.com/pulseengine/synth/issues/147), + "Optional additive PRs" item 6. +- synth `crates/synth-synthesis/src/instruction_selector.rs` lines + 1956–1960 (existing F32 deferral with identical root cause).