Add version 0.1.

d5201cf9 · Guillaume Melquiond · e3a07b01 · d5201cf9 · d5201cf9 · d5201cf9
Commit d5201cf9 authored 5 years ago by Guillaume Melquiond
--- a/.gitignore
+++ b/.gitignore
+generated-src/aclocal.m4
+generated-src/ChangeLog
+generated-src/compile
+generated-src/configure
+generated-src/COPYING
+generated-src/depcomp
+generated-src/INSTALL
+generated-src/install-sh
+generated-src/Makefile.in
+generated-src/missing
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "why3"]
+	path = why3
+	url = https://gitlab.inria.fr/why3/why3.git
+	branch = wmpz
--- a/README.md
+++ b/README.md
 # WhyMP

-WhyMP  is an arbitrary-precision integer library for C that is inspired by GMP, implemented in WhyML, and formally verified using Why3.
+WhyMP is an arbitrary-precision integer library for C that is inspired by [GMP](https://gmplib.org/), implemented in WhyML, and formally verified using [Why3](http://why3.lri.fr/).

-The algorithms are heavily inspired from [the mpn layer of the GMP library](https://gmplib.org/manual/Low_002dlevel-Functions.html). The implemented algorithms are addition, subtraction, multiplication (schoolbook and Toom-2), division, square root and modular exponentiation.
+The algorithms are heavily inspired from [the mpn layer](https://gmplib.org/manual/Low_002dlevel-Functions.html) of the GMP library. The implemented algorithms are addition, subtraction, multiplication (schoolbook and Toom-2), division, square root, and modular exponentiation.

 The library is compatible with GMP, and its performance is comparable to its standalone version, mini-gmp. For numbers under about 1000 bits (100,000 for multiplication), it is about 10% slower than the generic, pure-C version of GMP, and about twice as slow as GMP with hand-coded assembly.

 ## Generating the sources

-You may regenerate the C sources of the library using the most recent version of Why3.
-To do so, clone [the master branch of Why3](https://gitlab.inria.fr/why3/why3), and then use the following commands:
+To regenerate the C sources of the library using Why3, go into the `why3` submodule, and then use the following commands:

    ./autogen.sh
    ./configure --enable-local && make
    cd examples/multiprecision
    PATH=../../bin:$PATH make dist

+This produces a `.tar.gz` file containing a standalone C library.
+
 ![Overview of the project as a graph](img/over-1.png "Overview of the project as a graph")

 ## Project overview

-We have specified and reimplemented GMP's algorithms in WhyML. To do so, we have developed a simple memory model of the C language in WhyML. This memory model allows us to write WhyML programs in a fragment of the language that is practically a shallow embedding of C.
+We have specified and reimplemented GMP's algorithms in WhyML. To do so, we have developed a simple [memory model](https://gitlab.inria.fr/why3/why3/raw/master/stdlib/mach/c.mlw) of the C language in WhyML. This memory model allows us to write WhyML programs in a fragment of the language that is practically a shallow embedding of C.

 We have added to Why3 an extraction mechanism from Why3 to C. It only supports WhyML programs that are in the C-like fragment of the language, but the compilation is transparent and does not add inefficiencies that the compiler cannot optimize away. Thus, the resulting C program is easily predictable to the WhyML programmer.

@@ -56,4 +57,4 @@ int32_t wmpn_cmp(uint64_t * x, uint64_t * y, int32_t sz) {
  return 0;
 }
 ```
-The WhyML proofs are about 20,000 lines long in total (for about 5000 lines of extracted C). The split is about 1/3 program code to 2/3 specifications and assertions.
\ No newline at end of file
+The WhyML proofs are about 20,000 lines long in total (for about 5000 lines of extracted C). The split is about 1/3 program code to 2/3 specifications and assertions.
--- a/generated-src/AUTHORS
+++ b/generated-src/AUTHORS
+Raphaël Rieu-Helft <raphael.rieu-helft@trust-in-soft.com>
+Guillaume Melquiond <guillaume.melquiond@inria.fr>
--- a/generated-src/Makefile.am
+++ b/generated-src/Makefile.am
+AUTOMAKE_OPTIONS = subdir-objects
+
+lib_LIBRARIES = libwmp.a
+
+libwmp_a_SOURCES = \
+  lib/int.h lib/int32.h lib/uint64.h lib/uint64gmp.h lib/fxp.h \
+  lib/euclideandivision.h lib/computerdivision.h lib/valuation.h \
+  lib/array.h lib/map.h lib/c.h lib/power.h lib/types.h lib/realinfix.h lib/minmax.h \
+  lib/util.c lib/util.h \
+  lib/logical.c lib/logical.h \
+  lib/compare.c lib/compare.h \
+  lib/add.c lib/add.h lib/sub.c lib/sub.h \
+  lib/mul.c lib/mul.h lib/toom.c lib/toom.h \
+  lib/div.c lib/div.h \
+  lib/sqrt.c lib/sqrt.h lib/sqrt1.c lib/sqrt1.h lib/sqrtinit.h
+
+include_HEADERS = wmp.h
--- a/generated-src/NEWS
+++ b/generated-src/NEWS
+Version 0.1
+-----------
+
+- initial release
--- a/generated-src/README
+++ b/generated-src/README
+WMP
+===
+
+wmp is a formally verified arbitrary-precision library.  It is
+compatible with GMP's mpn layer, and implements a subset of the
+functions found at
+https://gmplib.org/manual/Low_002dlevel-Functions.html (which also
+serves as documentation). The function names are prefixed by the
+letter w, e.g. wmpn_add. wmp uses 64-bit limbs and 32-bit sizes.
+
+All declarations needed to use wmp are collected in the include file
+wmp.h. It is designed to work with both C and C++ compilers.
+
+    #include <wmp.h>
+
+All programs using wmp must link against the libwmp library. On a
+typical Unix-like system this can be done with `-lwmp`, for example
+
+    gcc myprogram.c -lwmp
+
+The proofs assume that the source operands (of type wmpn_srcptr) are
+entirely separated from the destination operands. This is a departure
+from GMP, which allows them to be either equal or separated. When they
+are equal, you should use the *_in_place versions instead. In
+practice, using the baseline versions should work even in place, but
+this fact is not formally verified.
+
+The C files are generated using the Why3 platform. The algorithms were
+reimplemented and verified in WhyML (see
+https://gitlab.inria.fr/why3/why3/tree/master/examples/multiprecision),
+and extracted to C.
+
+The implemented algorithms are addition, subtraction, multiplication
+(basecase and Toom-2), division, logical shifts and square root. A
+complete list of exported functions can be found in the header
+wmp.h.
+
+For operands under about 1000 bits, the performances are comparable to
+GMP's standalone version, mini-gmp, and about twice as slow as GMP
+with hand-coded assembly. For multiplication, this comparison holds
+until about 100000 bits. Above those thresholds, GMP uses
+asymptotically faster algorithms that we have not implemented, so the
+gap widens.
+
+
--- a/generated-src/configure.ac
+++ b/generated-src/configure.ac
+AC_INIT([libwmp], [0.1], [Raphaël Rieu-Helft <raphael.rieu-helft@trust-in-soft.com>])
+AM_INIT_AUTOMAKE
+
+AC_PROG_CC
+AC_PROG_INSTALL
+AC_PROG_RANLIB
+
+AC_CONFIG_FILES([Makefile])
+AC_OUTPUT
--- a/generated-src/lib/add.c
+++ b/generated-src/lib/add.c
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+#include "int.h"
+
+#include "int32.h"
+
+#include "uint64gmp.h"
+
+#include "power.h"
+
+#include "c.h"
+
+#include "array.h"
+
+#include "map.h"
+
+#include "types.h"
+
+#include "euclideandivision.h"
+
+uint64_t wmpn_add_1(uint64_t * r, uint64_t * x, int32_t sz, uint64_t y)
+{
+  uint64_t lx;
+  uint64_t res;
+  int32_t i;
+  uint64_t c;
+  uint64_t res1;
+  lx = *x;
+  res = lx + y;
+  i = 1;
+  c = UINT64_C(0);
+  *r = res;
+  if (res < lx) {
+    c = UINT64_C(1);
+    while (i < sz) {
+      lx = x[i];
+      res1 = lx + UINT64_C(1);
+      r[i] = res1;
+      i = i + 1;
+      if (!(res1 == UINT64_C(0))) {
+        c = UINT64_C(0);
+        break;
+      }
+    }
+  }
+  while (i < sz) {
+    lx = x[i];
+    r[i] = lx;
+    i = i + 1;
+  }
+  return c;
+}
+
+uint64_t wmpn_add_n(uint64_t * r, uint64_t * x, uint64_t * y, int32_t sz)
+{
+  uint64_t lx, ly, c;
+  int32_t i;
+  uint64_t res, carry;
+  struct __add64_with_carry_result struct_res;
+  lx = UINT64_C(0);
+  ly = UINT64_C(0);
+  c = UINT64_C(0);
+  i = 0;
+  while (i < sz) {
+    lx = x[i];
+    ly = y[i];
+    struct_res = add64_with_carry(lx, ly, c);
+    res = struct_res.__field_0;
+    carry = struct_res.__field_1;
+    r[i] = res;
+    c = carry;
+    i = i + 1;
+  }
+  return c;
+}
+
+uint64_t wmpn_add(uint64_t * r, uint64_t * x, int32_t sx, uint64_t * y,
+                  int32_t sy)
+{
+  uint64_t lx, ly, c;
+  int32_t i;
+  uint64_t res, carry, res1;
+  struct __add64_with_carry_result struct_res;
+  lx = UINT64_C(0);
+  ly = UINT64_C(0);
+  c = UINT64_C(0);
+  i = 0;
+  while (i < sy) {
+    lx = x[i];
+    ly = y[i];
+    struct_res = add64_with_carry(lx, ly, c);
+    res = struct_res.__field_0;
+    carry = struct_res.__field_1;
+    r[i] = res;
+    c = carry;
+    i = i + 1;
+  }
+  if (!(c == UINT64_C(0))) {
+    while (i < sx) {
+      lx = x[i];
+      res1 = lx + UINT64_C(1);
+      r[i] = res1;
+      i = i + 1;
+      if (!(res1 == UINT64_C(0))) {
+        c = UINT64_C(0);
+        break;
+      }
+    }
+  }
+  while (i < sx) {
+    lx = x[i];
+    r[i] = lx;
+    i = i + 1;
+  }
+  return c;
+}
+
+uint64_t wmpn_add_in_place(uint64_t * x, int32_t sx, uint64_t * y, int32_t sy)
+{
+  uint64_t lx, ly, c;
+  int32_t i;
+  uint64_t res, carry, res1;
+  struct __add64_with_carry_result struct_res;
+  lx = UINT64_C(0);
+  ly = UINT64_C(0);
+  c = UINT64_C(0);
+  i = 0;
+  while (i < sy) {
+    lx = x[i];
+    ly = y[i];
+    struct_res = add64_with_carry(lx, ly, c);
+    res = struct_res.__field_0;
+    carry = struct_res.__field_1;
+    x[i] = res;
+    c = carry;
+    i = i + 1;
+  }
+  if (!(c == UINT64_C(0))) {
+    while (i < sx) {
+      lx = x[i];
+      res1 = lx + UINT64_C(1);
+      x[i] = res1;
+      i = i + 1;
+      if (!(res1 == UINT64_C(0))) {
+        c = UINT64_C(0);
+        break;
+      }
+    }
+  }
+  return c;
+}
+
+void wmpn_incr(uint64_t * x, uint64_t y)
+{
+  uint64_t c, lx;
+  uint64_t * xp;
+  uint64_t res, res1;
+  c = UINT64_C(0);
+  lx = *x;
+  xp = x + 1;
+  res = lx + y;
+  *x = res;
+  if (res < lx) {
+    c = UINT64_C(1);
+    while (!(c == UINT64_C(0))) {
+      lx = *xp;
+      res1 = lx + UINT64_C(1);
+      *xp = res1;
+      xp = xp + 1;
+      if (!(res1 == UINT64_C(0))) {
+        c = UINT64_C(0);
+        break;
+      }
+    }
+  } else {
+    return;
+  }
+}
+
+void wmpn_incr_1(uint64_t * x)
+{
+  uint64_t r, lx;
+  uint64_t * xp;
+  uint64_t res;
+  r = UINT64_C(0);
+  lx = UINT64_C(0);
+  xp = x + 0;
+  while (r == UINT64_C(0)) {
+    lx = *xp;
+    res = lx + UINT64_C(1);
+    r = res;
+    *xp = res;
+    xp = xp + 1;
+  }
+}
+
+uint64_t wmpn_add_1_in_place(uint64_t * x, int32_t sz, uint64_t y)
+{
+  uint64_t c, lx;
+  int32_t i;
+  uint64_t res, res1;
+  c = UINT64_C(0);
+  lx = *x;
+  i = 1;
+  res = lx + y;
+  *x = res;
+  if (res < lx) {
+    c = UINT64_C(1);
+    while (i < sz) {
+      lx = x[i];
+      res1 = lx + UINT64_C(1);
+      x[i] = res1;
+      i = i + 1;
+      if (!(res1 == UINT64_C(0))) {
+        c = UINT64_C(0);
+        break;
+      }
+    }
+  }
+  return c;
+}
+
--- a/generated-src/lib/add.h
+++ b/generated-src/lib/add.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+uint64_t wmpn_add_1(uint64_t * r, uint64_t * x, int32_t sz, uint64_t y);
+
+uint64_t wmpn_add_n(uint64_t * r, uint64_t * x, uint64_t * y, int32_t sz);
+
+uint64_t wmpn_add(uint64_t * r, uint64_t * x, int32_t sx, uint64_t * y,
+                  int32_t sy);
+
+uint64_t wmpn_add_in_place(uint64_t * x, int32_t sx, uint64_t * y, int32_t sy);
+
+void wmpn_incr(uint64_t * x, uint64_t y);
+
+void wmpn_incr_1(uint64_t * x);
+
+uint64_t wmpn_add_1_in_place(uint64_t * x, int32_t sz, uint64_t y);
+
--- a/generated-src/lib/array.h
+++ b/generated-src/lib/array.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+
+
+
+
+
--- a/generated-src/lib/c.h
+++ b/generated-src/lib/c.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+#define IGNORE2(x,y) do { (void)(x); (void)(y); } while (0)
--- a/generated-src/lib/compare.c
+++ b/generated-src/lib/compare.c
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+#include "int.h"
+
+#include "int32.h"
+
+#include "uint64gmp.h"
+
+#include "power.h"
+
+#include "c.h"
+
+#include "map.h"
+
+#include "types.h"
+
+int32_t wmpn_cmp(uint64_t * x, uint64_t * y, int32_t sz)
+{
+  int32_t i;
+  uint64_t lx, ly;
+  i = sz;
+  while (i >= 1) {
+    i = i - 1;
+    lx = x[i];
+    ly = y[i];
+    if (!(lx == ly)) {
+      if (lx > ly) {
+        return 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+  return 0;
+}
+
--- a/generated-src/lib/compare.h
+++ b/generated-src/lib/compare.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+int32_t wmpn_cmp(uint64_t * x, uint64_t * y, int32_t sz);
+
--- a/generated-src/lib/computerdivision.h
+++ b/generated-src/lib/computerdivision.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
--- a/generated-src/lib/div.c
+++ b/generated-src/lib/div.c
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+#include "int.h"
+
+#include "int32.h"
+
+#include "uint64gmp.h"
+
+#include "power.h"
+
+#include "c.h"
+
+#include "array.h"
+
+#include "map.h"
+
+#include "types.h"
+
+#include "compare.h"
+
+#include "util.h"
+
+#include "add.h"
+
+#include "sub.h"
+
+#include "logical.h"
+
+#include "euclideandivision.h"
+
+#include "minmax.h"
+
+uint64_t invert_limb(uint64_t d)
+{ return div64_2by1((0xffffffffffffffffUL), (0xffffffffffffffffUL) - d, d);
+}
+
+struct __div2by1_inv_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+};
+
+struct __div2by1_inv_result div2by1_inv(uint64_t uh, uint64_t ul, uint64_t d,
+                                        uint64_t v)
+{
+  uint64_t l, h, sl, c;
+  struct __mul64_double_result struct_res;
+  struct __add64_with_carry_result struct_res1, struct_res2;
+  uint64_t sh, p;
+  uint64_t qh, ql, r;
+  struct __div2by1_inv_result result;
+  struct_res = mul64_double(v, uh);
+  l = struct_res.__field_0;
+  h = struct_res.__field_1;
+  struct_res1 = add64_with_carry(l, ul, UINT64_C(0));
+  sl = struct_res1.__field_0;
+  c = struct_res1.__field_1;
+  struct_res2 = add64_with_carry(uh, h, c);
+  sh = struct_res2.__field_0;
+  qh = sh;
+  ql = sl;
+  qh = qh + UINT64_C(1);
+  p = qh * d;
+  r = ul - p;
+  if (r > ql) {
+    qh = qh - UINT64_C(1);
+    r = r + d;
+  }
+  if (r >= d) {
+    qh = qh + UINT64_C(1);
+    r = r - d;
+  }
+  result.__field_0 = qh;
+  result.__field_1 = r;
+  return result;
+}
+
+uint64_t wmpn_divrem_1(uint64_t * q, uint64_t * x, int32_t sz, uint64_t y)
+{
+  int32_t msb;
+  uint64_t lx, r;
+  int32_t i;
+  int32_t clz;
+  uint64_t ry, v, l, h, qu, rem;
+  struct __lsld_ext_result struct_res;
+  struct __div2by1_inv_result struct_res1, struct_res2;
+  uint64_t v1, qu1, rem1;
+  msb = sz - 1;
+  lx = UINT64_C(0);
+  i = msb;
+  r = UINT64_C(0);
+  clz = __builtin_clzll(y);
+  if (clz > 0) {
+    ry = y << (uint64_t)clz;
+    v = invert_limb(ry);
+    while (i >= 0) {
+      lx = x[i];
+      struct_res = lsld_ext(lx, (uint64_t)clz);
+      l = struct_res.__field_0;
+      h = struct_res.__field_1;
+      struct_res1 = div2by1_inv(r + h, l, ry, v);
+      qu = struct_res1.__field_0;
+      rem = struct_res1.__field_1;
+      r = rem;
+      q[i] = qu;
+      i = i - 1;
+    }
+    return r >> (uint64_t)clz;
+  } else {
+    v1 = invert_limb(y);
+    while (i >= 0) {
+      lx = x[i];
+      struct_res2 = div2by1_inv(r, lx, y, v1);
+      qu1 = struct_res2.__field_0;
+      rem1 = struct_res2.__field_1;
+      r = rem1;
+      q[i] = qu1;
+      i = i - 1;
+    }
+    return r;
+  }
+}
+
+struct __div3by2_inv_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+  uint64_t __field_2;
+};
+
+struct __div3by2_inv_result div3by2_inv(uint64_t uh, uint64_t um,
+                                        uint64_t ul, uint64_t dh,
+                                        uint64_t dl, uint64_t v)
+{
+  uint64_t q1, r0, r1;
+  uint64_t l, h, sl, c;
+  struct __mul64_double_result struct_res;
+  struct __add64_with_carry_result struct_res1, struct_res2;
+  uint64_t sh, p, tl, th, il, b;
+  struct __mul64_double_result struct_res3;
+  struct __sub64_with_borrow_result struct_res4, struct_res5, struct_res6,
+                                  struct_res7;
+  uint64_t ih, bl, b2, bh, rl, c1, rh, bl1, b1;
+  struct __add64_with_carry_result struct_res8, struct_res9;
+  struct __sub64_with_borrow_result struct_res10, struct_res11;
+  uint64_t bh1;
+  struct __div3by2_inv_result result;
+  q1 = UINT64_C(0);
+  r0 = UINT64_C(0);
+  r1 = UINT64_C(0);
+  struct_res = mul64_double(v, uh);
+  l = struct_res.__field_0;
+  h = struct_res.__field_1;
+  struct_res1 = add64_with_carry(um, l, UINT64_C(0));
+  sl = struct_res1.__field_0;
+  c = struct_res1.__field_1;
+  struct_res2 = add64_with_carry(uh, h, c);
+  sh = struct_res2.__field_0;
+  q1 = sh;
+  q1 = q1 + UINT64_C(1);
+  p = dh * sh;
+  r1 = um - p;
+  struct_res3 = mul64_double(sh, dl);
+  tl = struct_res3.__field_0;
+  th = struct_res3.__field_1;
+  struct_res4 = sub64_with_borrow(ul, tl, UINT64_C(0));
+  il = struct_res4.__field_0;
+  b = struct_res4.__field_1;
+  struct_res5 = sub64_with_borrow(r1, th, b);
+  ih = struct_res5.__field_0;
+  struct_res6 = sub64_with_borrow(il, dl, UINT64_C(0));
+  bl = struct_res6.__field_0;
+  b2 = struct_res6.__field_1;
+  struct_res7 = sub64_with_borrow(ih, dh, b2);
+  bh = struct_res7.__field_0;
+  r1 = bh;
+  r0 = bl;
+  if (r1 >= sl) {
+    q1 = q1 - UINT64_C(1);
+    struct_res8 = add64_with_carry(r0, dl, UINT64_C(0));
+    rl = struct_res8.__field_0;
+    c1 = struct_res8.__field_1;
+    struct_res9 = add64_with_carry(r1, dh, c1);
+    rh = struct_res9.__field_0;
+    r1 = rh;
+    r0 = rl;
+  }
+  if (__builtin_expect(r1 > dh || (r1 == dh && r0 >= dl),0)) {
+    struct_res10 = sub64_with_borrow(r0, dl, UINT64_C(0));
+    bl1 = struct_res10.__field_0;
+    b1 = struct_res10.__field_1;
+    struct_res11 = sub64_with_borrow(r1, dh, b1);
+    bh1 = struct_res11.__field_0;
+    q1 = q1 + UINT64_C(1);
+    r1 = bh1;
+    r0 = bl1;
+  }
+  result.__field_0 = q1;
+  result.__field_1 = r0;
+  result.__field_2 = r1;
+  return result;
+}
+
+uint64_t reciprocal_word_3by2(uint64_t dh, uint64_t dl)
+{
+  uint64_t v, p;
+  uint64_t tl, th;
+  struct __mul64_double_result struct_res;
+  v = invert_limb(dh);
+  p = dh * v;
+  p = p + dl;
+  if (p < dl) {
+    if (p >= dh) {
+      v = v - UINT64_C(1);
+      p = p - dh;
+    }
+    v = v - UINT64_C(1);
+    p = p - dh;
+  }
+  struct_res = mul64_double(v, dl);
+  tl = struct_res.__field_0;
+  th = struct_res.__field_1;
+  p = p + th;
+  if (p < th) {
+    if (p > dh || (p == dh && tl >= dl)) {
+      v = v - UINT64_C(1);
+    }
+    v = v - UINT64_C(1);
+  }
+  return v;
+}
+
+struct __sub3_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+};
+
+struct __sub3_result sub3(uint64_t x, uint64_t y, uint64_t z)
+{
+  uint64_t u1, b1, u2, b2;
+  struct __sub64_with_borrow_result struct_res, struct_res1;
+  struct __sub3_result result;
+  struct_res = sub64_with_borrow(x, y, UINT64_C(0));
+  u1 = struct_res.__field_0;
+  b1 = struct_res.__field_1;
+  struct_res1 = sub64_with_borrow(u1, z, UINT64_C(0));
+  u2 = struct_res1.__field_0;
+  b2 = struct_res1.__field_1;
+  result.__field_0 = u2;
+  result.__field_1 = b1 + b2;
+  return result;
+}
+
+uint64_t wmpn_submul_1(uint64_t * r, uint64_t * x, int32_t sz, uint64_t y)
+{
+  uint64_t lx, lr, b;
+  int32_t i;
+  uint64_t rl, rh, res, borrow;
+  struct __mul64_double_result struct_res;
+  struct __sub3_result struct_res1;
+  lx = UINT64_C(0);
+  lr = UINT64_C(0);
+  b = UINT64_C(0);
+  i = 0;
+  while (i < sz) {
+    lx = x[i];
+    lr = r[i];
+    struct_res = mul64_double(lx, y);
+    rl = struct_res.__field_0;
+    rh = struct_res.__field_1;
+    struct_res1 = sub3(lr, rl, b);
+    res = struct_res1.__field_0;
+    borrow = struct_res1.__field_1;
+    r[i] = res;
+    b = rh + borrow;
+    i = i + 1;
+  }
+  return b;
+}
+
+uint64_t div_sb_qr(uint64_t * q, uint64_t * x, int32_t sx, uint64_t * y,
+                   int32_t sy)
+{
+  uint64_t * xp;
+  uint64_t * qp;
+  uint64_t dh, dl, v;
+  int32_t i;
+  int32_t mdn;
+  uint64_t ql, x1, x0;
+  uint64_t * xd;
+  int32_t r;
+  uint64_t qh, nx0, xp0, xp1, qu, rl, rh;
+  uint64_t * xd1;
+  struct __div3by2_inv_result struct_res;
+  uint64_t cy, cy1, cy2, c;
+  xp = x + (sx - 2);
+  qp = q + (sx - sy);
+  dh = y[sy - 1];
+  dl = y[sy - 2];
+  v = reciprocal_word_3by2(dh, dl);
+  i = sx - sy;
+  mdn = 2 - sy;
+  ql = UINT64_C(0);
+  xd = xp + mdn;
+  x1 = UINT64_C(0);
+  x0 = UINT64_C(0);
+  r = wmpn_cmp(xd, y, sy);
+  if (r >= 0) {
+    qh = UINT64_C(1);
+  } else {
+    qh = UINT64_C(0);
+  }
+  if (!(qh == UINT64_C(0))) {
+    wmpn_sub_in_place(xd, sy, y, sy);
+  }
+  x1 = xp[1];
+  while (i > 0) {
+    i = i - 1;
+    xp = xp + -1;
+    xd1 = xp + mdn;
+    nx0 = xp[1];
+    if (__builtin_expect(x1 == dh && nx0 == dl,0)) {
+      ql = (0xffffffffffffffffUL);
+      wmpn_submul_1(xd1, y, sy, ql);
+      x1 = xp[1];
+      qp = qp + -1;
+      *qp = ql;
+    } else {
+      xp0 = *xp;
+      xp1 = xp[1];
+      struct_res = div3by2_inv(x1, xp1, xp0, dh, dl, v);
+      qu = struct_res.__field_0;
+      rl = struct_res.__field_1;
+      rh = struct_res.__field_2;
+      ql = qu;
+      x1 = rh;
+      x0 = rl;
+      cy = wmpn_submul_1(xd1, y, sy - 2, ql);
+      if (x0 < cy) {
+        cy1 = UINT64_C(1);
+      } else {
+        cy1 = UINT64_C(0);
+      }
+      x0 = x0 - cy;
+      if (x1 < cy1) {
+        cy2 = UINT64_C(1);
+      } else {
+        cy2 = UINT64_C(0);
+      }
+      x1 = x1 - cy1;
+      *xp = x0;
+      if (__builtin_expect(!(cy2 == UINT64_C(0)),0)) {
+        c = wmpn_add_in_place(xd1, sy - 1, y, sy - 1);
+        x1 = x1 + (dh + c);
+        ql = ql - UINT64_C(1);
+        qp = qp + -1;
+        *qp = ql;
+      } else {
+        qp = qp + -1;
+        *qp = ql;
+      }
+    }
+  }
+  xp[1] = x1;
+  return qh;
+}
+
+uint64_t wmpn_divrem_2(uint64_t * q, uint64_t * x, uint64_t * y, int32_t sx)
+{
+  uint64_t * xp;
+  uint64_t dh, dl;
+  uint64_t rh, rl, qh, lx;
+  int32_t i;
+  uint64_t dinv, r0, b, r1, qu, r01, r11;
+  struct __sub64_with_borrow_result struct_res, struct_res1;
+  struct __div3by2_inv_result struct_res2;
+  xp = x + (sx - 2);
+  dh = y[1];
+  dl = *y;
+  rh = xp[1];
+  rl = *xp;
+  qh = UINT64_C(0);
+  lx = UINT64_C(0);
+  i = sx - 2;
+  dinv = reciprocal_word_3by2(dh, dl);
+  if (rh >= dh && (rh > dh || rl >= dl)) {
+    struct_res = sub64_with_borrow(rl, dl, UINT64_C(0));
+    r0 = struct_res.__field_0;
+    b = struct_res.__field_1;
+    struct_res1 = sub64_with_borrow(rh, dh, b);
+    r1 = struct_res1.__field_0;
+    rh = r1;
+    rl = r0;
+    qh = UINT64_C(1);
+  }
+  while (i > 0) {
+    xp = xp + -1;
+    lx = *xp;
+    struct_res2 = div3by2_inv(rh, rl, lx, dh, dl, dinv);
+    qu = struct_res2.__field_0;
+    r01 = struct_res2.__field_1;
+    r11 = struct_res2.__field_2;
+    rh = r11;
+    rl = r01;
+    i = i - 1;
+    q[i] = qu;
+  }
+  x[1] = rh;
+  *x = rl;
+  return qh;
+}
+
+void div_qr(uint64_t * q, uint64_t * r, uint64_t * x, uint64_t * y,
+            uint64_t * nx, uint64_t * ny, int32_t sx, int32_t sy)
+{
+  uint64_t lr;
+  int32_t clz;
+  int32_t i;
+  uint64_t * xp;
+  uint64_t * rp;
+  int32_t i1;
+  uint64_t * xp1;
+  uint64_t * rp1;
+  uint64_t h;
+  int32_t adjust, clz1;
+  int32_t i2;
+  uint64_t * xp2;
+  uint64_t * rp2;
+  int32_t i3;
+  uint64_t * xp3;
+  uint64_t * rp3;
+  uint64_t h1;
+  if (sy == 1) {
+    lr = wmpn_divrem_1(q, x, sx, *y);
+    *r = lr;
+    return;
+  } else {
+    if (sy == 2) {
+      clz = __builtin_clzll(y[sy - 1]);
+      if (clz == 0) {
+        i = 0;
+        xp = x + 0;
+        rp = nx + 0;
+        while (i < sx) {
+          *rp = *xp;
+          rp = rp + 1;
+          xp = xp + 1;
+          i = i + 1;
+        }
+        nx[sx] = UINT64_C(0);
+        wmpn_divrem_2(q, nx, y, sx + 1);
+        i1 = 0;
+        xp1 = nx + 0;
+        rp1 = r + 0;
+        while (i1 < sy) {
+          *rp1 = *xp1;
+          rp1 = rp1 + 1;
+          xp1 = xp1 + 1;
+          i1 = i1 + 1;
+        }
+      } else {
+        wmpn_lshift(ny, y, sy, (uint64_t)clz);
+        h = wmpn_lshift(nx, x, sx, (uint64_t)clz);
+        nx[sx] = h;
+        wmpn_divrem_2(q, nx, ny, sx + 1);
+        wmpn_rshift(r, nx, sy, (uint64_t)clz);
+        return;
+      }
+    } else {
+      if (x[sx - 1] >= y[sy - 1]) {
+        adjust = 1;
+      } else {
+        adjust = 0;
+      }
+      clz1 = __builtin_clzll(y[sy - 1]);
+      if (clz1 == 0) {
+        i2 = 0;
+        xp2 = x + 0;
+        rp2 = nx + 0;
+        while (i2 < sx) {
+          *rp2 = *xp2;
+          rp2 = rp2 + 1;
+          xp2 = xp2 + 1;
+          i2 = i2 + 1;
+        }
+        nx[sx] = UINT64_C(0);
+        div_sb_qr(q, nx, sx + adjust, y, sy);
+        i3 = 0;
+        xp3 = nx + 0;
+        rp3 = r + 0;
+        while (i3 < sy) {
+          *rp3 = *xp3;
+          rp3 = rp3 + 1;
+          xp3 = xp3 + 1;
+          i3 = i3 + 1;
+        }
+        if (adjust == 0) {
+          q[sx - sy] = UINT64_C(0);
+          return;
+        } else {
+          return;
+        }
+      } else {
+        wmpn_lshift(ny, y, sy, (uint64_t)clz1);
+        h1 = wmpn_lshift(nx, x, sx, (uint64_t)clz1);
+        nx[sx] = h1;
+        div_sb_qr(q, nx, sx + adjust, ny, sy);
+        wmpn_rshift(r, nx, sy, (uint64_t)clz1);
+        if (adjust == 0) {
+          q[sx - sy] = UINT64_C(0);
+          return;
+        } else {
+          return;
+        }
+      }
+    }
+  }
+}
+
+void wmpn_tdiv_qr(uint64_t * q, uint64_t * r, uint64_t * x, int32_t sx,
+                  uint64_t * y, int32_t sy)
+{
+  uint64_t * nx;
+  uint64_t * ny;
+  nx = malloc(((uint32_t)sx + 1U) * sizeof(uint64_t));
+  assert (nx);
+  ny = malloc((uint32_t)sy * sizeof(uint64_t));
+  assert (ny);
+  div_qr(q, r, x, y, nx, ny, sx, sy);
+  free(nx);
+  free(ny);
+  return;
+}
+
+void div_qr_in_place(uint64_t * q, uint64_t * x, uint64_t * y, uint64_t * nx,
+                     uint64_t * ny, int32_t sx, int32_t sy)
+{
+  uint64_t lr;
+  int32_t clz;
+  int32_t i;
+  uint64_t * xp;
+  uint64_t * rp;
+  int32_t i1;
+  uint64_t * xp1;
+  uint64_t * rp1;
+  uint64_t h;
+  int32_t adjust, clz1;
+  int32_t i2;
+  uint64_t * xp2;
+  uint64_t * rp2;
+  int32_t i3;
+  uint64_t * xp3;
+  uint64_t * rp3;
+  uint64_t h1;
+  if (sy == 1) {
+    lr = wmpn_divrem_1(q, x, sx, *y);
+    *x = lr;
+    return;
+  } else {
+    if (sy == 2) {
+      clz = __builtin_clzll(y[sy - 1]);
+      if (clz == 0) {
+        i = 0;
+        xp = x + 0;
+        rp = nx + 0;
+        while (i < sx) {
+          *rp = *xp;
+          rp = rp + 1;
+          xp = xp + 1;
+          i = i + 1;
+        }
+        nx[sx] = UINT64_C(0);
+        wmpn_divrem_2(q, nx, y, sx + 1);
+        i1 = 0;
+        xp1 = nx + 0;
+        rp1 = x + 0;
+        while (i1 < sy) {
+          *rp1 = *xp1;
+          rp1 = rp1 + 1;
+          xp1 = xp1 + 1;
+          i1 = i1 + 1;
+        }
+      } else {
+        wmpn_lshift(ny, y, sy, (uint64_t)clz);
+        h = wmpn_lshift(nx, x, sx, (uint64_t)clz);
+        nx[sx] = h;
+        wmpn_divrem_2(q, nx, ny, sx + 1);
+        wmpn_rshift(x, nx, sy, (uint64_t)clz);
+        return;
+      }
+    } else {
+      if (x[sx - 1] >= y[sy - 1]) {
+        adjust = 1;
+      } else {
+        adjust = 0;
+      }
+      clz1 = __builtin_clzll(y[sy - 1]);
+      if (clz1 == 0) {
+        i2 = 0;
+        xp2 = x + 0;
+        rp2 = nx + 0;
+        while (i2 < sx) {
+          *rp2 = *xp2;
+          rp2 = rp2 + 1;
+          xp2 = xp2 + 1;
+          i2 = i2 + 1;
+        }
+        nx[sx] = UINT64_C(0);
+        div_sb_qr(q, nx, sx + adjust, y, sy);
+        i3 = 0;
+        xp3 = nx + 0;
+        rp3 = x + 0;
+        while (i3 < sy) {
+          *rp3 = *xp3;
+          rp3 = rp3 + 1;
+          xp3 = xp3 + 1;
+          i3 = i3 + 1;
+        }
+        if (adjust == 0) {
+          q[sx - sy] = UINT64_C(0);
+          return;
+        } else {
+          return;
+        }
+      } else {
+        wmpn_lshift(ny, y, sy, (uint64_t)clz1);
+        h1 = wmpn_lshift(nx, x, sx, (uint64_t)clz1);
+        nx[sx] = h1;
+        div_sb_qr(q, nx, sx + adjust, ny, sy);
+        wmpn_rshift(x, nx, sy, (uint64_t)clz1);
+        if (adjust == 0) {
+          q[sx - sy] = UINT64_C(0);
+          return;
+        } else {
+          return;
+        }
+      }
+    }
+  }
+}
+
+void wmpn_tdiv_qr_in_place(uint64_t * q, uint64_t * x, int32_t sx,
+                           uint64_t * y, int32_t sy)
+{
+  uint64_t * nx;
+  uint64_t * ny;
+  nx = malloc(((uint32_t)sx + 1U) * sizeof(uint64_t));
+  assert (nx);
+  ny = malloc((uint32_t)sy * sizeof(uint64_t));
+  assert (ny);
+  div_qr_in_place(q, x, y, nx, ny, sx, sy);
+  free(nx);
+  free(ny);
+  return;
+}
+
--- a/generated-src/lib/div.h
+++ b/generated-src/lib/div.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+uint64_t invert_limb(uint64_t d);
+
+struct __div2by1_inv_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+};
+
+struct __div2by1_inv_result div2by1_inv(uint64_t uh, uint64_t ul, uint64_t d,
+                                        uint64_t v);
+
+uint64_t wmpn_divrem_1(uint64_t * q, uint64_t * x, int32_t sz, uint64_t y);
+
+struct __div3by2_inv_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+  uint64_t __field_2;
+};
+
+struct __div3by2_inv_result div3by2_inv(uint64_t uh, uint64_t um,
+                                        uint64_t ul, uint64_t dh,
+                                        uint64_t dl, uint64_t v);
+
+uint64_t reciprocal_word_3by2(uint64_t dh, uint64_t dl);
+
+struct __sub3_result
+{ uint64_t __field_0;
+  uint64_t __field_1;
+};
+
+struct __sub3_result sub3(uint64_t x, uint64_t y, uint64_t z);
+
+uint64_t wmpn_submul_1(uint64_t * r, uint64_t * x, int32_t sz, uint64_t y);
+
+uint64_t div_sb_qr(uint64_t * q, uint64_t * x, int32_t sx, uint64_t * y,
+                   int32_t sy);
+
+uint64_t wmpn_divrem_2(uint64_t * q, uint64_t * x, uint64_t * y, int32_t sx);
+
+void div_qr(uint64_t * q, uint64_t * r, uint64_t * x, uint64_t * y,
+            uint64_t * nx, uint64_t * ny, int32_t sx, int32_t sy);
+
+void wmpn_tdiv_qr(uint64_t * q, uint64_t * r, uint64_t * x, int32_t sx,
+                  uint64_t * y, int32_t sy);
+
+void div_qr_in_place(uint64_t * q, uint64_t * x, uint64_t * y, uint64_t * nx,
+                     uint64_t * ny, int32_t sx, int32_t sy);
+
+void wmpn_tdiv_qr_in_place(uint64_t * q, uint64_t * x, int32_t sx,
+                           uint64_t * y, int32_t sy);
+
--- a/generated-src/lib/euclideandivision.h
+++ b/generated-src/lib/euclideandivision.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
--- a/generated-src/lib/fxp.h
+++ b/generated-src/lib/fxp.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+uint64_t fxp_init(uint64_t x);
+
+uint64_t fxp_id(uint64_t x);
+
--- a/generated-src/lib/int.h
+++ b/generated-src/lib/int.h
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <assert.h>
+#include <alloca.h>
+
+
+
+
+
+