Commit 15509bd7 authored by Raphael Rieu-Helft's avatar Raphael Rieu-Helft

C extraction: print appropriate suffixes on integer constants

parent ef6e98c8
......@@ -443,6 +443,15 @@ module C = struct
| Bassign -> true
| _ -> false
(** Integer type bounds *)
open BigInt
let min32 = minus (pow_int_pos 2 31)
let max32 = sub (pow_int_pos 2 31) one
let maxu32 = sub (pow_int_pos 2 32) one
let min64 = minus (pow_int_pos 2 63)
let max64 = sub (pow_int_pos 2 63) one
let maxu64 = sub (pow_int_pos 2 64) one
end
type info = {
......@@ -924,7 +933,21 @@ module MLToC = struct
| ILitHex -> Format.asprintf "0x%a" (print_in_base 16 None) n
| ILitOct -> Format.asprintf "0%a" (print_in_base 8 None) n
| _ -> BigInt.to_string n in
let e = C.(Econst (Cint n)) in
let suf =
match e.e_ity with
| I i ->
begin match (ty_of_ity i).ty_node with
| Tyapp ({ts_def = Range { ir_lower = l; ir_upper = h }},_) ->
let open BigInt in
let unsigned = eq l zero in
if (le min32 l) && (le h max32) then ""
else if unsigned && (le h maxu32) then "u"
else if (le min64 l) && (le h max64) then "l"
else if unsigned && (le h maxu64) then "ul"
else raise (Unsupported "unknown number format")
| _ -> raise (Unsupported "non-range integer constant") end
| _ -> assert false in
let e = C.(Econst (Cint (n^suf))) in
([], expr_or_return env e)
| Eapp (rs, el)
when is_struct_constructor info rs
......
  • Inferring the type suffix from the value of the constant and the range of the type seems a bit dubious to me. For example, if the user writes 0x80000000:int64, the extracted literal will be 0x80000000u, i.e., unsigned instead of signed. I would instead have expected a system based either on the types actually written in the driver or on syntax literal entries in the driver or both. As a matter of fact, there are already syntax literal entries in the C driver.

  • I cannot reproduce your counterexample (I get 0x80000000l), though I also think it would be cleaner to use the driver for this. I did not find a good way to get the right information from the syntax literal entries. I guess I could just match the syntax type entry in the driver against hardcoded strings.

  • The code in the SMT printer looks as follows:

    | Tconst c ->
      let ts = match t.t_ty with
        | Some { ty_node = Tyapp (ts, []) } -> ts
        | _ -> assert false (* impossible *) in
      (* look for syntax literal ts in driver *)
      begin match query_syntax info.info_rliteral ts.ts_name, c with
      | Some st, Number.ConstInt c ->
        syntax_range_literal st fmt c

    Wouldn't the same kind of code work for programs too?

    Edited by Guillaume Melquiond
  • Huh, I guess it would. I did not know what to make of the format string, but syntax_range_literal is what I need.

  • I gave it a try. One issue is that the extracted code becomes quite ugly:

      [...]
      i = 0x00000001;
      c = 0x0000000000000000UL;
      *r = res;
      if (res < lx) {
        c = 0x0000000000000001UL;
        while (i < sz) {
          lx = x[i];
          res1 = lx + 0x0000000000000001UL;
      [...]

    It also made me find a bug in the memory model. The following function f

    let f () =
      let x = 0x80000000:int64 in
      let p = malloc 2 in
      let p1 = incr p 1 in
      C.set_ofs p1 (-1) x;
      return x

    gets extracted as:

    int64_t f()
    {
      int64_t x;
      int64_t * p;
      int64_t * p1;
      x = 0x0000000080000000L;
      p = malloc(0x0000000000000002U * sizeof(int64_t));
      p1 = p + 0x00000001;
      p1[0xFFFFFFFF] = x;
      return x;
    }

    This segfaults on my 64-bit machine. I guess I should use 64-bit integers in pointer offsets.

    Edited by Raphaël Rieu-Helft
  • Also, I would have liked to print some numbers without specifying the number of digits (using base 10 with no specific number of digits seems sane for unsigned types), but using just %d as format string outputs an empty string for 0.

  • One issue is that the extracted code becomes quite ugly

    I agree. My suggestion would be to modify syntax literal so that it accepts a richer format, so as to be able to select a syntax depending on the kind of literal, e.g., %dUL%|0x%xUL%|0%oUL with %| delimiting the choices of the alternative.

    This segfaults on my 64-bit machine. I guess I should use 64-bit integers in pointer offsets.

    I would rather say that the bug is that -1 should never be translated to 0xFFFFFFFF. Changing the size of pointer offsets would only hide the bug.

    using just %d as format string outputs an empty string for 0

    Seems like a misfeature. I can't think of a case where it would be useful.

  • I would rather say that the bug is that -1 should never be translated to 0xFFFFFFFF.

    How else would we print negative literals in bases other than 10 though? Should they just be rejected? For the memory model, the only way out I can see is to print signed types only in base 10, so that a minus sign is printed.

  • Why would they be rejected? syntax_range_literal is perfectly happy with displaying negative numbers in %d mode. So, there is at least one branch of the alternative that succeeds. (And when there are several ones, the one matching the literal kind should be chosen.)

  • I thought you meant that -1 should never be translated to 0xFFFFFFFFin general. I agree with your suggestion. Unless you object though, I will merge this and make the changes in a separate branch.

  • Yes, I meant that -1 should never be translated to a nonnegative literal for the C language.

    Moreover, I consider that the driver should not be using 0x%8x but just 0x%x (for readability), which effectively forbids a negative literal to pass through. As a corollary, this means that there should be a fallback. The multi-format system I proposed offers such a fallback, but we can imagine other ways of doing it.

    That said, I am fine with the merge of the fxp_sqrt2 branch in the meantime.

  • Perhaps a simpler system than the multi-format one would be to just add a new %c format (as in "custom") and pass an optional callback to syntax_range_literal. That way, the C driver would use "%cUL" and the C printer would pass the first part of its current code as a callback.

    By the way, a better string might be "UINT64_C(%c)". It is a bit ugly, but at least it no longer depends on the implicit knowledge that the target C compiler defines uint64_t as unsigned long.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment