C extraction: print appropriate suffixes on integer constants
-
Owner
Inferring the type suffix from the value of the constant and the range of the type seems a bit dubious to me. For example, if the user writes
0x80000000:int64
, the extracted literal will be0x80000000u
, i.e., unsigned instead of signed. I would instead have expected a system based either on the types actually written in the driver or onsyntax literal
entries in the driver or both. As a matter of fact, there are alreadysyntax literal
entries in the C driver. -
Owner
I cannot reproduce your counterexample (I get
0x80000000l
), though I also think it would be cleaner to use the driver for this. I did not find a good way to get the right information from thesyntax literal
entries. I guess I could just match thesyntax type
entry in the driver against hardcoded strings. -
Owner
The code in the SMT printer looks as follows:
| Tconst c -> let ts = match t.t_ty with | Some { ty_node = Tyapp (ts, []) } -> ts | _ -> assert false (* impossible *) in (* look for syntax literal ts in driver *) begin match query_syntax info.info_rliteral ts.ts_name, c with | Some st, Number.ConstInt c -> syntax_range_literal st fmt c
Wouldn't the same kind of code work for programs too?
-
Owner
Huh, I guess it would. I did not know what to make of the format string, but
syntax_range_literal
is what I need. -
Owner
I gave it a try. One issue is that the extracted code becomes quite ugly:
[...] i = 0x00000001; c = 0x0000000000000000UL; *r = res; if (res < lx) { c = 0x0000000000000001UL; while (i < sz) { lx = x[i]; res1 = lx + 0x0000000000000001UL; [...]
It also made me find a bug in the memory model. The following function
f
let f () = let x = 0x80000000:int64 in let p = malloc 2 in let p1 = incr p 1 in C.set_ofs p1 (-1) x; return x
gets extracted as:
int64_t f() { int64_t x; int64_t * p; int64_t * p1; x = 0x0000000080000000L; p = malloc(0x0000000000000002U * sizeof(int64_t)); p1 = p + 0x00000001; p1[0xFFFFFFFF] = x; return x; }
This segfaults on my 64-bit machine. I guess I should use 64-bit integers in pointer offsets.
-
Owner
Also, I would have liked to print some numbers without specifying the number of digits (using base 10 with no specific number of digits seems sane for unsigned types), but using just
%d
as format string outputs an empty string for0
. -
Owner
One issue is that the extracted code becomes quite ugly
I agree. My suggestion would be to modify
syntax literal
so that it accepts a richer format, so as to be able to select a syntax depending on the kind of literal, e.g.,%dUL%|0x%xUL%|0%oUL
with%|
delimiting the choices of the alternative.This segfaults on my 64-bit machine. I guess I should use 64-bit integers in pointer offsets.
I would rather say that the bug is that
-1
should never be translated to0xFFFFFFFF
. Changing the size of pointer offsets would only hide the bug.using just
%d
as format string outputs an empty string for0
Seems like a misfeature. I can't think of a case where it would be useful.
-
Owner
I would rather say that the bug is that
-1
should never be translated to0xFFFFFFFF
.How else would we print negative literals in bases other than 10 though? Should they just be rejected? For the memory model, the only way out I can see is to print signed types only in base 10, so that a minus sign is printed.
-
Owner
Why would they be rejected?
syntax_range_literal
is perfectly happy with displaying negative numbers in%d
mode. So, there is at least one branch of the alternative that succeeds. (And when there are several ones, the one matching the literal kind should be chosen.) -
Owner
I thought you meant that
-1
should never be translated to0xFFFFFFFF
in general. I agree with your suggestion. Unless you object though, I will merge this and make the changes in a separate branch. -
Owner
Yes, I meant that
-1
should never be translated to a nonnegative literal for the C language.Moreover, I consider that the driver should not be using
0x%8x
but just0x%x
(for readability), which effectively forbids a negative literal to pass through. As a corollary, this means that there should be a fallback. The multi-format system I proposed offers such a fallback, but we can imagine other ways of doing it.That said, I am fine with the merge of the
fxp_sqrt2
branch in the meantime. -
Owner
Perhaps a simpler system than the multi-format one would be to just add a new
%c
format (as in "custom") and pass an optional callback tosyntax_range_literal
. That way, the C driver would use"%cUL"
and the C printer would pass the first part of its current code as a callback.By the way, a better string might be
"UINT64_C(%c)"
. It is a bit ugly, but at least it no longer depends on the implicit knowledge that the target C compiler definesuint64_t
asunsigned long
.