Issues with error locations
Prompted by !760, I have tried to understand better how tools are supposed to report error locations.
Here are Emacs' expectations:
- The first character of a line is at position 1 (even if Emacs itself counts column starting from zero). A buffer setting makes it so that they are at position 0.
- Tabulation characters count for as many positions as needed to reach the next multiple of eight. A buffer setting makes it so that they count for one.
- Unicode characters count for one position.
Here are Vim's expectations:
- The first character of a line is at position 1.
- Tabulation characters count either for one position, or for as many positions as needed to reach the next multiple of eight, depending on the error format.
- Unicode characters count for one position.
Here are Gedit's expectations (+l:c
on command line):
- The first character of a line is at position 1.
- Tabulations characters count for one position.
- Unicode characters count for one position.
Here are Why3 IDE's current expectations:
- The first character of a line is at position 0. (Trivial to change.)
- Tabulations characters count for one. (Moderately easy to change.)
- Unicode characters count for one position. (Almost impossible to change.)
Here is a small example that exhibits the three cases:
let f ()
ensures { result = 0 }
(* next is a tab *) ensures { result = 0 }
(* éééé *) ensures { [@ ééé ] result = 0 }
= 0
Just select the goal in the IDE and see whether the postconditions are correctly highlighted. If you want to experiment with error messages instead, just change result
into risult
.
As it stands (!763 (merged), !766 (merged)), Why3 accommodates these expectations as follows:
- Not satisfied for Vim (and Gedit, but that is less relevant).
- Satisfied.
- Almost satisfied, e.g., in the absence of combining marks.
For completeness, here is GCC's behavior:
- The first character of a line is at position 1.
- Tabulation characters count for as many positions as needed to reach the next multiple of eight.
- Unicode characters count for one.
Here is Clang's behavior:
- The first character of a line is at position 1.
- Tabulation characters count for one position for column. They count for multiple positions for caret.
- Unicode characters count for their number of bytes for column. They count for one position for caret.
Here is OCaml's behavior:
- The first character of a line is at position 0.
- Tabulation characters count for one. (Hence a broken caret.)
- Unicode characters count for their number of bytes. (Hence a broken caret.)
Here is the LSP protocol's specification:
- The first character of a line is at position 0.
- Tabulation characters count for one.
- Unicode characters count for the number of words needed to represent them in UTF-16. (Since May 2022, client and server can negotiate to count in bytes or in unicode.)