Surprising behavior of offset keywords ($startofs/$endofs) for a lexer that doesn't update positions
Since https://github.com/ocaml/ocaml/pull/1585 (4.07), ocamllex does not update lexer positions (the lex_curr_p
and lex_start_p
fields) if a buffer has been initialized with dummy_pos
.
In this mode only the offsets should be used (lex_start_pos
and lex_curr_pos
), as the position will always be dummy_pos
.
However, the Menhir keywords $startofs
/ $endofs
have a slightly different meaning: the full positions are still used and the offset is projected from the position.
Thus even when using the offset keywords the positions will be wrong.
I am not sure how this situation should be handled. Possible mitigations in the short term:
- document the incompatibility,
- extend
lexer_lexbuf_to_supplier
to at least update the offsets if the lexer position is alwaysdummy_pos
I have started to experiment with generalized Engine.env
/Engine.stack
types that are parameterized by a location type only accessed in an abstract manner by Menhir, but the design space is quite large so I don't think it will be available very soon.