Regression (in 2021-12-30 ?) concerning parse errors
Consider the following LBNF grammar, which can be translated to a menhir-grammar using bnfc --ocaml-menhir
(https://github.com/BNFC/bnfc).
EInt. Exp1 ::= Integer; -- entrypoint
EPlus. Exp ::= Exp "+" Exp1;
The rule EPlus
can never fire, so running the parser on 1+2
will fail. However, the error message of menhir-2021-12-30 is worse than the one of e.g. menhir-2021-04-19 or ocamlyacc:
$ ocaml/TestTest <<< "1+2"
Parse error at 1.2-1.3
$ menhir-2021-04-19/TestTest <<< "1+2"
Parse error at 1.2-1.3
$ menhir-2021-12-30/TestTest <<< "1+2"
Fatal error: exception ParTest.MenhirBasics.Error
The somewhat simplified .mly
file generated by bnfc --ocaml-menhir
is this:
%{
open AbsTest
open Lexing
%}
%token SYMB1 /* + */
%token TOK_EOF
%token <int> TOK_Integer
%start pExp1 pExp
%type <AbsTest.exp> pExp1
%type <AbsTest.exp> pExp
%type <AbsTest.exp> exp1
%type <AbsTest.exp> exp
%type <int> int
%%
pExp1 : exp1 TOK_EOF { $1 }
| error { raise (BNFC_Util.Parse_error ($symbolstartpos, $endpos)) };
pExp : exp TOK_EOF { $1 }
| error { raise (BNFC_Util.Parse_error ($symbolstartpos, $endpos)) };
exp1 : int { EInt $1 };
exp : exp SYMB1 exp1 { EPlus ($1, $3) };
int : TOK_Integer { $1 };
Checking the changelog, it seems that the use of the error
token is within the backward-compatible way:
For grammars that use the
error
token in the limited way permitted by the simplified strategy, this makes no difference either. The simplified strategy makes the following requirement: theerror
token should always appear at the end of a production, whose semantic action should abort the parser by raising an exception.
All the generated code is in the attached tar file: regression-2021-12-30.tgz