Commit 7fe180b8 authored by POTTIER Francois's avatar POTTIER Francois

Removed --error-recovery mode.

parent b565c874
2014/12/02: 2014/12/02:
Removed support for the $previouserror keyword. Removed support for the $previouserror keyword.
Removed support for --error-recovery mode.
2014/02/18: 2014/02/18:
In the Coq backend, use ' instead of _ as separator in identifiers. In the Coq backend, use ' instead of _ as separator in identifiers.
......
...@@ -55,14 +55,17 @@ first~\cite{aho-86,appel-tiger-98,hopcroft-motwani-ullman-00}. They are also ...@@ -55,14 +55,17 @@ first~\cite{aho-86,appel-tiger-98,hopcroft-motwani-ullman-00}. They are also
invited to have a look at the \distrib{demos} directory in \menhir's invited to have a look at the \distrib{demos} directory in \menhir's
distribution. distribution.
At this stage, potential users should be warned about two facts. First, Potential users of Menhir should be warned that \menhir's feature set is not
\menhir's feature set is not stable. There is a tension between preserving a completely stable. There is a tension between preserving a measure of
measure of compatibility with \ocamlyacc, on the one hand, and introducing new compatibility with \ocamlyacc, on the one hand, and introducing new ideas, on
ideas, on the other hand. Some aspects of the tool, such as the error handling the other hand. Some aspects of the tool, such as the error handling
and recovery mechanism, are still potentially subject to incompatible mechanism, are still potentially subject to incompatible changes: for
changes. Second, the present release is \emph{beta}-quality. There is much instance, in the future, the current error handling mechanism (which is based
room for improvement in the tool and in this reference manual. Bug reports and on the \error token, see \sref{sec:errors}) could be removed and replaced with
suggestions are welcome! an entirely different mechanism.
There is room for improvement in the tool and in this reference manual. Bug
reports and suggestions are welcome!
% --------------------------------------------------------------------------------------------------------------------- % ---------------------------------------------------------------------------------------------------------------------
...@@ -111,13 +114,6 @@ switch. ...@@ -111,13 +114,6 @@ switch.
\docswitch{\odump} This switch causes a description of the automaton \docswitch{\odump} This switch causes a description of the automaton
to be written to the file \nt{basename}\automaton. to be written to the file \nt{basename}\automaton.
\docswitch{\oerrorrecovery} This switch causes error recovery code to be
generated. Error recovery, also known as re-synchronization, consists in
dropping tokens off the input stream, after an error has been detected,
until a token that can be shifted in the current state is found. This
behavior is made optional because it is seldom exploited and requires
extra code in the parser. See also \sref{sec:errors}.
\docswitch{\oexplain} This switch causes conflict explanations to be \docswitch{\oexplain} This switch causes conflict explanations to be
written to the file \nt{basename}\conflicts. See also \sref{sec:conflicts}. written to the file \nt{basename}\conflicts. See also \sref{sec:conflicts}.
...@@ -1553,12 +1549,12 @@ dummy positions. ...@@ -1553,12 +1549,12 @@ dummy positions.
% --------------------------------------------------------------------------------------------------------------------- % ---------------------------------------------------------------------------------------------------------------------
\section{Error handling and recovery} \section{Error handling}
\label{sec:errors} \label{sec:errors}
\paragraph{Error handling} \paragraph{Error handling}
\menhir's error handling and recovery is inspired by that of \yacc and \menhir's error handling mechanism is inspired by that of \yacc and
\ocamlyacc, but is not identical. A special \error token is made available \ocamlyacc, but is not identical. A special \error token is made available
for use within productions. The LR automaton is constructed exactly as if for use within productions. The LR automaton is constructed exactly as if
\error was a regular terminal symbol. However, \error is never produced \error was a regular terminal symbol. However, \error is never produced
...@@ -1577,9 +1573,7 @@ Since the lookahead token is still \error, the automaton remains in error ...@@ -1577,9 +1573,7 @@ Since the lookahead token is still \error, the automaton remains in error
handling mode. handling mode.
When a state that can shift on \error is found, the \error token is shifted. When a state that can shift on \error is found, the \error token is shifted.
At this point, the parser either enters \emph{error recovery} mode, if the At this point, the parser returns to normal mode.
\oerrorrecovery switch was enabled at compile time, or returns to normal
mode.
When no state that can act on \error is found on the automaton's stack, the When no state that can act on \error is found on the automaton's stack, the
parser stops and raises the exception \texttt{Error}. This exception carries parser stops and raises the exception \texttt{Error}. This exception carries
...@@ -1588,24 +1582,10 @@ lexical analyzer's environment record. ...@@ -1588,24 +1582,10 @@ lexical analyzer's environment record.
\paragraph{Error recovery} \paragraph{Error recovery}
Error recovery mode is entered immediately after an \error token was \ocamlyacc offers an error recovery mode, which is entered immediately after
successfully shifted, and only if \menhir's \oerrorrecovery switch was enabled an \error token was successfully shifted. In this mode, tokens are repeatedly
when the parser was produced. In error recovery mode, tokens are repeatedly
taken off the input stream and discarded until an acceptable token is found. taken off the input stream and discarded until an acceptable token is found.
A token is acceptable if the current state has an action on that token. When This feature is no longer offered by \menhir.
an acceptable token is found, the parser returns to normal mode and the action
takes place. Error recovery is also known as \emph{re-synchronization}.
Error recovery mode is peculiar, in that it can cause non-termination if the
token stream is infinite. In practice, token streams often \emph{are}
infinite, due to an \ocamllex peculiarity: every \ocamllex-generated analyzer
that maps the \kw{eof} pattern to an \basic{EOF} token will produce an
infinite stream of \basic{EOF} tokens, even if the underlying text that is
being scanned is finite. In order to address this issue, \menhir attributes
special meaning to the token named \basic{EOF}, if there is one in the grammar
specification, when \oerrorrecovery is enabled. It checks that every automaton
state that can be reached when in error recovery mode accepts this token, and
issues a warning otherwise. This ensures that the parser always terminates.
\paragraph{Error-related keywords} \paragraph{Error-related keywords}
...@@ -1829,10 +1809,10 @@ The list is roughly sorted by decreasing order of importance. ...@@ -1829,10 +1809,10 @@ The list is roughly sorted by decreasing order of importance.
semantic actions are deprecated. The function \verb+parse_error+ is semantic actions are deprecated. The function \verb+parse_error+ is
deprecated. They are replaced with keywords (\sref{sec:errors}). deprecated. They are replaced with keywords (\sref{sec:errors}).
\item \menhir's error handling and error recovery mechanisms (\sref{sec:errors}) are inspired \item \menhir's error handling mechanism (\sref{sec:errors}) isinspired
by \ocamlyacc's, but are not guaranteed to be fully by \ocamlyacc's, but are not guaranteed to be fully
compatible. Error recovery, also known as re-synchronization, is now compatible. Error recovery, also known as re-synchronization, is not
optional. supported by \menhir.
\item The way in which severe conflicts (\sref{sec:conflicts}) are resolved \item The way in which severe conflicts (\sref{sec:conflicts}) are resolved
is not guaranteed to be fully compatible with \ocamlyacc. is not guaranteed to be fully compatible with \ocamlyacc.
......
...@@ -60,7 +60,7 @@ stage1: ...@@ -60,7 +60,7 @@ stage1:
# Stage 2. # Stage 2.
# Build Menhir using Menhir (from stage 1). # Build Menhir using Menhir (from stage 1).
FLAGS := -v -lg 1 -la 1 -lc 1 --comment --infer --error-recovery --stdlib . --strict --fixed-exception FLAGS := -v -lg 1 -la 1 -lc 1 --comment --infer --stdlib . --strict --fixed-exception
stage2: stage2:
@$(OCAMLBUILD) -build-dir _stage2 -tag fancy_parser \ @$(OCAMLBUILD) -build-dir _stage2 -tag fancy_parser \
......
...@@ -115,7 +115,7 @@ open Interface ...@@ -115,7 +115,7 @@ open Interface
that branch with a simple [assert false]. TEMPORARY do it *) that branch with a simple [assert false]. TEMPORARY do it *)
(* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *)
(* Here is a description of our error recovery mechanism. (* Here is a description of our error handling mechanism.
With every state [s], we associate an [error] function. With every state [s], we associate an [error] function.
...@@ -131,19 +131,9 @@ open Interface ...@@ -131,19 +131,9 @@ open Interface
cells do not physically hold a state, this description is somewhat cells do not physically hold a state, this description is somewhat
simpler than the truth, but that's the idea.) simpler than the truth, but that's the idea.)
When an error is detected in state [s], one of two things happens When an error is detected in state [s], then (see [initiate]) the
(see [initiate]). [error] function associated with [s] is invoked. Immediately
before invoking the [error] function, the
a. If [s] can do error recovery and if no token was successfully
shifted since the last [error] token was shifted, then the
current token is discarded and the current state remains
unchanged, that is, the [action] function associated with [s]
is re-entered.
b. Otherwise, the [error] function associated with [s] is
invoked.
In case (b), immediately before invoking the [error] function, the
counter [env.shifted] is reset to -1. By convention, this means counter [env.shifted] is reset to -1. By convention, this means
that the current token is discarded and replaced with an [error] that the current token is discarded and replaced with an [error]
token. The [error] token transparently inherits the positions token. The [error] token transparently inherits the positions
...@@ -176,25 +166,7 @@ open Interface ...@@ -176,25 +166,7 @@ open Interface
reduction is unable to handle errors. reduction is unable to handle errors.
I note that a state that can handle [error] and has a default I note that a state that can handle [error] and has a default
reduction must in fact have a reduction action on [error]. reduction must in fact have a reduction action on [error]. *)
A state that can perform error recovery (that is, a state whose
incoming symbol is [error]) never performs a default reduction. The
reason why this is so is given in [Invariant]. A consequence of
this decision is that reduction is not performed until error
recovery is successful. This behavior could be surprising if it
were the default behavior; however, recall that error recovery is
disabled unless [--error-recovery] was specified.
I note that error recovery, case (a) above, can cause the parser to
enter an infinite loop. Indeed, the token stream is in principle
infinite -- for instance, many lexers will return an EOF token
forever after some finite supply of tokens has been exhausted. If
we hit EOF while in error recovery mode, and if EOF is not accepted
at the current state, we will keep discarding EOF and asking for a
new token. The way out of this situation is to design the grammar
in such a way that it cannot happen. We provide a warning to help
with this task. *)
(* The type of environments. *) (* The type of environments. *)
...@@ -909,39 +881,10 @@ let call_error_via_errorcase magic s = (* TEMPORARY document *) ...@@ -909,39 +881,10 @@ let call_error_via_errorcase magic s = (* TEMPORARY document *)
let call_assertfalse = let call_assertfalse =
EApp (EVar assertfalse, [ EVar "()" ]) EApp (EVar assertfalse, [ EVar "()" ])
(* ------------------------------------------------------------------------ *)
(* Emit a warning when a state can do error recovery but does not
accept EOF. This can lead to non-termination if the end of file
is reached while attempting to recover from an error. *)
let check_recoverer covered s =
match Terminal.eof with
| None ->
(* We do not know which token represents the end of file,
so we say nothing. *)
()
| Some eof ->
if not (TerminalSet.mem eof covered) then
(* This state has no (shift or reduce) action at EOF. *)
Error.warning []
(Printf.sprintf
"state %d can perform error recovery, but does not accept EOF.\n\
** Hitting the end of file during error recovery will cause non-termination."
(Lr1.number s))
(* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *)
(* Code production for the automaton functions. *) (* Code production for the automaton functions. *)
(* Count how many states actually perform error recovery. This figure (* Count how many states actually can peek at an error token. This
is, in general, inferior or equal to the number of states at which
[Invariant.recoverer] is true. Indeed, some of these states have a
default reduction, while some will accept every token; in either
case, error recovery is not performed. *)
let recoverers =
ref 0
(* Count how many states actually can peek at an error recovery. This
figure is, in general, inferior or equal to the number of states at figure is, in general, inferior or equal to the number of states at
which [Invariant.errorpeeker] is true, because some of these states which [Invariant.errorpeeker] is true, because some of these states
have a default reduction and will not consult the lookahead have a default reduction and will not consult the lookahead
...@@ -1146,15 +1089,9 @@ let errorbookkeeping e = ...@@ -1146,15 +1089,9 @@ let errorbookkeeping e =
handle the error token, by a series of reductions followed by a handle the error token, by a series of reductions followed by a
shift. shift.
In the simplest case, the state [s] cannot do error recovery. In We initiate error handling by first performing the standard
that case, we initiate error handling, which is done by first bookkeeping described above, then transferring control to the
performing the standard bookkeeping described above, then [error] function associated with [s].
transferring control to the [error] function associated with [s].
If, on the other hand, [s] can do error recovery, then we check
whether any tokens at all were shifted since the last error
occurred. If none were, then we discard the current token and
transfer control back to the [action] function associated with [s].
The token is discarded via a call to [discard], followed by The token is discarded via a call to [discard], followed by
resetting [env.shifted] to zero, to counter-act the effect of resetting [env.shifted] to zero, to counter-act the effect of
...@@ -1164,30 +1101,7 @@ let initiate covered s = ...@@ -1164,30 +1101,7 @@ let initiate covered s =
blet ( blet (
[ assertshifted ], [ assertshifted ],
errorbookkeeping (call_error_via_errorcase magic s)
if Invariant.recoverer s then begin
incr recoverers;
check_recoverer covered s;
EIfThenElse (
EApp (EVar "Pervasives.(=)", [ ERecordAccess (EVar env, fshifted); EIntConst 0 ]),
blet (
trace "Discarding last token read (%s)"
[ EApp (EVar print_token, [ ERecordAccess (EVar env, ftoken) ]) ] @
[
PVar token, EApp (EVar discard, [ EVar env ]);
PUnit, ERecordWrite (EVar env, fshifted, EIntConst 0)
],
call_action s
),
errorbookkeeping (call_error_via_errorcase magic s)
)
end
else
errorbookkeeping (call_error_via_errorcase magic s)
) )
(* This produces the definitions of the [run] and [action] functions (* This produces the definitions of the [run] and [action] functions
...@@ -1196,11 +1110,9 @@ let initiate covered s = ...@@ -1196,11 +1110,9 @@ let initiate covered s =
The [action] function implements the internal case analysis. It The [action] function implements the internal case analysis. It
receives the lookahead token as a parameter. It does not affect the receives the lookahead token as a parameter. It does not affect the
input stream. It does not set up exception handlers for dealing input stream. It does not set up exception handlers for dealing
with errors. The existence of this internal function is made with errors. *)
necessary by the error recovery mechanism (which discards tokens
when attempting to resynchronize after an error). In many states, (* TEMPORARY I believe [action] could now be inlined into [run] *)
recovery can in fact not be performed, so no self-call to [action]
will be generated and [action] will be inlined into [run]. *)
let rec runactiondef s : valdef list = let rec runactiondef s : valdef list =
...@@ -1825,10 +1737,8 @@ let program = { ...@@ -1825,10 +1737,8 @@ let program = {
let () = let () =
Error.logC 1 (fun f -> Error.logC 1 (fun f ->
Printf.fprintf f Printf.fprintf f
"%d out of %d states can peek at an error.\n\ "%d out of %d states can peek at an error.\n"
%d out of %d states can do error recovery.\n" !errorpeekers Lr1.n)
!errorpeekers Lr1.n
!recoverers Lr1.n)
let () = let () =
if not !can_die then if not !can_die then
......
...@@ -220,14 +220,7 @@ module Make (T : TABLE) = struct ...@@ -220,14 +220,7 @@ module Make (T : TABLE) = struct
and initiate env : void = and initiate env : void =
assert (env.shifted >= 0); assert (env.shifted >= 0);
if T.recovery && env.shifted = 0 then begin errorbookkeeping env
Log.discarding_last_token (T.token2terminal env.token);
discard env;
env.shifted <- 0;
action env
end
else
errorbookkeeping env
and errorbookkeeping env = and errorbookkeeping env =
Log.initiating_error_handling(); Log.initiating_error_handling();
......
...@@ -230,13 +230,6 @@ module type TABLE = sig ...@@ -230,13 +230,6 @@ module type TABLE = sig
val semantic_action: production -> semantic_action val semantic_action: production -> semantic_action
(* The LR engine can attempt error recovery. This consists in discarding
tokens, just after an error has been successfully handled, until a token
that can be successfully handled is found. This mechanism is optional.
The following flag enables it. *)
val recovery: bool
(* The LR engine requires a number of hooks, which are used for logging. *) (* The LR engine requires a number of hooks, which are used for logging. *)
(* The comments below indicate the conventional messages that correspond (* The comments below indicate the conventional messages that correspond
...@@ -276,10 +269,6 @@ module type TABLE = sig ...@@ -276,10 +269,6 @@ module type TABLE = sig
val handling_error: state -> unit val handling_error: state -> unit
(* Discarding last token read (<terminal>) *)
val discarding_last_token: terminal -> unit
end end
end end
......
...@@ -236,15 +236,7 @@ module Terminal = struct ...@@ -236,15 +236,7 @@ module Terminal = struct
Misc.mapi (n-1) f Misc.mapi (n-1) f
(* If a token named [EOF] exists, then it is assumed to represent (* If a token named [EOF] exists, then it is assumed to represent
ocamllex's [eof] pattern, which means that the lexer may ocamllex's [eof] pattern. *)
eventually produce an infinite stream of [EOF] tokens. This,
combined with our error recovery mechanism, may lead to
non-termination. We provide a warning against this somewhat
obscure situation.
Relying on the token's name is somewhat fragile, but this saves
introducing an extra keyword for declaring which token represents
[eof], and should not introduce much confusion. *)
let eof = let eof =
try try
......
...@@ -123,8 +123,8 @@ module Terminal : sig ...@@ -123,8 +123,8 @@ module Terminal : sig
(* This is the programmer-defined [EOF] token, if there is one. It (* This is the programmer-defined [EOF] token, if there is one. It
is recognized based solely on its name, which is fragile, but is recognized based solely on its name, which is fragile, but
this behavior is documented. This token is assumed to represent this behavior is documented. This token is assumed to represent
[ocamllex]'s [eof] pattern. It is used only in emitting warnings [ocamllex]'s [eof] pattern. It is used only by the reference
in [--error-recovery] mode. *) interpreter, and in a rather non-essential way. *)
val eof: t option val eof: t option
......
...@@ -688,49 +688,6 @@ let universal symbol = ...@@ -688,49 +688,6 @@ let universal symbol =
universal && (if represented s then SymbolMap.mem symbol (Lr1.transitions s) else true) universal && (if represented s then SymbolMap.mem symbol (Lr1.transitions s) else true)
) true ) true
(* ------------------------------------------------------------------------ *)
(* Discover which states potentially can do error recovery.
They are the states whose incoming symbol is [error]. At these
states, [env.shifted] is zero, that is, no tokens have been
successfully shifted since the last error token was shifted.
We do not include in this definition the states where [env.shifted]
*may be* zero. That would involve adding in all states reachable
from the above states via reductions. However, error recovery will
never be performed in these states. Indeed, imagine we shift an
error token and enter a state that can do error recovery, according
to the above definition. If, at this point, we consult the
lookahead token [tok] and perform a reduction, then the new state
that we reach is, by construction, able to act upon [tok], so no
error recovery will be performed at that state, even though
[env.shifted] is still zero. However, we must not perform default
reductions at states that can do error recovery, otherwise we break
this reasoning.
If the option [--error-recovery] was not provided on the command
line, then no states will perform error recovery. This makes things
simpler (and saves some code) in the common case where people are
not interested in error recovery. This also disables the warning
about states that can do error recovery but do not accept the EOF
token. *)
let recoverers =
if Settings.recovery then
Lr1.fold (fun recoverers node ->
match Lr1.incoming_symbol node with
| Some (Symbol.T tok)
when Terminal.equal tok Terminal.error ->
Lr1.NodeSet.add node recoverers
| _ ->
recoverers
) Lr1.NodeSet.empty
else
Lr1.NodeSet.empty
let recoverer node =
Lr1.NodeSet.mem node recoverers
(* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *)
(* Discover which states can peek at an error. These are the states (* Discover which states can peek at an error. These are the states
where [env.shifted] may be -1, that is, where an error token may be where [env.shifted] may be -1, that is, where an error token may be
...@@ -782,15 +739,6 @@ let errorpeeker node = ...@@ -782,15 +739,6 @@ let errorpeeker node =
the lookahead token. This saves code, but can alter the parser's the lookahead token. This saves code, but can alter the parser's
behavior in the presence of errors. behavior in the presence of errors.
A state that can perform error recovery (that is, a state whose
incoming symbol is [error]) never performs a default
reduction. This is explained above. Actually, we allow one
exception: if the state has a single (reduction) action on "#", as
explained in the next paragraph, then we perform this default
reduction and do not allow error recovery to take place. Error
recovery would not make much sense, since we believe we are at the
end of file.
The check for default actions subsumes the check for the case where The check for default actions subsumes the check for the case where
[s] admits a reduce action with lookahead symbol "#". In that case, [s] admits a reduce action with lookahead symbol "#". In that case,
it must be the only possible action -- see it must be the only possible action -- see
...@@ -836,12 +784,9 @@ let (has_default_reduction : Lr1.node -> (Production.index * TerminalSet.t) opti ...@@ -836,12 +784,9 @@ let (has_default_reduction : Lr1.node -> (Production.index * TerminalSet.t) opti
| Some (_, toks) as reduction | Some (_, toks) as reduction
when SymbolMap.purelynonterminal (Lr1.transitions s) -> when SymbolMap.purelynonterminal (Lr1.transitions s) ->
if TerminalSet.mem Terminal.sharp toks then if TerminalSet.mem Terminal.sharp toks then
(* Perform default reduction on "#". *) (* Perform default reduction on "#". *)
reduction reduction
else if recoverer s then
(* Do not perform default reduction. Allow error recovery. *)
None
else begin else begin
(* Perform default reduction, unless [--canonical] has been specified. *) (* Perform default reduction, unless [--canonical] has been specified. *)
match Settings.construction_mode with match Settings.construction_mode with
......
...@@ -9,9 +9,8 @@ ...@@ -9,9 +9,8 @@
need to physically exist on the stack at runtime) and which symbols need to physically exist on the stack at runtime) and which symbols
need to keep track of (start or end) positions. need to keep track of (start or end) positions.
It also determines which automaton states could potentially perform It also determines which automaton states could have to deal with an
error recovery, and which states could have to deal with an [error] [error] token. *)
token. *)
open Grammar open Grammar
...@@ -90,11 +89,6 @@ val endp: Symbol.t -> bool ...@@ -90,11 +89,6 @@ val endp: Symbol.t -> bool
(* ------------------------------------------------------------------------- *) (* ------------------------------------------------------------------------- *)
(* Information about error handling. *) (* Information about error handling. *)
(* [recoverer s] tells whether state [s] can potentially do error
recovery. *)
val recoverer: Lr1.node -> bool
(* [errorpeeker s] tells whether state [s] can potentially peek at an (* [errorpeeker s] tells whether state [s] can potentially peek at an
error. This is the case if, in state [s], [env.shifted] may be -1, error. This is the case if, in state [s], [env.shifted] may be -1,
that is, if an error token may be on the stream. *) that is, if an error token may be on the stream. *)
......
...@@ -164,12 +164,6 @@ module T = struct ...@@ -164,12 +164,6 @@ module T = struct
next = stack next = stack
} }
(* The reference interpreter performs error recovery if and only if this
is requested via [--recovery]. *)
let recovery =
Settings.recovery
module Log = struct module Log = struct
open Printf open Printf
...@@ -227,11 +221,6 @@ module T = struct ...@@ -227,11 +221,6 @@ module T = struct
fprintf stderr "Handling error in state %d" (Lr1.number s) fprintf stderr "Handling error in state %d" (Lr1.number s)
) )
let discarding_last_token tok =
maybe (fun () ->
fprintf stderr "Discarding last token read (%s)" (Terminal.print tok)
)
end end
end end
......
...@@ -163,7 +163,7 @@ let options = Arg.align [ ...@@ -163,7 +163,7 @@ let options = Arg.align [
"--coq-no-actions", Arg.Set coq_no_actions, " (undocumented)"; "--coq-no-actions", Arg.Set coq_no_actions, " (undocumented)";