Commit 4b46e509 authored by POTTIER Francois's avatar POTTIER Francois
Browse files

Added Jacques-Henri's documentation for the Coq back-end.

Not yet proof-read.
parent ab4fb4d4
......@@ -130,6 +130,9 @@
% Adding mathstruts to ensure a common baseline.
......@@ -94,6 +94,8 @@ the \obase switch \emph{must} be used.
\docswitch{\ocomment} This switch causes a few comments to be inserted into the
\ocaml code that is written to the \ml file.
\docswitch{\ocoq} This switch causes \menhir to produce Coq code. See \sref{sec:coq}.
\docswitch{\odepend} This switch causes \menhir to generate dependency information
for use in conjunction with \make. When invoked in this mode, \menhir does not
generate a parser. Instead, it examines the grammar specification and prints a
......@@ -1909,6 +1911,101 @@ in the near future.
% ---------------------------------------------------------------------------------------------------------------------
\section{Coq back-end}
Using the Coq proof assistant, \menhir is able to generate formally
verified parsers~\cite{jourdan-leroy-pottier-12}: when used with the
\ocoq option, menhir shall be given a \texttt{.vy}, and will generate
a \texttt{.v} file. This file contains both a description of the
grammar and semantic actions in the Coq language. \texttt{.vy} files
use the same syntax as \texttt{.mly} files. There are however, several
additional restrictions:
\item Error handling is not supported: thus, the use of
\verb+$syntaxerror+ or the \error token is not supported.
\item Location information is not propagated: the input file cannot
contain any \verb+$start*+ or \verb+$end*+ special keywords.
\item \dparameter is not supported.
\item \dinline is not supported.
\item The standard library is not supported.
\item As the type inference algorithm of Coq is rather unpredictable,
every nonterminal symbol has to be annotated with either the \dtype
or the \dstart keyword.
\item If the proof of completeness has not been deactivated via
\ocoqnocomplete, the grammar cannot contains conflicts, even
benign. The reason is that there is no simple formal specification
of how conflict resolution should work.
The generated file contains several modules:
\item Module \verb+Gram+ contains the types of terminals and
non-terminals, the grammar description, and semantic actions.
\item Module \verb+Aut+ contains description of the automaton
generated by \menhir, with a certificate that will be checked in
order to prove the soundness and completeness of the parser.
The type of terminals is a big inductive type, with a constructor for
each terminal, but it does not contains the semantic values: we also
define the type \verb+token+, which is a dependant pair of a terminal
and a semantic value of the corresponding type. Thus, we model the
lexer by an object of type \verb+Stream.Stream token+.
The proof of termination of an $LR(1)$ parser in the case of invalid
input is far from obvious, and we did not find one in the
literature. This question is generally not considered important, and
we decided to leave it not formally proved. This is why we use the
``fuel'' technique: there is an additionnal parameter of type
\verb+nat+ that indicate the maximum number of steps the parser is
allowed to perform. In practice, one can use the usuall trick of
passing the OCaml value defined by \verb+let rec inf = S inf+, after
The parsing can have three different outcomes, represented by the type
Inductive parse_result :=
| Fail_pr: parse_result
| Timeout_pr: parse_result
| Parsed_pr: symbol_semantic_type (NT (start_nt init)) -> Stream token -> parse_result.
\verb+Fail_pr+ means parsing has failed because of a parsing error
(that is, if the completeness has been proved, the input is not
valid). \verb+Timeout_pr+ means the fuel parameter has been
exhausted. This cannot happen if given an infinite
fuel. \verb+Parsed_pr+ corresponds to the success case: it contains
the parsed semantic value and the remaining stream of tokens.
For each entry point \verb+entry_point+ of the grammar of type
\verb+typ+, menhir generates a function \verb+entry_point+ of type
\verb+nat -> Stream token -> parse_result typ+: this is the parsing
function. Two theorems are provided, \verb+entry_point_correct+ and
Parsers generated with the Coq back-end need to be compiled with
libraries: those can be found in the CompCert tree, under the
\verb+cparser/validator+ subdirectory. Additionnally, CompCert can be
used as an example if one wants to use menhir to generate formally
verified parsers in another project.
Option \ocoqnocomplete can be passed to \menhir in order to avoid
generated proofs of completeness: they are incompatible with conflicts
(even benign ones), and, for complex grammars, completeness can use
heavy certificates and its validation can take time.
\ocoqnoactions can be used to test the Coq back-end with a standard
\menhir grammar, written for \ocaml: it just ignores all the semantic
actions, replacing them by the unique inhabitant of the \verb+unit+
% ---------------------------------------------------------------------------------------------------------------------
\section{Comparison with \ocamlyacc}
Here is an incomplete list of the differences between \ocamlyacc and \menhir.
......@@ -2090,8 +2187,13 @@ has GADTs, but, as the saying goes, ``if it ain't broken, don't fix it''.
\menhir's interpreter (\ointerpret) and table-based back-end (\otable) were
implemented by Guillaume Bau, Raja Boujbel, and François Pottier. The project
was generously funded by Jane Street Capital, LLC through the ``OCaml Summer
Project'' initiative. Frédéric Bour provided motivation and an initial
implementation for the incremental API.
Project'' initiative.
Frédéric Bour provided motivation and an initial implementation for the
incremental API.
Jacques-Henri Jourdan designed and implemented the Coq back-end and did the
Coq proofs that come with it.
% ---------------------------------------------------------------------------------------------------------------------
% Bibliography.
......@@ -158,9 +158,9 @@ let options = Arg.align [
"--base", Arg.Set_string base, "<basename> Specifies a base name for the output file(s)";
"--canonical", Arg.Unit (fun () -> construction_mode := ModeCanonical), " Construct a canonical Knuth LR(1) automaton";
"--comment", Arg.Set comment, " Include comments in the generated code";
"--coq", Arg.Set coq, " (undocumented)";
"--coq-no-complete", Arg.Set coq_no_complete, " (undocumented)";
"--coq-no-actions", Arg.Set coq_no_actions, " (undocumented)";
"--coq", Arg.Set coq, " Generate a formally verified parser, in Coq";
"--coq-no-complete", Arg.Set coq_no_complete, " Do not generate a proof of completeness";
"--coq-no-actions", Arg.Set coq_no_actions, " Ignore semantic actions in the Coq output";
"--depend", Arg.Unit (fun () -> depend := OMPostprocess), " Invoke ocamldep and display dependencies";
"--dump", Arg.Set dump, " Describe the automaton in <basename>.automaton";
"--error-recovery", Arg.Set recovery, " (no longer supported)";
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment