manual.tex 204 KB
 fpottier committed Mar 01, 2013 1 2 \def\true{true} \let\fpacm\true  POTTIER Francois committed Jan 05, 2017 3 \documentclass[onecolumn,11pt,nocopyrightspace,preprint]{sigplanconf}  fpottier committed Mar 01, 2013 4 \usepackage{amstext}  fpottier committed Mar 02, 2013 5 \usepackage[T1]{fontenc}  POTTIER Francois committed Jan 06, 2017 6 \usepackage[utf8]{inputenc}  POTTIER Francois committed Dec 29, 2014 7 \usepackage{moreverb}  fpottier committed Mar 02, 2013 8 \usepackage{tikz}  fpottier committed Mar 01, 2013 9 10 \usepackage{xspace} \usepackage{mymacros}  fpottier committed Mar 02, 2013 11 \def\fppdf{true}  fpottier committed Mar 01, 2013 12 13 14 \usepackage{fppdf} \input{macros} \input{version}  POTTIER Francois committed Jan 05, 2017 15 16 17 18 % Let Menhir's version number appear at the bottom right of every page. \makeatletter \def\@formatyear{\menhirversion} \makeatother  fpottier committed Mar 01, 2013 19   Gabriel Scherer committed May 19, 2018 20 21 22 23 24 25 26 27 28 29 30 % Hevea-specific logic; see % http://hevea.inria.fr/doc/manual007.html % http://hevea.inria.fr/doc/manual008.html \usepackage{hevea} \newenvironment{heveapicture}{ \begin{toimage} }{ \end{toimage} \imageflush{} }  POTTIER Francois committed Apr 06, 2017 31 32 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 33 34 35 36 37 38 % Headings. \title{\menhir Reference Manual\\\normalsize (version \menhirversion)} \begin{document}  POTTIER Francois committed Jan 06, 2017 39 \authorinfo{François Pottier\and Yann Régis-Gianas}  fpottier committed Mar 01, 2013 40 41 42 43 44  {INRIA} {\{Francois.Pottier, Yann.Regis-Gianas\}@inria.fr} \maketitle  POTTIER Francois committed Apr 06, 2017 45 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 46 47 48 49 50  \clearpage \tableofcontents \clearpage  POTTIER Francois committed Apr 06, 2017 51 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 52 53 54 55 56  \section{Foreword} \menhir is a parser generator. It turns high-level grammar specifications, decorated with semantic actions expressed in the \ocaml programming  POTTIER Francois committed Jul 12, 2017 57 language~\cite{ocaml}, into parsers, again expressed in \ocaml. It is  fpottier committed Mar 01, 2013 58 59 based on Knuth's LR(1) parser construction technique~\cite{knuth-lr-65}. It is strongly inspired by its precursors: \yacc~\cite{johnson-yacc-79},  POTTIER Francois committed Jul 12, 2017 60 \texttt{ML-Yacc}~\cite{tarditi-appel-00}, and \ocamlyacc~\cite{ocaml},  fpottier committed Mar 01, 2013 61 62 63 64 65 66 67 68 69 70 but offers a large number of minor and major improvements that make it a more modern tool. This brief reference manual explains how to use \menhir. It does not attempt to explain context-free grammars, parsing, or the LR technique. Readers who have never used a parser generator are encouraged to read about these ideas first~\cite{aho-86,appel-tiger-98,hopcroft-motwani-ullman-00}. They are also invited to have a look at the \distrib{demos} directory in \menhir's distribution.  POTTIER Francois committed Oct 27, 2015 71 Potential users of \menhir should be warned that \menhir's feature set is not  POTTIER Francois committed Dec 02, 2014 72 73 74 75 76 77 78 79 80 81 completely stable. There is a tension between preserving a measure of compatibility with \ocamlyacc, on the one hand, and introducing new ideas, on the other hand. Some aspects of the tool, such as the error handling mechanism, are still potentially subject to incompatible changes: for instance, in the future, the current error handling mechanism (which is based on the \error token, see \sref{sec:errors}) could be removed and replaced with an entirely different mechanism. There is room for improvement in the tool and in this reference manual. Bug reports and suggestions are welcome!  fpottier committed Mar 01, 2013 82   POTTIER Francois committed Apr 06, 2017 83 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 84 85 86 87 88 89 90  \section{Usage} \menhir is invoked as follows: \begin{quote} \cmenhir \nt{option} \ldots \nt{option} \nt{filename} \ldots \nt{filename} \end{quote}  POTTIER Francois committed Apr 06, 2017 91 92 Each of the file names must end with \mly (unless \ocoq is used, in which case it must end with \vy) and denotes a partial  fpottier committed Mar 01, 2013 93 94 grammar specification. These partial grammar specifications are joined (\sref{sec:split}) to form a single, self-contained grammar specification,  POTTIER Francois committed Apr 06, 2017 95 which is then processed. The following optional command line switches allow  fpottier committed Mar 01, 2013 96 97 98 99 100 101 102 103 controlling many aspects of the process. \docswitch{\obase \nt{basename}} This switch controls the base name of the \ml and \mli files that are produced. That is, the tool will produce files named \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. Note that \nt{basename} can contain occurrences of the \texttt{/} character, so it really specifies a path and a base name. When only one \nt{filename} is provided on the command line, the default \nt{basename} is obtained by  POTTIER Francois committed Apr 06, 2017 104 depriving \nt{filename} of its final \mly suffix. When multiple file  fpottier committed Mar 01, 2013 105 106 107 names are provided on the command line, no default base name exists, so that the \obase switch \emph{must} be used.  POTTIER Francois committed May 22, 2018 108 \docswitch{\ocmly} This switch causes \menhir to produce a \cmly file in  POTTIER Francois committed Apr 06, 2017 109 110 111 addition to its normal operation. This file contains a (binary-form) representation of the grammar and automaton (see \sref{sec:sdk}).  fpottier committed Mar 01, 2013 112 113 114 \docswitch{\ocomment} This switch causes a few comments to be inserted into the \ocaml code that is written to the \ml file.  POTTIER Francois committed Oct 27, 2015 115 116 117 118 119 120 \docswitch{\ocompareerrors \nt{filename1} \ocompareerrors \nt{filename2}} Two such switches must always be used in conjunction so as to specify the names of two \messages files, \nt{filename1} and \nt{filename2}. Each file is read and internally translated to a mapping of states to messages. \menhir then checks that the left-hand mapping is a subset of the right-hand mapping. This feature is typically used in conjunction with \olisterrors to check that \nt{filename2}  POTTIER Francois committed Oct 27, 2015 121 is complete (that is, covers all states where an error can occur).  POTTIER Francois committed Oct 27, 2015 122 123 124 125 For more information, see \sref{sec:errors:new}. \docswitch{\ocompileerrors \nt{filename}} This switch causes \menhir to read the file \nt{filename}, which must obey the \messages file format, and to compile  POTTIER Francois committed May 22, 2018 126 it to an \ocaml function that maps a state number to a message. The \ocaml code  POTTIER Francois committed Oct 27, 2015 127 128 129 130 is sent to the standard output channel. At the same time, \menhir checks that the collection of input sentences in the file \nt{filename} is correct and irredundant. For more information, see \sref{sec:errors:new}.  POTTIER Francois committed Dec 18, 2014 131 132 \docswitch{\ocoq} This switch causes \menhir to produce Coq code. See \sref{sec:coq}.  POTTIER Francois committed May 30, 2018 133 134 135 136 137 138 139 140 141 \docswitch{\ocoqlibpath \nt{path}} This switch allows specifying under what name (or path) the Coq support library MenhirLib is known to Coq. When \menhir runs in \ocoq mode, the generated parser contains references to several modules in this library. This path is used to qualify these references. Its default value is \texttt{MenhirLib}. \docswitch{\ocoqlibnopath} This switch indicates that references to the Coq library MenhirLib should \emph{not} be qualified. This was the default behavior of \menhir prior to 2018/05/30. This switch is provided for compatibility, but normally should not be used.  POTTIER Francois committed Dec 22, 2014 142 \docswitch{\ocoqnoactions} (Used in conjunction with \ocoq.) This switch  POTTIER Francois committed Apr 06, 2017 143 causes the semantic actions present in the \vy file to be ignored and  POTTIER Francois committed Dec 22, 2014 144 replaced with \verb+tt+, the unique inhabitant of Coq's \verb+unit+ type. This  POTTIER Francois committed Oct 27, 2015 145 feature can be used to test the Coq back-end with a standard grammar, that is, a  POTTIER Francois committed Dec 22, 2014 146 grammar that contains \ocaml semantic actions. Just rename the file from  POTTIER Francois committed Apr 06, 2017 147 \mly to \vy and set this switch.  POTTIER Francois committed Dec 22, 2014 148 149 150 151 152 153 154 155 156  \docswitch{\ocoqnocomplete} (Used in conjunction with \ocoq.) This switch disables the generation of the proof of completeness of the parser (\sref{sec:coq}). This can be necessary because the proof of completeness is possible only if the grammar has no conflict (not even a benign one, in the sense of \sref{sec:conflicts:benign}). This can be desirable also because, for a complex grammar, completeness may require a heavy certificate and its validation by Coq may take time.  POTTIER Francois committed May 22, 2018 157 \docswitch{\odepend} See \sref{sec:build}.  fpottier committed Mar 01, 2013 158 159 160 161  \docswitch{\odump} This switch causes a description of the automaton to be written to the file \nt{basename}\automaton.  POTTIER Francois committed Jun 05, 2016 162 \docswitch{\oechoerrors \nt{filename}} This switch causes \menhir to  POTTIER Francois committed Oct 27, 2015 163 164 165 166 read the \messages file \nt{filename} and to produce on the standard output channel just the input sentences. (That is, all messages, blank lines, and comments are filtered out.) For more information, see \sref{sec:errors:new}.  fpottier committed Mar 01, 2013 167 168 169 170 171 172 173 174 175 176 177 \docswitch{\oexplain} This switch causes conflict explanations to be written to the file \nt{basename}\conflicts. See also \sref{sec:conflicts}. \docswitch{\oexternaltokens \nt{T}} This switch causes the definition of the \token type to be omitted in \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. Instead, the generated parser relies on the type $T$\texttt{.}\token, where $T$ is an \ocaml module name. It is up to the user to define module $T$ and to make sure that it exports a suitable \token type. Module $T$ can be hand-written. It can also be automatically generated out of a grammar specification using the \oonlytokens switch.  fpottier committed Mar 02, 2013 178 \docswitch{\ofixedexc} This switch causes the exception \texttt{Error} to be  fpottier committed Mar 02, 2013 179 180 internally defined as a synonym for \texttt{Parsing.Parse\_error}. This means that an exception handler that catches \texttt{Parsing.Parse\_error} will also  POTTIER Francois committed Oct 27, 2015 181 catch the generated parser's \texttt{Error}. This helps increase \menhir's  fpottier committed Mar 02, 2013 182 compatibility with \ocamlyacc. There is otherwise no reason to use this switch.  fpottier committed Mar 02, 2013 183   fpottier committed Mar 01, 2013 184 185 186 187 188 189 \docswitch{\ograph} This switch causes a description of the grammar's dependency graph to be written to the file \nt{basename}\dott. The graph's vertices are the grammar's nonterminal symbols. There is a directed edge from vertex $A$ to vertex $B$ if the definition of $A$ refers to $B$. The file is in a format that is suitable for processing by the \emph{graphviz} toolkit.  POTTIER Francois committed May 22, 2018 190 \docswitch{\oinfer, \oinferwrite, \oinferread} See \sref{sec:build}.  fpottier committed Mar 01, 2013 191   POTTIER Francois committed Aug 27, 2015 192 193 194 195 196 197 \docswitch{\oinspection} This switch requires \otable. It causes \menhir to generate not only the monolithic and incremental APIs (\sref{sec:monolithic}, \sref{sec:incremental}), but also the inspection API (\sref{sec:inspection}). Activating this switch causes a few more tables to be produced, resulting in somewhat larger code size.  fpottier committed Mar 01, 2013 198 199 200 \docswitch{\ointerpret} This switch causes \menhir to act as an interpreter, rather than as a compiler. No \ocaml code is generated. Instead, \menhir reads sentences off the standard input channel, parses them, and displays  POTTIER Francois committed Oct 27, 2015 201 202 203 204 205 206 207 outcomes. This switch can be usefully combined with \otrace. For more information, see \sref{sec:interpret}. \docswitch{\ointerpreterror} This switch is analogous to \ointerpret, except \menhir expects every sentence to cause an error on its last token, and displays information about the state in which the error is detected, in the \messages file format. For more information, see \sref{sec:errors:new}.  fpottier committed Mar 01, 2013 208 209 210 211 212  \docswitch{\ointerpretshowcst} This switch, used in conjunction with \ointerpret, causes \menhir to display a concrete syntax tree when a sentence is successfully parsed. For more information, see \sref{sec:interpret}.  POTTIER Francois committed Oct 27, 2015 213 214 215 216 \docswitch{\olisterrors} This switch causes \menhir to produce (on the standard output channel) a complete list of input sentences that cause an error, in the \messages file format. For more information, see \sref{sec:errors:new}.  fpottier committed Mar 01, 2013 217 218 219 220 221 222 223 224 225 226 227 228 229 \docswitch{\ologautomaton \nt{level}} When \nt{level} is nonzero, this switch causes some information about the automaton to be logged to the standard error channel. \docswitch{\ologcode \nt{level}} When \nt{level} is nonzero, this switch causes some information about the generated \ocaml code to be logged to the standard error channel. \docswitch{\ologgrammar \nt{level}} When \nt{level} is nonzero, this switch causes some information about the grammar to be logged to the standard error channel. When \nt{level} is 2, the \emph{nullable}, \emph{FIRST}, and \emph{FOLLOW} tables are displayed.  POTTIER Francois committed Oct 24, 2018 230 231 232 \docswitch{\onodollars} This switch disallows the use of positional keywords of the form \kw{\$i}.  fpottier committed Mar 01, 2013 233 234 235 236 \docswitch{\onoinline} This switch causes all \dinline keywords in the grammar specification to be ignored. This is especially useful in order to understand whether these keywords help solve any conflicts.  POTTIER Francois committed May 22, 2018 237 \docswitch{\onostdlib} This switch instructs \menhir to \emph{not} use  POTTIER Francois committed Jan 01, 2017 238 its standard library (\sref{sec:library}).  fpottier committed Mar 01, 2013 239   POTTIER Francois committed May 22, 2018 240 \docswitch{\oocamlc \nt{command}} See \sref{sec:build}.  fpottier committed Mar 01, 2013 241   POTTIER Francois committed May 22, 2018 242 \docswitch{\oocamldep \nt{command}} See \sref{sec:build}.  fpottier committed Mar 01, 2013 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259  \docswitch{\oonlypreprocess} This switch causes the grammar specifications to be transformed up to the point where the automaton's construction can begin. The grammar specifications whose names are provided on the command line are joined (\sref{sec:split}); all parameterized nonterminal symbols are expanded away (\sref{sec:templates}); type inference is performed, if \oinfer is enabled; all nonterminal symbols marked \dinline are expanded away (\sref{sec:inline}). This yields a single, monolithic grammar specification, which is printed on the standard output channel. \docswitch{\oonlytokens} This switch causes the \dtoken declarations in the grammar specification to be translated into a definition of the \token type, which is written to the files \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. No code is generated. This is useful when a single set of tokens is to be shared between several parsers. The directory \distrib{demos/calc-two} contains a demo that illustrates the use of this switch.  POTTIER Francois committed May 22, 2018 260 \docswitch{\orawdepend} See \sref{sec:build}.  fpottier committed Mar 01, 2013 261   Nicolás Ojeda Bär committed Sep 24, 2019 262 \docswitch{\ostdlib \nt{directory}} This switch exists only for  POTTIER Francois committed Sep 24, 2019 263 backwards compatibility and is ignored. It may be removed in the future.  POTTIER Francois committed Jan 01, 2017 264   fpottier committed Mar 01, 2013 265 266 267 268 269 270 271 \docswitch{\ostrict} This switch causes several warnings about the grammar and about the automaton to be considered errors. This includes warnings about useless precedence declarations, non-terminal symbols that produce the empty language, unreachable non-terminal symbols, productions that are never reduced, conflicts that are not resolved by precedence declarations, and end-of-stream conflicts.  POTTIER Francois committed May 22, 2018 272 \docswitch{\oo{suggest-*}} See \sref{sec:build}.  POTTIER Francois committed Dec 30, 2015 273   fpottier committed Mar 01, 2013 274 275 \docswitch{\otable} This switch causes \menhir to use its table-based back-end, as opposed to its (default) code-based back-end. When \otable is  POTTIER Francois committed Dec 13, 2014 276 277 used, \menhir produces significantly more compact and somewhat slower parsers. See \sref{sec:qa} for a speed comparison.  fpottier committed Mar 01, 2013 278 279 280 281 282 283 284 285 286 287 288 289  The table-based back-end produces rather compact tables, which are analogous to those produced by \yacc, \bison, or \ocamlyacc. These tables are not quite stand-alone: they are exploited by an interpreter, which is shipped as part of the support library \menhirlib. For this reason, when \otable is used, \menhirlib must be made visible to the \ocaml compilers, and must be linked into your executable program. The \texttt{--suggest-*} switches, described above, help do this. The code-based back-end compiles the LR automaton directly into a nest of mutually recursive \ocaml functions. In that case, \menhirlib is not required.  POTTIER Francois committed Aug 27, 2015 290 291 292 The incremental API (\sref{sec:incremental}) and the inspection API (\sref{sec:inspection}) are made available only by the table-based back-end.  fpottier committed Mar 01, 2013 293 294 295 296 297 298 299 300 301 \docswitch{\otimings} This switch causes internal timing information to be sent to the standard error channel. \docswitch{\otrace} This switch causes tracing code to be inserted into the generated parser, so that, when the parser is run, its actions are logged to the standard error channel. This is analogous to \texttt{ocamlrun}'s \texttt{p=1} parameter, except this switch must be enabled at compile time: one cannot selectively enable or disable tracing at runtime.  POTTIER Francois committed Dec 22, 2017 302 303 304 305 306 307 308 309 310 311 \docswitch{\oignoreprec} This switch suppresses all warnings about useless \dleft, \dright, \dnonassoc and \dprec declarations. \docswitch{\oignoreone \nt{symbol}} This switch suppresses the warning that is normally emitted when \menhir finds that the terminal symbol \nt{symbol} is unused. \docswitch{\oignoreall} This switch suppresses all of the warnings that are normally emitted when \menhir finds that some terminal symbols are unused.  POTTIER Francois committed Jun 05, 2016 312 \docswitch{\oupdateerrors \nt{filename}} This switch causes \menhir to  POTTIER Francois committed Oct 27, 2015 313 314 315 316 317 read the \messages file \nt{filename} and to produce on the standard output channel a new \messages file that is identical, except the auto-generated comments have been re-generated. For more information, see \sref{sec:errors:new}.  fpottier committed Mar 01, 2013 318 319 320 \docswitch{\oversion} This switch causes \menhir to print its own version number and exit.  POTTIER Francois committed Apr 06, 2017 321 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 322 323 324  \section{Lexical conventions}  POTTIER Francois committed Nov 12, 2018 325 326 327 328 329 330 331 332 333 334 335 336 337 338 A semicolon character (\kw{;}) \emph{may} appear after a declaration (\sref{sec:decls}). An old-style rule (\sref{sec:old:rules}) \emph{may} be terminated with a semicolon. Also, within an old-style rule, each producer (\sref{sec:producers}) \emph{may} be terminated with a semicolon. A new-style rule (\sref{sec:new:rules}) \emph{must not} be terminated with a semicolon. Within such a rule, the elements of a sequence \emph{must} be separated with semicolons. Semicolons are not allowed to appear anywhere except in the places mentioned above. This is in contrast with \ocamlyacc, which views semicolons as insignificant, just like whitespace.  fpottier committed Mar 01, 2013 339 340 341 342 343 344  Identifiers (\nt{id}) coincide with \ocaml identifiers, except they are not allowed to contain the quote (\kw{'}) character. Following \ocaml, identifiers that begin with a lowercase letter (\nt{lid}) or with an uppercase letter (\nt{uid}) are distinguished.  POTTIER Francois committed Oct 26, 2018 345 346 347 348 A quoted identifier \nt{qid} is a string enclosed in double quotes. Such a string cannot contain a double quote or a backslash. Quoted identifiers are used as token aliases (\sref{sec:tokens}).  fpottier committed Mar 01, 2013 349 350 351 352 353 354 355 356 357 Comments are C-style (surrounded with \kw{/*} and \kw{*/}, cannot be nested), C++-style (announced by \kw{/$\!$/} and extending until the end of the line), or \ocaml-style (surrounded with \kw{(*} and \kw{*)}, can be nested). Of course, inside \ocaml code, only \ocaml-style comments are allowed. \ocaml type expressions are surrounded with \kangle{and}. Within such expressions, all references to type constructors (other than the built-in \textit{list}, \textit{option}, etc.) must be fully qualified.  POTTIER Francois committed Apr 06, 2017 358 % ------------------------------------------------------------------------------  fpottier committed Mar 01, 2013 359 360 361  \section{Syntax of grammar specifications}  POTTIER Francois committed Nov 12, 2018 362 \newcommand{\modifier}{(\,\dquestion \metachoice \dplus \metachoice \dstar\hspace{-.3mm})}  POTTIER Francois committed Nov 12, 2018 363   fpottier committed Mar 01, 2013 364 365 366 367 368 369 370 371 \begin{figure} \begin{center} \begin{tabular}{r@{}c@{}l} \nt{specification} \is \sepspacelist{\nt{declaration}} \percentpercent \sepspacelist{\nt{rule}}  POTTIER Francois committed May 22, 2018 372  \optional{\percentpercent \textit{\ocaml code}} \\  fpottier committed Mar 01, 2013 373 374  \nt{declaration} \is  POTTIER Francois committed May 22, 2018 375  \dheader{\textit{\ocaml code}} \\  fpottier committed Mar 01, 2013 376 && \dparameter \ocamlparam \\  POTTIER Francois committed Oct 26, 2018 377 && \dtoken \optional{\ocamltype} \sepspacelist{\nt{uid} \optional{\nt{qid}}} \\  fpottier committed Mar 01, 2013 378 379 380 381 382 && \dnonassoc \sepspacelist{\nt{uid}} \\ && \dleft \sepspacelist{\nt{uid}} \\ && \dright \sepspacelist{\nt{uid}} \\ && \dtype \ocamltype \sepspacelist{\nt{lid}} \\ && \dstart \optional{\ocamltype} \sepspacelist{\nt{lid}} \\  POTTIER Francois committed Oct 25, 2018 383 384 && \dattribute \sepspacelist{\nt{actual}} \sepspacelist{\nt{attribute}} \\ && \kw{\%} \nt{attribute} \\ % a grammar-wide attribute  POTTIER Francois committed Oct 27, 2015 385 && \donerrorreduce \sepspacelist{\nt{lid}} \\  POTTIER Francois committed Nov 12, 2018 386 387 388 \nt{attribute} \is \kw{[@} \nt{name} \nt{payload} \kw{]} \\[4mm]  fpottier committed Mar 01, 2013 389   POTTIER Francois committed Nov 12, 2018 390 \emph{old syntax} ---  fpottier committed Mar 01, 2013 391 392 393 \nt{rule} \is \optional{\dpublic} \optional{\dinline} \nt{lid}  POTTIER Francois committed Nov 12, 2018 394  \oparams{\nt{id}}  fpottier committed Mar 01, 2013 395  \deuxpoints  POTTIER Francois committed Nov 12, 2018 396  \precseplist\barre{\nt{group}} \\  fpottier committed Mar 01, 2013 397 398 399 400 401 402 403 404 405 406 407 408 409  \nt{group} \is \seplist{\ \barre}{\nt{production}} \daction \optional {\dprec \nt{id}} \\ \nt{production} \is \sepspacelist{\nt{producer}} \optional {\dprec \nt{id}} \\ \nt{producer} \is \optional{\nt{lid} \dequal} \nt{actual} \\ \nt{actual} \is  POTTIER Francois committed Nov 12, 2018 410 411  \nt{id} \oparams{\nt{actual}} \\ && \nt{actual} \modifier \\  POTTIER Francois committed Feb 15, 2015 412 && \seplist{\ \barre}{\nt{group}} % not really allowed everywhere  POTTIER Francois committed Nov 12, 2018 413 \\[4mm]  fpottier committed Mar 01, 2013 414   POTTIER Francois committed Nov 12, 2018 415 416 417 418 419 420 \emph{new syntax} --- \nt{rule} \is \optional{\dpublic} \dlet \nt{lid} \oparams{\nt{id}}  POTTIER Francois committed Nov 12, 2018 421  (\,\dcolonequal \metachoice \dequalequal\hspace{-.2mm})  POTTIER Francois committed Nov 12, 2018 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458  \expression \\ \expression \is % a choice between sequence expressions: \precseplist\barre\expression \\&& % a sequence expression: \optional{\pattern \dequal{}} \expression \dsemi \expression \\&& % a symbol expression: \nt{id} \oparams{\expression} \\&& % a symbol expression: \expression \modifier \\&& % an action expression: \daction \optional {\dprec \nt{id}} % %prec is in fact allowed to appear before the semantic action, % but this is not documented. \\&& % an action expression: \dpfaction{\nt{\ocaml id}} \optional {\dprec \nt{id}} \\ \pattern \is \nt{lid} \,\metachoice\, \dunderscore \,\metachoice\, \dtilde \,\metachoice\, \tuple\pattern % The places where attributes can be attached are not shown in this % figure. This is intentional; let's avoid pollution. Attributes are % described separately.  fpottier committed Mar 01, 2013 459 460 461 462 463 464 \end{tabular} \end{center} \caption{Syntax of grammar specifications} \label{fig:syntax} \end{figure}  POTTIER Francois committed Nov 12, 2018 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 The syntax of grammar specifications appears in \fref{fig:syntax}. The places where attributes can be attached are not shown; they are documented separately (\sref{sec:attributes}). % % (For compatibility with \ocamlyacc, some specifications that do not fully % adhere to this syntax are also accepted.) % A grammar specification begins with a sequence of declarations (\sref{sec:decls}), ended by a mandatory \percentpercent keyword. % Following this keyword, a sequence of rules is expected. Each rule defines a nonterminal symbol~\nt{lid}, whose name must begin with a lowercase letter. % % In reality, in the old syntax, this is enforced only for start symbols. % In the new syntax, this is enforced for all symbols. % % A rule can also *extend* a symbol, but let's not mention that here. % A rule is expressed either in the old syntax'' (\sref{sec:old:rules}) or in the new syntax'' (\sref{sec:new:rules}), which is slightly more elegant and powerful.  fpottier committed Mar 01, 2013 487 488  \subsection{Declarations}  POTTIER Francois committed Apr 06, 2017 489 \label{sec:decls}  fpottier committed Mar 01, 2013 490 491  \subsubsection{Headers}  POTTIER Francois committed Nov 12, 2017 492 \label{sec:decls:headers}  fpottier committed Mar 01, 2013 493 494 495 496 497 498 499 500 501 502  A header is a piece of \ocaml code, surrounded with \dheader{and}. It is copied verbatim at the beginning of the \ml file. It typically contains \ocaml \kw{open} directives and function definitions for use by the semantic actions. If a single grammar specification file contains multiple headers, their order is preserved. However, when two headers originate in distinct grammar specification files, the order in which they are copied to the \ml file is unspecified. \subsubsection{Parameters}  POTTIER Francois committed Dec 19, 2014 503 \label{sec:parameter}  fpottier committed Mar 01, 2013 504 505 506 507 508 509  A declaration of the form: \begin{quote} \dparameter \ocamlparam \end{quote} causes the entire parser to become parameterized over the \ocaml module  POTTIER Francois committed Jan 14, 2015 510 511 512 513 \nt{uid}, that is, to become an \ocaml functor. The directory \distrib{demos/calc-param} contains a demo that illustrates the use of this switch. If a single specification file  fpottier committed Mar 01, 2013 514 515 516 517 518 519 520 521 522 523 524 525 526 contains multiple \dparameter declarations, their order is preserved, so that the module name \nt{uid} introduced by one declaration is effectively in scope in the declarations that follow. When two \dparameter declarations originate in distinct grammar specification files, the order in which they are processed is unspecified. Last, \dparameter declarations take effect before \dheader{$\ldots$}, \dtoken, \dtype, or \dstart declarations are considered, so that the module name \nt{uid} introduced by a \dparameter declaration is effectively in scope in \emph{all} \dheader{$\ldots$}, \dtoken, \dtype, or \dstart declarations, regardless of whether they precede or follow the \dparameter declaration. This means, in particular, that the side effects of an \ocaml header are observed only when the functor is applied, not when it is defined. \subsubsection{Tokens}  POTTIER Francois committed Oct 26, 2018 527 \label{sec:tokens}  fpottier committed Mar 01, 2013 528 529 530  A declaration of the form: \begin{quote}  POTTIER Francois committed Oct 26, 2018 531 532 533 534 \dtoken \optional{\ocamltype}$\nt{uid}_1$\optional{$\nt{qid}_1$}$\;\ldots\;\nt{uid}_n$\optional{$\nt{qid}_n$}  fpottier committed Mar 01, 2013 535 536 537 \end{quote} defines the identifiers$\nt{uid}_1, \ldots, \nt{uid}_n$as tokens, that is, as terminal symbols in the grammar specification and as data constructors in  POTTIER Francois committed Oct 26, 2018 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 the \textit{token} type. If an \ocaml type$t$is present, then these tokens are considered to carry a semantic value of type$t$, otherwise they are considered to carry no semantic value. If a quoted identifier$\nt{qid}_i$is present, then it is considered an alias for the terminal symbol$\nt{uid}_i$. (This feature, known as token aliases'', is borrowed from Bison.) % https://www.gnu.org/software/bison/manual/html_node/Token-Decl.html#Token-Decl Throughout the grammar, the quoted identifier$\nt{qid}_i$is then synonymous with the identifier$\nt{uid}_i$. % For example, if one declares: \begin{verbatim} %token PLUS "+" \end{verbatim} then the quoted identifier \texttt{"+"} stands for the terminal symbol \texttt{PLUS} throughout the grammar. An example of the use of token aliases appears in the directory \distrib{demos/calc-alias}. % Token aliases can be used to improve the readability of a grammar. One must keep in mind, however, that they are just syntactic sugar: they are not interpreted in any way by Menhir or conveyed to tools like \ocamllex. % They could be considered confusing by a reader who mistakenly believes that they are interpreted as string literals.  fpottier committed Mar 01, 2013 565 566 567 568 569 570 571 572 573 574  \subsubsection{Priority and associativity} \label{sec:assoc} A declaration of one of the following forms: \begin{quote} \dnonassoc$\nt{uid}_1 \ldots \nt{uid}_n$\\ \dleft$\nt{uid}_1 \ldots \nt{uid}_n$\\ \dright$\nt{uid}_1 \ldots \nt{uid}_n$\end{quote}  POTTIER Francois committed Apr 06, 2017 575 assigns both a \emph{priority level} and an \emph{associativity status} to  fpottier committed Mar 01, 2013 576 577 578 579 580 581 582 583 584 585 586 the symbols$\nt{uid}_1, \ldots, \nt{uid}_n$. The priority level assigned to$\nt{uid}_1, \ldots, \nt{uid}_n$is not defined explicitly: instead, it is defined to be higher than the priority level assigned by the previous \dnonassoc, \dleft, or \dright declaration, and lower than that assigned by the next \dnonassoc, \dleft, or \dright declaration. The symbols$\nt{uid}_1, \ldots, \nt{uid}_n$can be tokens (defined elsewhere by a \dtoken declaration) or dummies (not defined anywhere). Both can be referred to as part of \dprec annotations. Associativity status and priority levels allow shift/reduce conflicts to be silently resolved (\sref{sec:conflicts}). \subsubsection{Types}  POTTIER Francois committed Dec 19, 2014 587 \label{sec:type}  fpottier committed Mar 01, 2013 588 589 590 591 592 593 594 595 596  A declaration of the form: \begin{quote} \dtype \ocamltype$\nt{lid}_1 \ldots \nt{lid}_n$\end{quote} assigns an \ocaml type to each of the nonterminal symbols$\nt{lid}_1, \ldots, \nt{lid}_n$. For start symbols, providing an \ocaml type is mandatory, but is usually done as part of the \dstart declaration. For other symbols, it is optional. Providing type information can improve the quality of \ocaml's type error messages.  POTTIER Francois committed Oct 27, 2015 597 598 599 600 601  A \dtype declaration may concern not only a nonterminal symbol, such as, say, \texttt{expression}, but also a fully applied parameterized nonterminal symbol, such as \texttt{list(expression)} or \texttt{separated\_list(COMMA, option(expression))}.  fpottier committed Mar 01, 2013 602   POTTIER Francois committed Nov 12, 2017 603 604 605 606 607 608 609 610 The types provided as part of \dtype declarations are copied verbatim to the \ml and \mli files. In contrast, headers (\sref{sec:decls:headers}) are copied to the \ml file only. For this reason, the types provided as part of \dtype declarations must make sense both in the presence and in the absence of these headers. They should typically be fully qualified types. % TEMPORARY type information can be mandatory in --coq mode; document?  fpottier committed Mar 01, 2013 611 \subsubsection{Start symbols}  POTTIER Francois committed Dec 19, 2014 612 \label{sec:start}  fpottier committed Mar 01, 2013 613 614 615 616 617 618 619 620 621 622 623 624  A declaration of the form: \begin{quote} \dstart \optional{\ocamltype}$\nt{lid}_1 \ldots \nt{lid}_n$\end{quote} declares the nonterminal symbols$\nt{lid}_1, \ldots, \nt{lid}_n$to be start symbols. Each such symbol must be assigned an \ocaml type either as part of the \dstart declaration or via separate \dtype declarations. Each of$\nt{lid}_1, \ldots, \nt{lid}_n$becomes the name of a function whose signature is published in the \mli file and that can be used to invoke the parser.  POTTIER Francois committed Oct 25, 2018 625 626 627 628 629 630 631 632 633 634 \subsubsection{Attribute declarations} Attribute declarations of the form \dattribute \sepspacelist{\nt{actual}} \sepspacelist{\nt{attribute}} and \kw{\%} \nt{attribute} are explained in \sref{sec:attributes}.  POTTIER Francois committed Oct 27, 2015 635 636 637 638 639 640 641 642 643 \subsubsection{Extra reductions on error} \label{sec:onerrorreduce} A declaration of the form: \begin{quote} \donerrorreduce$\nt{lid}_1 \ldots \nt{lid}_n$\end{quote} marks the nonterminal symbols$\nt{lid}_1, \ldots, \nt{lid}_n$as potentially eligible for reduction when an invalid token is found.  POTTIER Francois committed Aug 08, 2016 644 645 This may cause one or more extra reduction steps to be performed before the error is detected.  POTTIER Francois committed Oct 27, 2015 646 647 648 649 650  More precisely, this declaration affects the automaton as follows. Let us say that a production$\nt{lid} \rightarrow \ldots$is reducible on error'' if its left-hand symbol~\nt{lid} appears in a \donerrorreduce declaration. After the automaton has been constructed and after any conflicts have been resolved,  POTTIER Francois committed Aug 08, 2016 651 652 653 654 655 656 657 658 659 in every state~$s$, the following algorithm is applied: \begin{enumerate} \item Construct the set of all productions that are ready to be reduced in state~$s$and are reducible on error; \item Test if one of them, say$p$, has higher on-error-reduce-priority'' than every other production in this set; \item If so, in state~$s$, replace every error action with a reduction of the production~$p$. (In other words, for every terminal symbol~$t$, if the action table  POTTIER Francois committed Oct 27, 2015 660 says: in state~$s$, when the next input symbol is~$t$, fail'', then this  POTTIER Francois committed Aug 08, 2016 661 662 663 entry is replaced with: in state~$s$, when the next input symbol is~$t$, reduce production~$p$''.) \end{enumerate}  POTTIER Francois committed Oct 27, 2015 664   POTTIER Francois committed Aug 08, 2016 665 666 If step 3 above is executed in state~$s$, then an error can never be detected in state~$s$, since all error actions in state~$s$are replaced with reduce  POTTIER Francois committed Oct 27, 2015 667 668 669 670 actions. Error detection is deferred: at least one reduction takes place before the error is detected. It is a spurious'' reduction: in a canonical LR(1) automaton, it would not take place.  POTTIER Francois committed Oct 28, 2015 671 672 673 674 675 676 An \donerrorreduce declaration does not affect the language that is accepted by the automaton. It does not affect the location where an error is detected. It is used to control in which state an error is detected. If used wisely, it can make errors easier to report, because they are detected in a state for which it is easier to write an accurate diagnostic message (\sref{sec:errors:diagnostics}).  POTTIER Francois committed Oct 27, 2015 677 678 679 680 681 682 683 684 685  % This may make the tables bigger (but I have no statistics). % This makes LRijkstra significantly slower. Like a \dtype declaration, an \donerrorreduce declaration may concern not only a nonterminal symbol, such as, say, \texttt{expression}, but also a fully applied parameterized nonterminal symbol, such as \texttt{list(expression)} or \texttt{separated\_list(COMMA, option(expression))}.  POTTIER Francois committed Aug 08, 2016 686 687 688 689 690 691 692 693 694 The on-error-reduce-priority'' of a production is that of its left-hand symbol. The on-error-reduce-priority'' of a nonterminal symbol is determined implicitly by the order of \donerrorreduce declarations. In the declaration$\donerrorreduce\;\nt{lid}_1 \ldots \nt{lid}_n$, the symbols$\nt{lid}_1, \ldots, \nt{lid}_n$have the same on-error-reduce-priority''. They have higher on-error-reduce-priority'' than the symbols listed in previous \donerrorreduce declarations, and lower on-error-reduce-priority'' than those listed in later \donerrorreduce declarations.  POTTIER Francois committed Nov 12, 2018 695 696 \subsection{Rules---old syntax} \label{sec:old:rules}  fpottier committed Mar 01, 2013 697   POTTIER Francois committed Sep 11, 2015 698 In its simplest  POTTIER Francois committed Nov 12, 2018 699 form, a rule begins with the nonterminal symbol \nt{lid},  POTTIER Francois committed Sep 11, 2015 700 followed by a colon character (\deuxpoints),  fpottier committed Mar 01, 2013 701 702 703 704 705 706 and continues with a sequence of production groups (\sref{sec:productiongroups}). Each production group is preceded with a vertical bar character (\barre); the very first bar is optional. The meaning of the bar is choice: the nonterminal symbol \nt{id} develops to either of the production groups. We defer explanations of the keyword \dpublic (\sref{sec:split}), of the keyword \dinline (\sref{sec:inline}), and of the  POTTIER Francois committed Nov 12, 2018 707 optional formal parameters$\tuple{\nt{id}}$ fpottier committed Mar 01, 2013 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 (\sref{sec:templates}). \subsubsection{Production groups} \label{sec:productiongroups} In its simplest form, a production group consists of a single production (\sref{sec:productions}), followed by an \ocaml semantic action (\sref{sec:actions}) and an optional \dprec annotation (\sref{sec:prec}). A production specifies a sequence of terminal and nonterminal symbols that should be recognized, and optionally binds identifiers to their semantic values. \paragraph{Semantic actions} \label{sec:actions} A semantic action is a piece of \ocaml code that is executed in order to assign a semantic value to the nonterminal symbol with which this production group is associated. A semantic action can refer to the (already computed) semantic values of the terminal or nonterminal symbols that appear in the  726 727 728 729 production via the semantic value identifiers bound by the production. For compatibility with \ocamlyacc, semantic actions can also refer to unnamed semantic values via positional keywords of the form  POTTIER Francois committed Oct 24, 2018 730 731 732 \kw{\$1}, \kw{\$2}, etc.\ This style is discouraged. (It is in fact forbidden if \onodollars is turned on.) Furthermore, as  Yann Régis-Gianas committed Jul 06, 2015 733 a positional keyword of the form \kw{\$i} is internally rewritten as  POTTIER Francois committed Sep 20, 2015 734 \nt{\_i}, the user should not use identifiers of the form \nt{\_i}.  fpottier committed Mar 01, 2013 735 736 737 738  \paragraph{\dprec annotations} \label{sec:prec}  POTTIER Francois committed Sep 11, 2015 739 740 An annotation of the form \dprec \nt{id} indicates that the precedence level of the production group is the level assigned to the symbol \nt{id} via a  fpottier committed Mar 01, 2013 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 previous \dnonassoc, \dleft, or \dright declaration (\sref{sec:assoc}). In the absence of a \dprec annotation, the precedence level assigned to each production is the level assigned to the rightmost terminal symbol that appears in it. It is undefined if the rightmost terminal symbol has an undefined precedence level or if the production mentions no terminal symbols at all. The precedence level assigned to a production is used when resolving shift/reduce conflicts (\sref{sec:conflicts}). \paragraph{Multiple productions in a group} If multiple productions are present in a single group, then the semantic action and precedence annotation are shared between them. This short-hand effectively allows several productions to share a semantic action and precedence annotation without requiring textual duplication. It is legal only when every production binds exactly the same set of semantic value identifiers and when no positional semantic value keywords (\kw{\$1}, etc.) are used. \subsubsection{Productions} \label{sec:productions} A production is a sequence of producers (\sref{sec:producers}), optionally followed by a \dprec annotation (\sref{sec:prec}). If a precedence annotation is present, it applies to this production alone, not to other productions in the production group. It is illegal for a production and its production group to both carry \dprec annotations. \subsubsection{Producers} \label{sec:producers} A producer is an actual (\sref{sec:actual}), optionally preceded with a binding of a semantic value identifier, of the form \nt{lid} \dequal. The actual specifies which construction should be recognized and how a semantic value should be computed for that construction. The identifier \nt{lid}, if present, becomes bound to that semantic value in the semantic action that follows. Otherwise, the semantic value can be referred to via a positional keyword (\kw{\$1}, etc.). \subsubsection{Actuals} \label{sec:actual}  POTTIER Francois committed Feb 15, 2015 782 783 784 In its simplest form, an actual is just a terminal or nonterminal symbol $\nt{id}$. If it is a parameterized non-terminal symbol (see \sref{sec:templates}), then it should be applied:  POTTIER Francois committed Nov 12, 2018 785 $\nt{id}\tuple{\nt{actual}}$.  POTTIER Francois committed Feb 15, 2015 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815  An actual may be followed with a modifier (\dquestion, \dplus, or \dstar). This is explained further on (see \sref{sec:templates} and \fref{fig:sugar}). An actual may also be an anonymous rule''. In that case, one writes just the rule's right-hand side, which takes the form $\seplist{\ \barre\ }{\nt{group}}$. (This form is allowed only as an argument in an application.) This form is expanded on the fly to a definition of a fresh non-terminal symbol, which is declared \dinline. For instance, providing an anonymous rule as an argument to \nt{list}: \begin{quote} \begin{tabular}{l} \nt{list} \dlpar \basic{e} = \nt{expression}; \basic{SEMICOLON} \dpaction{\basic{e}} \drpar \end{tabular} \end{quote} is equivalent to writing this: \begin{quote} \begin{tabular}{l} \nt{list} \dlpar \nt{expression\_SEMICOLON} \drpar \end{tabular} \end{quote} where the non-terminal symbol \nt{expression\_SEMICOLON} is chosen fresh and is defined as follows: \begin{quote} \begin{tabular}{l} \dinline \nt{expression\_SEMICOLON}: \newprod \basic{e} = \nt{expression}; \basic{SEMICOLON} \dpaction{\basic{e}} \end{tabular} \end{quote}  fpottier committed Mar 01, 2013 816   POTTIER Francois committed Nov 12, 2018 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 \subsection{Rules---new syntax} \label{sec:new:rules} Please be warned that \textbf{the new syntax is considered experimental} and is subject to change in the future. % TEMPORARY à supprimer un jour... In its simplest form, a rule takes the form \dlet \nt{lid} \dcolonequal \expression. % Its left-hand side \nt{lid} is a nonterminal symbol; its right-hand side is an expression. % Such a rule defines an ordinary nonterminal symbol, while the alternate form \dlet \nt{lid} \dequalequal \expression defines an \dinline nonterminal symbol (\sref{sec:inline}), that is, a macro. % A rule can be preceded with the keyword \dpublic (\sref{sec:split}) and can be parameterized with a tuple of formal parameters $\tuple{\nt{id}}$ (\sref{sec:templates}). % The various forms of expressions, listed in \fref{fig:syntax}, are: % \begin{itemize} \item A \textbf{choice} between several expressions, % \precseplist\barre\expression. \optional{\barre} \expressionsub{1} \barre ${}\ldots{}$ \barre\expressionsub{n}. The leading bar is optional. \item A \textbf{sequence} of two expressions, \pattern \dequal \expressionsub{1} \dsemi \expressionsub{2}. The semantic value produced by \expressionsub{1} is decomposed according to the pattern \pattern. The \ocaml variables introduced by \pattern may appear in a semantic action that ends the sequence \expressionsub{2}. \item A sequence \dtilde \dequal \nt{id}${}_1$ \dsemi \expressionsub{2}, which is sugar for \nt{id}${}_1$ \dequal \nt{id}${}_1$ \dsemi \expressionsub{2}. This is a \textbf{pun}. % This is a special case of the previous form, % yet it receives special treatment; this is the % only case where ~ represents a deterministically-chosen name. \item A sequence \expressionsub{1} \dsemi \expressionsub{2}, which is sugar for \dunderscore \dequal \expressionsub{1} \dsemi \expressionsub{2}. \item A \textbf{symbol} \nt{id}, possibly applied to a tuple of expressions \dlpar \expressionsub{1},\ ${}\ldots{}$,\ \expressionsub{n} \drpar. It is worth noting that such an expression \emph{can} form the end of a sequence: \nt{id} at the end of a sequence stands for \nt{x} \dequal \nt{id} \dsemi \dpaction{\nt{x}} for some fresh variable \nt{x}. Thus, a sequence need not end with a semantic action. \item An expression followed with \dquestion, \dplus, or \dstar. This is sugar for the previous form: see \sref{sec:templates} and \fref{fig:sugar}. \item A \textbf{semantic action} \daction, possibly followed with a precedence annotation \dprec \nt{id}. This \ocaml code can refer to the variables that have been bound earlier in the sequence that this semantic action ends. These include all variables named by the user as well as all variables introduced by a $\dtilde$ pattern as part of a pun. % (but not variables introduced by deep ~ patterns) The notation $\kw{\$}i$, where$i$is an integer, is forbidden. \item A \textbf{point-free semantic action} \dpfaction{\nt{\ocaml id}}, possibly followed with a precedence annotation \dprec~\nt{id}. The \ocaml identifier \nt{id} must denote a function or a data constructor. It is applied to a tuple of the variables that have been bound earlier in the sequence that this semantic action ends. Thus,$\dpfaction{\,\nt{id}\,}$is sugar for$\dpaction{\,\nt{id}\;\,(x_1, \ldots, x_n)\,}$, where$x_1, \ldots, x_n$are the variables bound earlier. These include all variables named by the user as well as all variables introduced by a$\dtilde$pattern. \item An identity semantic action \dpfidentityaction. This is sugar for \dpfaction{\nt{identity}}, where \nt{identity} is \ocaml's identity function. Therefore, it is sugar for$\dpaction{\,(x_1, \ldots, x_n)\,}$, where$x_1, \ldots, x_n$are the variables bound earlier. \end{itemize} \begin{comment} % fp: not sure if this paragraph is helpful. To some degree, an expression is analogous to an \ocaml expression: it returns an \ocaml value, and as a side effect, recognizes and consumes a fragment of the input. In particular, a sequence % \pattern \dequal \expressionsub{1} \dsemi \expressionsub{2} % is roughly analogous to an \ocaml sequence \verb+let p = e1 in e2+. % terminal symbol = recognize and consume one input symbol % nonterminal symbol = procedure call or macro invocation (if %inline) % semantic action = OCaml code insertion \end{comment} The syntax of expressions, as presented in \fref{fig:syntax}, seems more permissive than it really is. In reality, a choice cannot be nested inside a sequence; % (either on the left or on the right) a sequence cannot be nested in the left-hand side of a sequence; a semantic action cannot appear in the left-hand side of a sequence. (Thus, there is a stratification in three levels: choice expressions, sequence expressions, and atomic expressions, which corresponds roughly to the stratification of rules, productions, and producers in the old syntax.) % Furthermore, an expression between parentheses$\dlpar \expression \drpar$is \emph{not} a valid expression. To surround an expression with parentheses, one must write either$\nt{midrule}\, \dlpar \expression \drpar$or$\nt{endrule}\, \dlpar \expression \drpar$; see \sref{sec:library} and \fref{fig:standard}. When a complex expression (e.g., a choice or a sequence) is placed in parentheses, as in \nt{id}\,\dlpar\expression\drpar, this is equivalent to using$\nt{id}\,\dlpar\nt{s}\drpar$, where the fresh symbol~\nt{s} is declared as a synonym for this expression, via the declaration \dlet \nt{s} \dequalequal \expression. This idiom is also known as an anonymous rule (\sref{sec:actual}). \paragraph{Examples} As an example of a rule in the new syntax, the parameterized nonterminal symbol \nt{option}, which is part of Menhir's standard library (\sref{sec:library}), can be defined as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\nt{x} \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \nt{x} \dequal \nt{x} \dsemi{} \dpaction{\nt{Some x}} \end{tabular} \end{quote} % Using a pun, it can also be written as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\dtilde \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \dtilde \dequal \nt{x} \dsemi{} \dpaction{\nt{Some x}} \end{tabular} \end{quote} % Using a pun and a point-free semantic action, it can also be expressed as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\dtilde \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \dtilde \dequal \nt{x} \dsemi{} \dpfaction{\nt{Some}} \end{tabular} \end{quote} % As another example, the parameterized symbol$\nt{delimited}\$, also part of Menhir's standard library (\sref{sec:library}), can be defined in the new syntax as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{delimited}(\nt{opening}, \nt{x}, \nt{closing}) \dequalequal \\