main.tex 107 KB
Newer Older
1 2
\documentclass[11pt,a4paper,twoside]{article}
\usepackage[left=25mm,right=25mm,top=25mm,bottom=25mm,marginparwidth=50pt]{geometry}
POTTIER Francois's avatar
POTTIER Francois committed
3
\setlength\abovecaptionskip{0pt} % Reduce space above figure captions.
4 5
\usepackage{lmodern} % This gives us a bold monospace font.
\renewcommand{\rmdefault}{ptm} % Times.
POTTIER Francois's avatar
POTTIER Francois committed
6 7
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
8
\usepackage{enumitem}
POTTIER Francois's avatar
POTTIER Francois committed
9 10
\usepackage[bookmarks=true,bookmarksopen=true,colorlinks=true,%
            linkcolor=blue,citecolor=blue,urlcolor=blue]{hyperref}
11
\usepackage{marginnote}
POTTIER Francois's avatar
POTTIER Francois committed
12 13 14
\usepackage{listings}
\input{listings-ocaml}
\lstset{language=ocaml}
POTTIER Francois's avatar
POTTIER Francois committed
15
\usepackage{moreverb}
POTTIER Francois's avatar
POTTIER Francois committed
16
\usepackage{tabularx}
POTTIER Francois's avatar
POTTIER Francois committed
17 18
\usepackage{xcolor}
\usepackage{mdframed}
POTTIER Francois's avatar
POTTIER Francois committed
19
\usepackage{xspace}
POTTIER Francois's avatar
POTTIER Francois committed
20
\input{macros}
POTTIER Francois's avatar
POTTIER Francois committed
21 22
% Style.
\renewcommand{\emph}[1]{\textbf{#1}}
23
\input{version}
POTTIER Francois's avatar
POTTIER Francois committed
24

POTTIER Francois's avatar
POTTIER Francois committed
25 26 27
% ------------------------------------------------------------------------------
% Headings.

28
\title{Visitors\\\normalsize version \visitorsversion}
29
\date{}
POTTIER Francois's avatar
POTTIER Francois committed
30
\begin{document}
31
\author{François Pottier\\ Inria Paris\\ \email{francois.pottier@inria.fr}}
POTTIER Francois's avatar
POTTIER Francois committed
32 33 34 35 36
\maketitle

% ------------------------------------------------------------------------------

\clearpage
37 38
\tableofcontents
\clearpage
POTTIER Francois's avatar
POTTIER Francois committed
39 40 41

% ------------------------------------------------------------------------------

42 43
\begin{flushright}
  Les visites font toujours plaisir, si ce n'est en arrivant, du moins en
44
  partant. \\ --- \textit{Jean de La Bruyère}
45 46
\end{flushright}

47
\vspace{8mm}
48

49 50 51
% ------------------------------------------------------------------------------
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
52
\section{Introduction}
POTTIER Francois's avatar
POTTIER Francois committed
53
\label{sec:intro}
POTTIER Francois's avatar
POTTIER Francois committed
54 55 56

% ------------------------------------------------------------------------------

57 58
\subsection{What is a visitor?}

59 60
A visitor class for a data structure is a class whose methods implement a
traversal of this data structure. By default, the visitor methods do not have
61 62 63 64 65 66
interesting behavior: they just cause control to go down into the data
structure and come back up, without performing any actual computation.
Nevertheless, by defining subclasses where one method or a few methods are
overridden, nontrivial behavior can be obtained. Therefore, visitors allow
many operations on this data structure to be defined with little effort.

67
Visitor classes come in several varieties. An \iter visitor traverses a data
68 69 70 71
structure and returns no result. It can nevertheless have side effects,
including updating a piece of mutable state, raising an exception, and
performing input/output. A \map visitor traverses a data structure and returns
another data structure: typically, a copy of its argument that has been
72 73 74
transformed in some way. An \mapendo visitor is a variant of a \map visitor
that preserves physical sharing when possible. A \reduce visitor traverses a
data structure and returns a value that somehow summarizes it: computing the
75 76 77
size of a data structure is a typical example. A \mapreduce visitor performs
the tasks of a \map visitor and a \reduce visitor at the same time,
possibly allowing symbiosis between them. All of these can be viewed as
78 79 80 81 82 83 84 85 86 87
special cases of the \fold visitor, which performs a bottom-up computation
over a data structure. The class \fold is equipped with virtual methods
% (the ``\tyconascendingmethod{}'' methods)
that can be overridden (in a subclass) so as to specify what computation is
desired.

Visitors also come in several arities. The visitors mentioned above have arity
one: they traverse one data structure. However, it is sometimes necessary to
simultaneously traverse two data structures of identical shape. For this
purpose, there are visitors of arity two: here, they are known as \itertwo,
88
\maptwo, \reducetwo, \mapreducetwo, and \foldtwo visitors.
89 90

% \mapendotwo does not exist.
POTTIER Francois's avatar
POTTIER Francois committed
91

92 93
% ------------------------------------------------------------------------------

94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
\subsection{What does this package offer?}

Visitors have extremely regular structure. As a result, whereas implementing
them by hand is boring and error-prone, generating them automatically is often
possible. The \visitors package extends the syntax of OCaml%
%
\footnote{Technically, \visitors is a plugin for \ppxderiving, which itself is
  a preprocessor extension for the OCaml compiler.}
%
so as to make it easy for the programmer to request the automatic generation
of visitor classes. Visitor classes for many forms of user-defined data types
can be generated and, if necessary, combined (via multiple inheritance) with
hand-written visitor classes, making the framework quite powerful.

% ------------------------------------------------------------------------------
% ------------------------------------------------------------------------------

\section{Walkthrough}

% ------------------------------------------------------------------------------

\subsection{Setup}
\label{sec:intro:setup}

In order to install the \visitors package, an \opam user should issue the
following commands:
POTTIER Francois's avatar
POTTIER Francois committed
120
\begin{lstlisting}[keywords={}]
121 122
  opam update
  opam install visitors
POTTIER Francois's avatar
POTTIER Francois committed
123
\end{lstlisting}
124 125
To use the package, an \ocamlbuild user should add the
following line in her project's \texttt{\_tags} file:
POTTIER Francois's avatar
POTTIER Francois committed
126
\begin{lstlisting}[keywords={}]
127 128 129 130
  true: package(visitors.ppx), package(visitors.runtime)
\end{lstlisting}
Finally, a user of \merlin should add the following lines in her project's
\texttt{.merlin} file:
POTTIER Francois's avatar
POTTIER Francois committed
131
\begin{lstlisting}[keywords={}]
132 133 134 135 136 137
  PKG visitors.ppx
  PKG visitors.runtime
\end{lstlisting}

% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
138
\begin{figure}[t]
139
% In an OCaml source file, a type definition can be annotated with
140
% \derivingvisitors:
POTTIER Francois's avatar
POTTIER Francois committed
141
\orig{expr00}
142 143
% This causes the following code to be (invisibly) generated:
\vspace{-\baselineskip}
144
\processed{expr00}
POTTIER Francois's avatar
POTTIER Francois committed
145
\caption{A visitor of the \iter variety}
POTTIER Francois's avatar
POTTIER Francois committed
146 147 148
\label{fig:expr00}
\end{figure}

149
\subsection{Defining an \iter visitor}
POTTIER Francois's avatar
POTTIER Francois committed
150
\label{sec:intro:iter:def}
POTTIER Francois's avatar
POTTIER Francois committed
151

152 153 154 155 156
Suppose we need to manipulate arithmetic expressions built out of integer
literals and binary additions. The abstract syntax of these expressions can be
described by an algebraic data type \oc|expr|, shown in the first part of
\fref{fig:expr00}.
%
POTTIER Francois's avatar
POTTIER Francois committed
157 158 159 160
By annotating this type definition with \derivingvisitors, we request the
automated generation of a visitor for expressions. The annotation
\derivingvisitors must carry at least one parameter, \variety, which indicates
what variety of visitor is desired.
POTTIER Francois's avatar
POTTIER Francois committed
161

162 163 164 165
The code of the visitor class, which is automatically generated and in normal
use remains invisible, is shown in the second part of \fref{fig:expr00}. The
name of this class is by default the value of the \variety parameter. It can
be changed, if desired, by explicitly supplying a \name parameter.
166

POTTIER Francois's avatar
POTTIER Francois committed
167 168
A visitor takes the form of an OCaml class, whose methods are named after the
types and data constructors that appear in the type definition. In
POTTIER Francois's avatar
POTTIER Francois committed
169 170 171 172 173
\fref{fig:expr00}, for instance, the method \tyconvisitor{expr} is named after
the type \oc|expr|, while the methods \dataconvisitor{EConst} and
\dataconvisitor{EAdd} are named after the data constructors \oc|EConst| and
\oc|EAdd|.

174 175 176 177 178 179 180 181 182 183 184
Different varieties of visitors differ in the computation that is performed
``on the way up'', after the recursive calls have finished, therefore differ
in the return types of the visitor methods. \iter is the simplest variety. An
\iter visitor performs no computation on the way up, so its methods have
return type \oc|unit|.

In an \iter visitor, the generated visitor methods do nothing. In
\fref{fig:expr00}, for instance, the method \tyconvisitor{expr} inspects its
argument \oc|this| and recursively invokes either \dataconvisitor{EConst} or
\dataconvisitor{EAdd}, as appropriate. The method \dataconvisitor{EConst} does
nothing.\footnote{More precisely, it calls the method \tyconvisitor{int},
POTTIER Francois's avatar
POTTIER Francois committed
185
  which is inherited from the class \runtime{iter}, and does
POTTIER Francois's avatar
POTTIER Francois committed
186 187 188 189
  nothing. This call to \tyconvisitor{int} can be avoided, if desired, by using
  \oc|(int[@opaque])| instead of \oc|int|; see \sref{sec:opaque}.} The method
\dataconvisitor{EAdd} performs two recursive calls to \tyconvisitor{expr},
which does nothing, so \dataconvisitor{EAdd} itself does nothing.
POTTIER Francois's avatar
POTTIER Francois committed
190 191

Every method is parameterized with an environment \oc|env|, which is carried
192 193 194 195 196 197 198
down into every recursive call and is otherwise unused. The type of this
environment is undetermined: it is up to the (user-defined) subclasses of the
visitor class to decide what the type of \oc|env| should be and (possibly)
where and how this environment should be enriched.

% One could note that the visitor class is parameterized over 'self,
% but it is perhaps a bit early for such a remark.
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
199

POTTIER Francois's avatar
POTTIER Francois committed
200 201 202 203 204
The fields of a data constructor or record are traversed left to right, in the
order they are declared. In a list-like data structure, the field that holds a
pointer to the list tail should be declared last, so as to ensure that the
traversal requires constant stack space.

POTTIER Francois's avatar
POTTIER Francois committed
205 206
% ------------------------------------------------------------------------------

207
\begin{figure}[t]
POTTIER Francois's avatar
POTTIER Francois committed
208
\codefollowup{expr00}
POTTIER Francois's avatar
POTTIER Francois committed
209
\origfirstline{expr04}{3}
210 211 212 213
\caption{Counting the number of addition nodes in an expression}
\label{fig:expr04}
\end{figure}

214
\subsection{Using an \iter visitor}
POTTIER Francois's avatar
POTTIER Francois committed
215
\label{sec:intro:iter:usage}
POTTIER Francois's avatar
POTTIER Francois committed
216

217 218
Naturally, traversing a data structure without actually computing anything is
not a very sensible thing to do. Things become interesting when at least one
219 220 221 222 223
visitor method is overridden so as to give rise to nontrivial behavior.
Suppose, for instance, that we wish to count the number of addition nodes in
an expression. This can be done as shown in \fref{fig:expr04}. We create an
object~\oc|v| that is both a counter (that is, an object equipped with a
mutable field~\oc|count|) and a visitor, and we override its method
224 225 226 227 228
\dataconvisitor{EAdd} so that the counter is incremented every time this
method is invoked. There remains to run the visitor, by invoking its
\tyconvisitor{expr} method, and to return the final value of the counter. The
environment, in this example, is unused; we let it have type \unit.

POTTIER Francois's avatar
POTTIER Francois committed
229 230 231 232 233 234 235
This may seem a rather complicated way of counting the addition nodes in an
expression. Of course, one could give a direct recursive definition of the
function \oc|count|, in a few lines of code, without using a visitor at all.
The point of employing a visitor, as done in Figures~\ref{fig:expr00}
and~\ref{fig:expr04}, is that no changes to the code are required when the
type of expressions is extended with new cases.

POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
236 237
% ------------------------------------------------------------------------------

238 239 240 241 242 243 244 245
\subsection{What is the type of a visitor?}
\label{sec:intro:type}

In this document, most of the time, we prefer to show the code of a visitor
class and omit its type. There are two reasons for this. First, this type is
often verbose, as the class has many methods, and complex, as several type
variables are often involved. Second, although we can explain the type of a
generated visitor on a case-by-case basis, we cannot in the general case
246 247
predict the type of a generated visitor.
The reason is, the type of a generated visitor depends upon the
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283
types of the classes that are inherited via the \ancestors parameter
(\sref{sec:ancestors}). Because a \texttt{ppx} syntax extension transforms
untyped syntax trees to untyped syntax trees, the \visitors syntax extension
does not have access to this information.

For this reason, the \visitors syntax extension cannot generate any type
declarations. Thus, the annotation \derivingvisitors can be used only in an
\texttt{\%.ml} file, not in an \texttt{\%.mli} file. When it is used in an
\texttt{\%.ml} file, the corresponding \texttt{\%.mli} file should either be
omitted altogether or be written by hand.

\input{types}

Nevertheless, the type of the generated code can be inspected, if desired, by
requesting the OCaml compiler to infer and display it. For this purpose, one
can use a command of the following form:
\begin{quote}
\verb|ocamlbuild -use-ocamlfind <other-flags> %.inferred.mli|
\end{quote}

\fref{fig:inferred} shows the type that is inferred via such a command for the
\iter visitor of \fref{fig:expr00}. This type is rather verbose, for two
reasons. First, an explicit type equation, introduced by the \oc|constraint|
keyword, relates the type parameter \oc|'self| with an OCaml object type that
lists the public methods with their types. Second, the class \iter has many more
methods than one might think. This is because it inherits a large number of
private methods from the class \runtime{iter}. In the present case, all of
these methods except \tyconvisitor{int} are in fact unused.

Fortunately, this complicated type can be manually simplified. This is done in
\fref{fig:simplified}. Two main simplifications have been performed. First, we
have omitted all private methods. Indeed, the most important property of
private methods in OCaml is precisely the fact that it is permitted to hide
their existence. Second, we have omitted the type constraint that bears on
the type variable \oc|'self|, as it is in fact redundant: it is implicit in OCaml
that the type of ``self'' must be an object type that lists the public methods.
284
The bizarre-looking ``\oc|'monomorphic.|'' annotations indicate that the methods have
285 286 287 288 289 290
monomorphic types. (This notational trick is explained in \sref{sec:oo:monomorphic}.)
This means that the type variable~\oc|'env| is not quantified at the level of each
method\footnote{That would be undesirable, as it would force each method to
treat the environment as an argument of unknown type!}, but at the level of the class.
This means that the three methods must agree on the type of the environment, and
that this type is presently undetermined, but can be determined in a subclass.
291 292 293 294 295 296 297 298

The class type shown in \fref{fig:simplified} cannot be further simplified.
The methods \tyconvisitor{EConst} and \tyconvisitor{EAdd} cannot be hidden, as
they are public. That said, if one wished to hide them, one could add the parameter %
\oc|public = ["visit_expr"]| to the annotation \derivingvisitors in \fref{fig:expr00}.
These two methods would then be declared private in the generated code,
so it would be permitted to hide their existence.

299 300 301 302
Although we have claimed earlier that one cannot in the general case predict
the type of a generated visitor method, or even predict whether a generated
method will be well-typed, it is possible to define a convention which in many
cases can be adhered to. This convention is presented later on
303
(\refconvention).
304

305 306
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
307
\begin{figure}[t]
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
308 309 310
\orig{expr01}
\vspace{-\baselineskip}
\processed{expr01}
POTTIER Francois's avatar
POTTIER Francois committed
311
\caption{A visitor of the \map variety}
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
312 313 314
\label{fig:expr01}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
315
\subsection{\map visitors}
POTTIER Francois's avatar
POTTIER Francois committed
316 317
\label{sec:intro:map}

318 319 320 321 322
An \iter visitor returns no result. Although, as illustrated previously
(\sref{sec:intro:iter:usage}), it can use private mutable state to accumulate
information, there are applications for which such a visitor is not suitable.
One class of such applications is tree transformations. To transform an
expression into an expression, one should use a visitor of another variety,
POTTIER Francois's avatar
POTTIER Francois committed
323 324
namely \map.

325 326 327 328 329
A \map visitor is shown in \fref{fig:expr01}. In comparison with the \iter
visitor of \fref{fig:expr00}, the generated code is identical, except that,
instead of returning the unit value \oc|()|, the method
\dataconvisitor{EConst} reconstructs an \oc|EConst| expression, while the
method \dataconvisitor{EAdd} reconstructs an \oc|EAdd| expression.
POTTIER Francois's avatar
POTTIER Francois committed
330

POTTIER Francois's avatar
POTTIER Francois committed
331 332 333
A \map visitor behaves (by default) as an identity function: it constructs a
copy of the data structure that it visits. If the data structure is immutable,
this is rather pointless: in order to obtain nontrivial behavior, at least one
Danny Willems's avatar
Typo  
Danny Willems committed
334
method should be overridden. If the data structure is mutable, though, even
POTTIER Francois's avatar
POTTIER Francois committed
335 336
the default behavior is potentially of interest: it constructs a deep copy of
its argument.
337

POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
338 339
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
340
\begin{figure}[p]
POTTIER Francois's avatar
POTTIER Francois committed
341 342 343 344 345 346 347
\orig{expr00endo}
\vspace{-\baselineskip}
\processed{expr00endo}
\caption{A visitor of the \mapendo variety}
\label{fig:expr00endo}
\end{figure}

348 349 350 351 352 353 354 355
\subsection{\mapendo visitors}
\label{sec:intro:endo}

\mapendo visitors are a slight variation of \map visitors. Whereas a \map
visitor systematically allocates a copy of the memory block that it receives
as an argument, an \mapendo visitor (\fref{fig:expr00endo}) first tests if the
newly allocated block would have exactly the same contents as the original
block, and if so, re-uses the original block instead.
POTTIER Francois's avatar
POTTIER Francois committed
356 357 358 359
% Didier attributes this idea to Gérard Huet,
% at least in the case of a dictionary insertion operation,
% where Gérard would raise an exception so as to go back to
% the toplevel and avoid re-allocating a path.
360
%
361 362 363 364 365
This trick allows saving memory: for instance, when a performing a
substitution operation on a term, the subterms that are unaffected
by the substitution are not copied.

One potential disadvantage of \mapendo visitors, in comparison with \map
366
visitors, is that these runtime tests have a runtime cost. A more serious
367 368
disadvantage is that \mapendo visitors have less general types: in an \mapendo
visitor, the argument type and return type of every method must coincide,
369
whence the name ``\mapendo''.
370
%
371
(An endomorphism is a function of a set into itself.)
372 373 374 375
%
\map visitors are not subject to this restriction: for an illustration, see
\sref{sec:advanced:hashconsed} and \fref{fig:expr14}.

POTTIER Francois's avatar
POTTIER Francois committed
376
In principle, \mapendo visitors should be created only for immutable data
377
structures. Although the tool can produce an \mapendo visitor for a mutable
POTTIER Francois's avatar
POTTIER Francois committed
378 379
data structure, this is discouraged, as it may lead to unexpected behavior.

POTTIER Francois's avatar
POTTIER Francois committed
380 381
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
382 383 384 385 386 387 388 389
\begin{figure}[p]
\orig{expr15}
\vspace{-\baselineskip}
\processed{expr15}
\caption{A visitor of the \reduce variety}
\label{fig:expr15}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
390
\begin{figure}[t]
391
\codefollowup{expr15}
POTTIER Francois's avatar
POTTIER Francois committed
392 393 394 395 396 397 398 399
\origfirstline{expr15b}{3}
\caption{Computing the size of an expression using a \reduce visitor}
\label{fig:reduce}
\end{figure}

\subsection{\reduce visitors}
\label{sec:intro:reduce}

POTTIER Francois's avatar
POTTIER Francois committed
400 401 402 403 404 405 406 407 408 409 410 411 412
Whereas an \iter visitor returns no result and a \map visitor returns a data
structure, a \reduce visitor returns a ``summary'' of a data structure, so to
speak. The summary of a term is computed by combining the summaries of its
subterms. This requires summaries to inhabit a monoid, that is, a type
equipped with a binary operation \oc|plus| and its neutral element \oc|zero|.

\fref{fig:expr15} shows a \reduce visitor for arithmetic expressions. In
\dataconvisitor{EAdd}, the summaries produced by the two recursive calls are
combined using a call to \oc|self#plus|. In \dataconvisitor{EConst}, there are
no recursive calls, so there is nothing to combine: the result is
\oc|self#zero|.

The virtual methods \oc|zero| and \oc|plus| are declared in the class
POTTIER Francois's avatar
POTTIER Francois committed
413
\runtime{reduce}, which is automatically inherited. The type of the
POTTIER Francois's avatar
POTTIER Francois committed
414 415 416 417 418 419
monoid elements, at this point, is undetermined: it is up to the
(user-defined) subclasses of the class \reduce to decide what this type should
be and what the monoid operations should be.

As an example application, \fref{fig:reduce} shows how to compute the size of
an expression using a \reduce visitor. We inherit the class \reduce. We also
POTTIER Francois's avatar
POTTIER Francois committed
420
inherit the class \runtime{addition_monoid}, which defines the
POTTIER Francois's avatar
POTTIER Francois committed
421 422 423 424
methods \oc|zero| and \oc|plus| as the integer \oc|0| and integer addition,
respectively. There remains to override the method \tyconvisitor{expr} so as
to indicate that every node contributes 1 to the size of an expression.

POTTIER Francois's avatar
POTTIER Francois committed
425 426 427 428 429 430 431 432 433 434
Incidentally, this code is written in such a manner that a single visitor
object is created initially and serves in every call to the function
\oc|size|. This style can be used when the visitor object is immutable and
when the function that one wishes to define is monomorphic. When it cannot be
used, one begins with \oc|let size (e : expr) : int = ...| and ends with %
\oc|v # visit_expr () e|. That style causes a new visitor object to be created
every time \oc|size| is called.
% On pourrait ré-utiliser le même objet à chaque fois en remettant son champ à
% zéro... Mais ce serait sale et pas ré-entrant.

POTTIER Francois's avatar
POTTIER Francois committed
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450
The size of an expression can also be computed using an \iter visitor equipped
with mutable state, in the style of \fref{fig:expr04}. It is mostly a matter
of style whether such a computation should be performed using \iter or
\reduce.
% TEMPORARY comparer les perfs, pour voir

An \iter visitor is in fact a special case of a \reduce visitor, instantiated
with the \oc|unit| monoid. Thus, in principle, one could forget \iter and
always work with \reduce. Nevertheless, it is preferable to work with \iter
when it is applicable, for reasons of clarity and efficiency.
% e.g. an \iter visitor can traverse a list-like data structure in constant
% stack space; a \reduce visitor cannot, because it does not have an accu.

% One might wonder whether a reduce visitor should have an accumulator...
% ...but then it would be left-to-right-from-the-start, instead of bottom-up.
% I am not sure what we want. Maybe we do not need reduce visitors at all!
POTTIER Francois's avatar
POTTIER Francois committed
451 452 453

% ------------------------------------------------------------------------------

454 455
\begin{figure}[p]
\orig{expr_info_mapreduce}
456
\vspace{-\baselineskip}
457
\processed{expr_info_mapreduce}
458 459 460 461
\caption{A visitor of the \mapreduce variety}
\label{fig:mapreduce}
\end{figure}

462 463 464 465 466 467 468
\begin{figure}[p]
\codefollowup{mapreduce}
\origfirstline{expr_info_mapreduce_use}{3}
\caption{Decorating every subexpression with its size}
\label{fig:expr_info_mapreduce_use}
\end{figure}

469 470 471 472 473 474 475 476 477
\subsection{\mapreduce visitors}
\label{sec:intro:mapreduce}

A \mapreduce visitor performs the tasks of a \map visitor and a \reduce
visitor at once. An example appears in \fref{fig:mapreduce}. Every visitor
method returns a pair of a transformed term and a summary. In other words,
it returns a pair of the results that would be returned by a \map visitor
method and by a \reduce visitor method.

478 479 480 481 482 483 484 485 486
Like a \reduce visitor, a \mapreduce visitor performs a bottom-up computation.
Like a \map visitor, it constructs a new tree. By default, these two tasks are
independent of one another. However, by overriding one or more methods, it is
easy to establish a connection between them: typically, one wishes to exploit
the information that was computed about a subtree in the construction of the
corresponding new subtree. As an example, \fref{fig:expr_info_mapreduce_use}
shows how to transform an arithmetic expression into an arithmetic expression
where every subexpression is annotated with its size. (This example uses a
parameterized type of decorated expressions, which is explained in
487
\sref{sec:intro:parameterized:mono}. We suggest reading the explanations there
488 489 490 491 492 493 494 495 496
first.) The transformation is carried out in one pass and in linear time. As
in \fref{fig:reduce}, we use the addition monoid to compute integer sizes.
This time, however, the visitor methods return not just a size, but a pair of
a new expression and a size. The method \tyconvisitor{expr} is overridden so
as to store the size of the subexpression, \oc|size|, in the \oc|info| field
of the new expression. Because the overridden method \tyconvisitor{expr} does
not call \tyconvisitor{'info}, the latter method is never called: we provide a
dummy definition of it.

497 498 499 500 501 502 503
Other example applications of \mapreduce visitors include:
\begin{itemize}[nosep]
\item decorating every subterm of a $\lambda$-term with the set of its free
  variables;
\item decorating every internal node of abstract syntax tree with a region in
  the source code (assuming that every leaf carries such a region already).
\end{itemize}
504 505 506

% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
507
\begin{figure}[p]
POTTIER Francois's avatar
POTTIER Francois committed
508 509 510 511 512 513 514
\orig{expr00fold}
\vspace{-\baselineskip}
\processed{expr00fold}
\caption{A visitor of the \fold variety}
\label{fig:expr00fold}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
515
\begin{figure}[p]
POTTIER Francois's avatar
POTTIER Francois committed
516 517 518 519 520
\orig{fold}
\caption{Converting towards unrelated types using a \fold visitor}
\label{fig:fold}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
521 522 523
\subsection{\fold visitors}
\label{sec:intro:fold}

POTTIER Francois's avatar
POTTIER Francois committed
524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571
The varities of visitors presented up to this point differ in the computation
that they perform on the way up, after the recursive calls. As we have seen,
\iter visitors perform no computation at all; \map and \mapendo visitors
reconstruct a term; \reduce visitors perform a series of monoid operations.
Each variety follows a baked-in pattern, which has been programmed ahead of
time, and cannot be changed. What if a different form of computation, which
has not been envisioned by the author of the \visitors syntax extension, is
needed?

This is where \fold visitors come in. A \fold visitor declares virtual methods
that are called on the way up and can be overridden by the user (in a
subclass) so as to implement the desired computation. \fref{fig:expr00fold}
shows a \fold visitor. Two virtual methods, \dataconascendingmethod{EConst}
and \dataconascendingmethod{EAdd}, are declared. They are invoked by
\tyconvisitor{EConst} and \tyconvisitor{EAdd}, respectively.

In a \fold visitor, the return type of the visitor methods is not fixed ahead
of time. It is up to the (user-defined) subclasses of the visitor class to
decide what this type should be.

As an example application, \fref{fig:fold} shows how a \fold visitor can be
used to convert the visited data structure to an entirely different format. In
this example, the type \oc|person| is a record type, whose fields are
\oc|firstname| and \oc|surname|. The type \oc|crowd| is isomorphic to a list
of persons, but, for some reason, it is declared as an algebraic data type
equipped with its own data constructors, \oc|Nobody| and \oc|Someone|. Suppose
we wish to convert a \oc|crowd| to a list of pairs of strings. We can do so by
creating a visitor object that inherits the class \fold and provides concrete
implementations of the methods \tyconascendingmethod{person},
\dataconascendingmethod{Nobody}, and \dataconascendingmethod{Someone}.
Our implementation of \tyconascendingmethod{person} simply allocates
a pair, while our implementations of
\dataconascendingmethod{Nobody} and \dataconascendingmethod{Someone}
respectively build an empty list and a nonempty list.
Thus, the return type of the methods \tyconascendingmethod{person}
and \tyconvisitor{person} is \oc|string * string|,
while the return type of the methods \dataconascendingmethod{Nobody},
\dataconascendingmethod{Someone}, and \dataconvisitor{crowd} is
\oc|(string * string) list|. In a \fold visitor, not all methods
need have the same return type!

If we had chosen to return \oc|f ^ s| instead of \oc|(f, s)| in
\tyconascendingmethod{person}, then a crowd would be converted to
a \oc|string list|. A \fold visitor offers great flexibility.

All previous varieties of visitors are special cases of \fold visitors. The
specialized varieties of visitors are more convenient to use, when they can be
used, because they do not require the user to provide \tyconascendingmethod{}
572 573 574 575
methods. Yet, \fold visitors are in principle more versatile.%
%
\footnote{This would be true in an untyped setting, but is not quite true in OCaml,
due to restrictions imposed by OCaml's type discipline (\sref{sec:map_from_fold}).}
POTTIER Francois's avatar
POTTIER Francois committed
576 577 578 579 580 581 582 583 584

% ppx_tools/genlifter is analogous, with a few differences:
% - it is a command line tool, not as ppx extension;
% - it has only one return type 'res for all methods;
% - the visitor methods receive (as extra arguments)
%   the names of the types, data constructors, and record fields
%   that are being visited.
% - and there are fewer visitor methods, basically one per type,
%   plus one primitive type, plus this#record, this#constr.
585

POTTIER Francois's avatar
POTTIER Francois committed
586 587 588
% TEMPORARY show how to do a fold in the presence of primitive types, e.g., list
% writing ancestors = ["VisitorsRuntime.map"] may be necessary

POTTIER Francois's avatar
POTTIER Francois committed
589 590
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
591
\begin{figure}[p]
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
592 593 594
\orig{expr02}
\vspace{-\baselineskip}
\processed{expr02}
POTTIER Francois's avatar
POTTIER Francois committed
595
\caption{A visitor of the \itertwo variety}
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
596 597 598
\label{fig:expr02}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
599
\begin{comment}
POTTIER Francois's avatar
POTTIER Francois committed
600
\begin{figure}[p]
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
601 602 603
\orig{expr03}
\vspace{-\baselineskip}
\processed{expr03}
POTTIER Francois's avatar
POTTIER Francois committed
604
\caption{A visitor of the \maptwo variety}
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
605 606
\label{fig:expr03}
\end{figure}
POTTIER Francois's avatar
POTTIER Francois committed
607
\end{comment}
POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
608

POTTIER Francois's avatar
POTTIER Francois committed
609
\subsection{Visitors of arity two}
POTTIER Francois's avatar
POTTIER Francois committed
610 611
\label{sec:intro:aritytwo}

POTTIER Francois's avatar
POTTIER Francois committed
612 613 614 615
The visitors introduced so far traverse one tree at a time. There are
situations where one wishes to simultaneously traverse two trees, which one
expects have the same structure. For this purpose, one can use a visitor of
arity 2. Every variety except \mapendo is available at arity 2.
POTTIER Francois's avatar
POTTIER Francois committed
616

POTTIER Francois's avatar
POTTIER Francois committed
617 618 619 620 621
As an illustration, \fref{fig:expr02} shows an \itertwo visitor. There, the
method \tyconvisitor{expr} expects an environment and two expressions. These
expressions must have identical structure: indeed, if \tyconvisitor{expr}
finds that they exhibit different tags at the root, say \oc|EConst| versus
\oc|EAdd|, then it invokes the method \tyconfail{expr}, whose default
POTTIER Francois's avatar
POTTIER Francois committed
622 623
implementation calls the function \runtime{fail}. This function
throws the exception \runtime{StructuralMismatch}.
POTTIER Francois's avatar
POTTIER Francois committed
624

POTTIER Francois's avatar
POTTIER Francois committed
625 626 627 628 629 630 631 632 633
In \fref{fig:expr02}, we have added the optional parameter
%
\oc|concrete = true|
%
to indicate that the generated class should not be virtual. (By default,
every generated class is declared \oc|virtual|.) We do this because, in
the illustration that follows, we wish to instantiate this class.

\begin{figure}[p]
POTTIER Francois's avatar
POTTIER Francois committed
634
\codefollowup{expr02}
POTTIER Francois's avatar
POTTIER Francois committed
635
\origfirstline{expr05}{3}
POTTIER Francois's avatar
POTTIER Francois committed
636 637 638 639
\caption{Determining whether two expressions are syntactically equal}
\label{fig:expr05}
\end{figure}

640 641 642 643 644 645 646
\begin{figure}[p]
\codefollowup{expr02}
\origfirstline{expr05lexico}{3}
\caption{Determining which way two expressions are ordered}
\label{fig:expr05lexico}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
647 648 649 650
As an illustration, in \fref{fig:expr05}, we use an \itertwo visitor to write
a function that tests whether two expressions are syntactically equal. This is
just a matter of performing a synchronous traversal of the two expressions and
detecting a \oc|StructuralMismatch| exception: if this exception is raised,
POTTIER Francois's avatar
POTTIER Francois committed
651 652
one must return \oc|false|, otherwise one must return \oc|true|. We rely on
the fact that the method \tyconvisitor{int}, which is inherited from the class
POTTIER Francois's avatar
Typo.  
POTTIER Francois committed
653
\runtime{iter2}, fails when its two integer arguments are
POTTIER Francois's avatar
POTTIER Francois committed
654 655
unequal.

POTTIER Francois's avatar
POTTIER Francois committed
656 657
The convenience functions \runtime{wrap} and
\runtime{wrap2} run a user-supplied function (of arity 1 or 2,
POTTIER Francois's avatar
POTTIER Francois committed
658 659
respectively) within an exception handler and return a Boolean result, which
is \oc|true| if no exception was raised. Here, we run the function
POTTIER Francois's avatar
POTTIER Francois committed
660 661
%
\oc|new iter2 # visit_expr ()|, whose type is \oc|expr -> expr -> unit|,
POTTIER Francois's avatar
POTTIER Francois committed
662
in the scope of such a handler.
POTTIER Francois's avatar
POTTIER Francois committed
663 664

Naturally, to test whether two expressions are syntactically equal, one could
POTTIER Francois's avatar
POTTIER Francois committed
665
also use the primitive equality operator \oc|=|. Alternatively, one could
POTTIER Francois's avatar
POTTIER Francois committed
666 667
exploit \ppxderiving and annotate the type definition with
%
POTTIER Francois's avatar
POTTIER Francois committed
668
\oc|[@@deriving eq]|. Visitors offer greater flexibility: for instance, if our
POTTIER Francois's avatar
POTTIER Francois committed
669 670
arithmetic expressions contained variables and binders, we could easily define
an operation that tests whether two expressions are $\alpha$-equivalent.
POTTIER Francois's avatar
POTTIER Francois committed
671

672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694
As a second illustration, in \fref{fig:expr05lexico}, we use an \itertwo
visitor to write a lexicographic ordering function for expressions. Again,
this involves a synchronous traversal of the two expressions. When a mismatch
is detected, however, one must not raise a \oc|StructuralMismatch| exception:
instead, one must raise an exception that carries one bit of information,
namely, which of the two expressions is strictly ``smaller'' in the ordering.
The exception \oc|Different| is introduced for this purpose; it carries an
integer return code, which by convention is either -1 or +1. Two visitor methods
must be overridden, namely \tyconvisitor{int}, which is in charge of detecting
a mismatch between two integer constants, and \tyconfail{expr}, which is invoked
when a mismatch between (the head constructors of) two expressions is detected.
The auxiliary function \oc|tag| returns an integer code for the head constructor
of an expression. It must be hand-written. One could in principle write such a
function once and for all by using the undocumented operations in OCaml's
\href{https://caml.inria.fr/pub/docs/manual-ocaml/libref/Obj.html}{\oc|Obj|}
module, but that is discouraged.

% Il faut au minimum distinguer les constructeurs constants des autres,
% en utilisant is_int. Il y a peut-être d'autres pièges. De plus, on ne
% peut pas (je crois?) garantir que l'ordre des constructeurs sera bien
% celui de la déclaration de types, parce que les constructeurs constants
% et les autres sont codés indépendamment.

POTTIER Francois's avatar
Doc.  
POTTIER Francois committed
695 696
% ------------------------------------------------------------------------------

697
\begin{figure}[t]
POTTIER Francois's avatar
POTTIER Francois committed
698 699 700 701 702 703
\orig{expr06}
\caption{Visitors for a family of types}
\label{fig:expr06}
\end{figure}

\subsection{Visitors for a family of types}
POTTIER Francois's avatar
POTTIER Francois committed
704
\label{sec:intro:family}
POTTIER Francois's avatar
POTTIER Francois committed
705 706 707 708 709 710 711 712 713 714 715

Visitors can be generated not just for one type definition, but for a family
of type definitions. In \fref{fig:expr06}, we propose a definition of
arithmetic expressions that involves three algebraic data types, namely
\oc|unop|, \oc|binop|, and \oc|expr|. We request the generation of two
visitors, namely an \iter visitor and a \map visitor. This causes the
generation of just two classes, named \iter and \map, respectively. Each of
these classes has visitor methods for every type (namely \tyconvisitor{unop},
\tyconvisitor{binop}, \tyconvisitor{expr}) and for every data constructor
(namely \dataconvisitor{UnaryMinus}, \dataconvisitor{BinaryMinus}, and so on).

716 717 718 719 720
Note that, for the \derivingvisitors annotation to apply to the entire family,
as opposed to just the type \oc|expr|, the types \oc|unop|, \oc|binop|, and
\oc|expr| in \fref{fig:expr06} are declared simultaneously: that is, their
declarations are separated with the keyword \oc|and|.

POTTIER Francois's avatar
POTTIER Francois committed
721 722
% ------------------------------------------------------------------------------

723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764
\subsection{Visitors for parameterized types}
\label{sec:intro:parameterized}

% ------------------------------------------------------------------------------

Visitors can be generated for parameterized types, too. However, there are two
ways in which this can be done. Here is why.

To visit a data type where some type variable~\oc|'a| occurs, one must know
how to visit a value of type~\oc|'a|.
%
% That is not quite true, in reality: perhaps there is no component of
% type~\oc|'a|, perhaps because \oc|'a| is a phantom type parameter or a GADT index,
% or perhaps because \oc|'a| occurs only under a type constructor that performs
% special treatment and does not recursively descend its own components.
%
There are two ways in which this information can be provided. One way is to
assume that there is a \emph{virtual visitor method} \oc|visit_'a| in charge
of visiting a value of type~\oc|'a|. Another way is to pass a \emph{visitor
  function} \oc|visit_'a| as an argument to every visitor method.

% J'allais écrire que dans un cadre non typé, ces deux approches ont la
% même expressivité. Mais c'est faux: seule la seconde approche permet
% de gérer les types de données non réguliers (qui existent aussi dans
% un cadre non typé, même s'ils ne sont pas explicitement décrits par
% une déclaration).

These two approaches differ in their expressive power. The
virtual-visitor-method approach implies that the visitor methods must have
monomorphic types: roughly speaking, the type variable~\oc|'a|
% (or variants of it)
appears free in the type of every visitor method.
The visitor-function approach implies that the visitor methods can have
polymorphic types: roughly speaking, each method independently
can be polymorphic in \oc|'a|.
For this reason, we refer to these two approaches as the \emph{monomorphic}
approach and the \emph{polymorphic} approach, respectively.

The monomorphic approach offers the advantage that the type of every method is
inferred by OCaml. Indeed, in this mode, the generated code need not contain
any type annotations. This allows correct, most general (monomorphic) types to
be obtained even in the case where certain hand-written visitor methods
765
(provided via the \ancestors parameter) have unconventional types.
766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784
%
% The downside of this approach is that it does not allow taking several
% distinct instances of a parameterized type. In particular, it is restricted
% to regular algebraic data types (\sref{sec:regularity}).

The polymorphic approach offers the advantage that visitor methods can receive
polymorphic types. If the type \oc|container| is parameterized with a type
variable~\oc|'a|, then the method \tyconvisitor{container} can be assigned a
type that is universally quantified in~\oc|'a|, of the following form:
%
\begin{mdframed}[backgroundcolor=green!10]
\begin{lstlisting}
method visit_container:
  'env 'a .
  ('env -> 'a -> ...) ->
  'env -> 'a container -> ...
\end{lstlisting}
\end{mdframed}
%
785 786
The types of the \tyconvisitor{list} methods, shown later on
(\sref{sec:intro:nonlocal}, \fref{fig:convention}), follow this pattern.
787 788 789 790 791 792 793
%
Because \tyconvisitor{container} is polymorphic, taking multiple instances of
the type \oc|container|, such as \oc|apple| \oc|container| and \oc|orange|
\oc|container|, and attempting to generate visitor methods for these types,
poses no difficulty. This works even if the definition of \oc|'a container|
mentions other instances of this type, such as \oc|('a * 'a)| \oc|container|.
In other words, in the polymorphic approach, irregular algebraic data types
794
(\sref{sec:regularity}) are supported.
795 796 797

One downside of the polymorphic approach is that, because polymorphic types
cannot be inferred by OCaml, the \visitors syntax extension must generate
798 799
polymorphic type annotations. Therefore, it must be able to predict
the type of every visitor method.
800 801 802 803
%
% Actually, not the full type, but at least the polymorphic skeleton.
%
This requires that any visitor methods inherited via \ancestors adhere to a
804
certain convention (\refconvention).
805 806 807 808

In summary, both the monomorphic approach and the polymorphic approach are
currently supported (\sref{sec:intro:parameterized:mono},
\sref{sec:intro:parameterized:poly}). The parameter \polymorphic allows
809 810 811 812 813 814 815 816 817 818 819 820
choosing between them. As a rule of thumb, we suggest setting
%
\oc|polymorphic = true|, as this produces visitors that compose better.

% TEMPORARY give an explanation why monomorphic mode can be useful?

% In my ICFP paper, I believe monomorphic mode is necessary to deal with ['bn]
% and ['env]. Indeed, we cannot expect [visit_abs] to be polymorphic in ['bn]
% and ['env], as it needs to be told how to [extend] an environment with a
% bound name.

% Other examples?
821 822 823

% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
824
\begin{figure}[p]
825
\orig{expr_info}
POTTIER Francois's avatar
POTTIER Francois committed
826
\vspace{-\baselineskip}
827
\processed{expr_info}
828
\caption{A ``monomorphic-method'' visitor for a parameterized type of decorated expressions}
829
\label{fig:expr_info}
POTTIER Francois's avatar
POTTIER Francois committed
830 831
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
832
\begin{figure}[p]
833 834
\codefollowup{expr_info}
\origfirstline{expr_info_use}{3}
POTTIER Francois's avatar
POTTIER Francois committed
835
\caption{Working with different types of decorations}
836
\label{fig:expr_info_use}
POTTIER Francois's avatar
POTTIER Francois committed
837 838
\end{figure}

839
\subsubsection{Monomorphic visitor methods for parameterized types}
840
\label{sec:intro:parameterized:mono}
POTTIER Francois's avatar
POTTIER Francois committed
841

842 843 844 845 846
We begin with an example of the \emph{monomorphic} mode. This mode can be
explicitly requested by writing \oc|polymorphic = false| as part of the
\derivingvisitors annotation. It is the default mode.

In \fref{fig:expr_info}, we define a variant of arithmetic
POTTIER Francois's avatar
POTTIER Francois committed
847 848
expressions where every tree node is decorated with a value of type
\oc|'info|. We request the generation of a \map visitor, whose code is shown
849
in the second part of \fref{fig:expr_info}. The generated code has exactly the
POTTIER Francois's avatar
POTTIER Francois committed
850
same structure as in the previous sections. The only new feature is that the
POTTIER Francois's avatar
POTTIER Francois committed
851 852
class \map now has a virtual method, \tyconvisitor{'info}. The general rule
is, for each type parameter, there is one virtual method, named after it.
853

POTTIER Francois's avatar
POTTIER Francois committed
854 855
The visitor methods are \emph{not} declared polymorphic in a type
variable~\oc|'info|, or in two type variables \oc|'info1|~and~\oc|'info2|, as
POTTIER Francois's avatar
POTTIER Francois committed
856 857 858 859 860
one might perhaps expect. In fact, they must not be declared polymorphic:
indeed, the user who implements \tyconvisitor{'info} in a subclass of \map may
wish to provide an implementation that expects and/or produces specific types
of information.

861
As a result, a visitor \emph{object} is monomorphic: its method
POTTIER Francois's avatar
POTTIER Francois committed
862 863 864 865 866 867 868
\tyconvisitor{'info} must have type \oc|info1 ->| \oc|info2| for certain
specific types \oc|info1| and \oc|info2|. Fortunately, because it is
parameterized over \oc|'self|,\footnote{We explain in \sref{sec:oo:self} why
  all visitor classes are parameterized over \oc|'self|.} the visitor
\emph{class} is polymorphic: two distinct visitor objects can have distinct
types.

POTTIER Francois's avatar
POTTIER Francois committed
869
Although every \emph{automatically generated} method
POTTIER Francois's avatar
POTTIER Francois committed
870 871
is monomorphic, a visitor class can nevertheless \emph{inherit} polymorphic
methods from a parent class, whose name is specified via the \ancestors parameter
872
(\sref{sec:ancestors}). For instance, the \tyconvisitor{list} methods provided by
POTTIER Francois's avatar
POTTIER Francois committed
873
the classes \runtime{iter}, \runtime{map}, and so on, are
POTTIER Francois's avatar
POTTIER Francois committed
874 875 876
polymorphic in the types of the list elements. (See \sref{sec:intro:nonlocal}
for more information on the treatment of preexisting types.)

877 878
\fref{fig:expr_info_use} presents two example uses of the class \map defined in
\fref{fig:expr_info}. In the first example, we define a function \oc|strip|, of
POTTIER Francois's avatar
POTTIER Francois committed
879 880 881 882
type \oc|'info expr -> unit expr|, which strips off the decorations in an
arithmetic expression, replacing them with unit values. In the second example,
we define a function \oc|number|, of type \oc|'info expr -> int expr|, which
decorates each node in an arithmetic expression with a unique integer number.%
POTTIER Francois's avatar
POTTIER Francois committed
883 884 885
\footnote{Because the \oc|info| field appears before the \oc|node| field in
  the definition of the type \oc|expr|, and because fields are visited
  left-to-right, we get a prefix numbering scheme. By exchanging these fields,
POTTIER Francois's avatar
POTTIER Francois committed
886
  we would get postfix numbering.} %
POTTIER Francois's avatar
POTTIER Francois committed
887 888 889

% ------------------------------------------------------------------------------

890
\subsubsection{Polymorphic visitor methods for parameterized types}
891 892
\label{sec:intro:parameterized:poly}

893 894 895 896 897 898 899 900 901 902 903 904 905 906 907
\begin{figure}[p]
\orig{expr_info_polymorphic}
\vspace{-\baselineskip}
\processed{expr_info_polymorphic}
\caption{A ``polymorphic-method'' visitor for a parameterized type of decorated expressions}
\label{fig:expr_info_polymorphic}
\end{figure}

\begin{figure}[p]
\codefollowup{expr_info_polymorphic}
\origfirstline{expr_info_polymorphic_use}{3}
\caption{Working with different types of decorations}
\label{fig:expr_info_polymorphic_use}
\end{figure}

908 909
We continue with an example of the \emph{polymorphic} mode. This mode can be
explicitly requested by writing \oc|polymorphic = true| as part of the
910 911 912 913
\derivingvisitors annotation. It is available for all varieties of visitors
except \fold and \foldtwo. The reason why it seems difficult to come up with
a satisfactory polymorphic type for \fold visitor methods is explained later
on (\sref{sec:map_from_fold}).
914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978

In \fref{fig:expr_info_polymorphic}, we again have arithmetic expressions
where every tree node is decorated with a value of type \oc|'info|. We again
request a \map visitor but, this time, we specify \oc|polymorphic = true|.
%
In order to make the generated code shorter, we specify \oc|data = false|
(\sref{sec:params}), which causes the methods \dataconvisitor{EConst} and
\dataconvisitor{EAdd} to disappear. (They are inlined at their call sites.)

The generated code is the same as in the previous section,
% except the methods \dataconvisitor{EConst} and \dataconvisitor{EAdd} have
% disappeared, as explained above, and
except that \tyconvisitor{'info} is not a method any more. Instead, it is
a function, which is passed as an argument to every visitor method.

% Because \tyconvisitor{'info} is a function, as opposed to a virtual method,
The class \map does not have any virtual methods. It can thus be declared as
concrete class: to this end, we specify
%
\oc|concrete = true| (\sref{sec:params}). Without this indication, it would be
declared \oc|virtual|.

Because \tyconvisitor{'info} is an argument of every visitor method,
every visitor method can be declared polymorphic in the type variables~\oc|'env|,
\oc|'info_0| and \oc|'info_1|,
where the function \tyconvisitor{'info} has type \oc|'env -> 'info_0 -> 'info_1|.
%
The fact that we would like every method to have a polymorphic type cannot
be inferred by OCaml. For this reason, the generated code contains explicit
polymorphic type annotations.

\fref{fig:expr_info_polymorphic_use} presents two example uses of the class
\map defined in \fref{fig:expr_info_polymorphic}. As in
\fref{fig:expr_info_use}, we define two functions \oc|strip| and \oc|number|.
A striking feature of this code is that a single visitor object~\oc|v| is now
allocated, once and for all, and is used in every subsequent call to \oc|strip|
and \oc|number|. The two \tyconvisitor{'info} functions are also defined once
and for all.%
%
\footnote{Although we have nested these functions inside the definitions of
  \oc|strip| and \oc|number|, they are closed, so they could be hoisted out to
  the top level, if desired.}

In the definition of \oc|number|, we choose to store the current count in a
reference, \oc|count|, and to let \oc|count| play the role of the
``environment''. Thus, we initially pass \oc|count| to \tyconvisitor{expr},
and \tyconvisitor{'info} receives \oc|count| as its ``environment'' argument.

%%

The polymorphic type annotations that are automatically generated
(\fref{fig:expr_info_polymorphic}) follow a certain fixed convention. If the
user throws in hand-written visitor methods via the \ancestors parameter, then
those hand-written methods must adhere to the same convention (or the
generated code would be ill-typed). This convention is illustrated in the next
section (\sref{sec:intro:nonlocal}) with the example of the
\tyconvisitor{list} methods.

A type variable is not allowed to occur under an \oc|[@opaque]| annotation
(\sref{sec:opaque}). Indeed, annotating a type variable~\oc|'a| with
\oc|[@opaque]| would cause special-purpose visit code to be generated, whose
type is not as polymorphic as required by the above convention.

%%

979 980
% We might wish to document that the type variable names \oc|'s| and \oc|'env|
% are reserved.
981

982 983
% ------------------------------------------------------------------------------

984 985
% At the moment, the following feature is considered experimental and
% undocumented:
986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006

\begin{comment}

\subsubsection{Mixing the monomorphic and polymorphic modes}
\label{sec:intro:parameterized:fine}

It is possible to mix the monomorphic and polymorphic modes
(\sref{sec:intro:parameterized:mono}, \sref{sec:intro:parameterized:poly}) in
a single generated visitor class. Suppose we wish to generate a visitor class
for the type \oc|('a, 'b) dictionary|. Suppose, for some reason, that we would
like \tyconvisitor{'a} to be a virtual visitor method and \tyconvisitor{'b} to
be a visitor function, which is passed as an argument to
\tyconvisitor{dictionary}. This can be achieved by declaring
%
\oc|polymorphic = ["'b"]| as part of the \derivingvisitors annotation.

% The user can control whether the visitor methods are polymorphic in \oc|'env|.
% This is done by including (or not including) \oc|'env| in the list.
% (The type variable \oc|'env| is reserved.)

\end{comment}
1007 1008 1009

% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
1010 1011 1012 1013
\begin{figure}[t]
\orig{expr11}
\vspace{-\baselineskip}
\processed{expr11}
1014
\caption{Using preexisting (parameterized) types, such as \oc|int| and \oc|list|}
POTTIER Francois's avatar
POTTIER Francois committed
1015 1016 1017
\label{fig:expr11}
\end{figure}

1018
\input{convention}
1019

1020
\subsection{Dealing with references to preexisting types}
POTTIER Francois's avatar
POTTIER Francois committed
1021 1022 1023 1024 1025 1026 1027
\label{sec:intro:nonlocal}

A type definition can contain references to the types that are being defined,
also known as \emph{local} types. For instance, in \fref{fig:expr00}, the
definition of \oc|EAdd| contains two references to a local type, namely
\oc|expr|.

1028
A type definition can also contain references to preexisting types, also
POTTIER Francois's avatar
POTTIER Francois committed
1029 1030 1031 1032 1033 1034 1035
known as \emph{nonlocal} types. For instance, in \fref{fig:expr00}, the
definition of \oc|EConst| contains a reference to a nonlocal type, namely
\oc|int|, which happens to be one of OCaml's primitive types. In
\fref{fig:expr11}, the definition of \oc|EAdd| contains a reference to a
parameterized nonlocal type, namely \oc|list|, which happens to be defined in
OCaml's standard library.

1036 1037 1038 1039
The treatment of local types has been illustrated in the previous sections. In
short, for every local type, a visitor method is called, and is defined:
for instance, for the local type \oc|expr|, we define the (concrete) method
\tyconvisitor{expr}.
POTTIER Francois's avatar
POTTIER Francois committed
1040

1041
The treatment of nonlocal types is the same, except the visitor method is not
1042 1043 1044
defined, nor declared. That is, it is called, but is neither defined (as a
concrete method) or declared (as a virtual method). Therefore, its definition
must be provided by an ancestor class.
POTTIER Francois's avatar
POTTIER Francois committed
1045

1046 1047
For most of OCaml primitive or built-in types, support is provided by the
module \oc|VisitorsRuntime|. This module contains several classes named
POTTIER Francois's avatar
POTTIER Francois committed
1048
\oc|iter|, \oc|map|, and so on; each of them supplies methods named
1049 1050
\tyconvisitor{int}, \tyconvisitor{list}, and so on.%
%
POTTIER Francois's avatar
POTTIER Francois committed
1051 1052
\footnote{As an exception to this rule, the classes \runtime{fold}
  and \runtime{fold2} do not supply any methods, because we do not
1053 1054 1055 1056
  wish to prematurely fix the types of the visitor methods. Please consult
  \srcFile{VisitorsRuntime.ml} to check which methods exist and what they
  do.} %
%
POTTIER Francois's avatar
Fix.  
POTTIER Francois committed
1057
As is evident in Figures~\ref{fig:expr00}, \ref{fig:expr01}, and so on, the
1058 1059
generated visitor automatically inherits from the appropriate class in
\oc|VisitorsRuntime|, so it receives default implementations of the methods
1060
\tyconvisitor{int}, \tyconvisitor{list}, and so on.
1061

1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087
The visitor methods for parameterized data types (\oc|array|, \oc|Lazy.t|,
\oc|list|, \oc|option|, \oc|ref|, \oc|result|) are polymorphic, so it is not a
problem if both lists of apples and lists of oranges need to be traversed.
The types of these methods follow a strict \emph{convention},
illustrated in \fref{fig:convention} with the example of \tyconvisitor{list}.

The \tyconvisitor{list} methods in the classes \runtime{iter}, \runtime{map},
and so on, have been hand-written, but could equally well have been generated,
with \oc|polymorphic = true|: their types would be the same.
%
% The code would not be the same. At the moment, we are using List.fold_left
% instead of a natural fold.
%
This illustrates the fact that, with \oc|polymorphic = true|,
\emph{visitors are compositional}.
%
% With \oc|polymorphic = false|, visitors are compositional, too,
% but only for non-parameterized types.
%
That is, if the definition of the type \oc|'b bar| refers to the type \oc|'a foo|,
then one can \emph{separately} generate a visitor class for \oc|'a foo|
and generate a visitor class for \oc|'b bar|, which inherits the previous class.
% There is no need to generate a single visitor class for \oc|'a foo| and
% \oc|'b bar| at once.

At a primitive type, it is advisable to carefully consider what
1088 1089 1090 1091 1092 1093 1094 1095
behavior is desired. On the one hand, perhaps the inherited method
\tyconvisitor{int} need not be invoked in the first place; this behavior can
be obtained by using \oc|(int[@opaque])| instead of \oc|int|. (See
\sref{sec:opaque} for details.) This is done, for instance, in
\fref{fig:expr15}, where one can check that no call to \tyconvisitor{int} is
generated. On the other hand, when one decides to use an inherited method, one
should make sure that one understands its behavior. The methods
\tyconvisitor{array} and \tyconvisitor{ref} in the class
POTTIER Francois's avatar
POTTIER Francois committed
1096
\runtime{map}, for instance, perform a copy of a mutable memory
1097 1098 1099 1100 1101
block: one should be aware that such a copy is taking place. If this behavior
is undesirable, it can be overridden.

It is possible to inherit as many classes as one wishes, beyond those defined
in \oc|VisitorsRuntime|. This is done via the \ancestors parameter
1102 1103
(\sref{sec:ancestors}). It is also possible to \emph{not} inherit any methods
from \oc|VisitorsRuntime|. This is done via the \nude parameter
1104
(\sref{sec:ancestors}).
POTTIER Francois's avatar
POTTIER Francois committed
1105

POTTIER Francois's avatar
POTTIER Francois committed
1106
% ------------------------------------------------------------------------------
1107 1108

\subsection{Generating visitors for preexisting types}
POTTIER Francois's avatar
POTTIER Francois committed
1109
\label{sec:import}
1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147

Because the \derivingvisitors annotation must be attached to the type
definition, it may seem as if it is impossible to generate a visitor for a
type whose definition is out of reach. Suppose, for instance, that the type
\oc|expr| of arithmetic expressions is defined in a module \oc|Expr|, which,
for some reason, we cannot modify. Can we generate a visitor for this type?

Fortunately, the answer is positive. The basic trick, documented in the
\href{https://github.com/whitequark/ppx_deriving/#working-with-existing-types}
{\texttt{ppx\_deriving} \textsc{readme}},
consists in defining a new type \oc|expr| of arithmetic expressions and to
explicitly declare that it is equal to the preexisting type \oc|Expr.expr|,
as follows.

\orig{expr_redef}

As can be seen above, the new definition of \oc|expr| can be annotated with
\derivingvisitors, yielding a visitor for the new type \oc|expr|, which by
definition, is equal to the preexisting type \oc|Expr.expr|. Thus, this
visitor class be used to traverse expressions of type \oc|Expr.expr|.

This approach works, but requires repeating the definition of the type \oc|expr|.
This duplication can be eliminated thanks to the \ppximport syntax extension,
as follows:

\orig{expr_import}

This expands to the code shown previously. (To use this syntax extension,
assuming you are using \ocamlbuild and \ocamlfind, just add the line
\texttt{true: package(ppx\_import)} to your \texttt{\_tags} file.) As icing on
the cake, \ppximport allows decorating the type definition, on the fly, with
new attributes. In the following examples, we replace all occurrences of
\oc|int| with \oc|int[@opaque]| (\sref{sec:opaque}), so as to ensure that the
generated visitor does not invoke the method \tyconvisitor{int}:

\orig{expr_import_opaque}

% ------------------------------------------------------------------------------
POTTIER Francois's avatar
POTTIER Francois committed
1148 1149
% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
1150
\section{Advanced examples}
POTTIER Francois's avatar
POTTIER Francois committed
1151 1152 1153 1154
\label{sec:advanced}

% ------------------------------------------------------------------------------

POTTIER Francois's avatar
POTTIER Francois committed
1155
\begin{figure}[p]
POTTIER Francois's avatar
POTTIER Francois committed
1156
\orig{expr12}
1157 1158 1159
\vspace{-\baselineskip}
\processed{expr12}
\caption{An open type of arithmetic expressions} % and a visitor for it
POTTIER Francois's avatar
POTTIER Francois committed
1160 1161 1162
\label{fig:expr12}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176
\begin{figure}[p]
\codefollowup{expr12}
\origfirstline{expr13}{3}
\caption{A closed type of arithmetic expressions}
\label{fig:expr13}
\end{figure}

\begin{figure}[p]
\codefollowup{expr12}
\origfirstline{expr08}{3}
\caption{A closed type of hash-consed arithmetic expressions}
\label{fig:expr08}
\end{figure}

POTTIER Francois's avatar
POTTIER Francois committed
1177
\subsection{Visitors for open and closed data types}
1178
\label{sec:advanced:openclosed}
POTTIER Francois's avatar
POTTIER Francois committed
1179 1180 1181 1182 1183 1184 1185 1186

The algebraic data types of arithmetic expressions shown in the previous
section (\sref{sec:intro}) are \emph{closed}. That is, the type \oc|expr|
is recursive: an expression of type \oc|expr| has subexpressions of type
\oc|expr|.

It is often desirable, for greater flexibility, to first define an \emph{open}
type of arithmetic expressions. Such a type, say \oc|oexpr|, is parameterized
POTTIER Francois's avatar
POTTIER Francois committed
1187
over a type variable~\oc|'expr|. It is not recursive: an expression of type
POTTIER Francois's avatar
POTTIER Francois committed
1188 1189 1190
\oc|'expr oexpr| has subexpressions of type \oc|'expr|. It is shown in
\fref{fig:expr12}. Naturally, we may request the generation of visitors for
the type \oc|oexpr|. In \fref{fig:expr12}, we generate a class of \map
POTTIER Francois's avatar
POTTIER Francois committed
1191
visitors, which we name \oc|omap|. (This is an example of using an explicit
1192
\name parameter.) As explained earlier (\sref{sec:intro:parameterized:mono}),
POTTIER Francois's avatar
POTTIER Francois committed
1193 1194
because the type \oc|oexpr| is parameterized over the type variable
\oc|'expr|, the visitor class has a virtual method, \tyconvisitor{'expr}.
1195 1196 1197 1198
%
In this example, we use the monomorphic mode (\sref{sec:intro:parameterized:mono}),
but could just as well use the polymorphic mode (\sref{sec:intro:parameterized:poly}):
this is left as an exercise for the reader.
POTTIER Francois's avatar
POTTIER Francois committed
1199

POTTIER Francois's avatar
POTTIER Francois committed
1200
A closed (recursive) type of expressions, \oc|expr|, can then be defined in
POTTIER Francois's avatar
POTTIER Francois committed
1201 1202
terms of \oc|expr|. This is done in \fref{fig:expr13}. In type-theoretical
terms, one would like to define \oc|expr| as the fixed point of the functor
1203 1204
\oc|oexpr|~\cite{milewski-13}.
%
POTTIER Francois's avatar
POTTIER Francois committed
1205 1206 1207 1208 1209
That is, roughly speaking, one would like to define \oc|type expr = expr oexpr|.
This is not accepted by OCaml, though;%
%
\footnote{It would be accepted by OCaml with the command line switch
  \texttt{-rectypes}, which instructs the typechecker to tolerate
POTTIER Francois's avatar
POTTIER Francois committed
1210
  equirecursive types. However, this is not a good idea, as it causes
POTTIER Francois's avatar
POTTIER Francois committed
1211 1212 1213 1214 1215