# A feeling of déjà vu
There are several ways of compiling
a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE)
down to a
[deterministic finite-state automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA).
One such way is based on
[Brzozowski derivatives](https://en.wikipedia.org/wiki/Brzozowski_derivative)
of regular expressions.
In this post,
I describe a concise OCaml implementation of this transformation.
This is an opportunity to illustrate the use of
[Fix](https://gitlab.inria.fr/fpottier/fix/),
a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.
## From REs to DFAs, via Brzozowski derivatives
Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is
the set of words obtained by keeping only the words that begin with `a` and by
crossing out, in each such word, the initial letter `a`. For instance, the
derivative of the set `{ ace, amid, bar }` with respect to `a` is the set `{
ce, mid }`.
A regular expression is a syntactic description of a set of words. If the set
`e` is described by a regular expression, then its derivative `delta a e` is
also described by a regular expression, which can be effectively computed.
Now, suppose that I am a machine and I am scanning a text, searching for a
certain pattern. At each point in time, my current **state** of mind is
described by a regular expression `e`: this expression represents the set of
words that I am hoping to read, and that I am willing to accept. After I read
one character, say `a`, my current state **changes** to `delta a e`, because I
have restricted my attention to the words of `e` that begin with `a`, and I am
now hoping to recognize the remainder of such a word.
Thus, the idea, in a nutshell, is to **build a deterministic automaton whose
states are regular expressions and whose transition function is `delta`**.
The main nontrivial aspect of this apparently simple-minded approach is the
fact that **only a finite number of states arise** when one starts with a
regular expression `e` and explores its descendants through `delta`. In other
words, a regular expression `e` only has a finite number of iterated
derivatives, up to a certain equational theory. Thanks to this property, which
I won't prove here, the construction terminates, and yields a **finite-state**
automaton.
For more details, please consult the paper
[Regular-expression derivatives re-examined](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf)
by Scott Owens, John Reppy and Aaron Turon.