Nous avons procédé ce jeudi matin 08 avril 2021 à une MAJ de sécurité urgente. Nous sommes passé de la version 13.9.3 à la version 13.9.5 les releases notes correspondantes sont ici: 2.66 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
# A feeling of déjà vu

There are several ways of compiling
a [regular expression]( (RE)
down to a
[deterministic finite-state automaton]( (DFA).
One such way is based on
[Brzozowski derivatives](
of regular expressions.
In this post,
I describe a concise OCaml implementation of this transformation.
This is an opportunity to illustrate the use of
a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.

## From REs to DFAs, via Brzozowski derivatives

Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is
the set of words obtained by keeping only the words that begin with `a` and by
crossing out, in each such word, the initial letter `a`. For instance, the
derivative of the set `{ ace, amid, bar }` with respect to `a` is the set `{
ce, mid }`.

A regular expression is a syntactic description of a set of words. If the set
`e` is described by a regular expression, then its derivative `delta a e` is
also described by a regular expression, which can be effectively computed.

Now, suppose that I am a machine and I am scanning a text, searching for a
certain pattern. At each point in time, my current **state** of mind is
described by a regular expression `e`: this expression represents the set of
words that I am hoping to read, and that I am willing to accept. After I read
one character, say `a`, my current state **changes** to `delta a e`, because I
have restricted my attention to the words of `e` that begin with `a`, and I am
now hoping to recognize the remainder of such a word.

Thus, the idea, in a nutshell, is to **build a deterministic automaton whose
states are regular expressions and whose transition function is `delta`**.

The main nontrivial aspect of this apparently simple-minded approach is the
fact that **only a finite number of states arise** when one starts with a
regular expression `e` and explores its descendants through `delta`. In other
words, a regular expression `e` only has a finite number of iterated
derivatives, up to a certain equational theory. Thanks to this property, which
I won't prove here, the construction terminates, and yields a **finite-state**
<!-- Cuius rei demonstrationem mirabilem sane detexi hanc marginis exiguitas -->
<!-- non caperet. -->

For more details, please consult the paper
[Regular-expression derivatives re-examined](
by Scott Owens, John Reppy and Aaron Turon.