 ### New introductory section in the blog post.

parent 195c19d7
 # Been there, done that: from REs to DFAs There are several ways of compiling a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE) down to a [deterministic finite-state automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA). One such way is based on [Brzozowski derivatives](https://en.wikipedia.org/wiki/Brzozowski_derivative) of regular expressions. In this post, I describe a concise OCaml implementation of this transformation. This is an opportunity to illustrate the use of [Fix](https://gitlab.inria.fr/fpottier/fix/), a library that offers facilities for constructing (recursive) memoized functions and for performing least fixed point computations. The transformation of REs to DFAs is based on the description given by Scott Owens, John Reppy and Aaron Turon in the paper [Regular-expression derivatives re-examined](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf). ## Preliminaries In order to read the following, a tiny bit of vocabulary is required. ... ...
misc/post.md 0 → 100644
 # A feeling of déjà vu There are several ways of compiling a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE) down to a [deterministic finite-state automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA). One such way is based on [Brzozowski derivatives](https://en.wikipedia.org/wiki/Brzozowski_derivative) of regular expressions. In this post, I describe a concise OCaml implementation of this transformation. This is an opportunity to illustrate the use of [Fix](https://gitlab.inria.fr/fpottier/fix/), a library that offers facilities for constructing (recursive) memoized functions and for performing least fixed point computations. ## From REs to DFAs, via Brzozowski derivatives Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is the set of words obtained by keeping only the words that begin with `a` and by crossing out, in each such word, the initial letter `a`. For instance, the derivative of the set `{ ace, amid, bar }` with respect to `a` is the set `{ ce, mid }`. A regular expression is a syntactic description of a set of words. If the set `e` is described by a regular expression, then its derivative `delta a e` is also described by a regular expression, which can be effectively computed. Now, suppose that I am a machine and I am scanning a text, searching for a certain pattern. At each point in time, my current **state** of mind is described by a regular expression `e`: this expression represents the set of words that I am hoping to read, and that I am willing to accept. After I read one character, say `a`, my current state **changes** to `delta a e`, because I have restricted my attention to the words of `e` that begin with `a`, and I am now hoping to recognize the remainder of such a word. Thus, the idea, in a nutshell, is to **build a deterministic automaton whose states are regular expressions and whose transition function is `delta`**. The main nontrivial aspect of this apparently simple-minded approach is the fact that **only a finite number of states arise** when one starts with a regular expression `e` and explores its descendants through `delta`. In other words, a regular expression `e` only has a finite number of iterated derivatives, up to a certain equational theory. Thanks to this property, which I won't prove here, the construction terminates, and yields a **finite-state** automaton. For more details, please consult the paper [Regular-expression derivatives re-examined](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf) by Scott Owens, John Reppy and Aaron Turon.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!