From 423e6ac23ae99c16ab684c59420ecaa220f30e8d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Franc=CC=A7ois=20Pottier?= Date: Wed, 28 Nov 2018 08:18:55 +0100 Subject: [PATCH] New introductory section in the blog post. --- demos/brz/post.md => misc/attic.md | 22 ------------- misc/post.md | 53 ++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+), 22 deletions(-) rename demos/brz/post.md => misc/attic.md (79%) create mode 100644 misc/post.md diff --git a/demos/brz/post.md b/misc/attic.md similarity index 79% rename from demos/brz/post.md rename to misc/attic.md index 1309606..c2e5451 100644 --- a/demos/brz/post.md +++ b/misc/attic.md @@ -1,25 +1,3 @@ -# Been there, done that: from REs to DFAs - - -There are several ways of compiling -a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE) -down to a -[deterministic finite-state automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA). -One such way is based on -[Brzozowski derivatives](https://en.wikipedia.org/wiki/Brzozowski_derivative) -of regular expressions. -In this post, -I describe a concise OCaml implementation of this transformation. -This is an opportunity to illustrate the use of -[Fix](https://gitlab.inria.fr/fpottier/fix/), -a library that offers facilities for -constructing (recursive) memoized functions -and for performing least fixed point computations. - -The transformation of REs to DFAs is based on the description -given by Scott Owens, John Reppy and Aaron Turon in the paper -[Regular-expression derivatives re-examined](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf). - ## Preliminaries In order to read the following, a tiny bit of vocabulary is required. diff --git a/misc/post.md b/misc/post.md new file mode 100644 index 0000000..b1c6459 --- /dev/null +++ b/misc/post.md @@ -0,0 +1,53 @@ +# A feeling of déjà vu + +There are several ways of compiling +a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE) +down to a +[deterministic finite-state automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA). +One such way is based on +[Brzozowski derivatives](https://en.wikipedia.org/wiki/Brzozowski_derivative) +of regular expressions. +In this post, +I describe a concise OCaml implementation of this transformation. +This is an opportunity to illustrate the use of +[Fix](https://gitlab.inria.fr/fpottier/fix/), +a library that offers facilities for +constructing (recursive) memoized functions +and for performing least fixed point computations. + +## From REs to DFAs, via Brzozowski derivatives + +Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is +the set of words obtained by keeping only the words that begin with `a` and by +crossing out, in each such word, the initial letter `a`. For instance, the +derivative of the set `{ ace, amid, bar }` with respect to `a` is the set `{ +ce, mid }`. + +A regular expression is a syntactic description of a set of words. If the set +`e` is described by a regular expression, then its derivative `delta a e` is +also described by a regular expression, which can be effectively computed. + +Now, suppose that I am a machine and I am scanning a text, searching for a +certain pattern. At each point in time, my current **state** of mind is +described by a regular expression `e`: this expression represents the set of +words that I am hoping to read, and that I am willing to accept. After I read +one character, say `a`, my current state **changes** to `delta a e`, because I +have restricted my attention to the words of `e` that begin with `a`, and I am +now hoping to recognize the remainder of such a word. + +Thus, the idea, in a nutshell, is to **build a deterministic automaton whose +states are regular expressions and whose transition function is `delta`**. + +The main nontrivial aspect of this apparently simple-minded approach is the +fact that **only a finite number of states arise** when one starts with a +regular expression `e` and explores its descendants through `delta`. In other +words, a regular expression `e` only has a finite number of iterated +derivatives, up to a certain equational theory. Thanks to this property, which +I won't prove here, the construction terminates, and yields a **finite-state** +automaton. + + + +For more details, please consult the paper +[Regular-expression derivatives re-examined](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf) +by Scott Owens, John Reppy and Aaron Turon. -- GitLab