From 78a947be65d1e554a44cab2adc29b7a67eef3117 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Franc=CC=A7ois=20Pottier?=
Date: Thu, 29 Nov 2018 22:28:13 +0100
Subject: [PATCH] Progress up to [nullable].
---
misc/post.md | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 83 insertions(+), 2 deletions(-)
diff --git a/misc/post.md b/misc/post.md
index 8dba383..c5319db 100644
--- a/misc/post.md
+++ b/misc/post.md
@@ -15,6 +15,8 @@ a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.
+
+
## From REs to DFAs, via Brzozowski derivatives
Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is
@@ -65,6 +67,8 @@ The empty regular expression can be viewed as an empty disjunction.
[The complete code](http://gitlab.inria.fr/fpottier/fix/demos/brz/Brzozowski.ml)
for this demo is also available.
+
+
## An alphabet
Throughout, I assume that the alphabet is given by a module `Char` whose
@@ -86,6 +90,8 @@ the function `Char.foreach`, which enumerates all characters.
As an exercise for the reader, this can be used to define an auxiliary
function `exists_char` of type `(Char.t -> bool) -> bool`.
+
+
## Regular expressions, hash-consed
The syntax of regular expressions (expressions, for short) is naturally
@@ -151,10 +157,85 @@ let skeleton : regexp -> skeleton =
HashCons.data
```
-The function `make` is where the magic takes place: whenever we wish to
-construct a new expression, we construct just a skeleton and pass it to
+The function `make` is where the magic takes place: whenever one wishes to
+construct a new expression, one constructs just a skeleton and passes it to
`make`, which takes care of determining whether this skeleton is already
known (in which case an existing integer identity is re-used) or is new
(in which case a fresh integer identity is allocated).
The function `skeleton`, which converts between an expression and a
skeleton in the reverse direction, is just a pair projection.
+
+Here are two basic examples of the use of `make`:
+
+```
+let epsilon : regexp =
+ make EEpsilon
+let zero : regexp =
+ make (EDisj [])
+```
+
+Still using `make`, one can define several *smart constructors* that build
+expressions: concatenation `@@`, disjunction, conjunction, iteration,
+negation. These smart constructors reduce expressions to a normal form with
+respect to the equational theory EQTH. In particular, `disjunction` flattens
+nested disjunctions, sorts the disjuncts, and removes duplicate disjuncts, so
+that two disjunctions that are equal according to EQTH are indeed recognized
+as equal.
+
+```
+let (@@) : regexp -> regexp -> regexp = ...
+let star : regexp -> regexp = ...
+let disjunction : regexp list -> regexp = ...
+let conjunction : regexp list -> regexp = ...
+let neg : regexp -> regexp = ...
+```
+
+
+
+## Nullability
+
+An expression is nullable if and only if it accepts the empty word.
+Determining whether an expression is nullable is a simple matter of
+writing a recursive function `nullable` of type `regexp -> bool`.
+
+I memoize this function, so the nullability of an expression is computed at
+most once and can be retrieved immediately if requested again. (This is not
+mandatory, but it is convenient to be able to call `nullable` without worrying
+about its cost. Another approach, by the way, would be to store nullability
+information inside each expression.)
+
+The module
+[Memoize](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Memoize.mli),
+which is part of
+[Fix](https://gitlab.inria.fr/fpottier/fix/),
+provides facilities for memoization.
+To use it, I apply the functor `Memoize.ForHashedType`
+to a little module `R` (not shown) which equips the type `regexp` with
+equality, comparison, and hashing functions. (Because expressions
+carry unique integer identifiers, the definition of `R` is trivial.)
+This functor application yields a module `M` which offers
+a memoizing fixed-point combinator `M.fix`.
+
+Then, instead of defining `nullable` directly as a recursive function,
+I define it as an application of `M.fix`, as follows.
+
+```
+let nullable : regexp -> bool =
+ let module M = Memoize.ForHashedType(R) in
+ M.fix (fun nullable e ->
+ match skeleton e with
+ | EChar _ ->
+ false
+ | EEpsilon
+ | EStar _ ->
+ true
+ | ECat (e1, e2) ->
+ nullable e1 && nullable e2
+ | EDisj es ->
+ exists nullable es
+ | EConj es ->
+ forall nullable es
+ | ENeg e ->
+ not (nullable e)
+ )
+```
--
GitLab