diff --git a/misc/post.md b/misc/post.md index 8dba3836a2f401426f7311d636076c7f77b7e9b1..c5319db4c9b52ef8e9e9c787d7195d3a5184c2e8 100644 --- a/misc/post.md +++ b/misc/post.md @@ -15,6 +15,8 @@ a library that offers facilities for constructing (recursive) memoized functions and for performing least fixed point computations. + + ## From REs to DFAs, via Brzozowski derivatives Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is @@ -65,6 +67,8 @@ The empty regular expression can be viewed as an empty disjunction. [The complete code](http://gitlab.inria.fr/fpottier/fix/demos/brz/Brzozowski.ml) for this demo is also available. + + ## An alphabet Throughout, I assume that the alphabet is given by a module `Char` whose @@ -86,6 +90,8 @@ the function `Char.foreach`, which enumerates all characters. As an exercise for the reader, this can be used to define an auxiliary function `exists_char` of type `(Char.t -> bool) -> bool`. + + ## Regular expressions, hash-consed The syntax of regular expressions (expressions, for short) is naturally @@ -151,10 +157,85 @@ let skeleton : regexp -> skeleton = HashCons.data ``` -The function `make` is where the magic takes place: whenever we wish to -construct a new expression, we construct just a skeleton and pass it to +The function `make` is where the magic takes place: whenever one wishes to +construct a new expression, one constructs just a skeleton and passes it to `make`, which takes care of determining whether this skeleton is already known (in which case an existing integer identity is re-used) or is new (in which case a fresh integer identity is allocated). The function `skeleton`, which converts between an expression and a skeleton in the reverse direction, is just a pair projection. + +Here are two basic examples of the use of `make`: + +``` +let epsilon : regexp = + make EEpsilon +let zero : regexp = + make (EDisj []) +``` + +Still using `make`, one can define several *smart constructors* that build +expressions: concatenation `@@`, disjunction, conjunction, iteration, +negation. These smart constructors reduce expressions to a normal form with +respect to the equational theory EQTH. In particular, `disjunction` flattens +nested disjunctions, sorts the disjuncts, and removes duplicate disjuncts, so +that two disjunctions that are equal according to EQTH are indeed recognized +as equal. + +``` +let (@@) : regexp -> regexp -> regexp = ... +let star : regexp -> regexp = ... +let disjunction : regexp list -> regexp = ... +let conjunction : regexp list -> regexp = ... +let neg : regexp -> regexp = ... +``` + + + +## Nullability + +An expression is nullable if and only if it accepts the empty word. +Determining whether an expression is nullable is a simple matter of +writing a recursive function `nullable` of type `regexp -> bool`. + +I memoize this function, so the nullability of an expression is computed at +most once and can be retrieved immediately if requested again. (This is not +mandatory, but it is convenient to be able to call `nullable` without worrying +about its cost. Another approach, by the way, would be to store nullability +information inside each expression.) + +The module +[Memoize](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Memoize.mli), +which is part of +[Fix](https://gitlab.inria.fr/fpottier/fix/), +provides facilities for memoization. +To use it, I apply the functor `Memoize.ForHashedType` +to a little module `R` (not shown) which equips the type `regexp` with +equality, comparison, and hashing functions. (Because expressions +carry unique integer identifiers, the definition of `R` is trivial.) +This functor application yields a module `M` which offers +a memoizing fixed-point combinator `M.fix`. + +Then, instead of defining `nullable` directly as a recursive function, +I define it as an application of `M.fix`, as follows. + +``` +let nullable : regexp -> bool = + let module M = Memoize.ForHashedType(R) in + M.fix (fun nullable e -> + match skeleton e with + | EChar _ -> + false + | EEpsilon + | EStar _ -> + true + | ECat (e1, e2) -> + nullable e1 && nullable e2 + | EDisj es -> + exists nullable es + | EConj es -> + forall nullable es + | ENeg e -> + not (nullable e) + ) +```