 ### Progress up to [nullable].

parent ecbab0ba
 ... ... @@ -15,6 +15,8 @@ a library that offers facilities for constructing (recursive) memoized functions and for performing least fixed point computations. ## From REs to DFAs, via Brzozowski derivatives Suppose `e` denotes a set of words. Then, its **derivative** `delta a e` is ... ... @@ -65,6 +67,8 @@ The empty regular expression can be viewed as an empty disjunction. [The complete code](http://gitlab.inria.fr/fpottier/fix/demos/brz/Brzozowski.ml) for this demo is also available. ## An alphabet Throughout, I assume that the alphabet is given by a module `Char` whose ... ... @@ -86,6 +90,8 @@ the function `Char.foreach`, which enumerates all characters. As an exercise for the reader, this can be used to define an auxiliary function `exists_char` of type `(Char.t -> bool) -> bool`. ## Regular expressions, hash-consed The syntax of regular expressions (expressions, for short) is naturally ... ... @@ -151,10 +157,85 @@ let skeleton : regexp -> skeleton = HashCons.data ``` The function `make` is where the magic takes place: whenever we wish to construct a new expression, we construct just a skeleton and pass it to The function `make` is where the magic takes place: whenever one wishes to construct a new expression, one constructs just a skeleton and passes it to `make`, which takes care of determining whether this skeleton is already known (in which case an existing integer identity is re-used) or is new (in which case a fresh integer identity is allocated). The function `skeleton`, which converts between an expression and a skeleton in the reverse direction, is just a pair projection. Here are two basic examples of the use of `make`: ``` let epsilon : regexp = make EEpsilon let zero : regexp = make (EDisj []) ``` Still using `make`, one can define several *smart constructors* that build expressions: concatenation `@@`, disjunction, conjunction, iteration, negation. These smart constructors reduce expressions to a normal form with respect to the equational theory EQTH. In particular, `disjunction` flattens nested disjunctions, sorts the disjuncts, and removes duplicate disjuncts, so that two disjunctions that are equal according to EQTH are indeed recognized as equal. ``` let (@@) : regexp -> regexp -> regexp = ... let star : regexp -> regexp = ... let disjunction : regexp list -> regexp = ... let conjunction : regexp list -> regexp = ... let neg : regexp -> regexp = ... ``` ## Nullability An expression is nullable if and only if it accepts the empty word. Determining whether an expression is nullable is a simple matter of writing a recursive function `nullable` of type `regexp -> bool`. I memoize this function, so the nullability of an expression is computed at most once and can be retrieved immediately if requested again. (This is not mandatory, but it is convenient to be able to call `nullable` without worrying about its cost. Another approach, by the way, would be to store nullability information inside each expression.) The module [Memoize](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Memoize.mli), which is part of [Fix](https://gitlab.inria.fr/fpottier/fix/), provides facilities for memoization. To use it, I apply the functor `Memoize.ForHashedType` to a little module `R` (not shown) which equips the type `regexp` with equality, comparison, and hashing functions. (Because expressions carry unique integer identifiers, the definition of `R` is trivial.) This functor application yields a module `M` which offers a memoizing fixed-point combinator `M.fix`. Then, instead of defining `nullable` directly as a recursive function, I define it as an application of `M.fix`, as follows. ``` let nullable : regexp -> bool = let module M = Memoize.ForHashedType(R) in M.fix (fun nullable e -> match skeleton e with | EChar _ -> false | EEpsilon | EStar _ -> true | ECat (e1, e2) -> nullable e1 && nullable e2 | EDisj es -> exists nullable es | EConj es -> forall nullable es | ENeg e -> not (nullable e) ) ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!