Commit 33ec6429 authored by POTTIER Francois's avatar POTTIER Francois

Progress up to `nonempty`.

parent c0390942
......@@ -10,7 +10,7 @@ of regular expressions.
In this post,
I describe a concise OCaml implementation of this transformation.
This is an opportunity to illustrate the use of
[Fix](https://gitlab.inria.fr/fpottier/fix/),
[fix](https://gitlab.inria.fr/fpottier/fix/),
a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.
......@@ -136,7 +136,7 @@ and `nonempty` are three examples, and there are more.
The module
[HashCons](https://gitlab.inria.fr/fpottier/fix/blob/master/src/HashCons.mli),
which is part of
[Fix](https://gitlab.inria.fr/fpottier/fix/),
[fix](https://gitlab.inria.fr/fpottier/fix/),
provides facilities for hash-consing.
(The reader is encouraged to take a look at
[its implementation:](https://gitlab.inria.fr/fpottier/fix/blob/master/src/HashCons.ml)
......@@ -207,7 +207,7 @@ information inside each expression.)
The module
[Memoize](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Memoize.mli),
which is part of
[Fix](https://gitlab.inria.fr/fpottier/fix/),
[fix](https://gitlab.inria.fr/fpottier/fix/),
provides facilities for memoization.
To use it, I apply the functor `Memoize.ForHashedType`
to a little module `R` (not shown) which equips the type `regexp` with
......@@ -284,3 +284,73 @@ let delta : Char.t -> regexp -> regexp =
)
)
```
<!------------------------------------------------------------------------------>
## (Non)Emptiness
During the construction of the DFA (which comes up next), it is convenient to
be able to (efficiently) decide whether an expression is nonempty (that is,
whether it accepts at least one word).
Following
[Owens et al.](https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf),
I have included all three Boolean operators (disjunction, conjunction,
negation) in the syntax of expressions. The presence of conjunction makes it
apparently nontrivial to determine whether an expression is nonempty.
Fortunately, by exploiting `nullable`, `delta`, and with the help of
[fix](https://gitlab.inria.fr/fpottier/fix/), I am able to define
an efficient nonemptiness test in four lines of code:
```
let nonempty : regexp -> bool =
let module F = Fix.ForHashedType(R)(Prop.Boolean) in
F.lfp (fun e nonempty ->
nullable e || exists_char (fun a -> nonempty (delta a e))
)
```
This test is based on the following fact: `e` is nonempty if and only if
either `e` is nullable or there exists a character `a` such that `delta a e`
is nonempty.
This statement, however, cannot be used directly as a recursive definition of
`nonempty`, because such a definition would not be well-founded. An attempt to
define `nonempty` using `let rec` or `M.fix` would construct a possibly
nonterminating function.
Instead, the above statement must be viewed as **an inductive
characterization** of nonemptiness. That is, `nonempty` is the **least fixed
point** in the Boolean lattice (where `false` is less than `true`) of the
equation:
```
nonempty e = nullable e || exists_char (fun a -> nonempty (delta a e))
```
The main module of the
[fix](https://gitlab.inria.fr/fpottier/fix/)
library, named
[Fix](https://gitlab.inria.fr/fpottier/fix/src/Core.mli),
is an algorithm for computing a least fixed point of type
`variable -> property`,
where the type `property` forms a lattice.
The functor application `Fix.ForHashedType(R)(Prop.Boolean)`
instantiates this algorithm so that the type `variable` is `regexp`
and the type `property` is `bool`.
Then, the fixed point combinator `F.lfp` allows us to give a
straightforward definition of `nonempty`.
The function `nonempty` thus defined is incremental. Nothing is computed when
the function is constructed; computation takes place only when the function is
invoked. It is also memoized: the nonemptiness of an expression, once
computed, is recorded and is never recomputed. Thus, one can invoke this
function without worrying about its cost.
The reader might note that nonemptiness could also be computed via a graph
traversal: an expression is nonempty if only there is exists a path from this
expression to a nullable expression in the graph whose vertices are
expressions and whose edges are determined by `delta`. What I have just done
is exploit the fact that co-accessibility is easily expressed as a least fixed
point.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment