Commit fa61ac2b authored by POTTIER Francois's avatar POTTIER Francois

Progress on the blog post.

parent e22b9f2e
......@@ -398,24 +398,65 @@ state.
Now, how does one construct a DFA for an expression `e`?
The answer is simple, really.
Consider the infinite graph whose vertices are
Consider the infinite graph whose vertices are the
nonempty expressions and whose edges are determined by `delta`.
The fragment of this graph that is reachable from `e`
is guaranteed to be finite,
and is exactly the desired automaton.
<!-- actually, if we are interested only in running the automaton up to
the first match, then a smaller graph suffices: a final state need have no successors.
If we are interested in finding all matches, then this graph is fine. -->
<!-- TEMPORARY can we point to a proof of finiteness? -->
There are several ways of approaching the construction of this finite graph
fragment. I choose to first perform a forward graph traversal in which I
discover the vertices of this graph, number them from `0` to `n-1`, and record
the bijective correspondence between vertices (that is, expressions) and state
numbers. Once this is done, completing the construction of a data structure of
fragment. I choose to first perform a forward graph traversal during which I
discover the vertices that are reachable from `e` and number them from `0` to
`n-1`. Once this is done, completing the construction of a data structure of
type `dfa` is easy.
```
let dfa (e : regexp) : dfa =
let module G = struct
type t = regexp
let foreach_successor e f =
Char.foreach (fun a ->
let e' = delta a e in
if nonempty e' then
f e'
)
let foreach_root f =
if nonempty e then
f e
end in
let module N = Number.ForHashedType(R)(G) in
let n, decode = N.n, N.decode in
let encode e : state option =
if nonempty e then Some (N.encode e) else None
in
let init = encode e in
let transition q a = encode (delta a (decode q)) in
{ n; init; decode; transition }
```
In the above code, the module `G` is a description of the graph that I wish to
traverse.
The functor application `Number.ForHashedType(R)(G)` performs a
traversal of this graph and constructs a numbering `N` of its vertices.
(The module
[Number](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Number.mli)
is part of
[fix](https://gitlab.inria.fr/fpottier/fix/).)
The module `N` thus obtained contains the number `n` of vertices that have
been discovered as well as two functions `encode: regexp -> int` and `decode:
int -> regexp` which record the correspondence between vertices and numbers.
In other words, these functions convert, both ways,
between regular expressions and state numbers. Without any effort,
I know, for each automaton state, which regular expression it stands for.
Neat!
<!-- TEMPORARY actually, if we are interested only in running the automaton up to
the first match, then a smaller graph suffices: a final state need have no successors.
If we are interested in finding all matches, then this graph is fine. -->
<!-- TEMPORARY -->
<!-- show an example of searching for one word, KMP -->
<!-- and an example of searching for multiple words, Aho-Corasick -->
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment