 ### Progress on the blog post.

parent e22b9f2e
 ... ... @@ -398,24 +398,65 @@ state. Now, how does one construct a DFA for an expression `e`? The answer is simple, really. Consider the infinite graph whose vertices are Consider the infinite graph whose vertices are the nonempty expressions and whose edges are determined by `delta`. The fragment of this graph that is reachable from `e` is guaranteed to be finite, and is exactly the desired automaton. There are several ways of approaching the construction of this finite graph fragment. I choose to first perform a forward graph traversal in which I discover the vertices of this graph, number them from `0` to `n-1`, and record the bijective correspondence between vertices (that is, expressions) and state numbers. Once this is done, completing the construction of a data structure of fragment. I choose to first perform a forward graph traversal during which I discover the vertices that are reachable from `e` and number them from `0` to `n-1`. Once this is done, completing the construction of a data structure of type `dfa` is easy. ``` let dfa (e : regexp) : dfa = let module G = struct type t = regexp let foreach_successor e f = Char.foreach (fun a -> let e' = delta a e in if nonempty e' then f e' ) let foreach_root f = if nonempty e then f e end in let module N = Number.ForHashedType(R)(G) in let n, decode = N.n, N.decode in let encode e : state option = if nonempty e then Some (N.encode e) else None in let init = encode e in let transition q a = encode (delta a (decode q)) in { n; init; decode; transition } ``` In the above code, the module `G` is a description of the graph that I wish to traverse. The functor application `Number.ForHashedType(R)(G)` performs a traversal of this graph and constructs a numbering `N` of its vertices. (The module [Number](https://gitlab.inria.fr/fpottier/fix/blob/master/src/Number.mli) is part of [fix](https://gitlab.inria.fr/fpottier/fix/).) The module `N` thus obtained contains the number `n` of vertices that have been discovered as well as two functions `encode: regexp -> int` and `decode: int -> regexp` which record the correspondence between vertices and numbers. In other words, these functions convert, both ways, between regular expressions and state numbers. Without any effort, I know, for each automaton state, which regular expression it stands for. Neat!
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!