 ### Progress up to the type [dfa].

parent 691737d0
 # A feeling of déjà vu There are several ways of compiling a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE) ... ... @@ -244,9 +245,9 @@ let nullable : regexp -> bool = ## Derivation We now reach a key operation: computing the Brzozowski derivative of an expression. If `a` is a character and `e` is an expression, then `delta a e` is the derivative of `e` with respect to `a`. It is now time to define a key operation: computing the Brzozowski derivative of an expression. If `a` is a character and `e` is an expression, then `delta a e` is the derivative of `e` with respect to `a`. Implementing `delta` is a textbook exercise. A key remark, though, is that this function **must** be memoized in order to ensure good complexity. A ... ... @@ -354,3 +355,68 @@ expression to a nullable expression in the graph whose vertices are expressions and whose edges are determined by `delta`. What I have just done is exploit the fact that co-accessibility is easily expressed as a least fixed point. ## Constructing a DFA The tools are now at hand to convert an expression to a deterministic finite-state automaton. I must first settle on a representation of such an automaton as a data structure in memory. I choose to represent a state as an integer in the range of `0` to `n-1`, where `n` is the number of states. An automaton can then be described as follows: ``` type state = int type dfa = { n: int; init: state option; decode: state -> regexp; transition: state -> Char.t -> state option; } ``` `init` is the initial state. If it is absent, then the automaton rejects every input. The function `decode` maps every state to the expression that this state accepts. This expression is guaranteed to be nonempty. This state is a final state if and only if this expression is nullable. The function `transition` maps every state and character to an optional target state. Now, how does one construct a DFA for an expression `e`? The answer is simple, really. Consider the infinite graph whose vertices are nonempty expressions and whose edges are determined by `delta`. The fragment of this graph that is reachable from `e` is guaranteed to be finite, and is exactly the desired automaton. There are several ways of approaching the construction of this finite graph fragment. I choose to first perform a forward graph traversal in which I discover the vertices of this graph, number them from `0` to `n-1`, and record the bijective correspondence between vertices (that is, expressions) and state numbers. Once this is done, completing the construction of a data structure of type `dfa` is easy.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!