Commit e1d65d4b authored by POTTIER Francois's avatar POTTIER Francois

Progress up to the type [dfa].

parent 691737d0
# A feeling of déjà vu
<!-- TEMPORARY update title -->
There are several ways of compiling
a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) (RE)
......@@ -244,9 +245,9 @@ let nullable : regexp -> bool =
## Derivation
We now reach a key operation: computing the Brzozowski derivative of
an expression. If `a` is a character and `e` is an expression, then
`delta a e` is the derivative of `e` with respect to `a`.
It is now time to define a key operation: computing the Brzozowski derivative
of an expression. If `a` is a character and `e` is an expression, then `delta
a e` is the derivative of `e` with respect to `a`.
Implementing `delta` is a textbook exercise. A key remark, though, is that
this function **must** be memoized in order to ensure good complexity. A
......@@ -354,3 +355,68 @@ expression to a nullable expression in the graph whose vertices are
expressions and whose edges are determined by `delta`. What I have just done
is exploit the fact that co-accessibility is easily expressed as a least fixed
point.
<!-- TEMPORARY
Accessibility, too, can be expressed as a least fixed point.
However, to do so, one must have access to the predecessors
of each vertex.
-->
<!------------------------------------------------------------------------------>
## Constructing a DFA
The tools are now at hand to convert an expression
to a deterministic finite-state automaton.
I must first settle on a representation of such an automaton as a data
structure in memory. I choose to represent a state as an integer in the range
of `0` to `n-1`, where `n` is the number of states. An automaton can then
be described as follows:
```
type state =
int
type dfa = {
n: int;
init: state option;
decode: state -> regexp;
transition: state -> Char.t -> state option;
}
```
`init` is the initial state. If it is absent, then the automaton rejects every
input.
The function `decode` maps every state to the expression that this state
accepts. This expression is guaranteed to be nonempty. This state is a final
state if and only if this expression is nullable.
The function `transition` maps every state and character to an optional target
state.
Now, how does one construct a DFA for an expression `e`?
The answer is simple, really.
Consider the infinite graph whose vertices are
nonempty expressions and whose edges are determined by `delta`.
The fragment of this graph that is reachable from `e`
is guaranteed to be finite,
and is exactly the desired automaton.
<!-- TEMPORARY can we point to a proof of finiteness? -->
There are several ways of approaching the construction of this finite graph
fragment. I choose to first perform a forward graph traversal in which I
discover the vertices of this graph, number them from `0` to `n-1`, and record
the bijective correspondence between vertices (that is, expressions) and state
numbers. Once this is done, completing the construction of a data structure of
type `dfa` is easy.
<!-- TEMPORARY tabulating the transition function:
we could choose not to tabulate it,
but `delta` would then be invoked at automaton runtime,
every time a transition is taken.
By tabulating this function,
taking a transition at runtime becomes a simple matter
of doing two table lookups. -->
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment