Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
F
fix
Project overview
Project overview
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
POTTIER Francois
fix
Commits
fa61ac2b
Commit
fa61ac2b
authored
Dec 01, 2018
by
POTTIER Francois
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Progress on the blog post.
parent
e22b9f2e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
50 additions
and
9 deletions
+50
-9
misc/post.md
misc/post.md
+50
-9
No files found.
misc/post.md
View file @
fa61ac2b
...
...
@@ -398,24 +398,65 @@ state.
Now, how does one construct a DFA for an expression
`e`
?
The answer is simple, really.
Consider the infinite graph whose vertices are
Consider the infinite graph whose vertices are
the
nonempty expressions and whose edges are determined by
`delta`
.
The fragment of this graph that is reachable from
`e`
is guaranteed to be finite,
and is exactly the desired automaton.
<!-- actually, if we are interested only in running the automaton up to
the first match, then a smaller graph suffices: a final state need have no successors.
If we are interested in finding all matches, then this graph is fine. -->
<!-- TEMPORARY can we point to a proof of finiteness? -->
There are several ways of approaching the construction of this finite graph
fragment. I choose to first perform a forward graph traversal in which I
discover the vertices of this graph, number them from
`0`
to
`n-1`
, and record
the bijective correspondence between vertices (that is, expressions) and state
numbers. Once this is done, completing the construction of a data structure of
fragment. I choose to first perform a forward graph traversal during which I
discover the vertices that are reachable from
`e`
and number them from
`0`
to
`n-1`
. Once this is done, completing the construction of a data structure of
type
`dfa`
is easy.
```
let dfa (e : regexp) : dfa =
let module G = struct
type t = regexp
let foreach_successor e f =
Char.foreach (fun a ->
let e' = delta a e in
if nonempty e' then
f e'
)
let foreach_root f =
if nonempty e then
f e
end in
let module N = Number.ForHashedType(R)(G) in
let n, decode = N.n, N.decode in
let encode e : state option =
if nonempty e then Some (N.encode e) else None
in
let init = encode e in
let transition q a = encode (delta a (decode q)) in
{ n; init; decode; transition }
```
In the above code, the module
`G`
is a description of the graph that I wish to
traverse.
The functor application
`Number.ForHashedType(R)(G)`
performs a
traversal of this graph and constructs a numbering
`N`
of its vertices.
(The module
[
Number
](
https://gitlab.inria.fr/fpottier/fix/blob/master/src/Number.mli
)
is part of
[
fix
](
https://gitlab.inria.fr/fpottier/fix/
)
.)
The module
`N`
thus obtained contains the number
`n`
of vertices that have
been discovered as well as two functions
`encode: regexp -> int`
and
`decode:
int -> regexp`
which record the correspondence between vertices and numbers.
In other words, these functions convert, both ways,
between regular expressions and state numbers. Without any effort,
I know, for each automaton state, which regular expression it stands for.
Neat!
<!-- TEMPORARY actually, if we are interested only in running the automaton up to
the first match, then a smaller graph suffices: a final state need have no successors.
If we are interested in finding all matches, then this graph is fine. -->
<!-- TEMPORARY -->
<!-- show an example of searching for one word, KMP -->
<!-- and an example of searching for multiple words, Aho-Corasick -->
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment