Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
F
fix
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
Operations
Operations
Incidents
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
POTTIER Francois
fix
Commits
423e6ac2
Commit
423e6ac2
authored
Nov 28, 2018
by
POTTIER Francois
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
New introductory section in the blog post.
parent
195c19d7
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
53 additions
and
22 deletions
+53
-22
misc/attic.md
misc/attic.md
+0
-22
misc/post.md
misc/post.md
+53
-0
No files found.
demos/brz/post
.md
→
misc/attic
.md
View file @
423e6ac2
# Been there, done that: from REs to DFAs
<!-- TEMPORARY title -->
There are several ways of compiling
a
[
regular expression
](
https://en.wikipedia.org/wiki/Regular_expression
)
(
RE
)
down to a
[
deterministic finite-state automaton
](
https://en.wikipedia.org/wiki/Deterministic_finite_automaton
)
(
DFA
)
.
One such way is based on
[
Brzozowski derivatives
](
https://en.wikipedia.org/wiki/Brzozowski_derivative
)
of regular expressions.
In this post,
I describe a concise OCaml implementation of this transformation.
This is an opportunity to illustrate the use of
[
Fix
](
https://gitlab.inria.fr/fpottier/fix/
)
,
a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.
The transformation of REs to DFAs is based on the description
given by Scott Owens, John Reppy and Aaron Turon in the paper
[
Regular-expression derivatives re-examined
](
https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf
)
.
## Preliminaries
In order to read the following, a tiny bit of vocabulary is required.
...
...
misc/post.md
0 → 100644
View file @
423e6ac2
# A feeling of déjà vu
There are several ways of compiling
a
[
regular expression
](
https://en.wikipedia.org/wiki/Regular_expression
)
(
RE
)
down to a
[
deterministic finite-state automaton
](
https://en.wikipedia.org/wiki/Deterministic_finite_automaton
)
(
DFA
)
.
One such way is based on
[
Brzozowski derivatives
](
https://en.wikipedia.org/wiki/Brzozowski_derivative
)
of regular expressions.
In this post,
I describe a concise OCaml implementation of this transformation.
This is an opportunity to illustrate the use of
[
Fix
](
https://gitlab.inria.fr/fpottier/fix/
)
,
a library that offers facilities for
constructing (recursive) memoized functions
and for performing least fixed point computations.
## From REs to DFAs, via Brzozowski derivatives
Suppose
`e`
denotes a set of words. Then, its
**derivative**
`delta a e`
is
the set of words obtained by keeping only the words that begin with
`a`
and by
crossing out, in each such word, the initial letter
`a`
. For instance, the
derivative of the set
`{ ace, amid, bar }`
with respect to
`a`
is the set
`{
ce, mid }`
.
A regular expression is a syntactic description of a set of words. If the set
`e`
is described by a regular expression, then its derivative
`delta a e`
is
also described by a regular expression, which can be effectively computed.
Now, suppose that I am a machine and I am scanning a text, searching for a
certain pattern. At each point in time, my current
**state**
of mind is
described by a regular expression
`e`
: this expression represents the set of
words that I am hoping to read, and that I am willing to accept. After I read
one character, say
`a`
, my current state
**changes**
to
`delta a e`
, because I
have restricted my attention to the words of
`e`
that begin with
`a`
, and I am
now hoping to recognize the remainder of such a word.
Thus, the idea, in a nutshell, is to
**
build a deterministic automaton whose
states are regular expressions and whose transition function is
`delta`
**
.
The main nontrivial aspect of this apparently simple-minded approach is the
fact that
**only a finite number of states arise**
when one starts with a
regular expression
`e`
and explores its descendants through
`delta`
. In other
words, a regular expression
`e`
only has a finite number of iterated
derivatives, up to a certain equational theory. Thanks to this property, which
I won't prove here, the construction terminates, and yields a
**finite-state**
automaton.
<!-- Cuius rei demonstrationem mirabilem sane detexi hanc marginis exiguitas -->
<!-- non caperet. -->
For more details, please consult the paper
[
Regular-expression derivatives re-examined
](
https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf
)
by Scott Owens, John Reppy and Aaron Turon.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment