Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
F
fix
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
Operations
Operations
Incidents
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
POTTIER Francois
fix
Commits
bcac1c43
Commit
bcac1c43
authored
Nov 28, 2018
by
POTTIER Francois
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Some progress on the blog post.
parent
08ee3403
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
62 additions
and
0 deletions
+62
-0
misc/post.md
misc/post.md
+62
-0
No files found.
misc/post.md
View file @
bcac1c43
...
...
@@ -51,3 +51,65 @@ automaton.
For more details, please consult the paper
[
Regular-expression derivatives re-examined
](
https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf
)
by Scott Owens, John Reppy and Aaron Turon.
In particular, Definition 4.1 in that paper
gives a number of equations that must be exploited
when deciding whether two regular expressions are equal.
In the following,
I refer to these equations collectively as
**EQTH**
,
for
*equational theory*
.
[
The complete code
](
http://gitlab.inria.fr/fpottier/fix/demos/brz/Brzozowski.ml
)
for this demo is also available.
## An alphabet
Throughout, I assume that the alphabet is given by a module
`Char`
whose
signature is as follows:
```
Char : sig
type t
val equal: t -> t -> bool
val hash: t -> int
val foreach: (t -> unit) -> unit
val print: t -> string
end
```
The fact that this alphabet is finite is witnessed by the existence of
the function
`Char.foreach`
, which enumerates all characters.
As an exercise for the reader, this can be used to define an auxiliary
function
`exists_char`
of type
`(Char.t -> bool) -> bool`
.
## Regular expressions, hash-consed
In OCaml, the syntax of regular expressions is naturally described by an
algebraic data type
`regexp`
.
A slight twist is that expressions are
[
**hash-consed**
](
https://en.wikipedia.org/wiki/Hash_consing
)
. That is, every
expression is decorated with an integer identifier, and these identifiers are
unique: two expressions are equal if and only if they carry the same
identifier. (This notion of equality takes
**EQTH**
into account.) This allows
efficiently testing whether two expressions are equal. This also allows
building efficient dictionaries whose keys are expressions, or, in other
words, efficient memoized functions of type
`regexp -> ...`
. This is heavily
exploited in the code that follows: the functions
`nullable`
,
`delta`
, and
`nonempty`
are three examples, and there are more.
<!-- `N.encode`, inside `dfa`, is another example. -->
The syntax of regular expressions is the same as in
[
Owens et al.'s paper
](
https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf
)
,
except I use n-ary disjunctions and conjunctions. This is dictated by the need
to normalize terms with respect to the equations in Definition 4.1. The
equations state that disjunction and conjunction are associative, commutative,
idempotent, and have a unit (ACIU). In other words, a disjunction must be
viewed as a set of disjuncts, and a conjunction must be viewed as a set of
conjuncts. For this reason, the data constructors
`EDisj`
and
`EConj`
carry a
list of subexpressions. This list is normalized in such a way that, if two
lists are equal as sets, then they are equal as lists, too.
The list carried by
`EDisj`
and
`EConj`
is never a singleton list. It can be
an empty list:
`EDisj []`
is the empty regular expression
`zero`
, while
`EConj
[]`
is the universal regular expression
`one`
.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment