:Stackless query
Depth-register automaton are automata designed to parse tree-structured data by keeping records of the structure with a very limited memory model: simple register remembering some the depth. A query on a tree-structured data executed without requiring a stack is a stackless query. See our paper for detailed presentation on the subject.
Implementation in C
lang of a given depth-register automaton can be performed in many way.
The simplest possible implementation is by performing a switch table encoding the
transition table, and encoding the content of the registers within fixed variables.
Objectives
The goal of the repository is to compare various implementation of a simple query across a fixed dataset representing various use case. Among query implementation, various stack implementation are compared against a stackless variant.
The choice of the benchmark is a sample of the web obtained through the Common Crawl project. A script downloading and cleaning a segment of the web for the purpose of this benchmark is proposed in the folder ccrawl. The cleaning part remove unvalid markup document (that is, HTML-page with unbalanced parenthesis).
Further dataset will eventually be add in the future.
The query
All queries are not doable in a stackless fashion. The simplest possible
query which the query selecting a node a
who has a b
descendant.
This query can be meaningfully interpret in an HTML page by instantiating
a
and b
with HTML tags.
In the Common Crawl dataset, we will execute the query selecting ... To complete
Models of computations
Depth register automaton has been ... To complete