Mentions légales du service

Skip to content
Snippets Groups Projects
Charles Paperman's avatar
PAPERMAN Charles authored
7b99c710
History

:Stackless query

Depth-register automaton are automata designed to parse tree-structured data by keeping records of the structure with a very limited memory model: simple register remembering some the depth. A query on a tree-structured data executed without requiring a stack is a stackless query. See our paper for detailed presentation on the subject.

Implementation in C lang of a given depth-register automaton can be performed in many way. The simplest possible implementation is by performing a switch table encoding the transition table, and encoding the content of the registers within fixed variables.

Objectives

The goal of the repository is to compare various implementation of a simple query across a fixed dataset representing various use case. Among query implementation, various stack implementation are compared against a stackless variant.

The choice of the benchmark is a sample of the web obtained through the Common Crawl project. A script downloading and cleaning a segment of the web for the purpose of this benchmark is proposed in the folder ccrawl. The cleaning part remove unvalid markup document (that is, HTML-page with unbalanced parenthesis).

Further dataset will eventually be add in the future.

The query

All queries are not doable in a stackless fashion. The simplest possible query which the query selecting a node a who has a b descendant. This query can be meaningfully interpret in an HTML page by instantiating a and b with HTML tags.

In the Common Crawl dataset, we will execute the query selecting ... To complete

Models of computations

Depth register automaton has been ... To complete