Mentions légales du service

Skip to content
Snippets Groups Projects
Lauréline Nevin's avatar
Lauréline Nevin authored
added s_min to the model's optional parameters. Tweaked so that for a "flat" in put, new_vocab_term favours s_max/2
c57591bc
History

pyBᴇᴀᴛLᴇx

This is a Python implementation of the BᴇᴀᴛLᴇx timeseries summarization algorithm[¹]. Mostly implemented following the associated paper, although with a couple hints taken from the authors' MATLAB code. The most computationally intensive part of the algorithm, the frequent use of a modified Dynamic Time Warping algorithm, has been implemented as a C extension module.

Installation

Compilation of this package has been tested for python 3.5, 3.6 and 3.7. Use python setup.py build to compile then either add the resulting ./build/lib.*/ directory to your pythonpath or use python setup.py install [--user] to install it.

Usage

The algorithm is implemented in the beatlex.BeatLex class. The following arguments can be passed upon instanciation:

s_max            [int]: Maximum term size to consider adding to the vocabulary
k_max            [int]: Maximum vocabulary size to learn
dtw_w            [int]: Width of the Sakoe-Chiba band for DTW computation
new_vocab_thresh [float]: New vocabulary term threshold. Lower = more words, higher = segments are matched to words
                        more leniently
progbar          [bool]: Display progress bars (requires the tqdm package) 
verbose          [bool]: Print progress update to the standard output

You may then pass your data to the fit_transform method, with the constraint that it should be possible to safely cast the dtype of your data to the internal dtype of the beatlex.mdtw submodule, and for best performance it is recommended to avoid the cast by using that dtype for your data in a first place. The internal dtype is set at compilation time, it currently is numpy.float64.

For Multivariate data, the first dimension must be the different variables and the second one must be the "time" axis.

The function will return a list of named tuples corresponding to the different segments of the input data, with associated id of the assigned vocabulary term, beginning index and length of the segment. The constructed vocabulary can be retreived from the BeatLex object via the .vocab property.

If you wish to build vocabulary over a certain dataset and segmentize one or multiple others with the resulting vocabulary, you can use it as described on the first dataset, then use fit_transform(..., build_vocab=False) with the same BeatLex object on the next one(s), which will run the algorithm without adding any new word to the vocabulary.

Acknowledgements

This work was realized under the supervision of Amedeo Napoli and Chedy Raissi at INRIA Nancy Grand-Est, France


[¹]: BEATLEX: Summarizing and Forecasting Time Series with Patterns