pyBᴇᴀᴛLᴇx
This is a Python implementation of the BᴇᴀᴛLᴇx timeseries summarization algorithm[¹]. Mostly implemented following the associated paper, although with a couple hints taken from the authors' MATLAB code. The most computationally intensive part of the algorithm, the frequent use of a modified Dynamic Time Warping algorithm, has been implemented as a C extension module.
Installation
Compilation of this package has been tested for python 3.5, 3.6 and 3.7. Use python setup.py build
to compile then
either add the resulting ./build/lib.*/ directory to your pythonpath or use python setup.py install [--user]
to
install it.
Usage
The algorithm is implemented in the beatlex.BeatLex
class. The following arguments can be passed upon instanciation:
s_max [int]: Maximum term size to consider adding to the vocabulary
k_max [int]: Maximum vocabulary size to learn
dtw_w [int]: Width of the Sakoe-Chiba band for DTW computation
new_vocab_thresh [float]: New vocabulary term threshold. Lower = more words, higher = segments are matched to words
more leniently
progbar [bool]: Display progress bars (requires the tqdm package)
verbose [bool]: Print progress update to the standard output
You may then pass your data to the fit_transform
method, with the constraint that it should be possible to safely
cast the dtype of your data to the internal dtype of the beatlex.mdtw submodule, and for best performance it is
recommended to avoid the cast by using that dtype for your data in a first place. The internal dtype is set at
compilation time, it currently is numpy.float64
.
For Multivariate data, the first dimension must be the different variables and the second one must be the "time" axis.
The function will return a list of named tuples corresponding to the different segments of the input data, with associated id of the assigned vocabulary term, beginning index and length of the segment. The constructed vocabulary can be retreived from the BeatLex object via the .vocab property.
If you wish to build vocabulary over a certain dataset and segmentize one or multiple others with the resulting
vocabulary, you can use it as described on the first dataset, then use fit_transform(..., build_vocab=False)
with the
same BeatLex object on the next one(s), which will run the algorithm without adding any new word to the vocabulary.
Acknowledgements
This work was realized under the supervision of Amedeo Napoli and Chedy Raissi at INRIA Nancy Grand-Est, France
[¹]: BEATLEX: Summarizing and Forecasting Time Series with Patterns