Artificial Intelligence

Symbolic representations of time series

Published on

Authors: Sylvain Combettes

The objectives of this thesis are to define novel symbolic representations and distance measures that are suited for time series that can be multivariate and non-stationary. In addition, they should preserve the time information, be interpretable, and fast to compute. We review symbolic representations of time series (that transform a real-valued series into a shorter discrete-valued series), as well as distance measures on time series, strings, and symbolic sequences (that result from a symbolization process).We propose two contributions: ASTRIDE for a data set of univariate time series, and d_{symb} for a data set of multivariate time series. We also developed the d_{symb} playground, an online interactive tool that allows users to apply d_{symb} to their uploaded data. ASTRIDE and d_{symb} are data-driven as they use change-point detection for the segmentation step, then either quantiles or a K-means clustering algorithm for the quantization step. Finally, they apply the general edit distance with custom costs between the resulting symbolic sequences.We show the performance of ASTRIDE compared to 4 other symbolic representations on reconstruction and, when applicable, on classification tasks. For d_{symb}, experiments show how interpretable the symbolization is. Moreover, compared to 9 elastic distances on a clustering task, d_{symb} achieves a competitive performance while being several orders of magnitude faster.