Artificial Intelligence
Symbolic representations of time series
Published on
The objectives of this thesis are to define novel symbolic representations and distance measures that are suited for time series that can be multivariate and non-stationary. In addition, they should preserve the time information, be interpretable, and fast to compute. We review symbolic representations of time series (that transform a real-valued series into a shorter discrete-valued series), as well as distance measures on time series, strings, and symbolic sequences (that result from a symbolization process).We propose two contributions: ASTRIDE for a data set of univariate time series, and d_{symb} for a data set of multivariate time series. We also developed the d_{symb} playground, an online interactive tool that allows users to apply d_{symb} to their uploaded data. ASTRIDE and d_{symb} are data-driven as they use change-point detection for the segmentation step, then either quantiles or a K-means clustering algorithm for the quantization step. Finally, they apply the general edit distance with custom costs between the resulting symbolic sequences.We show the performance of ASTRIDE compared to 4 other symbolic representations on reconstruction and, when applicable, on classification tasks. For d_{symb}, experiments show how interpretable the symbolization is. Moreover, compared to 9 elastic distances on a clustering task, d_{symb} achieves a competitive performance while being several orders of magnitude faster.