Statistics
Processus de rang et applications statistiques en grande dimension
Publié le
This research project aims at developing mathematical and algorithmic tools to study and evaluate the level of similarity between two complex datasets in high-dimension: vectors, multivariate signals, trajectories, signals on graphs. It answers fundamental questions related to quantification in experimental science, particularly in life sciences, neurosciences, and clinical applications.We propose a generalization of linear rank statistics using methods developed in machine learning. Indeed, thanks to bipartite ranking approaches, we articulate an in-depth and nonparametric study of those statistics based on two statistical samples, using statistical learning theory. More precisely, ranking methods circumvent the lack of relation order in high-dimensional spaces by learning a scoring function. The latter, defined on the ambient space and valued in the real line, aims at inducing an order on the multivariate observations by maximizing the generalized rank statistic.We propose the first application in statistical hypothesis testing by combining decision (acceptance/rejection) of the null hypothesis and learning a model describing the data. More specifically, we study two-sample homogeneity tests. Then, two applications in data analysis are introduced and developed using rank statistics as a performance criterion. They are applied to bipartite ranking and anomaly detection problems and specify their relation to state-of-the-art formulations. Finally, and motivated to propose tools adapted to experimental sciences and in the context of biomedical data studies, we introduce an interpretable method for the statistical comparison of two clinical populations and a stochastic generative model of specific longitudinal data.