
Timetable to

Place Université Paris Cité

Thèses et HDR

PhD defense of Mira AIT SAADA

Title: Unsupervised Learning from Textual Data with Neural Text Representations
Supervision: Mohamed Nadif
Defended on 18/04/2023

Add to the calendar


Unsupervised Learning from Textual Data with Neural Text Representations 


The digital era generates enormous amounts of unstructured data such as images and documents, requiring specific processing methods to extract value from them. Textual data presents an additional challenge as it does not contain numerical values. Word embeddings are techniques that transform text into numerical data, enabling machine learning algorithms to process them. Unsupervised tasks are a major challenge in the industry as they allow value creation from large amounts of data without requiring costly manual labeling. In thesis we explore the use of Transformer models for unsupervised tasks such as clustering, anomaly detection, and data visualization. We also propose methodologies to better exploit multi-layer Transformer models in an unsupervised context to improve the quality and robustness of document clustering while avoiding the choice of which layer to use and the number of classes. Additionally, we investigate more deeply Transformer language models and their application to clustering, examining in particular transfer learning methods that involve fine-tuning pre-trained models on a different task to improve their quality for future tasks. We demonstrate through an empirical study that post-processing methods based on dimensionality reduction are more advantageous than fine-tuning strategies proposed in the literature. Finally, we propose a framework for detecting text anomalies in French adapted to two cases: one where the data concerns a specific topic and the other where the data has multiple sub-topics. In both cases, we obtain superior results to the state of the art with significantly lower computation time.



  • M. Quafafou Mohamed, Professeur des universités, Laboratoire d'Informatique et systèmes, Univ. d'Aix-Marseilles
  • M. Lamirel Jean-charles, Maître de Conférences, Laboratoire Lorrain de Recherche en Informatique et ses Applications
  • Mme. Rosset Sophie, Directrice de recherche, Laboratoire Interdisciplinaire des Sciences du Numérique
  • Mme. Cariou Véronique, Professeure des université, Statistique, Sensométrie et Chimiométrie, ONIRIS
  • M. Fall Abdou aziz, Industriel, Caisse des depôts et consignations