Document and Text Processing

Etude approfondie des représentations de données textuelles dans l'apprentissage non supervisé

Published on - 23ème Conférence francophone sur l'Extraction et la Gestion des Connaissances (EGC'2023)

Authors: Mira Ait-Saada, Mohamed Nadif

Dense text representations are gaining great interest in several supervised tasks but much less is known about how suitable they are when dealing with an unlabeled dataset. In this paper, we investigate the use of such representations in unsupervised tasks: document clustering and visualization. For that, we propose the use of a tandem approach based un UMAP, showing that we can do better than the fine-tuning approaches usually proposed in the literature.