Computer Science

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Published on - The 18th ACM International Conference on Web Search and Data Mining

Authors: Amine Ferdjaoui, Séverine Affeldt, Mohamed Nadif

In unsupervised learning, the exploration of large volumes of textual data is a topic of significant interest. In this article, we present our compact and easy-to-use application to explore large volumes of textual data using clustering and generative models. We demonstrate how to adapt the Lasso weighted k-means algorithm to handle textual data. In addition, we present in detail a user-friendly package that shows how to use LLMs effectively to describe document classes.

CCS Concepts

• Computing methodologies → Cluster analysis; Probabilistic reasoning; Natural language processing; • Mathematics of computing → Probabilistic algorithms.