Machine Learning

Contributions to scalable clustering of networks and graphs

Publié le

Auteurs : Chakib Fettal

Graphs are an important data structure used in many fields because they provide a powerful tool for modeling and analyzing complex systems. They are used to represent relationships between entities, such as individuals in a social network or nodes in a computer network. Graphs have been used in various applications across different fields, such as social network analysis, bioinformatics, computer science, transportation, epidemiology and many more. In social network analysis, for example, graphs can be used to study patterns of interactions between individuals in a social network and identify groups of individuals with similar interests or behaviors. This can be useful for targeted marketing or recommendations. In bioinformatics, graphs can be used to identify functional modules in protein-protein interaction networks. Graph clustering, also known as community detection, is an important technique in the analysis of graph data. Clustering allows for the identification of groups of similar nodes within the graph. This can reveal underlying patterns and structures in the graph that may not be immediately apparent. For example, in a social network, clustering can reveal groups of individuals with similar interests or behaviors, and in bioinformatics, clustering can reveal functional modules in protein-protein interaction networks. The thesis tries to address scalability issues of the state-of-the-art graph clustering models and presents approaches for clustering and representation learning different types of graphs, including classical graphs, bipartite graphs, attributed graphs, bipartite attributed graphs, and multi-view attributed graphs. To this end we leverage techniques such as: linear projections, Laplacian smoothing, optimal transport, etc. The proposed approaches all share three key characteristics: simplicity, cost-effectiveness, and having few hyper-parameters. Thanks to their simple yet effective nature, the proposed methods are competitive with the state of the art while also generally being more computationally efficient. We showcase the efficacy and efficiency of our models against state-of-the-art methods through extensive experimentation and significance testing.