Gonzalo Iñaki QUINTANA
Leveraging domain adaptation methods for federated learning applied to 2D mammography image classification
Abstract
Breast cancer is the most commonly diagnosed cancer among women, with early detection significantly improving survival rates. X-ray imaging techniques are crucial in detecting, diagnosing, and monitoring breast cancer treatment. Deep Learning-based Computer-Aided Detection & Diagnosis (CAD) systems are developed to assist radiologists in detecting and analyzing radiological findings. However, developing these models requires access to large, annotated datasets that capture variability across different populations, acquisition systems, and image post-processing algorithms. The collection of data from different clinical sites faces practical and legal challenges related to health data privacy laws, which hinders the development of DL-based CADs for mammography.
Federated Learning eliminates the need for collecting data to a centralized location, and has the potential to overcome one of the main barriers for the exploitation of data from more diverse clinical sites. Despite having shown promising results when data on the different clients or sites are homogeneous, the training of FL models is negatively affected when the data distribution are different on each site (which is commonly referred to as heterogeneous setting). This is especially problematic for mammography images, due to population differences and to the large number of mammography systems, vendors, and post-processing algorithms, which contribute to having highly heterogeneous distributions among sites
In this thesis, we investigate the development and training of DL models for mammography, using Federated Learning. In particular, we study the issues caused by heterogeneous data distributions due to image style differences, and propose ways to alleviate them.
We first introduce BN-SCAFFOLD, a FL algorithm that extends the state-of-the-art SCAFFOLD by using control variates to correct client drift in Batch Normalization layers, thus increasing classification performance. We develop a generic theoretical framework that enables to calculate convergence rates for different FL algorithms and provide convergence guarantees for BN-SCAFFOLD. BN-SCAFFOLD is shown to outperform other state-of-the-art FL algorithms when heterogeneity is strong. Then, we propose a Contrastive Learning (CL)-based Domain Adaptation methodology, that enables to obtain domain-invariant models in a standard centralized learning setting, increasing classification performance. We theoretically show that minimizing the Contrastive losses reduces the Class-wise Mean Maximum Discrepancy (CMMD), a dissimilarity measure commonly used for achieving Domain Adaptation, and thereby performs Domain Adaptation. In addition, we show that decreasing the Contrastive losses increases class-separability in the feature space. Finally, we extend this CL methodology to Federated Learning, ensuring cross-client feature alignment without requiring the clients to transmit the locally extracted features, reducing the impact on data privacy.
Keywords
federated learning,deep learning,domain adaptation,contrastive learning,computer aided detection,medical imaging
Supervision
Jury
- Joseph SALMON, Professeur, Université de Montpellier
- Pietro GORI, Maître de conférences, Télécom Paris
- Marco LORENZI, Chargé de recherche, Université Côte d'Azur, INRIA
- Erwan LE PENNEC, Professeur, École polytechnique
- Isabelle BLOCH, Professeur, Sorbonne Université, LIP6