From

Timetable to

Place Espace Gilbert Simondon, 1B36, ENS Paris-Saclay

Thèses et HDR

PhD defense of Valéry DEWIL

Title : Practical video denoising and demosaicing with neural networks
Supervision : Gabriele Facciolo
Date of defense : 05/12/2022

Add to the calendar

Title

Practical video denoising and demosaicing with neural networks

Abstract

Image denoising is a fundamental task in image processing which aims to recover the underlying signal from the noisy data. There are two categories of denoising method: the traditional model-based approaches and the data-driven methods (based on training neural networks). The model-based methods require a comprehensive model of the noise. Conversely the learning-based approaches can be trained to, in principle, any noise type and have a great performance.

This thesis studies the case of data-driven based denoising methods. The standard approach is to use convolutional neural networks (CNNs), trained under supervision. Recently, self-supervised techniques have been proposed (mainly for image denoising). Although those latter methods achieve slightly worst results that their counterpart trained with supervision, they proved to be competitive on synthetic data and do not rely on the supervision of a clean data.

An alternative is to train on synthetic datasets. But neural networks have a poor ability to generalize to data with different distribution plus simulating realistic datasets requires a comprehensive model for the real noise and to generate the clean raw data, both of which are unsolved research problems.

Self-supervised approaches are suitable candidates for video denoising. We propose the first self-supervised method for training multi-frame video denoising networks. This framework, called MF2F, can be used to adapt any denoising neural network to a large family of noise types. MF2F relies on a fine-tuning of a pre-trained denoising network. For several synthetic noise types, a network fine-tuned with this proposed approach competes with the noise-specific network trained under supervision. On real noisy videos, it has given very promising results, setting the state of the art at the moment of the publication.

For now, two observations can be done: (1) CNNs can be trained with supervised and self-supervised learning, the latter being dominated by the former and (2) the self-supervised technique MF2F achieves auspicious results on real data. Hence which one should be used to trained a denoising network? In the second part of the thesis, we answer this question and describe the study we did to compare both approaches.

After denoising, demosaicing is also a very important step in the acquisition of an RGB image. Traditionally, denoising and demosaicing are applied separately but the best is  to operate both operations together in the same time. In the third part of the thesis, we examine different architectures, leading to the first method at this time that operates joint denoising and demosaicing for videos.

Supervision

Gabriele Facciolo