Débruitage et dématriçage concret de videos avec des réseaux de neurones

Publié le 5 décembre 2022

Auteurs : Valéry Dewil

Image denoising is a fundamental task in image processing. The goal of denoising is to recover the underlying signal. Several methods have been proposed. They can be classified in 2: the model-based approaches and the data-driven methods. Model-based methods need a model of the signal and noise. These approaches could be adapted to handle other noise types but this has revealed to be a tedious task since we potentially need to redesign the initial algorithms. On the other hand, the learning-based approaches have many advantages. First, even if they are trained for a specific noise, a same neural network architecture could be trained to handle any noise type. Second, they have advanced significantly the state of the art. For this reasons we are interested in the case of data-driven denoising methods. The standard approach is to use CNN, trained under supervision. A dataset of clean/noisy pairs is needed. However, acquiring a dataset of real images or videos with clean data is a difficult. For the case of images, it is possible to circumvent this by generating a pseudo noiseless image by aggregating numerous noisy frames or equivalently increasing the exposure time for instance. However, those tricks are no longer conceivable for the case of videos.Self-supervised techniques (which do not rely on the supervision of a clean data) have been proposed for image denoising.Although these methods achieve worst results that their supervised counterparts, they proved to be competitive.This means that self-supervised approaches are suitable candidates for video denoising.In this thesis, we propose the first self-supervised method for training multi-frame video denoising networks.This framework, called MF2F, can be used to adapt any denoising neural network to a large family of noise types. This effectively results in a blind denoising method. MF2F relies on a self-supervised fine-tuning of a pre-trained denoising network. For several synthetic noise types, a network fine-tuned with this proposed approach competes with the noise-specific network trained under supervision. On real noisy videos, it has given very promising results, setting the state of the art at the moment of the publication.For now, two observations can be done: (1) CNNs can be trained with two training techniques which are supervised and self-supervised learning, the latter being dominated by the former (at least on synthetic data) and (2) the self-supervised technique MF2F achieves auspicious results on real data. From those two statements, the natural question is to determine which one should be used to trained a denoising network when dealing with real data. In the second part of the thesis, we focus on answering this question and we describe the study we did to compare both approaches as well as the obtained results.After denoising, demosaicing is also a very important step in the acquisition of an RGB image. Traditionally, denoising is done before demosaicing, but there are some papers claiming that it is beneficial to operate demosaicing before. The best is even to operate both operations together in the same time. Joint denoising and demosaicing (JDD) methods have been proposed for image denoising. Still, the case of videos has been far much less studied. While the two first parts of the thesis focus on training network for practical use cases, this third part is devoted to the architecture. Starting from the promising results on real raw videos obtained with the framework MF2F, we examine different architectures for video JDD, evaluating the impact of aspects such as motion compensation and recurrent/non-recurrent approaches. The best results were obtained by a simple recurrent CNN with a multi-scale architecture, hopefully setting a baseline for future research in the subject.Both the multi-scale and recurrent CNN are not novel, yet this is the only method at this time that operates joint denoising and demosaicing for videos.