General Mathematics
Comparaison d'images invariantes affines
Published on
Image comparison, which consists in deciding whether or not several images represent some common or similar objects, is a problem recognized as difficult, especially because of the viewpoint changes between images. The apparent deformations of objects caused by changes of the camera position can be locally approximated by affine maps. This has motivated the quest for affine invariant local descriptors in the last 15 years. Unfortunately, existing descriptors cannot handle angle viewpoint differences larger than 45 degrees, and fail completely beyond 60 degrees. In this thesis, we address several strategies to resolve this limitation, and we show at the end that they complete each other.Three main branches to obtain affine invariance are actively being investigated by the scientific community:- Through affine simulations followed by (less invariant) matching of many simulated image pairs;- Through a description that is already independent from the viewpoint;- Through local affine patch normalization.In this thesis we explore all three approaches. We start by presenting a distance between affine maps that measures viewpoint deformation. This distance is used to generate optimal (minimal) sets of affine transformations, to be used by Image Matching by Affine Simulation (IMAS) methods. The goal is to reduce the number of affine simulations while keeping the same performance level in the matching process. We use these optimal sets of affine maps and other computational improvements to boost the well established ASIFT method. We also propose a new method, Optimal ARootSIFT whose performance and speed significantly improve on those of ASIFT. As a side quest and direct application of the IMAS methodology, we propose two descriptors suitable to track repeated objects based on the Number of False Alarms (NFA), test their viewpoint tolerance and generate accordingly proper sets of affine simulations. In that way we end up with two IMAS methods able to handle repetitive structures under strong viewpoint differences.Our search for improvement focuses then on local descriptors, which once were manually-designed, but are currently being learned from data with the promise of a better performance. This motivates our proposition of an affine invariant descriptor (called AID) based on a convolutional neural network trained with optical affine simulated data. Even if not trained for occlusion nor noise, the performance of AIDs on real images is surprisingly good. This performance confirms that it might be possible to attain a straightaway common description of a scene regardless of viewpoint.Finally, recent advances in affine patch normalization (e.g. Affnet) help circumvent the lack of affine invariance of state-of-the-art descriptors. As usual with affine normalization, patches are normalized to a single representation and then described. We instead propose to rely not on the precision nor on the existence of a single affine normalizing map, by presenting an Adaptive IMAS method that computes a small set of possible normalizing representations. This method aggregates the Affnet information to attain a good compromise between speed and performance. At the end of the day, our inquiries lead to a method that fuses normalization and simulation ideas to get a still faster and more complete affine invariant image matcher.All in all, affine invariance is a way to remove the viewpoint information from patches and focus on what the scene really describes. However, clues on how geometry is transformed can be useful when matching two images, e.g., recovering the global transformation, the proposal of new tentative matches, among others. With that in mind, we propose a LOCal Affine Transform Estimator (LOCATE) which is proved to be valuable for affine guided matching and homography estimation. These two applications of LOCATE provide complementary tools that improve still more the affine invariant image matchers presented above.