DATA seminar 2023

Abstract

Over the past decade, some progress has been made on understanding the strengths and limitations of convolutional neural networks (CNNs) for computer vision. In particular, the stability properties with respect to small transformations (translations, rotations, scaling, deformations) are only partially understood. In this talk, we study the combined effect of convolution and max pooling layers in generating quasi-invariant representations. This property is essential for classification, since it is expected that two translated versions of the same image are classified in the same way. When trained on datasets such as ImageNet, CNNs tend to learn parameters in the first layer that closely resemble oriented band-pass filters. By leveraging the properties of discrete Gabor-like convolutions, we establish conditions under which the feature maps computed by the subsequent max pooling operator approximate the modulus of complex Gabor-like coefficients, in which case they are stable with respect to small input shifts. We then compute a probabilistic measure of shift invariance for max pooling feature maps. More specifically, we show that some filters, depending on their frequency and orientation, are more likely than others to produce stable image representations. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition. We demonstrate a strong correlation between shift invariance on the one hand and similarity with complex modulus on the other hand.

Date
Nov 23, 2023 2:00 PM
Event
Location
Grenoble
1 Place de la République, Nancy, Lorraine 54000

Link to the corresponding article.