From CNNs to Shift-Invariant Twin Wavelet Models

Abstract

We propose a novel antialiasing method to increase shift invariance in convolutional neural networks (CNNs). More precisely, we replace the conventional combination “realvalued convolutions + max pooling” (RMax) by “complexvalued convolutions + modulus” (CMod), which produce stable feature representations for band-pass filters with well-defined orientations. In our recent work [21], we proved that, for such filters, the two operators yield similar outputs. Therefore, CMod can be viewed as a stable alternative to RMax. To separate band-pass filters from other freely-trained kernels, in this paper, we designed a “twin” architecture based on the dual-tree complex wavelet packet transform, which generates similar outputs as standard CNNs with fewer trainable parameters. In addition to improving stability to small shifts, our experiments on AlexNet and ResNet showed increased prediction accuracy on natural image datasets such as ImageNet and CIFAR10. Furthermore, our approach outperformed recent antialiasing methods based on low-pass filtering by preserving highfrequency information, while reducing memory usage.

Related