paper-reading-group

Convolutional Networks with Adaptive Inference Graphs

Authors: Andreas Veit, Serge Belongie

Year: 2018

https://openaccess.thecvf.com/content_ECCV_2018/papers/Andreas_Veit_Convolutional_Networks_with_ECCV_2018_paper.pdf


Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled.png

Majority of features of a CNN are these high level features that are not relevant to the image.

Theme-

Why does convolution layers have a fixed feedforward structure?

Instead of Image being passed through hundreds of layers why couldn’t image be directed to first high level features (initial layers) then fine grained differences that is specifically designed for this class.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%201.png

This paper proposes the idea of - the more a network already knows about an image, the better it should be at deciding which layer to compute next, so the architecture ConvNet-AIG decides for each input image on the fly which layers are needed. What if, after identifying that an image contains a bird, a ConvNet could move directly to a layer that can distinguish different bird species, without executing intermediate layers that specialize in unrelated aspects?

Solution proposed to this idea - A gated mechanism should be attached to each residual layer which makes a discrete decision on whether the image should be passed through it or not. This idea was supported by a study which showed even after removal of few residual block in a fully trained model, the performance of the model didn’t drop much.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%202.png

Gated Inference

The architecture follows the basic structure of a ResNet with the key difference that instead of executing all layers, the network determines for each input image which subset of layers to execute. In particular, with layers focusing on different subgroups of categories, it can select only those layers necessary for the specific input.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%203.png

Gate can be split into 2 parts

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%204.png

Problems with this Gates -

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%205.png

*for math and gumbel samples have a look at the paper

Loss Function-

They also introduce a target rate loss, which pushes the network to execute t percent of all layers. This is to prevent the scenario where the model uses all the layer or very few of them.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%206.png

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%207.png

Let z’l denote the fraction of images within a mini-batch that layer l is executed.

Experiments/Results-

ConvNet-AIG 110 clearly outperforms ResNet 110 while only using a subset of 82% of the layers.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%208.png

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%209.png

This image signifies that even of the same category if the classification of that object is difficult it would use much more layers than that of an easy image of the same category.

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%2010.png

Convolutional%20Networks%20with%20Adaptive%20Inference%20Gra%205c731fbfed5b4f93b6b49bc30f331d18/Untitled%2011.png

It can be seen that for Birds (in general living animal) one less residual block is needed to compute than that of the man made objects. Which implies lesser computational time.

Conclusion -