Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Poolings in Deep Learning

Introduction

I looked back to some of the pooling strategies, including max pooling, spatial pyramid pooling (SPP pooling), and region of interest pooling (ROI pooling), in deep learning, and I thought history is a little bit interesting. So I am going to write a short blog post on it.

Max Pooling

Max pooling is the most frequently used pooling strategies in convolutional neural networks, because it reduces the size of dimensions. For 2D max pooling, given a 2D matrix of arbitrary size, and pooling size along each dimension, you get the pooled matrix.

For example, we have the following 2D matrix, and we want to do pooling of size [3, 4].

$\begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 & 20 \\ 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 \\ 31 & 32 & 33 & 34 & 35 & 36 & 37 & 38 & 39 & 40 \\ 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 & 50 \\ \end{bmatrix}$

You will get the following matrix as output.

$\begin{bmatrix} 24 & 28 & 30 \\ 44 & 48 & 50 \\ \end{bmatrix}$

It should be noted that for the edge columns and rows if they are less than the pooling size, we will pool using whatever left there.

When doing max pooling, it feels like you are collecting fixed-sized information without knowing globally how large the matrix is. The output shape will be different if the input shape changes.

SPP Pooling

Spatial pyramid pooling (SPP pooling) was first introduced in SPPNet in 2014. I also had described it in one of my blog posts “Image Pyramids and Its Applications in Deep Learning”. For 2D SPP pooling, given a 2D matrix of arbitrary size, and output shapes, you figure out the pooling strategy and get the pooled matrices.

For example, we have the following 2D matrix, and we want to do pooling of size [2, 3], [1, 1].

$\begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 & 20 \\ 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 \\ 31 & 32 & 33 & 34 & 35 & 36 & 37 & 38 & 39 & 40 \\ 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 & 50 \\ \end{bmatrix}$

If we designed and fixed the pooling strategy to be evenly separate the matrix and do max pooling. You will get the following outputs.

$\begin{bmatrix} 13 & 16 & 20 \\ 43 & 46 & 50 \\ \end{bmatrix}$

and

$\begin{bmatrix} 50 \\ \end{bmatrix}$

Then we flatten the outputs and concatenate them. The final output is as follows.

$\begin{bmatrix} 13 & 16 & 20 & 43 & 46 & 50 & 50\\ \end{bmatrix}$

It should be noted that because 5 // 2 = 2 and 10 // 3 = 3. So when we are dividing the matrix “evenly”. The row divisions are row numbers {1, 2} and {3, 4, 5}, and the column divisions are column numbers {1, 2, 3}, {4, 5, 6}, {7, 8, 9, 10}. Not extremely even though in this case.

When doing SPP pooling, it feels like you need to figure out the information collecting strategy using the input shape and output shape first. The output shape will be unchanged if the input shape changes.

Although SPP pooling could be used to make the output shapes consistent from inputs of different shapes, the initial purpose of SPP pooling in SPPNet is to collect information from different spatial scales, and the inputs to the SPP pooling layer in SPP model are all the same.

ROI Pooling

Region of interest pooling (ROI pooling) was introduced in Fast R-CNN in 2015. It had a fancy name “ROI pooling”, but it is merely a special case of SPP pooling with only one pooling size and without flattening and concatenation. For 2D ROI pooling, given a 2D matrix of arbitrary size, and output shape, you figure out the pooling strategy and get the pooled matrix.

For example, we have the following 2D matrix, and we want to do pooling of size [2, 3].

$\begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 & 20 \\ 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 \\ 31 & 32 & 33 & 34 & 35 & 36 & 37 & 38 & 39 & 40 \\ 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 & 50 \\ \end{bmatrix}$

If we designed and fixed the pooling strategy to be evenly separate the matrix and do max pooling. You will get the following output.

$\begin{bmatrix} 13 & 16 & 20 \\ 43 & 46 & 50 \\ \end{bmatrix}$

It is almost the same as SPP Pooling. But ROI Pooling, for the first time, started to take inputs of different shapes and make the output shapes consistent. Because the inputs to the ROI pooling layer are the region of interests proposed by some region proposal methods (In Fast R-CNN it is “Select Search” outside the neural network, while in Faster R-CNN it is “Region Proposal Neural Network”), and they have different locations and shapes.

In order to classify the object in the region of interest, the features extracted from the region of interest need to be the same in order to be sent to object classification neural networks. ROI pooling plays an important role here to make all the inputs of different shapes have the extracted feature of the same shape.

Final Remarks

It is interesting that the authors of SPPNet and Fast R-CNN, Kaiming He and Ross Girshick, were both from Microsoft Research, although Kaiming He was at the branch in Beijing.