Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS


I have once come up with a question “how do we do back propagation through max-pooling layer?”. The short answer is “there is no gradient with respect to non-maximum values”.


Max-pooling is defined as

where $y$ is the output and $x_i$ is the value of the neuron.

Alternatively, we could consider max-pooling layer as an affine layer without bias terms. The weight matrix in this affine layer is not trainable though.

Concretely, for the output $y$ after max-pooling, we have


The gradient to each neuron is

This simply means that the gradient to the neuron is 1 for the neuron with the maximum value. The gradients for all the other neurons are 0. The neurons whose gradients are 0 do not contribute to the gradients in the earlier neurons due to the chain rule.


There is no gradient with respect to non-maximum values.