Parameter Importance Approximation Via Taylor Expansion In Neural Network Pruning
Introduction
In neural network pruning, usually we have to evaluate the importance of the parameters in a neural network using some criteria and remove the parameters with the smallest importance. In addition to simply evaluating the parameter importance as the absolute value of the parameter, we can also evaluate the parameter importance as the change in the loss function when the parameter is removed, which makes more sense intuitively.
However, such evaluation is usually computationally expensive. In this blog post, I would like to discuss how to approximate the importance of the parameters in a neural network using Taylor expansion to accelerate the parameter importance evaluation process.
Parameter Importance Evaluation
The importance of a parameter in a neural network can be quantified by the change in the loss function when the parameter is removed.
The importance of a parameter
where
More generally, the parameters can be grouped and the importance of a group of parameters can be quantified by the change in the loss function when the group of parameters
where
Quantifying the importance of all the parameters in a neural network can be computationally expensive, because we have to apply the above formula to each parameter or each group of the parameters in the neural network and the number of parameters or the number of groups of the parameters in a neural network can be very large.
We need to find a way to approximate the importance of the parameters in a neural network so that we can reduce the computational cost.
Parameter Importance Approximation Via Taylor Expansion
We define the difference between the loss function of the neural network with all parameters and the loss function of the neural network with the parameter
Note that
We can approximate the function
We notice that
This function evaluated at
The first order derivative
The first order derivative
Similarly, the second order derivative
The second order derivative
Thus, we have
and the importance of the group of parameters
Note that this is just the square of a dot product between the vector of gradient with respect to the group of parameters
An alternative way to derive these formulas is just to perform Taylor expansion on the loss function
When the group of parameters
Thus, the change in the loss function
Therefore, computing the importance of the group of parameters
Conclusions
Structured neural network pruning usually requires grouping the parameters in the neural network and pruning the group of parameters with the smallest importance. The importance of the group of parameters can be approximated using the Taylor expansion, which can be much faster than computing the importance of the group of parameters using the original non-approximated formula.
References
Parameter Importance Approximation Via Taylor Expansion In Neural Network Pruning