Transposed Convolution as Convolution

Introduction

In my previous blog post “Convolution and Transposed Convolution as Matrix Multiplication”, I discussed how to understand convolution and transposed convolution as matrix multiplications.

It turns out that transposed convolutions could also be implemented as slower convolutions. In this blog post, I would like to discuss about how to understand transposed convolution as convolution.

Transposed Convolution as Convolution

Consider an extremely simple transposed convolution whose kernel $W$ is of shape $(2, 2)$ and stride is $1$. No input padding or output padding. We apply this transposed convolution onto an input tensor $X$ whose shape is $(2, 2)$. This results in an output tensor $Y$ whose shape is $(3, 3)$. Concretely,

$$\begin{gather} X = \begin{bmatrix} x_{1,1} & x_{1,2} \\ x_{2,1} & x_{2,2} \\ \end{bmatrix} \\ W = \begin{bmatrix} w_{1,1} & w_{1,2} \\ w_{2,1} & w_{2,2} \\ \end{bmatrix} \\ Y = W \star X = \begin{bmatrix} x_{1,1}w_{1,1} & x_{1,1}w_{1,2} + x_{1,2}w_{1,1} & x_{1,2}w_{1,2} \\ x_{1,1}w_{2,1} + x_{2,1}w_{1,1} & \begin{split} & x_{1,1}w_{2,2} + x_{1,2}w_{2,1} \\ & + x_{2,1}w_{1,2} + x_{2,2}w_{1,1} \\ \end{split} & x_{1,2}w_{2,2} + x_{2,2}w_{1,2}\\ x_{2,1}w_{2,1} & x_{2,1}w_{2,2} + x_{2,2}w_{2,1} & x_{2,2}w_{2,2} \\ \end{bmatrix} \\ \end{gather}$$

This is equivalent as applying a convolution, whose kernel $W^{\prime}$ is just the flip of the kernel from the transposed convolution, stride is $1$, no input padding, onto an input tensor $X^{\prime}$, which is just a zero-padded $X$.

$$\begin{gather} X^{\prime} = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & x_{1,1} & x_{1,2} & 0 \\ 0 & x_{2,1} & x_{2,2} & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \\ W^{\prime} = \begin{bmatrix} w_{2,2} & w_{2,1} \\ w_{1,2} & w_{1,1} \\ \end{bmatrix} \\ Y^{\prime} = W^{\prime} \ast X^{\prime} = \begin{bmatrix} x_{1,1}w_{1,1} & x_{1,1}w_{1,2} + x_{1,2}w_{1,1} & x_{1,2}w_{1,2} \\ x_{1,1}w_{2,1} + x_{2,1}w_{1,1} & \begin{split} & x_{1,1}w_{2,2} + x_{1,2}w_{2,1} \\ & + x_{2,1}w_{1,2} + x_{2,2}w_{1,1} \\ \end{split} & x_{1,2}w_{2,2} + x_{2,2}w_{1,2}\\ x_{2,1}w_{2,1} & x_{2,1}w_{2,2} + x_{2,2}w_{2,1} & x_{2,2}w_{2,2} \\ \end{bmatrix} \\ \end{gather}$$

The equivalency also holds when the transposed convolution becomes more complicated. In the white paper “A Guide to Convolution Arithmetic for Deep Learning”, the authors have summarized the relationship between square transposed convolution and square convolution in Relationship 14. Concretely,

The application of a transposed convolution, whose kernel size is $k$, stride is $s$ and output padding is $p$, on an input tensor of size $i^{\prime}$ and an output tensor of size $i$ has an associated application of a convolution, whose kernel size is $k^{\prime} = k$, stride is $s^{\prime}=1$ and padding is $p^{\prime} = k - p - 1$, on an input tensor which is stretched from the transposed convolution input by adding $s-1$ zeros between each input unit, and $a = (i + 2p - k) \mod s$ row/column or zeros added to the bottom and right edges of the original input.

Implementing Transposed Convolution as Convolution

In this section, we have not only implemented the square transposed convolution as square convolution but also the non-square transposed convolution as non-square convolution. The implementation was verified using random unit tests.

We could see that the all random unit tests have passed, suggesting that the implementation of transposed convolution using convolution is correct.

Lei Mao

11-22-2021

11-22-2021