Cross-Correlation VS Convolution
Introduction
Some of my friends who have some background in electrical engineering sometimes told me that computer science people stole the idea of convolution from electrical engineering and applied it to deep learning and claimed it they invented it. They further told me that computer science people did not use the idea correctly and the convolution used in the deep learning is not the original convolution used in electrical engineering. In fact, it is cross-correlation instead of convolution.
I have no idea whether computer science people stole the convolution idea from electrical engineering or not. But in my opinion, cross-correlation and convolution are mathematically equivalent in a neural network. In this blog post, I would like to go over the definitions and some of the properties of cross-correlation and convolution, and discuss their applications in deep learning mathematically.
Cross-Correlation
For continuous complex-valued functions $f$ and $g$, the cross-correlation is defined as
$$
\begin{gather}
(f \ast g)(t) = \int_{-\infty}^{\infty} \overline{f(\tau)} g(t + \tau) d \tau
\end{gather}
$$
Similarly, for discrete sequences, the cross-correlation is defined as
$$
\begin{gather}
(f \ast g)[n] = \sum_{m = -\infty}^{\infty} \overline{f[m]} g[n + m]
\end{gather}
$$
Here we denote the cross-correlation operation using $\ast$.
Convolution
For continuous complex-valued functions $f$ and $g$, the convolution is defined as
$$
\begin{gather}
(f \star g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) d \tau
\end{gather}
$$
Similarly, for discrete sequences, the convolution is defined as
$$
\begin{gather}
(f \star g)[n] = \sum_{m = -\infty}^{\infty} f[m] g[n - m]
\end{gather}
$$
Here we denote the convolution operation using $\star$.
Cross-Correlation VS Convolution
Cross-correlation and convolution can be converted to each other. Concretely,
$$
\begin{gather}
(f(t) \ast g(t))(t) = (\overline{f(-t)} \star g(t))(t) \\
(f(t) \star g(t))(t) = (\overline{f(-t)} \ast g(t))(t) \\
\end{gather}
$$
Let’s show a quick proof for the continuous functions as an example.
Proof
$$
\begin{align}
(\overline{f(-t)} \star g(t))(t) &= \int_{-\infty}^{\infty} \overline{f(-\tau)} g(t - \tau) d \tau \\
&= \int_{-\infty}^{\infty} - \overline{f(-\tau)} g(t - \tau) d (-\tau) \\
&= \int_{\infty}^{-\infty} - \overline{f(\tau^{\prime})} g(t + \tau^{\prime}) d (\tau^{\prime}) \\
&= \int_{-\infty}^{\infty} \overline{f(\tau^{\prime})} g(t + \tau^{\prime}) d (\tau^{\prime}) \\
&= (f(t) \ast g(t))(t) \\
\end{align}
$$
This concludes the proof.
We could also verify the equivalence using Scipy for the 1D and 2D scenarios.
1 | from typing import Tuple, Union |
Convolution in Deep Learning
From the definitions, we could see that actually the convolutions in deep learning, where the input $g$ and filter $f$ are real-valued, are in fact cross-correlations.
Then why calling cross-correlation as convolution in deep learning is still somewhat valid? This is because the filter $f$ is learned instead of defined by the user.
We could apply real convolutions $(f \star g)$ and cross-correlations $(h \ast g)$ in two neural networks with the same training configuration. Based on the equivalent transformation between cross-correlation and convolution we have derived above, we have
$$
\begin{align}
(f[n] \star g[n])[n] &= (\overline{f[-n]} \ast g[n])[n] \\
&= (f[-n] \ast g[n])[n] \\
\end{align}
$$
So obviously, after the neural networks are trained using the same training configurations, the trained filters for the convolution and the cross-correlation are just the flip of each other, i.e., $h[n] = f[-n]$. Because $(f \star g) = (h \ast g)$ in trained neural networks with the same training configuration, so it does matter whether we call it convolution or cross-correlation.
In practice, we do cross-correlation instead of convolution in the implementation, because the mathematical expression for cross-correlation just look more intuitive than that for convolution.
References
Cross-Correlation VS Convolution
https://leimao.github.io/blog/Cross-Correlation-VS-Convolution/