Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Some of my friends who have some background in electrical engineering sometimes told me that computer science people stole the idea of convolution from electrical engineering and applied it to deep learning and claimed it they invented it. They further told me that computer science people did not use the idea correctly and the convolution used in the deep learning is not the original convolution used in electrical engineering. In fact, it is cross-correlation instead of convolution.


I have no idea whether computer science people stole the convolution idea from electrical engineering or not. But in my opinion, cross-correlation and convolution are mathematically equivalent in a neural network. In this blog post, I would like to go over the definitions and some of the properties of cross-correlation and convolution, and discuss their applications in deep learning mathematically.

Cross-Correlation

For continuous complex-valued functions $f$ and $g$, the cross-correlation is defined as

\[\begin{gather} (f \ast g)(t) = \int_{-\infty}^{\infty} \overline{f(\tau)} g(t + \tau) d \tau \end{gather}\]

Similarly, for discrete sequences, the cross-correlation is defined as

\[\begin{gather} (f \ast g)[n] = \sum_{m = -\infty}^{\infty} \overline{f[m]} g[n + m] \end{gather}\]

Here we denote the cross-correlation operation using $\ast$.

Convolution

For continuous complex-valued functions $f$ and $g$, the convolution is defined as

\[\begin{gather} (f \star g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) d \tau \end{gather}\]

Similarly, for discrete sequences, the convolution is defined as

\[\begin{gather} (f \star g)[n] = \sum_{m = -\infty}^{\infty} f[m] g[n - m] \end{gather}\]

Here we denote the convolution operation using $\star$.

Cross-Correlation VS Convolution

Cross-correlation and convolution can be converted to each other. Concretely,

\[\begin{gather} (f(t) \ast g(t))(t) = (\overline{f(-t)} \star g(t))(t) \\ (f(t) \star g(t))(t) = (\overline{f(-t)} \ast g(t))(t) \\ \end{gather}\]

Let’s show a quick proof for the continuous functions as an example.


Proof

\[\begin{align} (\overline{f(-t)} \star g(t))(t) &= \int_{-\infty}^{\infty} \overline{f(-\tau)} g(t - \tau) d \tau \\ &= \int_{-\infty}^{\infty} - \overline{f(-\tau)} g(t - \tau) d (-\tau) \\ &= \int_{\infty}^{-\infty} - \overline{f(\tau^{\prime})} g(t + \tau^{\prime}) d (\tau^{\prime}) \\ &= \int_{-\infty}^{\infty} \overline{f(\tau^{\prime})} g(t + \tau^{\prime}) d (\tau^{\prime}) \\ &= (f(t) \ast g(t))(t) \\ \end{align}\]

This concludes the proof.


We could also verify the equivalence using Scipy for the 1D and 2D scenarios.

from typing import Tuple, Union
import numpy as np
import scipy.signal


def random_complex_array(size: Union[int, Tuple[int, int]]) -> np.ndarray:

    return np.random.random(size) + np.random.random(size) * 1j


def correlation_vs_convolution_1d(size: int = 128) -> None:

    signal = random_complex_array(size)
    window = random_complex_array(size)

    # Assert some properties
    # [f(t) * g(t)](t) == [g^{\dag}(t) * f^{\dag}(t)](-t)
    assert np.allclose(
        scipy.signal.correlate(in1=signal, in2=window),
        np.flip(
            scipy.signal.correlate(in1=np.conjugate(window),
                                   in2=np.conjugate(signal))),
        rtol=1e-10,
        atol=1e-10,
    )
    # [f(t) * g(t)](t) == [f^{\dag}(-t) # g(t)](t)
    assert np.allclose(
        scipy.signal.correlate(in1=signal, in2=window),
        scipy.signal.convolve(in1=signal, in2=np.flip(np.conjugate(window))),
        rtol=1e-10,
        atol=1e-10,
    )


def correlation_vs_convolution_2d(size: Tuple[int, int] = (32, 32)) -> None:

    signal = random_complex_array(size)
    window = random_complex_array(size)

    # Assert some properties
    # [f(t) * g(t)](t) == [g^{\dag}(t) * f^{\dag}(t)](-t)
    assert np.allclose(
        scipy.signal.correlate2d(in1=signal, in2=window),
        np.flip(
            scipy.signal.correlate2d(in1=np.conjugate(window),
                                     in2=np.conjugate(signal))),
        rtol=1e-10,
        atol=1e-10,
    )
    # [f(t) * g(t)](t) == [f^{\dag}(-t) # g(t)](t)
    assert np.allclose(
        scipy.signal.correlate2d(in1=signal, in2=window),
        scipy.signal.convolve2d(in1=signal, in2=np.flip(np.conjugate(window))),
        rtol=1e-10,
        atol=1e-10,
    )


def main() -> None:

    correlation_vs_convolution_1d()
    correlation_vs_convolution_2d()


if __name__ == "__main__":

    main()

Convolution in Deep Learning

From the definitions, we could see that actually the convolutions in deep learning, where the input $g$ and filter $f$ are real-valued, are in fact cross-correlations.


Then why calling cross-correlation as convolution in deep learning is still somewhat valid? This is because the filter $f$ is learned instead of defined by the user.


We could apply real convolutions $(f \star g)$ and cross-correlations $(h \ast g)$ in two neural networks with the same training configuration. Based on the equivalent transformation between cross-correlation and convolution we have derived above, we have

\[\begin{align} (f[n] \star g[n])[n] &= (\overline{f[-n]} \ast g[n])[n] \\ &= (f[-n] \ast g[n])[n] \\ \end{align}\]

So obviously, after the neural networks are trained using the same training configurations, the trained filters for the convolution and the cross-correlation are just the flip of each other, i.e., $h[n] = f[-n]$. Because $(f \star g) = (h \ast g)$ in trained neural networks with the same training configuration, so it does matter whether we call it convolution or cross-correlation.


In practice, we do cross-correlation instead of convolution in the implementation, because the mathematical expression for cross-correlation just look more intuitive than that for convolution.

References