Covariance VS Pearson Correlation Coefficient
Introduction
Covariance and Pearson correlation coefficient are commonly seen in multivariate statistical analysis. In this blog post, I would like to quickly introduce their mathematical definitions, intuitions, and some properties.
Covariance
For two jointly distributed real-valued random variables $X$ and $Y$ with finite second moments, the covariance is defined as the expected value (or mean) of the product of their deviations from their individual expected values.
$$
\begin{align}
\sigma_{XY}
&\equiv \text{cov}(X,Y) \\
&= \mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big) \big(Y - \mathbb{E}[Y]\big) \Big]
\end{align}
$$
Covariance can be positive, zero, or negative. Understanding the value of covariance should be straightforward. The more likely random variable $X$ and $Y$ deviate from their respective means in a same direction and the higher deviation magnitude, the more positive the covariance is. The more likely random variable $X$ and $Y$ deviate from their respective means in a different direction and the higher deviation magnitude, the more negative the covariance is. Otherwise, covariance will be close to zero.
Pearson Correlation Coefficient
It seems that covariance could serve as an indicator of correlation, i.e., the linear relationship, between two random variable. However, because covariance can go infinitely positive or negative, correlation quantification and comparison become not feasible in practice.
Pearson correlation coefficient is an indicator of correlation in a range of $[-1, 1]$ by “normalizing” covariance with standard deviations. Concretely,
$$
\begin{align}
\rho_{XY}
&\equiv \text{corr}(X,Y) \\
&= \frac{\text{cov}(X,Y)}{\sigma_{X} \sigma_{Y}} \\
&= \frac{\mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big) \big(Y - \mathbb{E}[Y]\big) \Big]}{\sqrt{\mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big)^2 \Big]} \sqrt{\mathbb{E}\Big[ \big(Y - \mathbb{E}[Y]\big)^2 \Big]}} \\
\end{align}
$$
The correlation coefficient is +1 in the case of a perfect direct (increasing) linear relationship (correlation), -1 in the case of a perfect inverse (decreasing) linear relationship (anti-correlation), and some value in the open interval $(-1, 1)$ in all other cases, indicating the degree of linear dependence between the variables.
Pearson correlation coefficient is in a range of $[-1, 1]$ could be proven using Cauchy-Schwarz inequality.
Proof
Given real random variables $X$ and $Y$, we define
$$
\langle X, Y \rangle \equiv \mathbb{E}[XY]
$$
Using the definition of $\langle X, Y \rangle$, it is not hard to find that it satisfies the four basic properties of inner product. Therefore, the function $\mathbb{E}[XY]$ is a inner product and random variables $X$ and $Y$ are on the inner product space, in our case, the real space.
Then we could apply Cauchy-Schwarz inequality. The Cauchy-Schwarz inequality states that for all vectors $u$ and $v$ of an inner product space it is true that
$$
\lvert \langle \mathbf{u}, \mathbf{v} \rangle \rvert^2 \leq \langle \mathbf{u}, \mathbf{u} \rangle \cdot \langle \mathbf{v}, \mathbf{v} \rangle
$$
In terms of two real random variables $X$ and $Y$,
$$
\begin{gather}
\lvert \langle X, Y \rangle \rvert^2 \leq \langle X, X \rangle \cdot \langle Y, Y \rangle \\
\end{gather}
$$
So if we have real random variables $X - \mathbb{E}[X]$ and $Y - \mathbb{E}[Y]$,
$$
\begin{gather}
\lvert \langle X - \mathbb{E}[X], Y - \mathbb{E}[Y] \rangle \rvert^2 \leq \langle X - \mathbb{E}[X], X - \mathbb{E}[X] \rangle \cdot \langle Y - \mathbb{E}[Y], Y - \mathbb{E}[Y] \rangle \\
\end{gather}
$$
By applying the definition, we have
$$
\begin{gather}
\mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big) \big(Y - \mathbb{E}[Y]\big) \Big] ^2 \leq \mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big)^2 \Big] \mathbb{E}\Big[ \big(Y - \mathbb{E}[Y]\big)^2 \Big] \\
\frac{\mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big) \big(Y - \mathbb{E}[Y]\big) \Big] ^2}{\mathbb{E}\Big[ \big(X - \mathbb{E}[X]\big)^2 \Big] \mathbb{E}\Big[ \big(Y - \mathbb{E}[Y]\big)^2 \Big]} \leq 1 \\
\end{gather}
$$
The left side term is just the square of Pearson correlation coefficient, $\rho_{XY}^2$. Therefore
$$
\begin{gather}
\rho_{XY}^2 \leq 1 \\
\end{gather}
$$
and
$$
\begin{gather}
-1 \leq \rho_{XY} \leq 1 \\
\end{gather}
$$
This concludes the proof.
Pearson correlation coefficient could also be written in this form, which looks simpler.
$$
\begin{align}
\rho_{XY}
&= \frac{\mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] }{\sqrt{\mathbb{E}[X^2] - \mathbb{E}[X]^2} \sqrt{\mathbb{E}[Y^2] - \mathbb{E}[Y]^2}} \\
\end{align}
$$
References
Covariance VS Pearson Correlation Coefficient
https://leimao.github.io/blog/Covariance-VS-Pearson-Correlation-Coefficient/