Cosine Similarity VS Pearson Correlation Coefficient
Introduction
In some scenarios, I saw people get confused about the difference between the cosine similarity and the Pearson correlation coefficient, as their mathematical definition looks somewhat similar.
In this blog post, I would like to quickly discuss the definition for the cosine similarity and the Pearson correlation coefficient and their difference.
Cosine Similarity
The cosine similarity computes the similarity between two samples. The two samples can be obtained from the same distribution or different distributions. The two samples should have the same number of features.
Given two sample feature vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$, $\mathbf{x} = \{x_1, x_2, \cdots, x_n\}$, $\mathbf{y} = \{y_1, y_2, \cdots, y_n\}$, the cosine similarity is defined as
$$
\begin{align}
\cos(\theta) &= \frac{\mathbf{x} \cdot \mathbf{y}}{\left\Vert \mathbf{x} \right\Vert \left\Vert \mathbf{y} \right\Vert} \\
&= \frac{ \sum_{i=1}^{n} x_i y_i }{\sqrt{\sum_{i=1}^{n} x_i^2 } \sqrt{\sum_{i=1}^{n} y_i^2 }} \\
\end{align}
$$
The cosine similarity ranges from $-1$ to $1$. $1$ means the two samples are the most similar and $-1$ means the two samples are the least similar. If somehow we know $\mathbf{x}$ and $\mathbf{y}$ are unit vectors, or $\left\Vert \mathbf{x} \right\Vert \equiv \left\Vert \mathbf{y} \right\Vert$, $1$ means the two samples are the identical and $-1$ means the two samples are the opposite.
Pearson Correlation
The Pearson correlation coefficient computes the correlation between two jointly distributed random variables. We sampled $n$ samples from a bivariate $(X, Y)$ joint distribution.
Given $n$ samples consisting of two features, $\{ (x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n) \}$, the Pearson correlation coefficient is defined as
$$
\begin{align}
\rho_{X, Y} &= \frac{\text{cov}(X, Y)}{ \sigma_{X} \sigma_{Y} } \\
&= \frac{\mathbb{E}[(X - \mu_X)(Y - \mu_Y)] }{ \sigma_{X} \sigma_{Y} } \\
&= \frac{\mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]}{ \sqrt{\mathbb{E}[X^2] - (\mathbb{E}[X])^2} \sqrt{\mathbb{E}[Y^2] - (\mathbb{E}[Y])^2} } \\
&= \frac{ \big(\sum_{i=1}^{n} x_i y_i\big) - \big(n \bar{x}\bar{y} \big) }{ \sqrt{ \sum_{i=1}^{n} x_i^2 - n \bar{x}^2} \sqrt{\sum_{i=1}^{n} y_i^2 - n \bar{y}^2} } \\
\end{align}
$$
The Pearson correlation coefficient ranges from $-1$ to $1$. $1$ means the two random variables are perfectly positively correlated, $-1$ means the two random variables are perfectly negatively correlated, $0$ means the two random variables are not correlated.
Cosine Similarity VS Pearson Correlation
Someone might try to compare the cosine similarity and the Pearson correlation coefficient and ask what the difference between them. If somehow $\mathbb{E}[X] = \mathbb{E}[Y] = 0$ and $\bar{x} = \bar{y} = 0$, the Pearson correlation coefficient will become
$$
\begin{align}
\rho_{X, Y}
&= \frac{ \sum_{i=1}^{n} x_i y_i }{ \sqrt{ \sum_{i=1}^{n} x_i^2} \sqrt{\sum_{i=1}^{n} y_i^2 } } \\
\end{align}
$$
It seems that the Pearson correlation coefficient has “decayed” to cosine similarity. Someone might even claim that the cosine similarity is a special case of the Pearson correlation coefficient.
This is incorrect. The reason is extremely simple. The two quantities represent two different physical entities. The cosine similarity computes the similarity between two samples, whereas the Pearson correlation coefficient computes the correlation between two jointly distributed random variables.
References
Cosine Similarity VS Pearson Correlation Coefficient
https://leimao.github.io/blog/Cosine-Similarity-VS-Pearson-Correlation-Coefficient/