Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

In some scenarios, I saw people get confused about the difference between the cosine similarity and the Pearson correlation coefficient, as their mathematical definition looks somewhat similar.


In this blog post, I would like to quickly discuss the definition for the cosine similarity and the Pearson correlation coefficient and their difference.

Cosine Similarity

The cosine similarity computes the similarity between two samples. The two samples can be obtained from the same distribution or different distributions. The two samples should have the same number of features.


Given two sample feature vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$, $\mathbf{x} = \{x_1, x_2, \cdots, x_n\}$, $\mathbf{y} = \{y_1, y_2, \cdots, y_n\}$, the cosine similarity is defined as

\[\begin{align} \cos(\theta) &= \frac{\mathbf{x} \cdot \mathbf{y}}{\left\Vert \mathbf{x} \right\Vert \left\Vert \mathbf{y} \right\Vert} \\ &= \frac{ \sum_{i=1}^{n} x_i y_i }{\sqrt{\sum_{i=1}^{n} x_i^2 } \sqrt{\sum_{i=1}^{n} y_i^2 }} \\ \end{align}\]

The cosine similarity ranges from $-1$ to $1$. $1$ means the two samples are the most similar and $-1$ means the two samples are the least similar. If somehow we know $\mathbf{x}$ and $\mathbf{y}$ are unit vectors, or $\left\Vert \mathbf{x} \right\Vert \equiv \left\Vert \mathbf{y} \right\Vert$, $1$ means the two samples are the identical and $-1$ means the two samples are the opposite.

Pearson Correlation

The Pearson correlation coefficient computes the correlation between two jointly distributed random variables. We sampled $n$ samples from a bivariate $(X, Y)$ joint distribution.


Given $n$ samples consisting of two features, $\{ (x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n) \}$, the Pearson correlation coefficient is defined as

\[\begin{align} \rho_{X, Y} &= \frac{\text{cov}(X, Y)}{ \sigma_{X} \sigma_{Y} } \\ &= \frac{\mathbb{E}[(X - \mu_X)(Y - \mu_Y)] }{ \sigma_{X} \sigma_{Y} } \\ &= \frac{\mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]}{ \sqrt{\mathbb{E}[X^2] - (\mathbb{E}[X])^2} \sqrt{\mathbb{E}[Y^2] - (\mathbb{E}[Y])^2} } \\ &= \frac{ \big(\sum_{i=1}^{n} x_i y_i\big) - \big(n \bar{x}\bar{y} \big) }{ \sqrt{ \sum_{i=1}^{n} x_i^2 - n \bar{x}^2} \sqrt{\sum_{i=1}^{n} y_i^2 - n \bar{y}^2} } \\ \end{align}\]

The cosine similarity ranges from $-1$ to $1$. $1$ means the two random variables are perfectly positively correlated, $-1$ means the two random variables are perfectly negatively correlated, $0$ means the two random variables are not correlated.

Cosine Similarity VS Pearson Correlation

Someone might try to compare the cosine similarity and the Pearson correlation coefficient and ask what the difference between them. If somehow $\mathbb{E}[X] = \mathbb{E}[Y] = 0$ and $\bar{x} = \bar{y} = 0$, the Pearson correlation coefficient will become

\[\begin{align} \rho_{X, Y} &= \frac{ \sum_{i=1}^{n} x_i y_i }{ \sqrt{ \sum_{i=1}^{n} x_i^2} \sqrt{\sum_{i=1}^{n} y_i^2 } } \\ \end{align}\]

It seems that the Pearson correlation coefficient has “decayed” to cosine similarity. Someone might even claim that the cosine similarity is a special case of the Pearson correlation coefficient.


This is incorrect. The reason is extremely simple. The two quantities represent two different physical entities. The cosine similarity computes the similarity between two samples, whereas the Pearson correlation coefficient computes the correlation between two jointly distributed random variables.

References