Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Multivariate Jensen's Inequality

Introduction

Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.

In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.

Jensen’s Inequality

If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then

$f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$

Similarly, if $f$ is a concave function, then

$f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]$

Proof to Jensen’s Inequality

When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.

For any $x$ and $y$ that are define for $f$, we must have

$f(x) \geq f(y) + f^{\prime}(y) (x - y)$

Let $x = X$ and $y = \mathbb{E}(X)$, we have

$f(X) \geq f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X))$

We take the expected value on both side of the equation, we have

\begin{align} \mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X)) ] \\ &= \mathbb{E}[ f(\mathbb{E}(X))] + \mathbb{E}[f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X)) ] \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) \mathbb{E}[ X - \mathbb{E}(X) ] \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}(X) ] ) \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) ( \mathbb{E}[ X ] - \mathbb{E}(X) ) \\ &= f(\mathbb{E}(X)) \\ \end{align}

Therefore, when $f$ is a convex function, we have

$f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$

Similarly, we could also prove that when $f$ is a concave function, we have

$f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]$

This concludes the proof.

Multivariate Jensen’s Inequality

If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then

$f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]$

Similarly, if $f$ is a concave function, then

$f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]$

Proof to Multivariate Jensen’s Inequality

The proof is very similar to the proof to the univariate Jensen’s inequality.

For any $\mathbf{x}$ and $\mathbf{y}$ that are define for $f$, we must have

$f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla(\mathbf{y}) (\mathbf{x} - \mathbf{y})$

Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}(\mathbf{X})$, we have

$f(\mathbf{X}) \geq f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X}))$

We take the expected value on both side of the equation, we have

\begin{align} \mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= \mathbb{E}[ f(\mathbb{E}(\mathbf{X})) ] + \mathbb{E}[ \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) \mathbb{E}[ (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}(\mathbf{X}])) \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\ &= f(\mathbb{E}(\mathbf{X})) \\ \end{align}

Therefore, when $f$ is a convex function, we have

$f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]$

Similarly, we could also prove that when $f$ is a concave function, we have

$f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]$

This concludes the proof.