Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.


In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.

Jensen’s Inequality

If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then

\[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]\]

Similarly, if $f$ is a concave function, then

\[f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]\]

Proof to Jensen’s Inequality

When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.


For any $x$ and $y$ that are define for $f$, we must have

\[f(x) \geq f(y) + f^{\prime}(y) (x - y)\]

Let $x = X$ and $y = \mathbb{E}(X)$, we have

\[f(X) \geq f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X))\]

We take the expected value on both side of the equation, we have

\[\begin{align} \mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X)) ] \\ &= \mathbb{E}[ f(\mathbb{E}(X))] + \mathbb{E}[f^{\prime}(\mathbb{E}(X)) (X - \mathbb{E}(X)) ] \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) \mathbb{E}[ X - \mathbb{E}(X) ] \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}(X) ] ) \\ &= f(\mathbb{E}(X)) + f^{\prime}(\mathbb{E}(X)) ( \mathbb{E}[ X ] - \mathbb{E}(X) ) \\ &= f(\mathbb{E}(X)) \\ \end{align}\]

Therefore, when $f$ is a convex function, we have

\[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]\]

Similarly, we could also prove that when $f$ is a concave function, we have

\[f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]\]

This concludes the proof.

Multivariate Jensen’s Inequality

If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then

\[f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]\]

Similarly, if $f$ is a concave function, then

\[f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]\]

Proof to Multivariate Jensen’s Inequality

The proof is very similar to the proof to the univariate Jensen’s inequality.


For any $\mathbf{x}$ and $\mathbf{y}$ that are define for $f$, we must have

\[f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla(\mathbf{y}) (\mathbf{x} - \mathbf{y})\]

Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}(\mathbf{X})$, we have

\[f(\mathbf{X}) \geq f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X}))\]

We take the expected value on both side of the equation, we have

\[\begin{align} \mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= \mathbb{E}[ f(\mathbb{E}(\mathbf{X})) ] + \mathbb{E}[ \nabla(\mathbb{E}(\mathbf{X})) (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) \mathbb{E}[ (\mathbf{X} - \mathbb{E}(\mathbf{X})) ] \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}(\mathbf{X}])) \\ &= f(\mathbb{E}(\mathbf{X})) + \nabla(\mathbb{E}(\mathbf{X})) (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\ &= f(\mathbb{E}(\mathbf{X})) \\ \end{align}\]

Therefore, when $f$ is a convex function, we have

\[f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]\]

Similarly, we could also prove that when $f$ is a concave function, we have

\[f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]\]

This concludes the proof.

References