Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.


In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.

Jensen’s Inequality

If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then

\[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]\]

Similarly, if $f$ is a concave function, then

\[f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]\]

Proof to Jensen’s Inequality

When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.


For any $x$ and $y$ that are defined for $f$, we must have

\[f(x) \geq f(y) + f^{\prime}(y) (x - y)\]

Let $x = X$ and $y = \mathbb{E}[X]$, we have

\[f(X) \geq f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X])\]

We take the expected value on both side of the equation, we have

\[\begin{align} \mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\ &= \mathbb{E}[ f(\mathbb{E}[X])] + \mathbb{E}[f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\ &= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) \mathbb{E}[ X - \mathbb{E}[X] ] \\ &= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}[X] ] ) \\ &= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ X ] ) \\ &= f(\mathbb{E}[X]) \\ \end{align}\]

Therefore, when $f$ is a convex function, we have

\[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]\]

Similarly, we could also prove that when $f$ is a concave function, we have

\[f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]\]

This concludes the proof.

Multivariate Jensen’s Inequality

If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then

\[f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]\]

Similarly, if $f$ is a concave function, then

\[f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]\]

Proof to Multivariate Jensen’s Inequality

The proof is very similar to the proof to the univariate Jensen’s inequality.


For any $\mathbf{x}$ and $\mathbf{y}$ that are defined for $f$, we must have

\[f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})\]

where $\mathbf{x}$ and $\mathbf{y}$ are assumed to be column vectors, and

\[\begin{align} \nabla f(\mathbf{y})^{\top} &= \nabla f(y_1, y_2, \cdots, y_n)^{\top} \\ &= \Big[ \frac{\partial f(\mathbf{y})}{ \partial y_1 }, \frac{\partial f(\mathbf{y})}{ \partial y_2 }, \cdots, \frac{\partial f(\mathbf{y})}{ \partial y_n } \Big] \\ \end{align}\]

Essentially, $f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})$ is just the first order Taylor series expansion for $f(\mathbf{x})$, and the above inequality is the first order condition for convexity. The first order condition for convexity states that the first order Taylor series expansion for $f(\mathbf{x})$ is always less or equal to $f(\mathbf{x})$.


Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}[\mathbf{X}]$, we have

\[f(\mathbf{X}) \geq f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}])\]

We take the expected value on both side of the equation, we have

\[\begin{align} \mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\ &= \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) ] + \mathbb{E}[ \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\ &= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} \mathbb{E}[ (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\ &= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}[\mathbf{X}])) \\ &= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\ &= f(\mathbb{E}[\mathbf{X}]) \\ \end{align}\]

Therefore, when $f$ is a convex function, we have

\[f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]\]

Similarly, we could also prove that when $f$ is a concave function, we have

\[f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]\]

This concludes the proof.

References