Multivariate Jensen's Inequality

Introduction

Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.

In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.

Jensen’s Inequality

If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then

$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$

Similarly, if $f$ is a concave function, then

$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$

Proof to Jensen’s Inequality

When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.

For any $x$ and $y$ that are defined for $f$, we must have

$$
f(x) \geq f(y) + f^{\prime}(y) (x - y)
$$

Let $x = X$ and $y = \mathbb{E}[X]$, we have

$$
f(X) \geq f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X])
$$

We take the expected value on both side of the equation, we have

$$
\begin{align}
\mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[X])] + \mathbb{E}[f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) \mathbb{E}[ X - \mathbb{E}[X] ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}[X] ] ) \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ X ] ) \\
&= f(\mathbb{E}[X]) \\
\end{align}
$$

Therefore, when $f$ is a convex function, we have

$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$

Similarly, we could also prove that when $f$ is a concave function, we have

$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$

This concludes the proof.

Multivariate Jensen’s Inequality

If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then

$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$

Similarly, if $f$ is a concave function, then

$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$

Proof to Multivariate Jensen’s Inequality

The proof is very similar to the proof to the univariate Jensen’s inequality.

For any $\mathbf{x}$ and $\mathbf{y}$ that are defined for $f$, we must have

$$
f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})
$$

where $\mathbf{x}$ and $\mathbf{y}$ are assumed to be column vectors, and

$$
\begin{align}
\nabla f(\mathbf{y})^{\top} &= \nabla f(y_1, y_2, \cdots, y_n)^{\top} \\
&= \Big[ \frac{\partial f(\mathbf{y})}{ \partial y_1 }, \frac{\partial f(\mathbf{y})}{ \partial y_2 }, \cdots, \frac{\partial f(\mathbf{y})}{ \partial y_n } \Big] \\
\end{align}
$$

Essentially, $f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})$ is just the first order Taylor series expansion for $f(\mathbf{x})$, and the above inequality is the first order condition for convexity. The first order condition for convexity states that the first order Taylor series expansion for $f(\mathbf{x})$ is always less or equal to $f(\mathbf{x})$.

Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}[\mathbf{X}]$, we have

$$
f(\mathbf{X}) \geq f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}])
$$

We take the expected value on both side of the equation, we have

$$
\begin{align}
\mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) ] + \mathbb{E}[ \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} \mathbb{E}[ (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}[\mathbf{X}])) \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\
&= f(\mathbb{E}[\mathbf{X}]) \\
\end{align}
$$

Therefore, when $f$ is a convex function, we have

$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$

Similarly, we could also prove that when $f$ is a concave function, we have

$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$

This concludes the proof.

References

Author

Lei Mao

Posted on

10-09-2020

Updated on

01-26-2021

Licensed under


Comments