Multivariate Jensen's Inequality

10-09-202001-26-2021 blog 6 minutes read (About 840 words) visits

Introduction

Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.

In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.

Jensen’s Inequality

If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then

$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$

Similarly, if $f$ is a concave function, then

$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$

Proof to Jensen’s Inequality

When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.

For any $x$ and $y$ that are defined for $f$, we must have

$$
f(x) \geq f(y) + f^{\prime}(y) (x - y)
$$

Let $x = X$ and $y = \mathbb{E}[X]$, we have

$$
f(X) \geq f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X])
$$

We take the expected value on both side of the equation, we have

$$
\begin{align}
\mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[X])] + \mathbb{E}[f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) \mathbb{E}[ X - \mathbb{E}[X] ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}[X] ] ) \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ X ] ) \\
&= f(\mathbb{E}[X]) \\
\end{align}
$$

Therefore, when $f$ is a convex function, we have

$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$

Similarly, we could also prove that when $f$ is a concave function, we have

$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$

This concludes the proof.

Multivariate Jensen’s Inequality

If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then

$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$

Similarly, if $f$ is a concave function, then

$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$

Proof to Multivariate Jensen’s Inequality

The proof is very similar to the proof to the univariate Jensen’s inequality.

For any $\mathbf{x}$ and $\mathbf{y}$ that are defined for $f$, we must have

$$
f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})
$$

where $\mathbf{x}$ and $\mathbf{y}$ are assumed to be column vectors, and

$$
\begin{align}
\nabla f(\mathbf{y})^{\top} &= \nabla f(y_1, y_2, \cdots, y_n)^{\top} \\
&= \Big[ \frac{\partial f(\mathbf{y})}{ \partial y_1 }, \frac{\partial f(\mathbf{y})}{ \partial y_2 }, \cdots, \frac{\partial f(\mathbf{y})}{ \partial y_n } \Big] \\
\end{align}
$$

Essentially, $f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})$ is just the first order Taylor series expansion for $f(\mathbf{x})$, and the above inequality is the first order condition for convexity. The first order condition for convexity states that the first order Taylor series expansion for $f(\mathbf{x})$ is always less or equal to $f(\mathbf{x})$.

Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}[\mathbf{X}]$, we have

$$
f(\mathbf{X}) \geq f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}])
$$

We take the expected value on both side of the equation, we have

$$
\begin{align}
\mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) ] + \mathbb{E}[ \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} \mathbb{E}[ (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}[\mathbf{X}])) \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\
&= f(\mathbb{E}[\mathbf{X}]) \\
\end{align}
$$

Therefore, when $f$ is a convex function, we have

$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$

Similarly, we could also prove that when $f$ is a concave function, we have

$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$

This concludes the proof.

References

Multivariate Jensen's Inequality

https://leimao.github.io/blog/Multivariate-Jensen-Inequality/

Author

Lei Mao

Posted on

10-09-2020

Updated on

01-26-2021

Licensed under

Mathematics

Multivariate Jensen's Inequality

Introduction

Jensen’s Inequality

Proof to Jensen’s Inequality

Multivariate Jensen’s Inequality

Proof to Multivariate Jensen’s Inequality

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue