Multivariate Jensen's Inequality
Introduction
Jensen’s inequality could be used for proving a lot of useful mathematical properties. Jensen’s inequality for the univariate case is very common and is relatively simple to prove. In addition, there is also a more generalized multivariate Jensen’s inequality, and I was not able to find any proof from the Internet.
In this blog post, I would like to quickly derive the proof to the univariate and multivariate Jensen’s inequality.
Jensen’s Inequality
If $X \in \mathbb{R}$ is a random variable, $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function, and $X$ is defined on $f$ then
$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$
Similarly, if $f$ is a concave function, then
$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$
Proof to Jensen’s Inequality
When $f$ is a convex function, according to the definition, it must be differentiable and its derivative $f^{\prime}$ must be defined.
For any $x$ and $y$ that are defined for $f$, we must have
$$
f(x) \geq f(y) + f^{\prime}(y) (x - y)
$$
Let $x = X$ and $y = \mathbb{E}[X]$, we have
$$
f(X) \geq f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X])
$$
We take the expected value on both side of the equation, we have
$$
\begin{align}
\mathbb{E}[ f(X) ] &\geq \mathbb{E}[ f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[X])] + \mathbb{E}[f^{\prime}(\mathbb{E}[X]) (X - \mathbb{E}[X]) ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) \mathbb{E}[ X - \mathbb{E}[X] ] \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ \mathbb{E}[X] ] ) \\
&= f(\mathbb{E}[X]) + f^{\prime}(\mathbb{E}[X]) ( \mathbb{E}[ X ] - \mathbb{E}[ X ] ) \\
&= f(\mathbb{E}[X]) \\
\end{align}
$$
Therefore, when $f$ is a convex function, we have
$$
f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]
$$
Similarly, we could also prove that when $f$ is a concave function, we have
$$
f(\mathbb{E}[X]) \geq \mathbb{E}[f(X)]
$$
This concludes the proof.
Multivariate Jensen’s Inequality
If $\mathbf{X} \in \mathbb{R}^n$ is random multivariate variable and $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function, then
$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$
Similarly, if $f$ is a concave function, then
$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$
Proof to Multivariate Jensen’s Inequality
The proof is very similar to the proof to the univariate Jensen’s inequality.
For any $\mathbf{x}$ and $\mathbf{y}$ that are defined for $f$, we must have
$$
f(\mathbf{x}) \geq f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})
$$
where $\mathbf{x}$ and $\mathbf{y}$ are assumed to be column vectors, and
$$
\begin{align}
\nabla f(\mathbf{y})^{\top} &= \nabla f(y_1, y_2, \cdots, y_n)^{\top} \\
&= \Big[ \frac{\partial f(\mathbf{y})}{ \partial y_1 }, \frac{\partial f(\mathbf{y})}{ \partial y_2 }, \cdots, \frac{\partial f(\mathbf{y})}{ \partial y_n } \Big] \\
\end{align}
$$
Essentially, $f(\mathbf{y}) + \nabla f(\mathbf{y})^{\top} (\mathbf{x} - \mathbf{y})$ is just the first order Taylor series expansion for $f(\mathbf{x})$, and the above inequality is the first order condition for convexity. The first order condition for convexity states that the first order Taylor series expansion for $f(\mathbf{x})$ is always less or equal to $f(\mathbf{x})$.
Let $\mathbf{x} = \mathbf{X}$ and $\mathbf{y} = \mathbb{E}[\mathbf{X}]$, we have
$$
f(\mathbf{X}) \geq f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}])
$$
We take the expected value on both side of the equation, we have
$$
\begin{align}
\mathbb{E}[ f(\mathbf{X}) ] &\geq \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= \mathbb{E}[ f(\mathbb{E}[\mathbf{X}]) ] + \mathbb{E}[ \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} \mathbb{E}[ (\mathbf{X} - \mathbb{E}[\mathbf{X}]) ] \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbb{E}[\mathbf{X}])) \\
&= f(\mathbb{E}[\mathbf{X}]) + \nabla f(\mathbb{E}[\mathbf{X}])^{\top} (\mathbb{E}[\mathbf{X}] - \mathbb{E}[\mathbf{X}]) \\
&= f(\mathbb{E}[\mathbf{X}]) \\
\end{align}
$$
Therefore, when $f$ is a convex function, we have
$$
f(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[f(\mathbf{X})]
$$
Similarly, we could also prove that when $f$ is a concave function, we have
$$
f(\mathbb{E}[\mathbf{X}]) \geq \mathbb{E}[f(\mathbf{X})]
$$
This concludes the proof.
References
Multivariate Jensen's Inequality
https://leimao.github.io/blog/Multivariate-Jensen-Inequality/