Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Taylor theorem is widely used for the approximation of a $k$-times differentiable function around a given point by a polynomial of degree $k$, called the $k$th-order Taylor polynomial. In addition, it is also useful for proving some of the convex function properties.


In this blog post, I would like to discuss and prove the univariate Taylor theorem followed by touching some of the basic expressions for the commonly used quadratic multivariate Taylor theorem.

Prerequisites

The following theorem and lemmas are prerequisite knowledge for proving the univariate Taylor theorem.

Rolle’s Theorem

Rolle’s theorem is a special case of mean value theorem.


If a real-valued function $f$ is continuous on a proper closed interval $[a, b]$, differentiable on the open interval $(a, b)$, and $f(a) = f(b)$, then there exists at least one $c$ in the open interval $(a, b)$ such that $f(c) = 0$.

Lemma 1

Let $f: \mathbb{R} \rightarrow \mathbb{R}$ be $2$ times differentiable on the open interval with $f^{\prime}$ and $f^{\prime\prime}$ continuous on the closed interval $I$ between $a$ and $x_0$ (a function of class $C^2$ on $I$). If $f^{\prime}(a) = 0$, and $x_0$ is a point such that $f(x_0) = f(a)$ and $x_0 \neq a$, then there exists $t \in (0,1)$ such that $f^{\prime\prime}(a + t(x_0 - a)) = 0$.


Proof


Because $f(x_0) = f(a)$ and $x_0 \neq a$, according to Rolle’s Theorem, there exists $t_1 \in (0,1)$ such that $f^{\prime}(a + t_1(x_0 - a)) = 0$. Now, because $f^{\prime}(a) = f^{\prime}(a + t_1(x_0 - a))$ and $a + t_1(x_0 - a) \neq a$, again, according to Rolle’s Theorem, there exists $t_2 \in (0,1)$ such that $f^{\prime\prime}(a + t_2(a + t_1(x_0 - a) - a)) = 0$.

\[\begin{align} f^{\prime\prime}(a + t_2(a + t_1(x_0 - a) - a)) &= f^{\prime\prime}(a + t_2t_1(x_0 - a)) \\ &= f^{\prime\prime}(a + t(x_0 - a)) \\ &= 0 \\ \end{align}\]

where $t = t_1t_2$.


This concludes the proof.

Lemma 2

Let $f$ of class $C^k$ on the closed interval $I$ between $a$ and $x_0$. If $f^{\prime}(a) = f^{\prime\prime}(a) = \cdots = f^{(k-1)}(a) = 0$, and $x_0$ is a point such that $f(x_0) = f(a)$ and $x_0 \neq a$, then there exists $t \in (0,1)$ such that $f^{(k)}(a + t(x_0 - a)) = 0$.


Proof


Because $f(x_0) = f(a)$, $x_0 \neq a$, and $f^{\prime}(a) = 0$, according to Lemma 1, there exists $t_1 \in (0,1)$ such that $f^{\prime\prime}(a + t_1(x_0 - a)) = 0$. Because $f^{\prime\prime}(a + t_1(x_0 - a)) = f^{\prime\prime}(a)$, $a + t_1(x_0 - a) \neq a$, and $f^{\prime\prime}(a) = 0$, according to Lemma 1, there exists $t_2^{\prime} \in (0,1)$ such that $f^{(3)}(a + t_2^{\prime}(a + t_1(x_0 - a) - a)) = 0$. We rearrange the terms, so there must exists $t_2 \in (0,1)$ such that $f^{(3)}(a + t_2(x_0 - a)) = 0$, where $t_2 = t_1 t_2^{\prime}$.


Iterating this process, there exists $t_{k-1} \in (0,1)$ such that $f^{(k)}(a + t_{k-1}(x_0 - a)) = 0$.


This concludes the proof.

Univariate Taylor Theorem

General Univariate Taylor Theorem

Let $k \geq 1$ be an integer and let the function $f: \mathbb{R} \rightarrow \mathbb{R}$ be $k$ times differentiable at the point $a \in \mathbb{R}$. Then there exists a function $h_k: \mathbb{R} \rightarrow \mathbb{R}$ such that

\[f(x) = P_k(x) + R_k(x)\]

where $P_k(x)$ is the $k$-th order Taylor polynomial,

\[P_k(x) = f(a) + f^{\prime}(a)(x-a) + \frac{f^{\prime\prime}(a)}{2!}(x-a)^2 + \cdots + \frac{f^{(k)}(a)}{k!}(x-a)^k\]

and $R_k(x)$ is the remainder term.


The remainder term could have many different forms. The commonly seen form is the Peano form of the remainder, where the formula of the remainder is not explicitly defined.

\[R_k(x) = h_k(x)(x-a)^k, \quad \lim_{x \rightarrow a} h_k(x) = 0\]

There are explicit formula for the remainder, with some additional assumptions. Among the mean-value forms of the remainder, the Lagrange form of the remainder is widely used.


Let $f: \mathbb{R} \rightarrow \mathbb{R}$ be $k+1$ times differentiable on the open interval with $f^{(k)}$ continuous on the closed interval between $a$ and $x$, then the remainder term could be expressed as

\[\begin{align} R_k(x) &= \frac{f^{(k+1)}(\xi_L)}{(k+1)!}(x-a)^{k+1} \\ &= \frac{f^{(k+1)}(a + t(x-a))}{(k+1)!}(x-a)^{k+1} \\ \end{align}\]

where $\xi_L$ is some real value between $a$ and $x$, and $0 \leq t \leq 1$.


It is easy to show that Taylor theorem with the Lagrange form remainder implies Taylor theorem with the Peano form remainder.

\[\begin{align} f(x) &= P_k(x) + R_k(x) \\ &= P_{k-1}(x) + R_{k-1}(x) \\ \end{align}\]

We will use Peano form for $R_k(x)$ and Lagrange form for $R_{k-1}(x)$, we then have

\[\begin{align} \frac{f^{(k)}(a)}{k!}(x-a)^k + h_k(x)(x-a)^k &= \frac{f^{(k)}(a + t(x-a))}{k!}(x-a)^{k} \\ \frac{f^{(k)}(a)}{k!} + h_k(x) &= \frac{f^{(k)}(a + t(x-a))}{k!} \\ \end{align}\] \[h_k(x) = \frac{f^{(k)}(a + t(x-a)) - f^{(k)}(a)}{k}\]

Apparently,

\[\lim_{x \rightarrow a} h_k(x) = 0\]

This concludes the proof that Taylor theorem with the Lagrange form remainder implies Taylor theorem with the Peano form remainder.


Let’s check Taylor’s theorem quickly. When $k = 1$, using the Peano form of the remainder, we have

\[\begin{align} f(x) &= P_1(x) + R_1(x) \\ &= f(a) + f^{\prime}(a)(x - a) + h_1(x)(x-a) \\ \end{align}\]

We have the expression for $h(x)$

\[\begin{align} h_1(x) &= \frac{f(x) - f(a) - f^{\prime}(a)(x - a)}{x - a} \\ &= \frac{f(x) - f(a)}{x - a} - f^{\prime}(a)\\ \end{align}\]

Because of the definition of derivative,

\[f^{\prime}(a) = \lim_{x \rightarrow a} \frac{f(x) - f(a)}{x - a}\]

we have

\[\begin{align} \lim_{x \rightarrow a} h_1(x) &= \lim_{x \rightarrow a} \bigg( \frac{f(x) - f(a)}{x - a} - f^{\prime}(a) \bigg) \\ &= \lim_{x \rightarrow a} \bigg( \frac{f(x) - f(a)}{x - a} \bigg) - f^{\prime}(a) \\ &= f^{\prime}(a) - f^{\prime}(a) \\ &= 0 \\ \end{align}\]

Let’s try to prove Taylor’s theorem more formally using the Lagrange form now. (To make the proof somewhat easier, it is not using the more general Peano form remainder.)


Proof


We define new functions $g_1$ and $g_2$.

\[\begin{align} g_1(x) &= f(x) - \bigg( f(a) + f^{\prime}(a) (x-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-1)!} (x-a)^{k-1} \bigg) \end{align}\] \[\begin{align} g_2(x) &= g_1(x) - \Big(\frac{x - a}{x_0 - a}\Big)^k g_1(x_0) \end{align}\]

where $a$ and $x_0$ are some arbitrary real values and $a \neq x_0$.


The derivatives of $g_1(x)$ are as follows.

\[\begin{gather} g_1^{(1)}(x) = f^{(1)}(x) - \bigg( f^{(1)}(a) + f^{(2)}(a) (x-a) + \frac{f^{(3)}(a)}{2!} (x-a)^2 + \cdots + \frac{f^{(k-1)}(a)}{(k-2)!} (x-a)^{k-2} \bigg) \\ g_1^{(2)}(x) = f^{(2)}(x) - \bigg( f^{(2)}(a) + f^{(3)}(a) (x-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-3)!} (x-a)^{k-3} \bigg) \\ \vdots \\ g_1^{(i)}(x) = f^{(i)}(x) - \bigg( f^{(i)}(a) + f^{(i+1)}(a) (x-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-i-1)!} (x-a)^{k-i-1} \bigg) \\ \vdots \\ g_1^{(k-1)}(x) = f^{(k-1)}(x) - f^{(k-1)}(a) \\ g_1^{(k)}(x) = f^{(k)}(x) \\ \end{gather}\]


It is easy to verify that $g_1^{(i)}(a) = 0$ for $i \in [1, k-1]$.


The derivatives of $g_2(x)$ are as follows.

\[\begin{gather} g_2^{(1)}(x) = g_1^{(1)}(x) - k \frac{(x - a)^{k-1}}{(x_0 - a)^k} g_1(x_0) \\ g_2^{(2)}(x) = g_1^{(2)}(x) - k(k-1) \frac{(x - a)^{k-2}}{(x_0 - a)^k} g_1(x_0) \\ g_2^{(i)}(x) = g_1^{(i)}(x) - \frac{k!}{(k-i)!} \frac{(x - a)^{k-i}}{(x_0 - a)^k} g_1(x_0) \\ g_2^{(k-1)}(x) = g_1^{(k-1)}(x) - k! \frac{x - a}{(x_0 - a)^k} g_1(x_0) \\ g_2^{(k)}(x) = g_1^{(k)}(x) - k! \frac{1}{(x_0 - a)^k} g_1(x_0) \\ \end{gather}\]

Given $g_1^{(i)}(a) = 0$ for $i \in [1, k-1]$, it is also easy to verify that $g_2^{(i)}(a) = 0$ for $i \in [1, k-1]$.


Further more,

\[\begin{align} g_1(a) = f(a) - f(a) = 0 \end{align}\] \[\begin{align} g_2(a) = g_1(a) = 0 \end{align}\] \[\begin{align} g_2(x_0) = g_1(x_0) - g_1(x_0) = 0 \end{align}\]

So we have $g_2(x_0) = g_2(a)$ and $a \neq x_0$.


So we have $g_2(x_0) = g_2(a)$ and $a \neq x_0$, and $g_2^{\prime}(a) = g_2^{\prime\prime}(a) = \cdots = g_2^{(k-1)}(a) = 0$. According to Lemma 2, there exists $t \in (0,1)$ such that $g_2^{(k)}(a + t(x_0 - a)) = 0$.


Therefore,

\[\begin{align} g_2^{(k)}(a + t(x_0 - a)) &= g_1^{(k)}(a + t(x_0 - a)) - k! \frac{1}{(x_0 - a)^k} g_1(x_0) \\ &= f^{(k)}(a + t(x_0 - a)) - k! \frac{1}{(x_0 - a)^k} \Bigg( f(x_0) - \bigg( f(a) + f^{\prime}(a) (x_0-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-1)!} (x_0-a)^{k-1} \bigg) \Bigg) \\ &= 0 \\ \end{align}\] \[\begin{align} f(x_0) &= f(a) + f^{\prime}(a) (x_0-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-1)!} (x_0-a)^{k-1} + \frac{f^{(k)}(a + t(x_0 - a))}{k!}(x_0 - a)^k \end{align}\]

Note that we have assumed $a$ and $x_0$ to be arbitrary as long as $a \neq x_0$. So we basically have,

\[\begin{align} f(x) &= f(a) + f^{\prime}(a) (x-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-1)!} (x-a)^{k-1} + \frac{f^{(k)}(a + t(x - a))}{k!}(x - a)^k \end{align}\]

where $x \neq a$.


However, it is easy to see that when $x = a$, the above equation still holds. So ultimately, we have

\[\begin{align} f(x) &= f(a) + f^{\prime}(a) (x-a) + \cdots + \frac{f^{(k-1)}(a)}{(k-1)!} (x-a)^{k-1} + \frac{f^{(k)}(a + t(x - a))}{k!}(x - a)^k \end{align}\]

The above verifies Taylor theorem with the Lagrange form remainder.


This concludes the proof.

Multivariate Taylor Theorem

The multivariate Taylor theorem is a little bit complicated. We will only focus on the quadratic form here without discussing the proofs.

Multi-Index Notation

There are some notations in order to express the multivariate Taylor theorem conveniently.


Suppose $\alpha \in \mathbb{N}^n$, $\alpha = \{\alpha_1, \alpha_2, \cdots, \alpha_n\}$, and $x \in \mathbb{R}^n$, $x = \{x_1, x_2, \cdots, x_n\}$, we have the following notations.

\[\lvert \alpha \rvert = \alpha_1 + \alpha_2 + \cdots + \alpha_n\] \[\alpha ! = \alpha_1! \alpha_2! \cdots \alpha_n!\] \[{n \choose \alpha} = \frac{n!}{\alpha !} = \frac{n!}{\alpha_1! \alpha_2! \cdots \alpha_n!}\] \[x^{\alpha} = x_1^{\alpha_1} x_2^{\alpha_2} \cdots x_n^{\alpha_n}\]

Suppose $\alpha \in \mathbb{N}^n$ and a constant natural number $k$, how many different $\alpha$ are there such that $\lvert \alpha \rvert = k$?


Let’s look at an example. Assume $n = 3$ and $k = 2$, the $\alpha$ such that $\lvert \alpha \rvert = k$ are

\[\begin{gather} \alpha = (2, 0, 0) \\ \alpha = (0, 2, 0) \\ \alpha = (0, 0, 2) \\ \alpha = (1, 1, 0) \\ \alpha = (1, 0, 1) \\ \alpha = (0, 1, 1) \\ \end{gather}\]

Therefore, we have $6$ different $\alpha$ for $n = 3$ and $k = 2$.


The formula for the general case is actually not hard to derive. This combination problem is actually equivalent to the combination problem that we have $k + n$ identical balls, and $n$ different boxes, how many unique ways to put these balls into the boxes such that each box has at least one ball. To solve it, we would need to put all the $k+n$ balls as a sequence, and insert $n-1$ barriers between the balls, such that each two adjacent balls could have at most one barrier. The number of combinations for this problem is, obviously, $k+n-1 \choose n-1$.


Let’s verify if the formula we derived is valid for the example we have seen above.

\[{k+n-1 \choose n-1} = {2+3-1 \choose 3-1} = {4 \choose 2} = 6\]

$k$-th Order Partial Derivatives

Suppose a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is $k$ times differentiable, the $k$-th order partial derivatives of $f$ are

\[D^{\alpha} f = \frac{\partial^{\lvert \alpha \rvert}f}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}}\]

for all $\alpha$ such that $\lvert \alpha \rvert = k$.


Suppose a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is $k$ times differentiable at point $x \in \mathbb{R}^n$, the $k$-th order partial derivatives of $f$ evaluated at point $x$ are

\[D^{\alpha} f (x) = \frac{\partial^{\lvert \alpha \rvert}f}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}} (x)\]

for all $\alpha$ such that $\lvert \alpha \rvert = k$.


Let’s look at an example. Assume $n = 3$ and $k = 2$,

\[f(x) = 5 x_1^3 + 7 x_2^2 x_3^4\]

From the example in the Multi-Index Notation section, we know that all the $\alpha$ are

\[\begin{gather} \alpha = (2, 0, 0) \\ \alpha = (0, 2, 0) \\ \alpha = (0, 0, 2) \\ \alpha = (1, 1, 0) \\ \alpha = (1, 0, 1) \\ \alpha = (0, 1, 1) \\ \end{gather}\]

So the $2$-th order partial derivatives of $f$ evaluated at point $x$ are

\[\begin{gather} D^{(2, 0, 0)} f(x) = \frac{\partial^{2}f}{\partial x_1^{2} \partial x_2^{0} \partial x_3^{0}} = 30 x_1 + 7 x_2^2 x_3^4\\ D^{(0, 2, 0)} f(x) = \frac{\partial^{2}f}{\partial x_1^{0} \partial x_2^{2} \partial x_3^{0}} = 5 x_1^3 + 14 x_3^4 \\ D^{(0, 0, 2)} f(x) = \frac{\partial^{2}f}{\partial x_1^{0} \partial x_2^{0} \partial x_3^{2}} = 5 x_1^3 + 84 x_2^2 x_3^2 \\ D^{(1, 1, 0)} f(x) = \frac{\partial^{2}f}{\partial x_1^{1} \partial x_2^{1} \partial x_3^{0}} = 15 x_1^2 + 14 x_2 x_3^4 \\ D^{(1, 0, 1)} f(x) = \frac{\partial^{2}f}{\partial x_1^{1} \partial x_2^{0} \partial x_3^{1}} = 15 x_1^2 + 28 x_2^2 x_3^3 \\ D^{(0, 1, 1)} f(x) = \frac{\partial^{2}f}{\partial x_1^{0} \partial x_2^{1} \partial x_3^{1}} = 5 x_1^3 + 56 x_2 x_3^3 \\ \end{gather}\]

Quadratic Multivariate Taylor Theorem

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a $2$-times differentiable function at the point $a \in \mathbb{R}^n$. Using the Lagrange form of the remainder, there exists $t \in (0, 1)$ such that

\[\begin{align} f(x) &= f(a) + \nabla f(a)^{\top} (x-a) + \frac{1}{2} (x-a)^{\top} H(a + t(x - a)) (x-a) \\ \end{align}\]

where $\nabla f(a)^{\top}$ is a row vector and $x - a$ is a column vector, $H$ is the Hessian matrix and $H(a + t(x - a))$ is the Hessian matrix evaluated at $a + t(x - a)$. Concretely,

\[\nabla f(a)^{\top} = \bigg[ \frac{\partial f}{\partial x_1}(a), \frac{\partial f}{\partial x_2}(a), \cdots, \frac{\partial f}{\partial x_n}(a) \bigg]\]

and

\[x - a = [x_1 - a_1, x_2 - a_2, \cdots, x_n - a_n]^{\top}\]

General Multivariate Taylor Theorem

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a $k$-times differentiable function at the point $a \in \mathbb{R}^n$, we have

\[\begin{align} f(x) &= P_k(x) + R_k(x) \end{align}\]

where the $k$-th order Taylor polynomial

\[\begin{align} P_k(x) &= \sum_{j = 0}^{k} \frac{1}{j!} \sum_{\lvert \alpha \rvert = j}^{} {\lvert \alpha \rvert \choose \alpha} D^{\alpha}f(a) (x-a)^{\alpha} \\ &= \sum_{j = 0}^{k} \sum_{\lvert \alpha \rvert = j}^{} \frac{1}{j!} {j \choose \alpha} D^{\alpha}f(a) (x-a)^{\alpha} \\ &= \sum_{j = 0}^{k} \sum_{\lvert \alpha \rvert = j}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \\ &= \sum_{0 \leq \lvert \alpha \rvert \leq k}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \\ \end{align}\]

and the Peano form of the remainder

\[R_k(x) = \sum_{\lvert \alpha \rvert = k}^{} h_{\alpha}(x) (x-a)^{\alpha}, \quad \lim_{x \rightarrow a} h_{\alpha}(x) = 0\]

where $h_{\alpha}: \mathbb{R}^n \rightarrow \mathbb{R}$. So instead of having one single $h$ in the univariate case, here we have multiple different $h$, each for one $\alpha$ whose $\lvert \alpha \rvert = k$.


For example, assume $n = 3$ and $k = 2$, for the $2$-th order Taylor polynomial,

\[\begin{align} P_2(x) &= \sum_{0 \leq \lvert \alpha \rvert \leq 2}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \\ &= \Bigg( \sum_{\lvert \alpha \rvert = 0}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \Bigg) + \Bigg( \sum_{\lvert \alpha \rvert = 1}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \Bigg) \\ &\quad + \Bigg( \sum_{\lvert \alpha \rvert = 2}^{} \frac{D^{\alpha}f(a)}{\alpha !} (x-a)^{\alpha} \Bigg) \\ &= f(a) + \frac{D^{(1,0,0)}f(a)}{(1,0,0) !} (x-a)^{(1,0,0)} + \frac{D^{(0,1,0)}f(a)}{(0,1,0) !} (x-a)^{(0,1,0)} \\ &\quad + \frac{D^{(0,0,1)}f(a)}{(0,0,1) !} (x-a)^{(0,0,1)} + \frac{D^{(2,0,0)}f(a)}{(2,0,0) !} (x-a)^{(2,0,0)} \\ &\quad + \frac{D^{(2,0,0)}f(a)}{(2,0,0) !} (x-a)^{(2,0,0)} + \frac{D^{(0,0,2)}f(a)}{(0,0,2) !} (x-a)^{(0,0,2)} \\ &\quad + \frac{D^{(1,1,0)}f(a)}{(1,1,0) !} (x-a)^{(1,1,0)} + \frac{D^{(1,0,1)}f(a)}{(1,0,1) !} (x-a)^{(1,0,1)} \\ &\quad + \frac{D^{(0,1,1)}f(a)}{(0,1,1) !} (x-a)^{(0,1,1)} \\ \end{align}\]

Each of the term could be expanded using the multi-index notation and the $k$-th order partial derivatives.


For the Peano form of the remainder,

\[\begin{align} R_2(x) &= \sum_{\lvert \alpha \rvert = 2}^{} h_{\alpha}(x) (x-a)^{\alpha} \\ &= h_{(2,0,0)}(x) (x-a)^{(2,0,0)} + h_{(0,2,0)}(x) (x-a)^{(0,2,0)} + h_{(0,0,2)}(x) (x-a)^{(0,0,2)} \\ &\quad + h_{(1,1,0)}(x) (x-a)^{(1,1,0)} + h_{(1,0,1)}(x) (x-a)^{(1,0,1)} + h_{(0,1,1)}(x) (x-a)^{(0,1,1)} \\ \end{align}\]

References