 ### Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

# Higher-Order Differentials

### Introduction

Higher-order differentials are necessary skills for calculus. When it comes to the multivariate higher-order differentials, people will often feel confused about the definition and the mathematical expressions.

In this blog post, I would like to discuss and derive the univariate and multivariate higher-order differentials.

### Prerequisites

#### Derivative Definition

The derivative of a function $y = f(x)$ measures the rate of change of $y$ with respect to $x$. The derivative of the function $y = f(x)$ at the point $x$ is defined as the limit of the ratio $\frac{\Delta y}{\Delta x}$ as $\Delta x \rightarrow 0$.

\begin{align} y^{\prime} &= f^{\prime}(x) \\ &= \frac{dy}{dx} \\ &= \frac{d f(x)}{dx} \\ &= \lim_{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x} \\ &= \lim_{\Delta x \rightarrow 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \\ \end{align}

#### Multi-Index Notation

There are some notations in order to express the multivariate Taylor theorem conveniently.

Suppose $\alpha \in \mathbb{N}^n$, $\alpha = \{\alpha_1, \alpha_2, \cdots, \alpha_n\}$, and $x \in \mathbb{R}^n$, $x = \{x_1, x_2, \cdots, x_n\}$, we have the following notations.

$\lvert \alpha \rvert = \alpha_1 + \alpha_2 + \cdots + \alpha_n$ $\alpha ! = \alpha_1! \alpha_2! \cdots \alpha_n!$ ${n \choose \alpha} = \frac{n!}{\alpha !} = \frac{n!}{\alpha_1! \alpha_2! \cdots \alpha_n!}$ $x^{\alpha} = x_1^{\alpha_1} x_2^{\alpha_2} \cdots x_n^{\alpha_n}$

Given a constant natural number $k$, how many different $\alpha$ are there such that $\lvert \alpha \rvert = k$?

Let’s look at an example. Assume $n = 3$ and $k = 2$, the $\alpha$ such that $\lvert \alpha \rvert = k$ are

$\begin{gather} \alpha = (2, 0, 0) \\ \alpha = (0, 2, 0) \\ \alpha = (0, 0, 2) \\ \alpha = (1, 1, 0) \\ \alpha = (1, 0, 1) \\ \alpha = (0, 1, 1) \\ \end{gather}$

Therefore, we have $6$ different $\alpha$ for $n = 3$ and $k = 2$.

The formula for the general case is actually not hard to derive. This combination problem is actually equivalent to the combination problem that we have $k + n$ identical balls, and $n$ different boxes, how many unique ways to put these balls into the boxes such that each box has at least one ball. To solve it, we would need to put all the $k+n$ balls as a sequence, and insert $n-1$ barriers between the balls, such that each two adjacent balls could have at most one barrier. The number of combinations for this problem is, obviously, $k+n-1 \choose n-1$.

Let’s verify if the formula we derived is valid for the example we have seen above.

${k+n-1 \choose n-1} = {2+3-1 \choose 3-1} = {4 \choose 2} = 6$

### Univariate Higher-Order Differentials

#### First-Order Differential

Suppose a univariate function $f: \mathbb{R} \rightarrow \mathbb{R}$ in an interval $I = (a, b)$, $y = f(x)$, the first-order differential of the function at the point $x \in I$ is defined as

$dy = df(x) = f^{\prime}(x) dx$

Note that the definition of the first-order differential is consistent with the definition of the derivative.

There are many useful properties of the first-order differential, which could be proved by the definition of derivatives mentioned in the prerequisite section. The properties that will be used for deriving the higher-order differentials are the linearity rule and the product rule.

\begin{align} d(af(x) + bg(x)) &= a df(x) + b g(x) \\ &= a f^{\prime}(x) dx + b g^{\prime}(x) dx \\ \end{align} \begin{align} d(f(x) g(x)) &= g(x)df(x) + f(x) dg(x) \\ &= g(x)f^{\prime}(x) dx + f(x)g^{\prime}(x) dx \\ \end{align}

We will skip the proofs for these properties since the proofs are somewhat trivial.

#### Higher-Order Differential

Because $x$ is an independent variable, $dx$ is treated as a constant, and only dependent variable has the first-order differential, so we have the second-order differential of $y$, $d^2 y$, as follows

\begin{align} d^2y &= d (dy) \\ &= d \big( f^{\prime}(x) dx \big) \\ &= d \big( f^{\prime}(x) \big) dx \\ &= \big( f^{\prime\prime}(x) dx \big) dx \\ &= f^{\prime\prime}(x) (dx)^2 \\ &= f^{\prime\prime}(x) dx^2 \\ \end{align}

Note that $(dx)^2$ is commonly denoted as $dx^2$.

\begin{align} d^2y &= f^{\prime\prime}(x) dx^2 \\ \end{align}

In general, the $n$-th order differential $d^n y$ is

$d^n y = f^{n}(x) dx^n$

This also leads to the $n$-order derivative for univariate function.

$f^{n}(x) = \frac{d^n y}{dx^n}$

Sometimes, it is written as

$f^{n}(x) = \frac{d^n f (x)}{dx^n}$

#### Higher-Order Differential of a Composite Function

Suppose $y = f(x)$ and $x$ is also dependent on an auxillary or latent variable $u$, i.e., $x = g(u)$,

$y = f(x) = f(g(u))$

By applying the first-order differential properties, we have

\begin{align} d^2y &= d (dy) \\ &= d \big( f^{\prime}(x) dx \big) \\ \end{align}

Note that now $x$ is not an independent variable and $dx$ is no longer a constant, so we have

\begin{align} d^2y &= d \big( f^{\prime}(x) dx \big) \\ &= \Big( d\big( f^{\prime}(x) \big) dx + f^{\prime}(x) d (dx) \Big) \\ &= f^{\prime\prime}(x) (dx)^2 + f^{\prime}(x) d^2 x \\ &= f^{\prime\prime}(x) dx^2 + f^{\prime}(x) d^2 x \\ \end{align}

Using the same approach, computing $d^3y$, $d^4y$, $\cdots$, $d^ny$, $\cdots$, becomes relatively simpler. For example,

\begin{align} d^3y &= d (d^2y) \\ &= d \big( f^{\prime\prime}(x) dx^2 + f^{\prime}(x) d^2 x \big) \\ &= d \big( f^{\prime\prime}(x) dx^2 \big) + d \big( f^{\prime}(x) d^2 x \big) \\ &= d \big( f^{\prime\prime}(x) \big) dx^2 + f^{\prime\prime}(x) d ( dx^2 ) + d \big( f^{\prime}(x) \big) d^2 x + f^{\prime}(x) d( d^2 x )\\ &= f^{\prime\prime\prime}(x) dx dx^2 + f^{\prime\prime}(x) d \big( (dx)(dx) \big) + f^{\prime\prime}(x) dx d^2 x + f^{\prime}(x) d^3 x\\ &= f^{\prime\prime\prime}(x) dx^3 + f^{\prime\prime}(x) \big(d ( dx) dx + dx d(dx) \big) + f^{\prime\prime}(x) dx d^2 x + f^{\prime}(x) d^3 x\\ &= f^{\prime\prime\prime}(x) dx^3 + f^{\prime\prime}(x) (d^2x dx + dx d^2x) + f^{\prime\prime}(x) dx d^2 x + f^{\prime}(x) d^3 x\\ &= f^{\prime\prime\prime}(x) dx^3 + 3 f^{\prime\prime}(x) dx d^2 x + f^{\prime}(x) d^3 x \\ \end{align}

### Multivariate Higher-Order Differentials

#### First-Order Differential

Suppose a univariate function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ in a region, $y = f(x) = f(x_1, x_2, \cdots, x_n)$, the first-order partial differential of the function at the point $x = \{ x_1, x_2, \cdots, x_n \}$ with respect to the variable $x_i$ is defined as

$\frac{\partial y}{\partial x_i} dx_i$

The first-order total differential of the function evaluated at $x$ is defined as

\begin{align} dy &= \frac{\partial y}{\partial x_1} dx_1 + \frac{\partial y}{\partial x_2} dx_2 + \cdots + \frac{\partial y}{\partial x_n} dx_n \\ &= \sum_{i = 1}^{n} \frac{\partial y}{\partial x_i} dx_i \\ &= \sum_{i = 1}^{n} \frac{\partial f(x)}{\partial x_i} dx_i \\ \end{align}

#### Higher-Order Differential

Similar to the higher-order differential for univariate functions, we could derive the higher-order differential for multivariate functions based on definitions.

The second-order total differential of the function evaluated at $x$ is

\begin{align} d^2 y &= d (dy) \\ &= d \bigg( \sum_{i = 1}^{n} \frac{\partial y}{\partial x_i} dx_i \bigg) \\ &= \sum_{i = 1}^{n} d \bigg( \frac{\partial y}{\partial x_i} dx_i \bigg) \\ &= \sum_{i = 1}^{n} d \bigg( \frac{\partial y}{\partial x_i} \bigg) dx_i \\ &= \sum_{i = 1}^{n} \bigg( \sum_{j=1}^{n} \frac{\partial(\frac{\partial y}{x_i})}{\partial x_j} dx_j \bigg) dx_i \\ &= \sum_{i = 1}^{n} \bigg( \sum_{j=1}^{n} \frac{\partial^2 y}{\partial x_i \partial x_j} dx_j \bigg) dx_i \\ &= \sum_{i = 1}^{n} \bigg( \sum_{j=1}^{n} \frac{\partial^2 y}{\partial x_i \partial x_j} dx_i dx_j \bigg) \\ &= \sum_{\lvert \alpha \rvert = 2}^{} {\lvert \alpha \rvert \choose \alpha} \bigg( \frac{\partial^{\lvert \alpha \rvert} y}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}} dx_1^{\alpha_1} dx_2^{\alpha_2} \cdots dx_n^{\alpha_n} \bigg) \\ &= \sum_{\lvert \alpha \rvert = 2}^{} {\lvert \alpha \rvert \choose \alpha} \bigg( \frac{\partial^{\lvert \alpha \rvert} f(x)}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}} dx_1^{\alpha_1} dx_2^{\alpha_2} \cdots dx_n^{\alpha_n} \bigg) \\ \end{align}

where $\alpha \in \mathbb{N}^n$ and $\lvert \alpha \rvert \choose \alpha$ is the multinomial coefficient represented using multi-index notation.

In general, the $k$-th order total differential of the function evaluated at $x$ is

\begin{align} d^k y &= d^k f(x) \\ &= \sum_{\lvert \alpha \rvert = k}^{} {k \choose \alpha} \bigg( \frac{\partial^{\lvert \alpha \rvert} f(x)}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}} dx_1^{\alpha_1} dx_2^{\alpha_2} \cdots dx_n^{\alpha_n} \bigg) \\ \end{align}

If we denote

$dx = \{d x_1, d x_2, \cdots, d x_n\}$

and the $k$-th order partial derivative

$D^{\alpha}f = \frac{\partial^{\lvert \alpha \rvert} f}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}}$

We could simplify the $k$-th order total differential of the function evaluated at $x$,

\begin{align} d^k y &= d^k f(x) \\ &= \sum_{\lvert \alpha \rvert = k}^{} {\lvert \alpha \rvert \choose \alpha} D^{\alpha}f(x) (dx)^{\alpha}\\ \end{align}

As has been discussed in the prerequisite section, the number of terms in the summation is $k+n-1 \choose n-1$.

#### Higher-Order Differential of a Composite Function

The higher-order differential for multivariate functions that consists of composite function(s) are much more complicated and should be discussed case by case. We will just skip the discussion on it here.