Derivatives
Introduction
In mathematics, the derivative of a function is a measure of how the function changes as its input changes. It is a fundamental concept in calculus and is used to describe the variation of a function with respect to its input. There are several types of derivatives, with slightly different forms, including the ordinary derivative we commonly see and use, partial derivative, directional derivative, and total derivative.
In this blog post, I would like to discuss the definitions, derivations, and relationships of these derivatives.
Definition of Derivatives
The derivative of a differentiable function $f: X \to Y$ at a point $x$ can be defined using limits as follows
$$
\begin{align}
f’(x) &= \frac{df}{dx}(x) \\
&= \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \\
\end{align}
$$
if the limit exists.
Note that here the domains $X$ and $Y$ are not limited to be real numbers. The definition of the derivative can be extended to vectors of several variables.
Let $o(h)$ be the difference between $f(x + h) - f(x)$ and $h f’(x)$, i.e.,
$$
\begin{align}
o(h) &= f(x + h) - f(x) - h f’(x) \\
&= \frac{f(x + h) - f(x) - h f’(x)}{h} h \\
&= \left(\frac{f(x + h) - f(x)}{h} - f’(x) \right) h \\
\end{align}
$$
Then we have the limit of $o(h)$ as $h$ approaches $0$ is equal to $0$:
$$
\begin{align}
\lim_{h \to 0} o(h) &= \lim_{h \to 0} \left(\frac{f(x + h) - f(x)}{h} - f’(x) \right) h \\
&= \left(\lim_{h \to 0} \frac{f(x + h) - f(x)}{h} - f’(x) \right) \left(\lim_{h \to 0} h \right) \\
&= \left(f’(x) - f’(x) \right) \cdot 0 \\
&= 0 \\
\end{align}
$$
Therefore, we could usually write $f(x + h)$ as
$$
\begin{align}
f(x + h) &= f(x) + h f’(x) + o(h) \\
\end{align}
$$
where $\lim_{h \to 0} o(h) = 0$.
This is the same as the Taylor expansion of $f(x + h)$ at $x$.
It is also straightforward to find that $\lim_{h \to 0} \frac{o(h)}{h} = 0$.
$$
\begin{align}
f’(x) &= \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \\
&= \lim_{h \to 0} \frac{f(x) + h f’(x) + o(h) - f(x)}{h} \\
&= \lim_{h \to 0} \frac{h f’(x) + o(h)}{h} \\
&= \lim_{h \to 0} \left( f’(x) + \frac{o(h)}{h} \right) \\
&= f’(x) + \lim_{h \to 0} \frac{o(h)}{h} \\
\end{align}
$$
Therefore, we have $\lim_{h \to 0} \frac{o(h)}{h} = 0$.
Vector-Valued Functions
If $\mathbf{f} = [f_{1}, f_{2}, \ldots, f_{m}]^{\top}$ is a vector-valued function $\mathbf{f}: \mathbb{R} \to \mathbb{R}^m$, where $f_{i}: \mathbb{R} \to \mathbb{R}$ for $i \in [1, m]$ and $\mathbf{f}(x) = [f_{1}(x), f_{2}(x), \ldots, f_{m}(x)]^{\top}$, the derivative of $\mathbf{f}$ at $x$ is defined as the vector of derivatives of its components
$$
\begin{align}
\mathbf{f}’(x) &= \lim_{h \to 0} \frac{\mathbf{f}(x + h) - \mathbf{f}(x)}{h} \\
&= \lim_{h \to 0} \frac{[f_{1}(x + h), f_{2}(x + h), \ldots, f_{m}(x + h)]^{\top} - [f_{1}(x), f_{2}(x), \ldots, f_{m}(x)]^{\top}}{h} \\
&= \lim_{h \to 0} \left[ \frac{f_{1}(x + h) - f_{1}(x)}{h}, \frac{f_{2}(x + h) - f_{2}(x)}{h}, \ldots, \frac{f_{m}(x + h) - f_{m}(x)}{h} \right]^{\top} \\
&= \left[ \lim_{h \to 0} \frac{f_{1}(x + h) - f_{1}(x)}{h}, \lim_{h \to 0} \frac{f_{2}(x + h) - f_{2}(x)}{h}, \ldots, \lim_{h \to 0} \frac{f_{m}(x + h) - f_{m}(x)}{h} \right]^{\top} \\
&= \left[ f_{1}’(x), f_{2}’(x), \ldots, f_{m}’(x) \right]^{\top} \\
\end{align}
$$
if all the limits exist.
Apparently, the derivative of $\mathbf{f}(x)$, $\mathbf{f}’(x)$, is a vector of the same dimension as $\mathbf{f}(x)$, i.e., $\mathbf{f}’: \mathbb{R} \to \mathbb{R}^m$.
Partial Derivatives
If $f$ is a function of several variables $f: \mathbb{R}^n \to \mathbb{R}$, the partial derivative of $f$ with respect to $\mathbf{x} = [x_{1}, x_{2}, \ldots, x_{n}]^{\top}$ is defined as
$$
\begin{align}
f’(\mathbf{x})
&= \nabla f(\mathbf{x}) \\
&= \left[ \frac{\partial f}{\partial x_{1}}(\mathbf{x}), \frac{\partial f}{\partial x_{2}}(\mathbf{x}), \ldots, \frac{\partial f}{\partial x_{n}}(\mathbf{x}) \right]^{\top} \\
\end{align}
$$
where $\nabla f(\mathbf{x})$ is the gradient of $f$ at $\mathbf{x}$, and $\frac{\partial f}{\partial x_{i}}(\mathbf{x})$ is the partial derivative of $f$ with respect to $x_{i}$ at $\mathbf{x}$. More specifically,
$$
\begin{align}
\frac{\partial f}{\partial x_{i}}(\mathbf{x}) &= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{e}_{i}) - f(\mathbf{x})}{h} \\
&= \lim_{h \to 0} \frac{f(x_{1}, x_{2}, \ldots, x_{i} + h, \ldots, x_{n}) - f(x_{1}, x_{2}, \ldots, x_{i}, \ldots, x_{n})}{h} \\
\end{align}
$$
where $\mathbf{e}_{i}$ is the $i$-th unit vector in $\mathbb{R}^n$.
Similarly, we could write $f(\mathbf{x} + h \mathbf{e}_{i})$ as
$$
\begin{align}
f(\mathbf{x} + h \mathbf{e}_{i}) &= f(\mathbf{x}) + h \frac{\partial f}{\partial x_{i}}(\mathbf{x}) + o(h) \\
\end{align}
$$
where $\lim_{h \to 0} o(h) = 0$ and $\lim_{h \to 0} \frac{o(h)}{h} = 0$.
Directional Derivatives
If $f$ is a function of several variables $f: \mathbb{R}^n \to \mathbb{R}$, the partial derivative of $f$ with respect to $\mathbf{x} = [x_{1}, x_{2}, \ldots, x_{n}]^{\top}$ measures the variation in $f$ in the directions of $[\mathbf{e}_{1}, \mathbf{e}_{2}, \ldots, \mathbf{e}_{n}]^{\top}$, where $\mathbf{e}_{i}$ is the $i$-th unit vector in $\mathbb{R}^n$. To measure the variation in $f$ in any arbitrary direction $\mathbf{v} = [v_{1}, v_{2}, \ldots, v_{n}]$, the directional derivative of $f$ at $\mathbf{x}$ in the direction of $\mathbf{v}$ is defined as
$$
\begin{align}
f’_{\mathbf{v}}(\mathbf{x})
&= \nabla_{\mathbf{v}} f(\mathbf{x}) \\
&= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} \\
\end{align}
$$
Note that $\mathbf{v}$ may or may not be a unit vector.
Intuitively, the directional derivative is defined as such because the directional derivative measures the variation in $f$ in the direction of $\mathbf{v}$. Let $h$ be the amount of change in the direction of $\mathbf{v}$. Then, we must have
$$
\begin{align}
0 &= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x}) - hf’_{\mathbf{v}}(\mathbf{x})}{h} \\
&= \lim_{h \to 0} \left(\frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} - f’_{\mathbf{v}}(\mathbf{x}) \right) \\
&= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} - \lim_{h \to 0} f’_{\mathbf{v}}(\mathbf{x}) \\
&= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} - f’_{\mathbf{v}}(\mathbf{x}) \\
\end{align}
$$
Therefore, we have
$$
\begin{align}
f’_{\mathbf{v}}(\mathbf{x}) &= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} \\
\end{align}
$$
Intuitively, the directional derivative of $f$ at $\mathbf{x}$ in the direction of $\mathbf{v}$ should be the weighted sum of the partial derivatives of $f$ at $\mathbf{x}$ in the directions of $[\mathbf{e}_{1}, \mathbf{e}_{2}, \ldots, \mathbf{e}_{n}]^{\top}$, where the weights are the components of $\mathbf{v}$.
$$
\begin{align}
f’_{\mathbf{v}}(\mathbf{x})
&= \nabla_{\mathbf{v}} f(\mathbf{x}) \\
&= \sum_{i=1}^{n} v_{i} \frac{\partial f}{\partial x_{i}}(\mathbf{x}) \\
&= \left[ v_{1} \frac{\partial f}{\partial x_{1}}(\mathbf{x}), v_{2} \frac{\partial f}{\partial x_{2}}(\mathbf{x}), \ldots, v_{n} \frac{\partial f}{\partial x_{n}}(\mathbf{x}) \right]^{\top} \\
&= \mathbf{v} \cdot f’(\mathbf{x}) \\
&= \mathbf{v} \cdot \nabla f’(\mathbf{x}) \\
\end{align}
$$
We could mathematically prove this.
Proof
Given an arbitrary direction $\mathbf{v} = [v_{1}, v_{2}, \ldots, v_{n}]$, using the definition of partial derivatives, we have
$$
\begin{align}
\lim_{h \to 0} \frac{f(\mathbf{x} + h v_{i} \mathbf{e}_{i}) - f(\mathbf{x})}{h}
&= \lim_{h \to 0} \frac{f(\mathbf{x} + h v_{i} \mathbf{e}_{i}) - f(\mathbf{x})}{h v_{i}} v_{i} \\
&= \left( \lim_{h \to 0} \frac{f(\mathbf{x} + h v_{i} \mathbf{e}_{i}) - f(\mathbf{x})}{h v_{i}} \right) v_{i} \\
&= \left( \lim_{h v_{i} \to 0} \frac{f(\mathbf{x} + h v_{i} \mathbf{e}_{i}) - f(\mathbf{x})}{h v_{i}} \right) v_{i} \\
&= v_{i} \frac{\partial f}{\partial x_{i}}(\mathbf{x}) \\
\end{align}
$$
Thus,
$$
\begin{align}
f(\mathbf{x} + h v_{i} \mathbf{e}_{i}) = f(\mathbf{x}) + h v_{i} \frac{\partial f}{\partial x_{i}}(\mathbf{x}) + o(h)
\end{align}
$$
where $\lim_{h \to 0} o(h) = 0$ and $\lim_{h \to 0} \frac{o(h)}{h} = 0$.
$f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})$ could be expanded as
$$
\begin{align}
f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})
&= f(\mathbf{x} + h v_{1} \mathbf{e}_{1} + h v_{2} \mathbf{e}_{2} + \cdots + h v_{n} \mathbf{e}_{n}) - f(\mathbf{x})\\
&= f\left(\mathbf{x} + \sum_{i=1}^{n} h v_{i} \mathbf{e}_{i}\right) - f(\mathbf{x})\\
&= \left( f\left(\mathbf{x} + \sum_{i=1}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) \right) + f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) - f(\mathbf{x})\\
&= \left( f\left(\mathbf{x} + \sum_{i=1}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) \right) + \left( f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=3}^{n} h v_{i} \mathbf{e}_{i}\right) \right) + f\left(\mathbf{x} + \sum_{i=3}^{n} h v_{i} \mathbf{e}_{i}\right) - f(\mathbf{x})\\
&= \left( f\left(\mathbf{x} + \sum_{i=1}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) \right) + \left( f\left(\mathbf{x} + \sum_{i=2}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=3}^{n} h v_{i} \mathbf{e}_{i}\right) \right) + \cdots + f(\mathbf{x}) - f(\mathbf{x})\\
&= \sum_{j=1}^{n} \left( f\left(\mathbf{x} + \sum_{i=j}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=j+1}^{n} h v_{i} \mathbf{e}_{i}\right) \right)\\
\end{align}
$$
We will derive the following limits.
$$
\begin{align}
\lim_{h \to 0} \frac{ f\left(\mathbf{x} + \sum_{i=j}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=j+1}^{n} h v_{i} \mathbf{e}_{i}\right) }{h}
&= \lim_{h \to 0} \frac{ f\left(\mathbf{x} + h v_{i} \mathbf{e}_{i} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) - f\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) }{h} \\
&= \lim_{h \to 0} \frac{ f\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) + h v_{j} \frac{\partial f}{\partial x_{j}}(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}) + o(h) - f\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) }{h} \\
&= \lim_{h \to 0} \frac{ h v_{j} \frac{\partial f}{\partial x_{j}}\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) + o(h) }{h} \\
&= \lim_{h \to 0} \left( v_{j} \frac{\partial f}{\partial x_{j}}\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) + \frac{o(h)}{h} \right) \\
&= \lim_{h \to 0} \left( v_{j} \frac{\partial f}{\partial x_{j}}\left(\mathbf{x} + \sum_{k=j+1}^{n} h v_{k} \mathbf{e}_{k}\right) \right) + \lim_{h \to 0} \frac{o(h)}{h} \\
&= v_{j} \frac{\partial f}{\partial x_{j}}(\mathbf{x}) + 0 \\
&= v_{j} \frac{\partial f}{\partial x_{j}}(\mathbf{x}) \\
\end{align}
$$
To derive the directional derivative, we have
$$
\begin{align}
f’_{\mathbf{v}}(\mathbf{x}) &= \lim_{h \to 0} \frac{f(\mathbf{x} + h \mathbf{v}) - f(\mathbf{x})}{h} \\
&= \lim_{h \to 0} \frac{\sum_{j=1}^{n} \left( f\left(\mathbf{x} + \sum_{i=j}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=j+1}^{n} h v_{i} \mathbf{e}_{i}\right) \right)}{h} \\
&= \lim_{h \to 0} \sum_{j=1}^{n} \frac{ f\left(\mathbf{x} + \sum_{i=j}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=j+1}^{n} h v_{i} \mathbf{e}_{i}\right) }{h} \\
&= \sum_{j=1}^{n} \left( \lim_{h \to 0} \frac{ f\left(\mathbf{x} + \sum_{i=j}^{n} h v_{i} \mathbf{e}_{i}\right) - f\left(\mathbf{x} + \sum_{i=j+1}^{n} h v_{i} \mathbf{e}_{i}\right) }{h} \right) \\
&= \sum_{j=1}^{n} v_{j} \frac{\partial f}{\partial x_{j}}(\mathbf{x}) \\
&= \mathbf{v} \cdot f’(\mathbf{x}) \\
\end{align}
$$
This concludes the proof. $\square$
Similarly, we could write $f(\mathbf{x} + h \mathbf{v})$ as
$$
\begin{align}
f(\mathbf{x} + h \mathbf{v}) &= f(\mathbf{x}) + h f’_{\mathbf{v}}(\mathbf{x}) + o(h) \\
\end{align}
$$
where $\lim_{h \to 0} o(h) = 0$ and $\lim_{h \to 0} \frac{o(h)}{h} = 0$.
Total Derivatives
If $\mathbf{f} = [f_{1}, f_{2}, \ldots, f_{m}]^{\top}$ is a vector-valued function $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m$, where $f_{i}: \mathbb{R}^n \to \mathbb{R}$ for $i \in [1, m]$ and $\mathbf{f}(\mathbf{x}) = [f_{1}(\mathbf{x}), f_{2}(\mathbf{x}), \ldots, f_{m}(\mathbf{x})]^{\top}$, the total derivative of $\mathbf{f}$ at $\mathbf{x}$, sometimes referred to as the Jacobian matrix, is defined as the vector of partial derivatives of its components
$$
\begin{align}
\mathbf{f}’(\mathbf{x}) &= \mathbf{J}_{\mathbf{f}}(\mathbf{x}) \\
&= \nabla \mathbf{f}(\mathbf{x}) \\
&= \left[ \nabla f_{1}(\mathbf{x}), \nabla f_{2}(\mathbf{x}), \ldots, \nabla f_{m}(\mathbf{x}) \right]^{\top} \\
&= \left[ \nabla f_{1}(\mathbf{x})^{\top}, \nabla f_{2}(\mathbf{x})^{\top}, \ldots, \nabla f_{m}(\mathbf{x})^{\top} \right] \\
&= \begin{bmatrix}
\frac{\partial f_{1}}{\partial x_{1}}(\mathbf{x}) & \frac{\partial f_{1}}{\partial x_{2}}(\mathbf{x}) & \cdots & \frac{\partial f_{1}}{\partial x_{n}}(\mathbf{x}) \\
\frac{\partial f_{2}}{\partial x_{1}}(\mathbf{x}) & \frac{\partial f_{2}}{\partial x_{2}}(\mathbf{x}) & \cdots & \frac{\partial f_{2}}{\partial x_{n}}(\mathbf{x}) \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_{m}}{\partial x_{1}}(\mathbf{x}) & \frac{\partial f_{m}}{\partial x_{2}}(\mathbf{x}) & \cdots & \frac{\partial f_{m}}{\partial x_{n}}(\mathbf{x}) \\
\end{bmatrix} \\
\end{align}
$$
Similarly, given an arbitrary direction $\mathbf{v} = [v_{1}, v_{2}, \ldots, v_{n}]$, we could write $\mathbf{f}(\mathbf{x} + h \mathbf{v})$ as
$$
\begin{align}
\mathbf{f}(\mathbf{x} + h \mathbf{v}) &= \mathbf{f}(\mathbf{x}) + h \mathbf{f}’_{\mathbf{v}}(\mathbf{x}) + o(h) \\
\end{align}
$$
where $\lim_{h \to 0} o(h) = 0$, $\lim_{h \to 0} \frac{o(h)}{h} = 0$ and the directional derivatives are
$$
\begin{align}
\mathbf{f}’_{\mathbf{v}}(\mathbf{x}) &= \mathbf{f}’(\mathbf{x}) \mathbf{v}^{\top} \\
&= \left[ f’_{1, \mathbf{v}}(\mathbf{x}), f’_{2, \mathbf{v}}(\mathbf{x}), \ldots, f’_{m, \mathbf{v}}(\mathbf{x}) \right]^{\top} \\
&= \left[ \mathbf{v} f’_{1}(\mathbf{x}), \mathbf{v} f’_{2}(\mathbf{x}), \ldots, \mathbf{v} f’_{m}(\mathbf{x}) \right]^{\top} \\
&= \left[ f’_{1, \mathbf{v}}(\mathbf{x}), f’_{2, \mathbf{v}}(\mathbf{x}), \ldots, f’_{m, \mathbf{v}}(\mathbf{x}) \right]^{\top}
\end{align}
$$
References
Derivatives