Integration By Substitution
Introduction
Integration by substitution is an extremely useful method for evaluating antiderivatives and integrals. However, I realized that its proof is not well known by many people.
In this article, I would like to provide a proof for the substitution for univariate definite integral and multivariate definite integral.
Prerequisites
Fundamental Theorem of Calculus
The fundamental theorem of calculus consists of two parts. The first part deals with the derivative of an antiderivative, while the second part deals with the relationship between antiderivatives and definite integrals.
The concrete proof of the two parts of the fundamental theorem of calculus can be found on Wikipedia and it’s straightforward to understand.
Part One
Let $f$ be a continuous real-valued function defined on a closed interval $[a, b]$. Let $F$ be the function defined, for all $x$ in $[a, b]$, by
$$
\begin{align}
F(x) = \int_{a}^{x} f(t)  dt
\end{align}
$$
Then, $F$ is uniformly continuous on $[a, b]$, differentiable on the open interval $(a, b)$, and
$$
\begin{align}
F’(x) = f(x)
\end{align}
$$
for all $x$ in $(a, b)$ where $F’(x)$ is defined as
$$
\begin{align}
F’(x) = \lim_{\Delta x \to 0} \frac{F(x + \Delta x) - F(x)}{\Delta x}
\end{align}
$$
So $F$ is called an antiderivative of $f$ on $[a, b]$.
Corollary
The corollary of the first part of the fundamental theorem of calculus states that if $f$ is a real-valued continuous function defined on a closed interval $[a, b]$, and $F$ is an antiderivative of $f$ on $[a, b]$, then
$$
\begin{align}
\int_{a}^{b} f(x)  dx = F(b) - F(a)
\end{align}
$$
This corollary is often used to compute definite integrals.
Also note that the corollary assumes that $f$ is continuous on $[a, b]$, it be strengthened in the part two of the fundamental theorem of calculus.
Part Two
Let $f$ be a real-valued function defined on a closed interval $[a, b]$ and $F$ be a continuous function on $[a, b]$ which is an antiderivative of $f$ in $(a, b)$:
$$
\begin{align}
F’(x) = f(x)
\end{align}
$$
Then, if $f$ is Riemann integrable on $[a, b]$, the following equation holds:
$$
\begin{align}
\int_{a}^{b} f(x)  dx = F(b) - F(a)
\end{align}
$$
The part two of the fundamental theorem of calculus is stronger than the corollary of the first part, because it does not require $f$ to be continuous on $[a, b]$.
Determinant and Hyper Volume
Let $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$ be $n$ vectors in $\mathbb{R}^n$. Let $P$ be the parallelepiped spanned by these vectors, and let $A$ be the matrix with rows $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$. Then the absolute value of the determinant of $A$ is the $n$-dimensional volume (hyper volume) of $P$.
$$
\begin{align}
\left| \det(A) \right| = \text{vol}(P)
\end{align}
$$
The concrete proof of the equivalence between the absolute value of the determinant and the hyper volume can be found on my previous article “Determinant and Hyper Volume”.
Integration By Substitution
Substitution for Univariate Definite Integral
Let $g: [a, b] \to I$ be a differentiable function with a continuous derivative, where $I \subseteq \mathbb{R}$ is an interval. Suppose $f: I \to \mathbb{R}$ be a continuous function. Then the following definite integral holds:
$$
\begin{align}
\int_{g(a)}^{g(b)} f(u)  du = \int_{a}^{b} f(g(x)) \cdot g’(x)  dx
\end{align}
$$
where $u = g(x)$ and $du = g’(x) dx$ or $\frac{du}{dx} = g’(x)$ in Leibniz notation.
Proof
Let $f$ and $g$ be two functions satisfying the above hypothesis that $f$ is continuous and $g’$ is integrable on the closed interval $[a, b]$. Then the function $f(g(x)) \cdot g’(x)$ is also integrable on $[a, b]$.
Hence the two integrals
$$
\begin{align}
\int_{a}^{b} f(g(x)) \cdot g’(x)  dx
\end{align}
$$
and
$$
\begin{align}
\int_{g(a)}^{g(b)} f(u)  du
\end{align}
$$
in fact exist.
Since $f$ is continuous, it has an antiderivative $F$. The composite function $F \circ g$ can be constructed and defined. Since $g$ is differentiable, according to the chain rule and the definition of an antiderivative,
$$
\begin{align}
(F \circ g)’(x) = F’(g(x)) \cdot g’(x) = f(g(x)) \cdot g’(x)
\end{align}
$$
Applying the fundamental theorem of calculus,
$$
\begin{align}
\int_{a}^{b} f(g(x)) \cdot g’(x)  dx
&= \int_{a}^{b} (F \circ g)’(x)  dx \\
&= F(g(b)) - F(g(a)) \\
&= \int_{g(a)}^{g(b)} f(u)  du
\end{align}
$$
This concludes the proof. $\square$
In practice, applying the integration by substitution from left to right or from right to left are straightforward, as long as the integral limits are adjusted accordingly.
From left to right, we have
$$
\begin{align}
\int_{g(a)}^{g(b)} f(u) du
&= \int_{g(a)}^{g(b)} f(g(x)) d\left(g(x)\right) \\
&= \int_{a}^{b} f(g(x)) \cdot g’(x)  dx
\end{align}
$$
From right to left, we have
$$
\begin{align}
\int_{a}^{b} f(g(x)) \cdot g’(x)  dx
&= \int_{g(a)}^{g(b)} f(g(x)) d\left(g(x)\right) \\
&= \int_{g(a)}^{g(b)} f(u)  du
\end{align}
$$
Substitution for Multivariate Definite Integral
Let $U$ be an open set in $\mathbb{R}^n$ and $\varphi: U \to \mathbb{R}^n$ be an injective differentiable function with continuous partial derivatives, the Jacobian of which is non-zero for every $\mathbf{u} \in U$. Then for any real-valued, compactly supported, continuous function $f$ defined on $\varphi(U)$, the following substitution holds:
$$
\begin{align}
\int_{\varphi(U)} f(\mathbf{v})  d\mathbf{v} = \int_{U} f(\varphi(\mathbf{u})) \cdot \left| \det \left( \mathbf{J}\varphi \right) (\mathbf{u})  \right|  d\mathbf{u}
\end{align}
$$
where $\mathbf{v} = \varphi(\mathbf{u})$, $d\mathbf{u}$ and $d\mathbf{v}$ are the volume elements in $\mathbb{R}^n$ and $\mathbb{R}^n$ respectively, and $\mathbf{J}\varphi$ is the Jacobian matrix of $\varphi$, and $\left| \det \left( \mathbf{J}\varphi \right) (\mathbf{u}) \right|$ is the absolute value of the determinant of the Jacobian matrix of partial derivatives of $\varphi$ at the point $\mathbf{u}$.
Concretely, given
$$
\begin{align}
\mathbf{u} = \begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}
\end{align}
$$
$$
\begin{align}
\mathbf{v} = \begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
\end{align}
$$
and
$$
\begin{align}
\mathbf{v}
=
\begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
=
\varphi(\mathbf{u})
=
\varphi
\left(
\begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}
\right)
\end{align}
$$
The Jacobian matrix of partial derivatives of $\varphi$, $\mathbf{J}\varphi$, is defined as
$$
\begin{align}
\mathbf{J}\varphi
&= \frac{\partial \mathbf{v}}{\partial \mathbf{u}} \\
&= \begin{bmatrix}
\frac{\partial v_1}{\partial u_1} & \frac{\partial v_1}{\partial u_2} & \cdots & \frac{\partial v_1}{\partial u_n} \\
\frac{\partial v_2}{\partial u_1} & \frac{\partial v_2}{\partial u_2} & \cdots & \frac{\partial v_2}{\partial u_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} & \frac{\partial v_n}{\partial u_2} & \cdots & \frac{\partial v_n}{\partial u_n}
\end{bmatrix}
\end{align}
$$
Proof
Surprisingly, I was not able to find a proof for the substitution for multivariate definite integral from the Internet. Therefore, I will make an attempt to prove it here, at least informally.
Before substituting $\mathbf{v}$ with $\mathbf{u}$, the infinitesimal hyper volume was spanned by vectors
$$
\begin{align}
\begin{bmatrix}
d v_1 \\
0 \\
\vdots \\
0
\end{bmatrix}
\begin{bmatrix}
0 \\
d v_2 \\
\vdots \\
0
\end{bmatrix}
\cdots
\begin{bmatrix}
0 \\
0 \\
\vdots \\
d v_n
\end{bmatrix}
\end{align}
$$
These vectors constitute the following matrix:
$$
\begin{align}
\begin{bmatrix}
d v_1 & 0 & \cdots & 0 \\
0 & d v_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & d v_n
\end{bmatrix}
\end{align}
$$
According to the relationship between the determinant and the hyper volume, the infinitesimal hyper volume $d\mathbf{v}$ can be expressed as the absolute value of the determinant of the matrix composed of the spanned vectors:
$$
\begin{align}
d\mathbf{v}
&=
\left| \det \left(
\begin{bmatrix}
d v_1 & 0 & \cdots & 0 \\
0 & d v_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & d v_n
\end{bmatrix}
\right) \right| \\
&=
\left| d v_1 \cdot d v_2 \cdot \cdots \cdot d v_n \right| \\
&=
d v_1 \cdot d v_2 \cdot \cdots \cdot d v_n
\end{align}
$$
After substituting $\mathbf{v}$ with $\mathbf{u}$, the infinitesimal hyper volume was spanned by vectors
$$
\begin{align}
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} d u_1 \\
\frac{\partial v_2}{\partial u_1} d u_1 \\
\vdots \\
\frac{\partial v_n}{\partial u_1} d u_1
\end{bmatrix}
\begin{bmatrix}
\frac{\partial v_1}{\partial u_2} d u_2 \\
\frac{\partial v_2}{\partial u_2} d u_2 \\
\vdots \\
\frac{\partial v_n}{\partial u_2} d u_2
\end{bmatrix}
\cdots
\begin{bmatrix}
\frac{\partial v_1}{\partial u_n} d u_n \\
\frac{\partial v_2}{\partial u_n} d u_n \\
\vdots \\
\frac{\partial v_n}{\partial u_n} d u_n
\end{bmatrix}
\end{align}
$$
These vectors constitute the following matrix:
$$
\begin{align}
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} d u_1 & \frac{\partial v_1}{\partial u_2} d u_2 & \cdots & \frac{\partial v_1}{\partial u_n} d u_n \\
\frac{\partial v_2}{\partial u_1} d u_1 & \frac{\partial v_2}{\partial u_2} d u_2 & \cdots & \frac{\partial v_2}{\partial u_n} d u_n \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} d u_1 & \frac{\partial v_n}{\partial u_2} d u_2 & \cdots & \frac{\partial v_n}{\partial u_n} d u_n
\end{bmatrix}
\end{align}
$$
According to the relationship between the determinant and the hyper volume, the infinitesimal hyper volume $d\mathbf{v}$ can be expressed as the absolute value of the determinant of the matrix composed of the spanned vectors:
$$
\begin{align}
d\mathbf{v}
&=
\left| \det \left(
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} d u_1 & \frac{\partial v_1}{\partial u_2} d u_2 & \cdots & \frac{\partial v_1}{\partial u_n} d u_n \\
\frac{\partial v_2}{\partial u_1} d u_1 & \frac{\partial v_2}{\partial u_2} d u_2 & \cdots & \frac{\partial v_2}{\partial u_n} d u_n \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} d u_1 & \frac{\partial v_n}{\partial u_2} d u_2 & \cdots & \frac{\partial v_n}{\partial u_n} d u_n
\end{bmatrix}
\right) \right| \\
&=
\left| \det \left(
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} & \frac{\partial v_1}{\partial u_2} & \cdots & \frac{\partial v_1}{\partial u_n} \\
\frac{\partial v_2}{\partial u_1} & \frac{\partial v_2}{\partial u_2} & \cdots & \frac{\partial v_2}{\partial u_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} & \frac{\partial v_n}{\partial u_2} & \cdots & \frac{\partial v_n}{\partial u_n}
\end{bmatrix}
\right) du_1 \cdot du_2 \cdot \cdots \cdot du_n \right| \\
&=
\left| \det \left(
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} & \frac{\partial v_1}{\partial u_2} & \cdots & \frac{\partial v_1}{\partial u_n} \\
\frac{\partial v_2}{\partial u_1} & \frac{\partial v_2}{\partial u_2} & \cdots & \frac{\partial v_2}{\partial u_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} & \frac{\partial v_n}{\partial u_2} & \cdots & \frac{\partial v_n}{\partial u_n}
\end{bmatrix}
\right) \right|
du_1 \cdot du_2 \cdot \cdots \cdot du_n \\
&=
\left| \det \left(
\begin{bmatrix}
\frac{\partial v_1}{\partial u_1} & \frac{\partial v_1}{\partial u_2} & \cdots & \frac{\partial v_1}{\partial u_n} \\
\frac{\partial v_2}{\partial u_1} & \frac{\partial v_2}{\partial u_2} & \cdots & \frac{\partial v_2}{\partial u_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} & \frac{\partial v_n}{\partial u_2} & \cdots & \frac{\partial v_n}{\partial u_n}
\end{bmatrix}
\right) \right| d\mathbf{u} \\
\end{align}
$$
Note that we used the property of the determinant to move the $du_1 \cdot du_2 \cdot \cdots \cdot du_n$ outside of the determinant in the above derivation.
The matrix inside the determinant is the Jacobian matrix of partial derivatives of $\varphi$, $\mathbf{J}\varphi$, which is defined as
$$
\begin{align}
\mathbf{J}\varphi = \begin{bmatrix}
\frac{\partial v_1}{\partial u_1} & \frac{\partial v_1}{\partial u_2} & \cdots & \frac{\partial v_1}{\partial u_n} \\
\frac{\partial v_2}{\partial u_1} & \frac{\partial v_2}{\partial u_2} & \cdots & \frac{\partial v_2}{\partial u_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial v_n}{\partial u_1} & \frac{\partial v_n}{\partial u_2} & \cdots & \frac{\partial v_n}{\partial u_n}
\end{bmatrix}
\end{align}
$$
Therefore, the infinitesimal hyper volume $d\mathbf{v}$ can be expressed as
$$
\begin{align}
d\mathbf{v} = \left| \det \left( \mathbf{J}\varphi \right) (\mathbf{u})  \right| d\mathbf{u}
\end{align}
$$
The multivariate definite integral can thus be expressed as the sum of the infinitesimal hyper volumes with the integral limits substituted accordingly:
$$
\begin{align}
\int_{\varphi(U)} f(\mathbf{v})  d\mathbf{v}
&= \int_{U} f(\varphi(\mathbf{u})) \left| \det \left( \mathbf{J}\varphi \right) (\mathbf{u})  \right| d\mathbf{u}
\end{align}
$$
This concludes the proof. $\square$
References
Integration By Substitution