Matrix Block Diagonalization Theorem

Introduction

Matrix block diagonalization theorem combines both the matrix diagonalization theorem and the matrix rotation-scaling theorem. It allows us to find a real-valued block diagonal matrix $B$ that is similar to the matrix $A$ that has complex eigenvalues and eigenvectors. This provides an intuitive geometric interpretation of the matrix $A$.

In this blog post, I would like to quickly discuss the matrix block diagonalization theorem.

Matrices with Complex Eigenvalues

Let $A$ be a matrix with real entries. If $\lambda$ is a complex eigenvalue with eigenvector $v$, then $\overline{\lambda}$ is a complex eigenvalue with eigenvector $\overline{v}$.

Proof

If $Av = \lambda v$, we have

$$
\begin{align}
A \overline{v}
&= \overline{Av} \\
&= \overline{\lambda v} \\
&= \overline{\lambda} \overline{v} \\
\end{align}
$$

This concludes the proof. $\square$

Block Diagonalization

For example, given a $3 \times 3$ matrix with a non-real complex eigenvalue $\lambda_1$. Then $\overline{\lambda}_1$ is another eigenvalue, and there is one real eigenvalue $\lambda_2$. Since there are three distinct eigenvalues, they have algebraic and geometric multiplicity one.

Let $v_1$ be a non-real complex eigenvector with eigenvalue $\lambda_1$, and let $v_2$ be a real eigenvector with eigenvalue $\lambda_2$. Then we have the block diagonalization $A = CBC^{-1}$ where

$$
\begin{align}
C &=
\begin{bmatrix}
\mid & \mid & \mid \\
\text{Re}(v_1) & \text{Im}(v_1) & v_2\\
\mid & \mid & \mid \\
\end{bmatrix}
\end{align}
$$

and

$$
\begin{align}
B &=
\begin{bmatrix}
\text{Re}(\lambda_1) & \text{Im}(\lambda_1) & 0 \\
-\text{Im}(\lambda_1) & \text{Re}(\lambda_1) & 0 \\
0 & 0 & \lambda_2 \\
\end{bmatrix}
\end{align}
$$

Block Diagonalization Theorem

Let $A$ be a real $n \times n$ matrix. Suppose that for each (real or complex) eigenvalue, the algebraic multiplicity equals the geometric multiplicity. Then $A = CBC^{-1}$, where $B$ and $C$ are as follows:

The matrix $B$ is block diagonal, where the blocks are $1 \times 1$ blocks containing real eigenvalues (with their multiplicities), or $2 \times 2$ blocks containing the matrices

$$
\begin{align}
\begin{bmatrix}
\text{Re}(\lambda) & \text{Im}(\lambda) \\
-\text{Im}(\lambda) & \text{Re}(\lambda) \\
\end{bmatrix}
\end{align}
$$

for each non-real complex eigenvalue $\lambda$ (with multiplicity).

The columns of $C$ for bases for eigenspaces for the real eigenvalues, or come in pairs $\text{Re}(v)$ and $\text{Im}(v)$ for the non-real eigenvalues.

Proof

The proof is somewhat similar to the proof of the diagonalization theorem the rotation-scaling theorem.

First of all, we will have to prove $C$ is invertible by showing its column vectors are linearly independent.

Suppose the distinct non-real conjugate eigenvalues of $A$ are $\lambda_{c,1}, \lambda_{c,2}, \cdots, \lambda_{c,h}, \overline{\lambda}_{c,1}, \overline{\lambda}_{c,2}, \cdots, \overline{\lambda}_{c,h}$, the distinct real eigenvalues of $A$ are $\lambda_{r,1}, \lambda_{r,2}, \cdots, \lambda_{r,l}$. The corresponding non-real eigenvectors corresponding to the distinct non-real eigenvalues of $A$ are $v_{c,1}, v_{c,2}, \cdots, v_{c,h}, \overline{v}_{c,1}, \overline{v}_{c,2}, \cdots, \overline{v}_{c,h}$. Those eigenvectors cannot be real and it can be proved by contradiction. The corresponding real eigenvectors corresponding to the distinct real eigenvalues of $A$ are $v_{r,1}, v_{r,2}, \cdots, v_{r,l}$.

Consider the following linear equation.

$$
\begin{align}
\left( \sum_{i=1}^{h} x_{c, i} \text{Re}(v_{c,i}) \right) + \left( \sum_{i=1}^{h} y_{c, i} \text{Im}(v_{c,i}) \right) + \left( \sum_{i=1}^{l} x_{r, i} v_{r,i} \right)
&= 0 \\
\end{align}
$$

We must have

$$
\begin{align}
\left( \sum_{i=1}^{h} x_{c, i} \text{Re}(v_{c,i}) \right) + \left( \sum_{i=1}^{h} y_{c, i} \text{Im}(v_{c,i}) \right) + \left( \sum_{i=1}^{l} x_{r, i} v_{r,i} \right)
&= \left( \sum_{i=1}^{h} x_{c, i} \frac{v_{c,i} + \overline{v}_{c,i}}{2} \right) + \left( \sum_{i=1}^{h} y_{c, i} \frac{v_{c,i} - \overline{v}_{c,i}}{2} \right) + \left( \sum_{i=1}^{l} x_{r, i} v_{r,i} \right) \\
&= \left( \sum_{i=1}^{h} \frac{x_{c, i} + y_{c, i}}{2} v_{c, i} \right) + \left( \sum_{i=1}^{h} \frac{x_{c, i} - y_{c, i}}{2} \overline{v}_{c, i} \right) + \left( \sum_{i=1}^{l} x_{r, i} v_{r,i} \right) \\
&= 0 \\
\end{align}
$$

Because $\lambda_{c,1}, \lambda_{c,2}, \cdots, \lambda_{c,h}, \overline{\lambda}_{c,1}, \overline{\lambda}_{c,2}, \cdots, \overline{\lambda}_{c,h}, \lambda_{r,1}, \lambda_{r,2}, \cdots, \lambda_{r,l}$ are distinct eigenvalues and eigenvectors with distinct eigenvalues are linearly independent, therefore, the eigenvectors $v_{c,1}, v_{c,2}, \cdots, v_{c,h}, \overline{v}_{c,1}, \overline{v}_{c,2}, \cdots, \overline{v}_{c,h}, v_{r,1}, v_{r,2}, \cdots, v_{r,l}$ are linearly independent, and the only solution to the linear equation above is

$$
\begin{gather}
\frac{x_{c, i} - y_{c, i}}{2} = 0 \\
\frac{x_{c, i} + y_{c, i}}{2} = 0 \\
x_{r, i} = 0 \\
\end{gather}
$$

for $\forall i \in [1, h]$ and $\forall j \in [1, l]$.

Consequently,

$$
\begin{gather}
x_{c, i} = 0 \\
y_{c, i} = 0 \\
x_{r, i} = 0 \\
\end{gather}
$$

for $\forall i \in [1, h]$ and $\forall j \in [1, l]$.

Thus, the linear equation

$$
\begin{align}
\left( \sum_{i=1}^{h} x_{c, i} \text{Re}(v_{c,i}) \right) + \left( \sum_{i=1}^{h} y_{c, i} \text{Im}(v_{c,i}) \right) + \left( \sum_{i=1}^{l} x_{r, i} v_{r,i} \right)
&= 0 \\
\end{align}
$$

only has a zero solution.

This proves that $\text{Re}(v_{c,1}), \text{Re}(v_{c,2}), \cdots, \text{Re}(v_{c,h}), \text{Im}(v_{c,1}), \text{Im}(v_{c,2}), \cdots, \text{Im}(v_{c,h}), v_{r,1}, v_{r,2}, \cdots, v_{r,l}$, are linearly independent.

Note that because of algebraic multiplicity, there can be multiple linearly independent eigenvectors in $C$ corresponding to the same real eigenvalue. We can similarly prove that they are linearly independent with other eigenvectors corresponding to the other real distinct eigenvalues or the real and the imaginary components of the eigenvectors corresponding to the non-real eigenvalues.

Thus all the columns in $C$ are linearly independent and $C$ is invertible.

Suppose the eigenvalue $\lambda$ is non-real and the column indices of the real and imaginary component of its eigenvector $v$ in the matrix $C$ are $k$ and $k+1$ respectively.

$$
\begin{align}
\lambda v
&= \left( \text{Re}(\lambda) + i\text{Im}(\lambda) \right) \left( \text{Re}(v) + i\text{Im}(v) \right)\\
&= \left( \text{Re}(\lambda) \text{Re}(v) - \text{Im}(\lambda) \text{Im}(v) \right) + i\left( \text{Im}(\lambda) \text{Re}(v) + \text{Re}(\lambda) \text{Im}(v) \right) \\
&= CB w_{k} \\
\end{align}
$$

where $w_{k}$ is a column vector of size $n$, its $k$th entry value is $1$, $k+1$th entry value is $i$, and all the other entry values are $0$.

It’s also not difficult to find that

$$
\begin{align}
v
&= C w_{k} \\
\end{align}
$$

Because $C$ is invertible,

$$
w_{k} = C^{-1}v
$$

Thus,

$$
\begin{align}
Av
&= \lambda v \\
&= C B w_{k} \\
&= C B C^{-1}v \\
\end{align}
$$

Similarly, suppose the eigenvalue $\lambda$ is real and the column indices of its eigenvector $v$ in the matrix $C$ is $k$.

$$
\begin{align}
\lambda v
&= CB e_{k} \\
\end{align}
$$

where $e_{k}$ is a column vector of size $n$ and its $k$th entry value is $1$ and all the other entry values are $0$.

It’s also not difficult to find that

$$
\begin{align}
v
&= C e_{k} \\
\end{align}
$$

Thus,

$$
\begin{align}
Av
&= \lambda v \\
&= C B e_{k} \\
&= C B C^{-1}v \\
\end{align}
$$

Because $A$, $B$, and $C$ are real matrices, this implies that

$$
\begin{align}
A\text{Re}(v_{c, i}) &= CBC^{-1}\text{Re}(v_{c, i}) \\
A\text{Im}(v_{c, i}) &= CBC^{-1}\text{Im}(v_{c, i}) \\
\end{align}
$$

for for $\forall i \in [1, h]$.

That is also to say, for any column in $C$, $C_{:,i}$, we have

$$
AC_{:,i} = CBC^{-1} C_{:,i}
$$

Because we have shown that the columns in $C$ are linearly independent, any vector $w \in \mathbb{R}^n$ can be written as a linear combination of the columns, i.e.,

$$
w = \sum_{i=1}^{n} x_{i} C_{:,i}
$$

Thus,

$$
\begin{align}
Aw
&=
A\sum_{i=1}^{n} x_{i} C_{:,i} \\
&=
\sum_{i=1}^{n} x_{i} A C_{:,i} \\
&=
\sum_{i=1}^{n} x_{i} CBC^{-1} C_{:,i} \\
&=
CBC^{-1} \sum_{i=1}^{n} x_{i} C_{:,i} \\
&=
CBC^{-1} w \\
\end{align}
$$

Because $w$ can be any vector in $\mathbb{R}^n$, we must have

$$
A = CBC^{-1}
$$

This concludes the proof. $\square$

References

Author

Lei Mao

Posted on

10-30-2023

Updated on

10-30-2023

Licensed under


Comments