Matrix Diagonalization Theorem

Introduction

Diagonal matrix is one of the most simplest matrices. It’s eigenvalues are just the diagonal values and the eigenvectors are just the natural bases vectors $\{e_1, e_2, \cdots, e_n\}$.

Diagonalizable matrix, a matrix similar to a diagonal matrix, is also benefited from the simplicity of a diagonal matrix.

In this blog post, I would like to quickly discuss matrix diagonalization.

Definition of Diagonalizable Matrix

An $n \times n$ matrix $A$ is diagonalizable if it is similar to a diagonal matrix: that is, if there exits an invertible $n \times n$ matrix $C$ and a diagonal matrix $D$ such that $A = CDC^{-1}$.

Diagonalization Theorem

Naturally derived from the definition of diagonalizable matrix, we have the diagonalizable theorem.

An $n \times n$ matrix $A$ is diagonalizable if and only if $A$ has $n$ linearly independent eigenvectors.

Proof

Suppose $A$ is diagonalizable, $A = CDC^{-1}$, we then have $AC = CD$. Because $D$ is a diagonal matrix, each column vector of $C$, $v_i$, must be an eigenvector of $A$ and the diagonal value of $D$, $\lambda_i$, is the corresponding eigenvalue.

Because the matrix $C$ is invertible, the $n$ column vectors of $C$ must be linearly independent.

Suppose $A$ has $n$ linearly independent eigenvectors $\{v_1, v_2, \cdots, v_n\}$ which can form an $n \times n$ invertible matrix $C$, their corresponding eigenvalues $\{\lambda_1, \lambda_2, \cdots, \lambda_n\}$ can form a diagonal matrix $D$. By the definition of eigenvalue and eigenvector, $AC = CD$.

Because the matrix $C$ is invertible, $A = CDC^{-1}$ and $A$ is diagonalizable.

This concludes the proof. $\square$

Algebraic and Geometric Multiplicity

Let $A$ be an $n \times n$ be matrix, and let $\lambda$ be an eigenvalue of $A$.

The algebraic multiplicity of $\lambda$ is its multiplicity as a root of the characteristic polynomial of $A$.

The geometric multiplicity of $\lambda$ is the dimension of the $\lambda$-eigenspace.

Let $A$ be a square matrix and let $\lambda$ be an eigenvalue of $A$. Then

$$
1 \leq \text{the geometric multiplicity of } \lambda \leq \text{the algebraic multiplicity of } \lambda
$$

The complete proof to this is skipped in this article.

Diagonalization Theorem Variant

There is a variant of diagonalization theorem stated in the language of multiplicities.

Let $A$ be an $n \times n$ matrix. The following are equivalent:

  1. $A$ is diagonalizable.
  2. The sum of the geometric multiplicities of the eigenvalues of $A$ is equal to $n$.
  3. The sum of the algebraic multiplicities of the eigenvalues of $A$ is equal to $n$, and for each eigenvalue, the geometric multiplicity equals the algebraic multiplicity.

Proof

The proof shows $1 \rightarrow 2 \rightarrow 3 \rightarrow 1$.

$1 \rightarrow 2$

If $A$ is diagonalizable, $A$ has $n$ linearly independent eigenvectors. This suggests that the sum of the geometric multiplicities of the eigenvalues of $A$ is at least $n$.

Because the geometric multiplicity of a eigenvalue is not larger than the algebraic multiplicity of the eigenvalue, the sum of the geometric multiplicities of the eigenvalues of $A$ is not larger than the sum of the algebraic multiplicities of the eigenvalues of $A$.

Because the sum of the algebraic multiplicities of an $n \times n$ matrix $A$ is at most $n$, the sum of the geometric multiplicities of the eigenvalues of $A$ is not larger than $n$.

Therefore, if $A$ is diagonalizable, the sum of the geometric multiplicities of the eigenvalues of $A$ must be equal to $n$.

$2 \rightarrow 3$

Because the sum of the algebraic multiplicities of the eigenvalues of $A$ is at least the sum of the geometric multiplicities of the eigenvalues of $A$ and at most $n$, if the sum of the geometric multiplicities of the eigenvalues of $A$ is equal to $n$, this suggests that the sum of the algebraic multiplicities of the eigenvalues of $A$ must be $n$.

Suppose there is at least one eigenvalue $\lambda$ whose geometric multiplicity does not equal to the algebraic multiplicity. Because the geometric multiplicity of $\lambda$ is less than but not equal to the algebraic multiplicity of $\lambda$ now, and the sum of the algebraic multiplicities of the eigenvalues of $A$ is at least the sum of the geometric multiplicities of the eigenvalues of $A$, which is $n$, the algebraic multiplicities must be at least $n + 1$. However, the the sum of the algebraic multiplicities of the eigenvalues of $A$ cannot be larger than $n$. This raises a contradiction. Thus, for each eigenvalue, the geometric multiplicity equals the algebraic multiplicity.

Therefore, if the sum of the geometric multiplicities of the eigenvalues of $A$ is equal to $n$, then the sum of the algebraic multiplicities of the eigenvalues of $A$ is equal to $n$, and for each eigenvalue, the geometric multiplicity equals the algebraic multiplicity.

$3 \rightarrow 1$

Because the sum of the algebraic multiplicities of the eigenvalues of $A$ is equal to $n$, and for each eigenvalue, the geometric multiplicity equals the algebraic multiplicity, then the sum of geometric multiplicity is $n$.

Suppose the distinct eigenvalues are $\{\lambda_1, \lambda_2, \cdots, \lambda_k\}$, the geometric multiplicities corresponding to the distinct eigenvalues are $\{m_1, m_2, \cdots, m_k\}$ and $\sum_{i=1}^{k} m_i = n$, the $m_i$ linearly independent eigenvectors corresponding to the $\lambda_{i}$-eigenspace are $\{v_{i,1}, v_{i,2}, \cdots, v_{i,m_{i}}\}$. $\mathcal{B}_{i}$ is a basis for the $\lambda_{i}$-eigenspace, which can just consist of $\{v_{i,1}, v_{i,2}, \cdots, v_{i,m_{i}}\}$. The basis of $\mathcal{B}_{i}$ forms a matrix $V_i = [v_{i,1}, v_{i,2}, \cdots, v_{i,m_{i}}]$. We will show that all the column vectors from the matrices $V_1, V_2, \cdots, V_k$ are linearly independent, i.e., the equation

$$
\sum_{i=1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} = 0
$$

has no non-zero solutions.

Suppose all the column vectors from the matrices $V_1, V_2, \cdots, V_k$ are linearly dependent. There exists at least one $c_{h,l} \neq 0$ such that $c_{h,l} v_{h,l} \neq 0$. Because $\{v_{h,1}, v_{h,2}, \cdots, v_{h,m_{h}}\}$ are linearly independent, $\sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} \neq 0$. Thus,

$$
\sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} = \left( \sum_{i=1}^{h-1} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right)
$$

Multiplying the above equation by $\lambda_{h}$ on both ends, we have

$$
\begin{align}
\lambda_{h} \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j}
&= \lambda_{h} \left( \sum_{i=1}^{h-1} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \lambda_{h} \left( \sum_{i=h+1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \\
&= \left( \sum_{i=1}^{h-1} \lambda_{h} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \lambda_{h} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \\
\end{align}
$$

Using the property of eigenvector and eigenvalues, we have

$$
\begin{align}
A \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j}
&= \sum_{j=1}^{m_{h}} c_{h,j} A v_{h,j} \\
&= \sum_{j=1}^{m_{h}} c_{h,j} \lambda_{h} v_{h,j} \\
&= \lambda_{h} \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} \\
&= A \left( \sum_{i=1}^{h-1} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + A \left( \sum_{i=h+1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \\
&= \left( \sum_{i=1}^{h-1} \sum_{j=1}^{m_{i}} c_{i,j} A v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} A v_{i,j} \right) \\
&= \left( \sum_{i=1}^{h-1} \sum_{j=1}^{m_{i}} c_{i,j} \lambda_{i} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \sum_{j=1}^{m_{i}} c_{i,j} \lambda_{i} v_{i,j} \right) \\
&= \left( \sum_{i=1}^{h-1} \lambda_{i} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \lambda_{i} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \\
\end{align}
$$

Thus, we have

$$
\begin{align}
\lambda_{h} \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} - A \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j}
&=
\lambda_{h} \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} - \lambda_{h} \sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} \\
&=
\left( \left( \sum_{i=1}^{h-1} \lambda_{h} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \lambda_{h} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \right) - \left( \left( \sum_{i=1}^{h-1} \lambda_{i} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \lambda_{i} \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \right) \\
&=
\left( \sum_{i=1}^{h-1} \left(\lambda_{h} - \lambda_{i}\right) \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} \left(\lambda_{h} - \lambda_{i} \right) \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) \\
&= 0 \\
\end{align}
$$

Because $\left(\lambda_{h} - \lambda_{i}\right)$ is not zero for $i \neq h$ and thus the following linear equation has non-zero solutions.

$$
\begin{align}
\left( \sum_{i=1}^{h-1} d_i \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right) + \left( \sum_{i=h+1}^{k} d_i \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j} \right)
&= 0 \\
\end{align}
$$

Because $\sum_{j=1}^{m_{h}} c_{h,j} v_{h,j} \neq 0$, $\{ \sum_{j=1}^{m_{1}} c_{1,j} v_{1,j}, \sum_{j=1}^{m_{2}} c_{2,j} v_{2,j}, \cdots, \sum_{j=1}^{m_{h-1}} c_{h-1,j} v_{h-1,j}, \sum_{j=1}^{m_{h+1}} c_{h+1,j} v_{h+1,j}, \cdots, \sum_{j=1}^{m_{k}} c_{k,j} v_{k,j} \}$ cannot be all zeros.

For a subset whose $\sum_{j=1}^{m_{i}} c_{i,j} v_{i,j}$ which are not zero, suppose $i \in L$, the following linear equation has non-zero solutions.

$$
\begin{align}
\sum_{i \in L}^{} d_i \sum_{j=1}^{m_{i}} c_{i,j} v_{i,j}
&= 0 \\
\end{align}
$$

Notice that $\sum_{j=1}^{m_{i}} c_{i,j} v_{i,j}$, if not zero, is an eigenvector of $A$ whose corresponding distinct eigenvalue is $\lambda_{i}$. Because eigenvectors with distinct eigenvalues are linearly independent, the zero solution is the only solution to the above linear equation.

This contradicts our previous finding that the above linear equation has non-zero solutions.

Therefore, all the column vectors from the matrices $V_1, V_2, \cdots, V_k$ are linearly independent, $A$ has $n$ linearly independent eigenvectors, and $A$ is diagonalizable.

This concludes the proof. $\square$

Vector Coordinate Transformation Using Diagonalizable Matrix

As we have discussed in the previous article “Matrix Similarity”, the vector coordinate transformation using a diagonalizable matrix $A$, where $A = CDC^{-1}$, is just same as doing coordinate scaling in the new $\mathcal{B}$ bases coordinate system where $\mathcal{B} = \{ v_1, v_2, \cdots, v_n \}$ which are the columns of the matrix $C$, i.e., the eigenvectors of the matrix $A$, and the scaling factors are the corresponding eigenvalues $\lambda_i$ for each eigenvector $v_i$.

Matrix with Distinct Complex Eigenvalue

If the eigenvalues of an $n \times n$ matrix $A$ are distinct complex numbers, the eigenvectors of $A$ are also linearly independent. Thus, $A$ is diagonalizable. This time, the diagonal matrix $D$ that the matrix $A$ is similar to is a diagonal matrix with the distinct complex eigenvalues of $A$ on the diagonal. The coordinate scaling factor, however, is not real but complex. This complex scaling sometimes does not have a good geometric interpretation. Therefore, it’s not quite common to diagonalize a matrix with complex eigenvalues.

As we will see in my later articles, the matrices with complex eigenvalues are similar to real-valued rotation-scaling matrices or real-valued block-diagonal matrices. These matrices are more commonly used in practice and have much better geometric interpretations.

Matrix Diagonalization

To diagonalize a diagonalizable matrix $A$, where $A = CDC^{-1}$, finding the invertible matrix $C$ and the diagonal matrix $D$ is simply finding the eigenvectors and the eigenvalues of the matrix $A$.

Matrix Eigendecomposition

Matrix eigendecomposition and diagonalization are the same.

References

Author

Lei Mao

Posted on

09-01-2023

Updated on

10-10-2023

Licensed under


Comments