# Matrix Similarity

## Introduction

Similar matrices represent the same linear map under two (possibly) different bases. It is the fundamental for matrix diagonalization.

In this blog post, I would like to quickly discuss how to understand similar matrices.

## Definition of Similar Matrices

Two $n \times n$ matrices $A$ and $B$ are similar if there exists an invertible matrix $C$ such that $A = CBC^{-1}$.

## Vector Coordinate Transformation Using Similar Matrices

The matrix $C$ is invertible if and only if the columns of $C$, $\{ v_1, v_2, \cdots, v_n \}$, are linearly independent, i.e., constitutes a basis or span for $\mathbb{R}^{n}$. Let $\mathcal{B} = \{ v_1, v_2, \cdots, v_n \}$ be a basis for $\mathbb{R}^{n}$, which is different from the standard basis $\{ e_1, e_2, \cdots, e_n \}$. Let $x$ be a vector in $\mathbb{R}^{n}$ and its coordinates are represented using standard basis for simplicity.

By the definition of similar matrices, we have the vector coordinate transformation $Ax = CBC^{-1}x = C\left(B\left( C^{-1} x\right)\right)$.

Recall, because $\mathcal{B} = \{ v_1, v_2, \cdots, v_n \}$ is a basis for $\mathbb{R}^{n}$, $x$ can be written as a linear combination of the $\mathcal{B}$ basis,

\begin{align} x &= c_1 v_1 + c_2 v_2 + \cdots + c_n v_n \\ &= C [x]_{\mathcal{B}} \\ \end{align}

where

$$C = [v_1, v_2, \cdots, v_n]$$

and

$$[x]_{\mathcal{B}} = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \\ \end{bmatrix}$$

are the new coordinates for $x$ using the $\mathcal{B}$ basis.

Because $C$ is invertible, we have

$$[x]_{\mathcal{B}} = C^{-1} x$$

Thus, to transform the vector coordinates from the standard basis coordinate system to the new $\mathcal{B}$ basis coordinate system, we do

$$[x]_{\mathcal{B}} = C^{-1} x$$

To transform the vector coordinates from the new $\mathcal{B}$ basis coordinate system to the standard basis coordinate system, because the matrix $C$ is invertible,

$$x = C [x]_{\mathcal{B}}$$

The vector coordinate transformation using similar matrices $Ax = C\left(B\left( C^{-1} x\right)\right)$ implies that the vector coordinate transformation in the standard basis coordinate system is equivalent as transforming the vector coordinates from the standard basis coordinate system to the new $\mathcal{B}$ basis coordinate system, performing transformation $B$ in the new $\mathcal{B}$ basis coordinate system, and transforming the vector coordinates from the new $\mathcal{B}$ basis coordinate system back to the standard basis coordinate system.

The transformation between the transformation matrix $A$ and the transformation $B$ is similar in a way that the coordinates of the vector, $[x]_{\mathcal{B}}$, in the $\mathcal{B}$ basis coordinate system, after the transformation $B$ and the transformation back to the standard basis coordinate system, is exactly the same as the coordinates of the same vector $x$ in the standard basis coordinate system after the transformation $A$. That is to say,

$$Ax = C\left(B [x]_{\mathcal{B}} \right)$$

Alternatively, we could also say the transformation between the transformation matrix $A$ and the transformation $B$ is similar in a way that the coordinates of the vector, $[x]_{\mathcal{B}}$, in the $\mathcal{B}$ basis coordinate system, after the transformation $B$, is exactly the same as the coordinates of the same vector $x$ in the standard basis coordinate system after the transformation $A$ and the transformation to the $\mathcal{B}$ basis coordinate system. That is to say,

$$C^{-1} \left(Ax\right) = B [x]_{\mathcal{B}}$$

In fact, similar matrices $A$ and $B$ represent the same linear map under two (possibly) different bases. To see this, given the $n$ linearly independent column vectors from the matrix $C$ represented using standard basis coordinates, $\{ v_1, v_2, \cdots, v_n \}$, their $\mathcal{B}$ basis coordinates are $\{ C^{-1}v_1, C^{-1}v_2, \cdots, C^{-1}v_n \} = \{ e_1, e_2, \cdots, e_n \}$.

Thus the vector coordinates transformation using the standard basis is just the “same” as the vector coordinates transformation using the $\mathcal{B}$ basis followed by converting the basis back to the standard basis from the $\mathcal{B}$ basis. For example, because

\begin{align} A v_i &= CBC^{-1}v_i \\ &= C\left(B\left( C^{-1} v_i\right)\right) \\ &= C\left(B e_i\right) \\ \end{align}

Finally, in practice, $x$ does not have to be represented using standard basis. That’s to say, it’s not necessary that one of the coordinate systems has to be a standard basis coordinate system.

## Eigenvalues of Similar Matrices

Similar matrices have some interesting properties related to eigenvalues and eigenvectors.

### Similar Matrices Have the Same Eigenvalues

Similar matrices have the same eigenvalues.

Proof

Suppose $n \times n$ matrices $A$ and $B$ are similar and $A = CBC^{-1}$, the characteristic polynomial for computing the eigenvalues of $A$ becomes

\begin{align} \lvert A - \lambda I_n \rvert &= \lvert CBC^{-1} - \lambda CC^{-1} \rvert \\ &= \lvert CBC^{-1} - \lambda C I_nC^{-1} \rvert \\ &= \lvert CBC^{-1} - C \left( \lambda I_n \right) C^{-1} \rvert \\ &= \lvert C \left( B - \lambda I_n \right) C^{-1} \rvert \\ &= \lvert C \rvert \lvert \left( B - \lambda I_n \right) \rvert \lvert C^{-1} \rvert \\ &= \lvert C \rvert \lvert B - \lambda I_n \rvert \frac{1}{\lvert C \rvert} \\ &= \lvert B - \lambda I_n \rvert \end{align}

Because the characteristic polynomials for computing the eigenvalues of $A$ and $B$ are exactly the same, similar matrices $A$ and $B$ have the same eigenvalues.

This concludes the proof. $\square$

### Eigenvectors of Similar Matrices

Suppose $n \times n$ matrices $A$ and $B$ are similar, $A = CBC^{-1}$. Because similar matrices have the same eigenvalues, we further suppose $\lambda$ is the eigenvalue for $A$ and $B$, and $Av = \lambda v$ where $v$ is an eigenvector for $A$ corresponding to the eigenvalue $\lambda$.

Because of the following relationship,

\begin{align} BC^{-1} v &= \left( C^{-1}C \right) BC^{-1} v \\ &= C^{-1} \left( C BC^{-1} \right) v \\ &= C^{-1} A v \\ &= C^{-1} \lambda v \\ &= \lambda C^{-1} v \\ \end{align}

Therefore, an eigenvector for $B$ corresponding to the eigenvalue $\lambda$ is $C^{-1} v$.

Suppose $Bv = \lambda v$ where $v$ is an eigenvector for $B$ corresponding to the eigenvalue $\lambda$, similarly, we could also derive that an eigenvector for $A$ corresponding to the eigenvalue $\lambda$ is $Cv$.

### Eigenspace of Similar Matrices

Suppose $n \times n$ matrices $A$ and $B$ are similar, $A = CBC^{-1}$. $\lambda$ is the eigenvalue for $A$ and $B$, the $\lambda$-eigenspace of $A$ is the solution set of $(A - \lambda I_n) v = 0$, i.e., the nullspace of matrix $A - \lambda I_n$, $\text{Nul}(A - \lambda I_n)$.

For any eigenvector $v$ in the $\lambda$-eigenspace of $A$, $C^{-1} v$ is an eigenvector in the $\lambda$-eigenspace of $B$. We could say, $C^{-1}$ takes the $\lambda$-eigenspace of $A$ to the $\lambda$-eigenspace of $B$. Similarly, $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$.

### Eigenvectors with Distinct Eigenvalues

Eigenvectors with distinct eigenvalues are linearly independent.

Let $v_{1}, v_{2}, \cdots, v_{n}$ be eigenvectors of a matrix $A$, and suppose that the corresponding eigenvalues $\lambda_{1}, \lambda_{2}, \cdots, \lambda_{n}$ are distinct. Then $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ is linearly independent.

Proof

We will prove by contradiction.

Suppose $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ were linearly dependent.

This means that we can rearrange the order of $\{ v_{1}, v_{2}, \cdots, v_{n} \}$, for some $j$, $\{ v_{1}, v_{2}, \cdots, v_{j} \}$ is a span that is linearly independent, and $v_{j}$ is a linearly combination of the span.

$$v_{j} = \sum_{i=1}^{j-1} c_{i} v_{i}$$

Multiplying both side of the equation by $A$,

\begin{align} A v_{j} &= \lambda_{j} v_{j} \\ &= A \sum_{i=1}^{j-1} c_{i} v_{i} \\ &= \sum_{i=1}^{j-1} c_{i} A v_{i} \\ &= \sum_{i=1}^{j-1} c_{i} \lambda_{i} v_{i} \\ \end{align}

Multiplying both side of the equation by $\lambda_{j}$,

$$\lambda_{j} v_{j} = \sum_{i=1}^{j-1} c_{i} \lambda_{j} v_{i}$$

Thus,

$$\sum_{i=1}^{j-1} c_{i} \lambda_{i} v_{i} = \sum_{i=1}^{j-1} c_{i} \lambda_{j} v_{i}$$

We further have

$$\sum_{i=1}^{j-1} \left( c_{i} - c_{j} \right) \lambda_{i} v_{i} = 0$$

Because $c_i \neq c_j$, the linear equation

$$\sum_{i=1}^{j-1} d_{i} \lambda_{i} v_{i} = 0$$

has non-zero solutions.

But $\{ v_{1}, v_{2}, \cdots, v_{j} \}$ is linearly independent and the above linear equation only has the zero solution. which raises a contradiction.

Therefore, $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ is linearly independent.

This concludes the proof. $\square$

Notice that this theorem does not assume whether the eigenvectors and eigenvalues are real or complex.

## Similar to Diagonal Matrix

Suppose $n \times n$ matrices $A$ and $D$ are similar, $A = CDC^{-1}$, $C = [v_1, v_2, \cdots, v_n]$, and $D$ is a diagonal matrix.

It’s easy to see and verify that the standard basis $\{e_1, e_2, \cdots, e_n\}$ is a set of $n$ linearly-independent eigenvectors for any diagonal matrix $D$.

Suppose the eigenvalues for each of the eigenvectors are $\{\lambda_1, \lambda_2, \cdots, \lambda_n\}$, the $\lambda_1$-eigenspace of $D$ is $k e_1$, the $\lambda_2$-eigenspace of $D$ is $k e_2$, etc.

Because we have derived that $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$. In this case, $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$, a set of $n$ linearly independent eigenvectors of $A$ are $\{Ce_1, Ce_2, \cdots, Ce_n\} = \{v_1, v_2, \cdots, v_n\}$, and the $\lambda_1$-eigenspace of $A$ is $k v_1$, the $\lambda_2$-eigenspace of $A$ is $k v_2$, etc.

Because of the following relationship,

\begin{align} Av_i &= \lambda_i v_i \\ &= CDC^{-1} v_i \\ &= CD\left( C^{-1} v_i \right) \\ &= CD e_i \\ &= C \left( D e_i \right) \\ &= C \left( \lambda_i e_i \right) \\ \end{align}

we could see that the diagonal matrix $D$ essentially scales the $e_i$ in the $\mathcal{B}$ basis coordinate system by the corresponding eigenvalue $\lambda_i$. In the standard basis coordinate system, correspondingly, the matrix $A$ scales $v_i$ by the corresponding eigenvalue $\lambda_i$. Both of the operations, even though in different coordinate systems, are the same scaling operation using eigenvalue $\lambda_i$.

Lei Mao

08-20-2023

10-10-2023