Matrix Similarity
Introduction
Similar matrices represent the same linear map under two (possibly) different bases. It is the fundamental for matrix diagonalization.
In this blog post, I would like to quickly discuss how to understand similar matrices.
Definition of Similar Matrices
Two $n \times n$ matrices $A$ and $B$ are similar if there exists an invertible matrix $C$ such that $A = CBC^{-1}$.
Vector Coordinate Transformation Using Similar Matrices
The matrix $C$ is invertible if and only if the columns of $C$, $\{ v_1, v_2, \cdots, v_n \}$, are linearly independent, i.e., constitutes a basis or span for $\mathbb{R}^{n}$. Let $\mathcal{B} = \{ v_1, v_2, \cdots, v_n \}$ be a basis for $\mathbb{R}^{n}$, which is different from the standard basis $\{ e_1, e_2, \cdots, e_n \}$. Let $x$ be a vector in $\mathbb{R}^{n}$ and its coordinates are represented using standard basis for simplicity.
By the definition of similar matrices, we have the vector coordinate transformation $Ax = CBC^{-1}x = C\left(B\left( C^{-1} x\right)\right)$.
Recall, because $\mathcal{B} = \{ v_1, v_2, \cdots, v_n \}$ is a basis for $\mathbb{R}^{n}$, $x$ can be written as a linear combination of the $\mathcal{B}$ basis,
$$
\begin{align}
x &= c_1 v_1 + c_2 v_2 + \cdots + c_n v_n \\
&= C [x]_{\mathcal{B}} \\
\end{align}
$$
where
$$
C = [v_1, v_2, \cdots, v_n]
$$
and
$$
[x]_{\mathcal{B}} =
\begin{bmatrix}
c_1 \\
c_2 \\
\vdots \\
c_n \\
\end{bmatrix}
$$
are the new coordinates for $x$ using the $\mathcal{B}$ basis.
Because $C$ is invertible, we have
$$
[x]_{\mathcal{B}} = C^{-1} x
$$
Thus, to transform the vector coordinates from the standard basis coordinate system to the new $\mathcal{B}$ basis coordinate system, we do
$$
[x]_{\mathcal{B}} = C^{-1} x
$$
To transform the vector coordinates from the new $\mathcal{B}$ basis coordinate system to the standard basis coordinate system, because the matrix $C$ is invertible,
$$
x = C [x]_{\mathcal{B}}
$$
The vector coordinate transformation using similar matrices $Ax = C\left(B\left( C^{-1} x\right)\right)$ implies that the vector coordinate transformation in the standard basis coordinate system is equivalent as transforming the vector coordinates from the standard basis coordinate system to the new $\mathcal{B}$ basis coordinate system, performing transformation $B$ in the new $\mathcal{B}$ basis coordinate system, and transforming the vector coordinates from the new $\mathcal{B}$ basis coordinate system back to the standard basis coordinate system.
The transformation between the transformation matrix $A$ and the transformation $B$ is similar in a way that the coordinates of the vector, $[x]_{\mathcal{B}}$, in the $\mathcal{B}$ basis coordinate system, after the transformation $B$ and the transformation back to the standard basis coordinate system, is exactly the same as the coordinates of the same vector $x$ in the standard basis coordinate system after the transformation $A$. That is to say,
$$
Ax = C\left(B [x]_{\mathcal{B}} \right)
$$
Alternatively, we could also say the transformation between the transformation matrix $A$ and the transformation $B$ is similar in a way that the coordinates of the vector, $[x]_{\mathcal{B}}$, in the $\mathcal{B}$ basis coordinate system, after the transformation $B$, is exactly the same as the coordinates of the same vector $x$ in the standard basis coordinate system after the transformation $A$ and the transformation to the $\mathcal{B}$ basis coordinate system. That is to say,
$$
C^{-1} \left(Ax\right) = B [x]_{\mathcal{B}}
$$
In fact, similar matrices $A$ and $B$ represent the same linear map under two (possibly) different bases. To see this, given the $n$ linearly independent column vectors from the matrix $C$ represented using standard basis coordinates, $\{ v_1, v_2, \cdots, v_n \}$, their $\mathcal{B}$ basis coordinates are $\{ C^{-1}v_1, C^{-1}v_2, \cdots, C^{-1}v_n \} = \{ e_1, e_2, \cdots, e_n \}$.
Thus the vector coordinates transformation using the standard basis is just the “same” as the vector coordinates transformation using the $\mathcal{B}$ basis followed by converting the basis back to the standard basis from the $\mathcal{B}$ basis. For example, because
$$
\begin{align}
A v_i
&= CBC^{-1}v_i \\
&= C\left(B\left( C^{-1} v_i\right)\right) \\
&= C\left(B e_i\right) \\
\end{align}
$$
Finally, in practice, $x$ does not have to be represented using standard basis. That’s to say, it’s not necessary that one of the coordinate systems has to be a standard basis coordinate system.
Eigenvalues of Similar Matrices
Similar matrices have some interesting properties related to eigenvalues and eigenvectors.
Similar Matrices Have the Same Eigenvalues
Similar matrices have the same eigenvalues.
Proof
Suppose $n \times n$ matrices $A$ and $B$ are similar and $A = CBC^{-1}$, the characteristic polynomial for computing the eigenvalues of $A$ becomes
$$
\begin{align}
\lvert A - \lambda I_n \rvert
&= \lvert CBC^{-1} - \lambda CC^{-1} \rvert \\
&= \lvert CBC^{-1} - \lambda C I_nC^{-1} \rvert \\
&= \lvert CBC^{-1} - C \left( \lambda I_n \right) C^{-1} \rvert \\
&= \lvert C \left( B - \lambda I_n \right) C^{-1} \rvert \\
&= \lvert C \rvert \lvert \left( B - \lambda I_n \right) \rvert \lvert C^{-1} \rvert \\
&= \lvert C \rvert \lvert B - \lambda I_n \rvert \frac{1}{\lvert C \rvert} \\
&= \lvert B - \lambda I_n \rvert
\end{align}
$$
Because the characteristic polynomials for computing the eigenvalues of $A$ and $B$ are exactly the same, similar matrices $A$ and $B$ have the same eigenvalues.
This concludes the proof. $\square$
Eigenvectors of Similar Matrices
Suppose $n \times n$ matrices $A$ and $B$ are similar, $A = CBC^{-1}$. Because similar matrices have the same eigenvalues, we further suppose $\lambda$ is the eigenvalue for $A$ and $B$, and $Av = \lambda v$ where $v$ is an eigenvector for $A$ corresponding to the eigenvalue $\lambda$.
Because of the following relationship,
$$
\begin{align}
BC^{-1} v
&= \left( C^{-1}C \right) BC^{-1} v \\
&= C^{-1} \left( C BC^{-1} \right) v \\
&= C^{-1} A v \\
&= C^{-1} \lambda v \\
&= \lambda C^{-1} v \\
\end{align}
$$
Therefore, an eigenvector for $B$ corresponding to the eigenvalue $\lambda$ is $C^{-1} v$.
Suppose $Bv = \lambda v$ where $v$ is an eigenvector for $B$ corresponding to the eigenvalue $\lambda$, similarly, we could also derive that an eigenvector for $A$ corresponding to the eigenvalue $\lambda$ is $Cv$.
Eigenspace of Similar Matrices
Suppose $n \times n$ matrices $A$ and $B$ are similar, $A = CBC^{-1}$. $\lambda$ is the eigenvalue for $A$ and $B$, the $\lambda$-eigenspace of $A$ is the solution set of $(A - \lambda I_n) v = 0$, i.e., the nullspace of matrix $A - \lambda I_n$, $\text{Nul}(A - \lambda I_n)$.
For any eigenvector $v$ in the $\lambda$-eigenspace of $A$, $C^{-1} v$ is an eigenvector in the $\lambda$-eigenspace of $B$. We could say, $C^{-1}$ takes the $\lambda$-eigenspace of $A$ to the $\lambda$-eigenspace of $B$. Similarly, $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$.
Eigenvectors with Distinct Eigenvalues
Eigenvectors with distinct eigenvalues are linearly independent.
Let $v_{1}, v_{2}, \cdots, v_{n}$ be eigenvectors of a matrix $A$, and suppose that the corresponding eigenvalues $\lambda_{1}, \lambda_{2}, \cdots, \lambda_{n}$ are distinct. Then $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ is linearly independent.
Proof
We will prove by contradiction.
Suppose $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ were linearly dependent.
This means that we can rearrange the order of $\{ v_{1}, v_{2}, \cdots, v_{n} \}$, for some $j$, $\{ v_{1}, v_{2}, \cdots, v_{j} \}$ is a span that is linearly independent, and $v_{j}$ is a linearly combination of the span.
$$
v_{j} = \sum_{i=1}^{j-1} c_{i} v_{i}
$$
Multiplying both side of the equation by $A$,
$$
\begin{align}
A v_{j}
&= \lambda_{j} v_{j} \\
&= A \sum_{i=1}^{j-1} c_{i} v_{i} \\
&= \sum_{i=1}^{j-1} c_{i} A v_{i} \\
&= \sum_{i=1}^{j-1} c_{i} \lambda_{i} v_{i} \\
\end{align}
$$
Multiplying both side of the equation by $\lambda_{j}$,
$$
\lambda_{j} v_{j} = \sum_{i=1}^{j-1} c_{i} \lambda_{j} v_{i}
$$
Thus,
$$
\sum_{i=1}^{j-1} c_{i} \lambda_{i} v_{i} = \sum_{i=1}^{j-1} c_{i} \lambda_{j} v_{i}
$$
We further have
$$
\sum_{i=1}^{j-1} \left( c_{i} - c_{j} \right) \lambda_{i} v_{i} = 0
$$
Because $c_i \neq c_j$, the linear equation
$$
\sum_{i=1}^{j-1} d_{i} \lambda_{i} v_{i} = 0
$$
has non-zero solutions.
But $\{ v_{1}, v_{2}, \cdots, v_{j} \}$ is linearly independent and the above linear equation only has the zero solution. which raises a contradiction.
Therefore, $\{ v_{1}, v_{2}, \cdots, v_{n} \}$ is linearly independent.
This concludes the proof. $\square$
Notice that this theorem does not assume whether the eigenvectors and eigenvalues are real or complex.
Similar to Diagonal Matrix
Suppose $n \times n$ matrices $A$ and $D$ are similar, $A = CDC^{-1}$, $C = [v_1, v_2, \cdots, v_n]$, and $D$ is a diagonal matrix.
It’s easy to see and verify that the standard basis $\{e_1, e_2, \cdots, e_n\}$ is a set of $n$ linearly-independent eigenvectors for any diagonal matrix $D$.
Suppose the eigenvalues for each of the eigenvectors are $\{\lambda_1, \lambda_2, \cdots, \lambda_n\}$, the $\lambda_1$-eigenspace of $D$ is $k e_1$, the $\lambda_2$-eigenspace of $D$ is $k e_2$, etc.
Because we have derived that $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$. In this case, $C$ takes the $\lambda$-eigenspace of $B$ to the $\lambda$-eigenspace of $A$, a set of $n$ linearly independent eigenvectors of $A$ are $\{Ce_1, Ce_2, \cdots, Ce_n\} = \{v_1, v_2, \cdots, v_n\}$, and the $\lambda_1$-eigenspace of $A$ is $k v_1$, the $\lambda_2$-eigenspace of $A$ is $k v_2$, etc.
Because of the following relationship,
$$
\begin{align}
Av_i
&= \lambda_i v_i \\
&= CDC^{-1} v_i \\
&= CD\left( C^{-1} v_i \right) \\
&= CD e_i \\
&= C \left( D e_i \right) \\
&= C \left( \lambda_i e_i \right) \\
\end{align}
$$
we could see that the diagonal matrix $D$ essentially scales the $e_i$ in the $\mathcal{B}$ basis coordinate system by the corresponding eigenvalue $\lambda_i$. In the standard basis coordinate system, correspondingly, the matrix $A$ scales $v_i$ by the corresponding eigenvalue $\lambda_i$. Both of the operations, even though in different coordinate systems, are the same scaling operation using eigenvalue $\lambda_i$.
References
Matrix Similarity