Matrix Rotation-Scaling Theorem

Introduction

Rotation-scaling theorem is a special case of the matrix block diagonalization theorem where the matrix $A$ is a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$. Even though such matrix can still be diagonalized to complex-valued matrix using matrix diagonalization theorem, the rotation-scaling theorem allows us to find a real-valued rotation-scaling matrix $B$ that is similar to the matrix $A$ and this provides a more intuitive geometric interpretation of the matrix $A$.

In this blog post, I would like to quickly discuss the rotation-scaling theorem.

Definition of Rotation-Scaling Matrix

A rotation-scaling matrix is a $2 \times 2$ matrix of the form

$$
\begin{align}
\begin{bmatrix}
a & -b \\
b & a \\
\end{bmatrix}
\\
\end{align}
$$

where $a$ and $b$ are real numbers, not both equal to zero.

Because a rotation matrix is a $2 \times 2$ matrix of the form

$$
\begin{align}
\begin{bmatrix}
\cos \theta & - \sin \theta \\
\sin \theta & \cos \theta \\
\end{bmatrix}
\end{align}
$$

and a 2D scaling matrix is a $2 \times 2$ matrix of the form

$$
\begin{align}
\begin{bmatrix}
r & 0 \\
0 & r \\
\end{bmatrix}
\end{align}
$$

Thus, a rotation-scaling matrix $A$ can be written as the product of a rotation matrix and a scaling matrix.

$$
\begin{align}
A
&=
\begin{bmatrix}
a & -b \\
b & a \\
\end{bmatrix}
\\
&=
\begin{bmatrix}
\cos \theta & - \sin \theta \\
\sin \theta & \cos \theta \\
\end{bmatrix}
\begin{bmatrix}
r & 0 \\
0 & r \\
\end{bmatrix}
\\
\end{align}
$$

where $a = r\cos\theta$, $b = r\sin\theta$, $r = \sqrt{\lvert A \rvert} = \sqrt{a^2 + b^2}$.

To compute the eigenvalues of the matrix $A$, using the characteristic polynomial of the matrix $A$,

$$
\begin{align}
f(\lambda)
&= \lvert A - \lambda I \rvert \\
&= (a - \lambda)^2 + b^2 \\
&= 0 \\
\end{align}
$$

The eigenvalues of the matrix $A$ is a pair of the conjugated complex number which consists of $a$ and $b$,

$$
\lambda = a \pm bi
$$

To compute the eigenvector for each of the eigenvalues, we will have to find the nonzero vectors in $\text{Nul}(A - \lambda I)$.

In this case, we have two scenarios.

  1. $\lambda \in \mathbb{R}$, i.e., $b = 0$ and $\theta = k\pi$. This is actually the case where only scaling happens.
  2. $\lambda \notin \mathbb{R}$, i.e., $b \neq 0$ and $\theta \neq k\pi$. This is actually the case where both rotation and scaling happen.

If $\lambda \in \mathbb{R}$, i.e., $b = 0$, the eigenvalue is $\lambda = a$. It has an algebraic multiplicity of $2$ and a geometric multiplicity of $2$. The two linear independent eigenvectors for the eigenvalue $\lambda = a$ can be $e_1 = [1, 0]^{\top}$ and $e_2 = [0, 1]^{\top}$.

If $\lambda \notin \mathbb{R}$, i.e., $b \neq 0$, the two eigenvalues are $\lambda = a \pm bi$ where $b \neq 0$. Each eigenvalue has an algebraic multiplicity of $1$ and a geometric multiplicity of $1$. The eigenvectors for the eigenvalues $\lambda = a + bi$ and $\lambda = a - bi$ can be $[i, 1]^{\top}$ and $[-i, 1]^{\top}$, respectively.

Rotation-Scaling Theorem

Rotation-Scaling Theorem

Let $A$ be a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$, and let $v$ be an eigenvector. Then $A = CBC^{-1}$, where

$$
\begin{align}
C &=
\begin{bmatrix}
\mid & \mid \\
\text{Re}(v) & \text{Im}(v) \\
\mid & \mid \\
\end{bmatrix}
\end{align}
$$

and

$$
\begin{align}
B &=
\begin{bmatrix}
\text{Re}(\lambda) & \text{Im}(\lambda) \\
-\text{Im}(\lambda) & \text{Re}(\lambda) \\
\end{bmatrix}
\end{align}
$$

Notice that the matrices $A$, $B$, and $C$ are all real matrices. In particular, $B$ is a rotation-scaling matrix whose scaling factor $r = \sqrt{\lvert B \rvert} = \lvert \lambda \rvert$, and $A$ is similar to the rotation-scaling matrix $B$.

Proof

First of all, we will have to prove $C$ is invertible by showing its column vectors, $\text{Re}(v)$ and $\text{Im}(v)$, are linearly independent. Suppose the column vectors of $C$, $\text{Re}(v)$ and $\text{Im}(v)$, are linearly dependent, there must be two non-zero values $x$ and $y$ such that

$$
x \text{Re}(v) + y \text{Im}(v) = 0
$$

Then,

$$
\begin{align}
(y + ix) v
&= (y + ix) (\text{Re}(v) + i \text{Im}(v)) \\
&= y \text{Re}(v) + i \left(x \text{Re}(v) + y \text{Im}(v) \right) - x \text{Im}(v) \\
&= y \text{Re}(v) - x \text{Im}(v) \\
\end{align}
$$

Notice that $(y + ix) v$ is also a real eigenvector for the non-real eigenvalue $\lambda$ as it scales the eigenvector $v$. Because $A$ is a real matrix, $(y + ix) v$ is also a real eigenvector for $A$, $\lambda$ must be real. This raises contradiction. Thus, the column vectors of $C$ must be linearly independent and $C$ must be invertible.

Next, we have

$$
\begin{align}
CB
&=
\begin{bmatrix}
\mid & \mid \\
\text{Re}(\lambda) \text{Re}(v) - \text{Im}(\lambda) \text{Im}(v) & \text{Im}(\lambda) \text{Re}(v) + \text{Re}(\lambda) \text{Im}(v)\\
\mid & \mid \\
\end{bmatrix}
\end{align}
$$

and

$$
\begin{align}
\lambda v
&= (\text{Re}(\lambda) + i \text{Im}(\lambda) ) (\text{Re}(v) + i \text{Im}(v) )
\\
&=
\text{Re}(\lambda) \text{Re}(v) + i ( \text{Re}(\lambda) \text{Im}(v) + \text{Im}(\lambda) \text{Re}(v)) - \text{Im}(\lambda) \text{Im}(v)
\\
\end{align}
$$

We realized that

$$
\begin{align}
\lambda v
&=
CB
\begin{bmatrix}
1 \\
i \\
\end{bmatrix}
\end{align}
$$

In addition, because

$$
\begin{align}
v
&=
\text{Re}(v) + i \text{Im}(v) \\
&=
C
\begin{bmatrix}
1 \\
i \\
\end{bmatrix}
\end{align}
$$

and $C$ is invertible,

$$
\begin{align}
\begin{bmatrix}
1 \\
i \\
\end{bmatrix}
&=
C^{-1}v
\end{align}
$$

Thus,

$$
\begin{align}
Av
&=
\lambda v \\
&=
CB
\begin{bmatrix}
1 \\
i \\
\end{bmatrix}
\\
&=
CBC^{-1}v
\end{align}
$$

Because $A$, $B$, and $C$ are real matrices, this implies that

$$
\begin{align}
A\text{Re}(v) &= CBC^{-1}\text{Re}(v) \\
A\text{Im}(v) &= CBC^{-1}\text{Im}(v) \\
\end{align}
$$

Because we have shown that $\text{Re}(v)$ and $\text{Im}(v)$ are linearly independent, any vector $w \in \mathbb{R}^2$ can be written as a linear combination of $\text{Re}(v)$ and $\text{Im}(v)$, i.e.,

$$
w = a \text{Re}(v) + b \text{Im}(v)
$$

Thus,

$$
\begin{align}
Aw
&=
A(a \text{Re}(v) + b \text{Im}(v)) \\
&=
a A\text{Re}(v) + b A\text{Im}(v) \\
&=
a CBC^{-1}\text{Re}(v) + b CBC^{-1}\text{Im}(v) \\
&=
CBC^{-1} (a\text{Re}(v) + b \text{Im}(v)) \\
&=
CBC^{-1} w \\
\end{align}
$$

Because $w$ can be any vector in $\mathbb{R}^2$, we must have

$$
A = CBC^{-1}
$$

This concludes the proof. $\square$

Relationship with Diagonalization Theorem

The rotation-scaling theorem cannot be applied if the eigenvalues $\lambda$ are real. Instead, if the eigenvalue multiplicities satisfy some constraints, such as the sum of the geometric multiplicities of the eigenvalues of $A$ is equal to $2$, the diagonalization theorem can be applied.

Concretely, let $A$ be a $2 \times 2$ real matrix with real eigenvalues $\lambda_1$ and $\lambda_2$, and let $v_1$ and $v_2$ be the eigenvectors corresponding to the eigenvalues, $\lambda_1 \neq \lambda_2$. Then $A = CDC^{-1}$, where

$$
\begin{align}
C &=
\begin{bmatrix}
\mid & \mid \\
v_1 & v_2 \\
\mid & \mid \\
\end{bmatrix}
\end{align}
$$

and

$$
\begin{align}
D &=
\begin{bmatrix}
\lambda_1 & 0 \\
0 & \lambda_2 \\
\end{bmatrix}
\end{align}
$$

It’s also possible that $A$ only has one real eigenvalue $\lambda$ with geometric multiplicity of $2$. This implies that $A = \lambda I$,

$$
\begin{align}
C &=
\begin{bmatrix}
\mid & \mid \\
e_1 & e_2 \\
\mid & \mid \\
\end{bmatrix}
\end{align}
$$

and

$$
\begin{align}
D &=
\begin{bmatrix}
\lambda & 0 \\
0 & \lambda \\
\end{bmatrix}
\end{align}
$$

Rotation-Scaling Consequences

Let $A$ be a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$, and let $v$ be an eigenvector. We have proved that $A$ is similar to the rotation-scaling matrix $B$ whose scaling factor $r = \sqrt{\lvert B \rvert} = \lvert \lambda \rvert$,

$$
A = CBC^{-1}
$$

If the transformation $A$ is applied on a vector $n$ times,

$$
A^n = CB^nC^{-1}
$$

Notice that $B^n$ is still a rotation-scaling matrix,

$$
\begin{align}
\lvert B^n \rvert
&=
\lvert B \rvert^n \\
&= \lvert \lambda \rvert^{2n} \\
\end{align}
$$

Therefore, $A^n$ is similar to the rotation-scaling matrix $B^n$ whose scaling factor $r = \sqrt{\lvert B^n \rvert} = \lvert \lambda \rvert^n$,

If $\lvert \lambda \rvert < 1$, as the transformation $A$ is repeated applied on a vector, the vector will sprial in and ultimately its norm will become 0.

On the contrary, if $\lvert \lambda \rvert > 1$, as the transformation $A$ is repeated applied on a vector, the vector will sprial out and ultimately ts norm will become infinitely large.

Finally, if $\lvert \lambda \rvert = 1$, as the transformation $A$ is repeated applied on a vector, the vector will rotate around an ellipse and its norm will be constrained.

References

Author

Lei Mao

Posted on

10-23-2023

Updated on

10-23-2023

Licensed under


Comments