# Matrix Rotation-Scaling Theorem

## Introduction

Rotation-scaling theorem is a special case of the matrix block diagonalization theorem where the matrix $A$ is a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$. Even though such matrix can still be diagonalized to complex-valued matrix using matrix diagonalization theorem, the rotation-scaling theorem allows us to find a real-valued rotation-scaling matrix $B$ that is similar to the matrix $A$ and this provides a more intuitive geometric interpretation of the matrix $A$.

In this blog post, I would like to quickly discuss the rotation-scaling theorem.

## Definition of Rotation-Scaling Matrix

A rotation-scaling matrix is a $2 \times 2$ matrix of the form

\begin{align} \begin{bmatrix} a & -b \\ b & a \\ \end{bmatrix} \\ \end{align}

where $a$ and $b$ are real numbers, not both equal to zero.

Because a rotation matrix is a $2 \times 2$ matrix of the form

\begin{align} \begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \\ \end{bmatrix} \end{align}

and a 2D scaling matrix is a $2 \times 2$ matrix of the form

\begin{align} \begin{bmatrix} r & 0 \\ 0 & r \\ \end{bmatrix} \end{align}

Thus, a rotation-scaling matrix $A$ can be written as the product of a rotation matrix and a scaling matrix.

\begin{align} A &= \begin{bmatrix} a & -b \\ b & a \\ \end{bmatrix} \\ &= \begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \\ \end{bmatrix} \begin{bmatrix} r & 0 \\ 0 & r \\ \end{bmatrix} \\ \end{align}

where $a = r\cos\theta$, $b = r\sin\theta$, $r = \sqrt{\lvert A \rvert} = \sqrt{a^2 + b^2}$.

To compute the eigenvalues of the matrix $A$, using the characteristic polynomial of the matrix $A$,

\begin{align} f(\lambda) &= \lvert A - \lambda I \rvert \\ &= (a - \lambda)^2 + b^2 \\ &= 0 \\ \end{align}

The eigenvalues of the matrix $A$ is a pair of the conjugated complex number which consists of $a$ and $b$,

$$\lambda = a \pm bi$$

To compute the eigenvector for each of the eigenvalues, we will have to find the nonzero vectors in $\text{Nul}(A - \lambda I)$.

In this case, we have two scenarios.

1. $\lambda \in \mathbb{R}$, i.e., $b = 0$ and $\theta = k\pi$. This is actually the case where only scaling happens.
2. $\lambda \notin \mathbb{R}$, i.e., $b \neq 0$ and $\theta \neq k\pi$. This is actually the case where both rotation and scaling happen.

If $\lambda \in \mathbb{R}$, i.e., $b = 0$, the eigenvalue is $\lambda = a$. It has an algebraic multiplicity of $2$ and a geometric multiplicity of $2$. The two linear independent eigenvectors for the eigenvalue $\lambda = a$ can be $e_1 = [1, 0]^{\top}$ and $e_2 = [0, 1]^{\top}$.

If $\lambda \notin \mathbb{R}$, i.e., $b \neq 0$, the two eigenvalues are $\lambda = a \pm bi$ where $b \neq 0$. Each eigenvalue has an algebraic multiplicity of $1$ and a geometric multiplicity of $1$. The eigenvectors for the eigenvalues $\lambda = a + bi$ and $\lambda = a - bi$ can be $[i, 1]^{\top}$ and $[-i, 1]^{\top}$, respectively.

## Rotation-Scaling Theorem

### Rotation-Scaling Theorem

Let $A$ be a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$, and let $v$ be an eigenvector. Then $A = CBC^{-1}$, where

\begin{align} C &= \begin{bmatrix} \mid & \mid \\ \text{Re}(v) & \text{Im}(v) \\ \mid & \mid \\ \end{bmatrix} \end{align}

and

\begin{align} B &= \begin{bmatrix} \text{Re}(\lambda) & \text{Im}(\lambda) \\ -\text{Im}(\lambda) & \text{Re}(\lambda) \\ \end{bmatrix} \end{align}

Notice that the matrices $A$, $B$, and $C$ are all real matrices. In particular, $B$ is a rotation-scaling matrix whose scaling factor $r = \sqrt{\lvert B \rvert} = \lvert \lambda \rvert$, and $A$ is similar to the rotation-scaling matrix $B$.

Proof

First of all, we will have to prove $C$ is invertible by showing its column vectors, $\text{Re}(v)$ and $\text{Im}(v)$, are linearly independent. Suppose the column vectors of $C$, $\text{Re}(v)$ and $\text{Im}(v)$, are linearly dependent, there must be two non-zero values $x$ and $y$ such that

$$x \text{Re}(v) + y \text{Im}(v) = 0$$

Then,

\begin{align} (y + ix) v &= (y + ix) (\text{Re}(v) + i \text{Im}(v)) \\ &= y \text{Re}(v) + i \left(x \text{Re}(v) + y \text{Im}(v) \right) - x \text{Im}(v) \\ &= y \text{Re}(v) - x \text{Im}(v) \\ \end{align}

Notice that $(y + ix) v$ is also a real eigenvector for the non-real eigenvalue $\lambda$ as it scales the eigenvector $v$. Because $A$ is a real matrix, $(y + ix) v$ is also a real eigenvector for $A$, $\lambda$ must be real. This raises contradiction. Thus, the column vectors of $C$ must be linearly independent and $C$ must be invertible.

Next, we have

\begin{align} CB &= \begin{bmatrix} \mid & \mid \\ \text{Re}(\lambda) \text{Re}(v) - \text{Im}(\lambda) \text{Im}(v) & \text{Im}(\lambda) \text{Re}(v) + \text{Re}(\lambda) \text{Im}(v)\\ \mid & \mid \\ \end{bmatrix} \end{align}

and

\begin{align} \lambda v &= (\text{Re}(\lambda) + i \text{Im}(\lambda) ) (\text{Re}(v) + i \text{Im}(v) ) \\ &= \text{Re}(\lambda) \text{Re}(v) + i ( \text{Re}(\lambda) \text{Im}(v) + \text{Im}(\lambda) \text{Re}(v)) - \text{Im}(\lambda) \text{Im}(v) \\ \end{align}

We realized that

\begin{align} \lambda v &= CB \begin{bmatrix} 1 \\ i \\ \end{bmatrix} \end{align}

\begin{align} v &= \text{Re}(v) + i \text{Im}(v) \\ &= C \begin{bmatrix} 1 \\ i \\ \end{bmatrix} \end{align}

and $C$ is invertible,

\begin{align} \begin{bmatrix} 1 \\ i \\ \end{bmatrix} &= C^{-1}v \end{align}

Thus,

\begin{align} Av &= \lambda v \\ &= CB \begin{bmatrix} 1 \\ i \\ \end{bmatrix} \\ &= CBC^{-1}v \end{align}

Because $A$, $B$, and $C$ are real matrices, this implies that

\begin{align} A\text{Re}(v) &= CBC^{-1}\text{Re}(v) \\ A\text{Im}(v) &= CBC^{-1}\text{Im}(v) \\ \end{align}

Because we have shown that $\text{Re}(v)$ and $\text{Im}(v)$ are linearly independent, any vector $w \in \mathbb{R}^2$ can be written as a linear combination of $\text{Re}(v)$ and $\text{Im}(v)$, i.e.,

$$w = a \text{Re}(v) + b \text{Im}(v)$$

Thus,

\begin{align} Aw &= A(a \text{Re}(v) + b \text{Im}(v)) \\ &= a A\text{Re}(v) + b A\text{Im}(v) \\ &= a CBC^{-1}\text{Re}(v) + b CBC^{-1}\text{Im}(v) \\ &= CBC^{-1} (a\text{Re}(v) + b \text{Im}(v)) \\ &= CBC^{-1} w \\ \end{align}

Because $w$ can be any vector in $\mathbb{R}^2$, we must have

$$A = CBC^{-1}$$

This concludes the proof. $\square$

### Relationship with Diagonalization Theorem

The rotation-scaling theorem cannot be applied if the eigenvalues $\lambda$ are real. Instead, if the eigenvalue multiplicities satisfy some constraints, such as the sum of the geometric multiplicities of the eigenvalues of $A$ is equal to $2$, the diagonalization theorem can be applied.

Concretely, let $A$ be a $2 \times 2$ real matrix with real eigenvalues $\lambda_1$ and $\lambda_2$, and let $v_1$ and $v_2$ be the eigenvectors corresponding to the eigenvalues, $\lambda_1 \neq \lambda_2$. Then $A = CDC^{-1}$, where

\begin{align} C &= \begin{bmatrix} \mid & \mid \\ v_1 & v_2 \\ \mid & \mid \\ \end{bmatrix} \end{align}

and

\begin{align} D &= \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \\ \end{bmatrix} \end{align}

It’s also possible that $A$ only has one real eigenvalue $\lambda$ with geometric multiplicity of $2$. This implies that $A = \lambda I$,

\begin{align} C &= \begin{bmatrix} \mid & \mid \\ e_1 & e_2 \\ \mid & \mid \\ \end{bmatrix} \end{align}

and

\begin{align} D &= \begin{bmatrix} \lambda & 0 \\ 0 & \lambda \\ \end{bmatrix} \end{align}

## Rotation-Scaling Consequences

Let $A$ be a $2 \times 2$ real matrix with a non-real complex eigenvalue $\lambda$, and let $v$ be an eigenvector. We have proved that $A$ is similar to the rotation-scaling matrix $B$ whose scaling factor $r = \sqrt{\lvert B \rvert} = \lvert \lambda \rvert$,

$$A = CBC^{-1}$$

If the transformation $A$ is applied on a vector $n$ times,

$$A^n = CB^nC^{-1}$$

Notice that $B^n$ is still a rotation-scaling matrix,

\begin{align} \lvert B^n \rvert &= \lvert B \rvert^n \\ &= \lvert \lambda \rvert^{2n} \\ \end{align}

Therefore, $A^n$ is similar to the rotation-scaling matrix $B^n$ whose scaling factor $r = \sqrt{\lvert B^n \rvert} = \lvert \lambda \rvert^n$,

If $\lvert \lambda \rvert < 1$, as the transformation $A$ is repeated applied on a vector, the vector will sprial in and ultimately its norm will become 0.

On the contrary, if $\lvert \lambda \rvert > 1$, as the transformation $A$ is repeated applied on a vector, the vector will sprial out and ultimately ts norm will become infinitely large.

Finally, if $\lvert \lambda \rvert = 1$, as the transformation $A$ is repeated applied on a vector, the vector will rotate around an ellipse and its norm will be constrained.

Lei Mao

10-23-2023

10-23-2023