 ### Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

# Shor's Algorithm

### Introduction

Shor’s algorithm is a quantum computer integer factorization algorithm that runs in asymptotically polynomial time. Concretely, given an integer $N$, it finds it prime factors relative efficiently. Shor’s algorithm consists of two parts, a classical part and a quantum part. The classical part is self-contained and has been extensively discussed in my previous article “Composite Number Factorization Using Modular Exponentiation Period”.

In this blog post, I would like to discuss the Shor’s algorithm, and mainly focus on the quantum part.

### Prerequisite

Most of the important properties of Hadamard operator have been derived in the prerequisites section of my previous blog post on Deutsch-Jozsa algorithm. Unlike Deutsch-Jozsa algorithm, Shor’s algorithm is only going to use a small fraction of the Hadamard operator properties that Deutsch-Jozsa algorithm has used. I would just copy the properties useful for Shor’s algorithm algorithm. For the derivation, proof, and other properties of Hadamard operator, the reader should refer to my previous blog post.

To extract an arbitrary column $j$ from $H^{\otimes {n}}$, we prepared a one-hot quantum system basic state vector $| \mathbf{y} \rangle = [y_0, y_1, \cdots, y_{2^n-1}]^{\top}$, where $y_j = 1$ and $y_k = 0$ for $k \neq j$.

\begin{align} H^{\otimes {n}}_{:,j} &= H^{\otimes {n}} | \mathbf{y} \rangle \\ &= H^{\otimes n}[\mathbf{0}, \mathbf{j}] | \mathbf{x}_0 \rangle + H^{\otimes n}[\mathbf{1}, \mathbf{j}] | \mathbf{x}_1 \rangle + \cdots + H^{\otimes n}[\mathbf{2^n-1}, \mathbf{j}] | \mathbf{x}_{2^{n}-1} \rangle \\ &= \frac{1}{\sqrt{2^n}} (-1)^{\langle \mathbf{0}, \mathbf{j} \rangle} | \mathbf{x}_0 \rangle + \frac{1}{\sqrt{2^n}} (-1)^{\langle \mathbf{1}, \mathbf{j} \rangle} | \mathbf{x}_1 \rangle + \cdots + \frac{1}{\sqrt{2^n}} (-1)^{\langle \mathbf{2^n-1}, \mathbf{j} \rangle} | \mathbf{x}_{2^{n}-1} \rangle \\ &= \frac{1}{\sqrt{2^n}} \sum_{i=0}^{2^n-1} (-1)^{\langle \mathbf{i}, \mathbf{j} \rangle} | \mathbf{x}_i \rangle\\ &= \frac{1}{\sqrt{2^n}} \sum_{ \mathbf{x} \in \{0,1\}^n } (-1)^{\langle \mathbf{x}, \mathbf{j} \rangle} | \mathbf{x} \rangle\\ &= \frac{1}{\sqrt{2^n}} \sum_{ \mathbf{x} \in \{0,1\}^n } (-1)^{\langle \mathbf{x}, \mathbf{y} \rangle} | \mathbf{x} \rangle\\ \end{align}

where $| \mathbf{x}_i \rangle$ is a quantum system one-hot basic state vector, $|\mathbf{x}_i\rangle = [x_0, x_1, \cdots, x_{2^{n}-1}]^{\top}$, where $x_i = 1$ and $x_k = 0$ for $k \neq i$.

Specifically, if $j = 0$, $| \mathbf{y} \rangle = [\underbrace{1, 0, 0, \cdots, 0}_{2^n} ]^{\top} = | \mathbf{0} \rangle$,

\begin{align} H^{\otimes {n}} | \mathbf{0} \rangle &= \frac{1}{\sqrt{2^n}} \sum_{i=0}^{2^n-1} (-1)^{\langle \mathbf{i}, \mathbf{0} \rangle} | \mathbf{x}_i \rangle \\ &= \frac{1}{\sqrt{2^n}} \sum_{i=0}^{2^n-1} (-1)^{0} | \mathbf{x}_i \rangle \\ &= \frac{1}{\sqrt{2^n}} \sum_{i=0}^{2^n-1} | \mathbf{x}_i \rangle \\ &= \frac{1}{\sqrt{2^n}} \sum_{ \mathbf{x} \in \{0,1\}^n } | \mathbf{x} \rangle \\ \end{align}

#### Discrete Fourier Transform

In the “Discrete Fourier Transform”, we have discussed how to perform discrete Fourier transform and inverse discrete Fourier transform using unitary matrix multiplication.

We define the inverse discrete Fourier transform Vandermonde matrix $\mathbf{DFT}^{\ast\dagger}$ as

\begin{align} \mathbf{DFT}^{\ast\dagger} &= \begin{bmatrix} \omega_N^{0 \cdot 0} & \omega_N^{0 \cdot 1} & \cdots & \omega_N^{0 \cdot (N-1)} \\ \omega_N^{1 \cdot 0} & \omega_N^{1 \cdot 1} & \cdots & \omega_N^{1 \cdot (N-1)} \\ \vdots & \vdots & \ddots & \vdots \\ \omega_N^{(N-1) \cdot 0} & \omega_N^{(N-1) \cdot 1} & \cdots & \omega_N^{(N-1) \cdot (N-1)} \\ \end{bmatrix} \end{align}

and of course the discrete Fourier transform Vandermonde matrix $\mathbf{DFT}^{\ast}$ is

\begin{align} \mathbf{DFT}^{\ast} &= \begin{bmatrix} \omega_N^{-0 \cdot 0} & \omega_N^{-0 \cdot 1} & \cdots & \omega_N^{-0 \cdot (N-1)} \\ \omega_N^{-1 \cdot 0} & \omega_N^{-1 \cdot 1} & \cdots & \omega_N^{-1 \cdot (N-1)} \\ \vdots & \vdots & \ddots & \vdots \\ \omega_N^{-(N-1) \cdot 0} & \omega_N^{-(N-1) \cdot 1} & \cdots & \omega_N^{-(N-1) \cdot (N-1)} \\ \end{bmatrix} \end{align}

where $\omega_N^{i \cdot j} = (\omega_N^{i})^j = \omega_N^{ij}$, and $\omega_N^{0}, \omega_N^{1}, \cdots, \omega_N^{N-1}$ are the $N$th root of unity, i.e., $\omega_N^{k} = e^{i\frac{2\pi k}{N}}$.

The unitary inverse discrete Fourier transform matrix and unitary discrete Fourier transform matrix are

$\begin{gather} U_{\text{iDFT}} = \frac{1}{\sqrt{N}} \mathbf{DFT}^{\ast\dagger} \\ U_{\text{DFT}} =\frac{1}{\sqrt{N}} \mathbf{DFT}^{\ast} \\ \end{gather}$

Using some math tricks, we could also rewrite the unitary inverse discrete Fourier transform matrix and unitary discrete Fourier transform matrix as

$\begin{gather} U_{\text{iDFT}} = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} \omega_{N}^{x \cdot y} | y \rangle \langle x | \\ U_{\text{DFT}} = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} \omega_{N}^{- x \cdot y} | y \rangle \langle x | \\ \end{gather}$

where $|x \rangle$ and $|y \rangle$ are one-hot encoded state vector for scalar $x$ and $y$.

Note that $| y \rangle \langle x |$ is an outer product resulting in a $N \times N$ matrix.

#### Quantum Fourier Transform

Quantum Fourier transform and the inverse quantum Fourier transform are just the discrete Fourier transform and inverse discrete Fourier transform analogues in quantum systems. In stead of applying the discrete Fourier transform Vandermonde matrix and the inverse discrete Fourier transform Vandermonde matrix to a sequence of scalars, quantum Fourier transform and the inverse quantum Fourier transform applies the same Vandermonde matrices to the superposition, i.e., the linear combination, of a sequence of quantum states. We will see this more concretely in the Shor’s algorithm.

The unitary inverse quantum Fourier transform matrix and unitary quantum Fourier transform matrix are the same to the ones used for inverse discrete Fourier transform and discrete Fourier transform.

$\begin{gather} U_{\text{iQFT}} = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} \omega_{N}^{x \cdot y} | y \rangle \langle x | \\ U_{\text{QFT}} = \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} \omega_{N}^{- x \cdot y} | y \rangle \langle x | \\ \end{gather}$

where $|x \rangle$ and $|y \rangle$ are quantum state vector for basic states, which are also represented as one-hot encoded vectors.

### Shor’s Algorithm

#### Classical Part

From the “Composite Number Factorization Using Modular Exponentiation Period”, we learned that given a composite integer $N$, and integers, such as an $a$, that are relatively prime to $N$, if somehow we could compute the period $r$ of the modular exponentiation remainder sequence $f_{a,N}$, we will be able to factorize the composite integer $N$ efficiently using classical algorithms.

We have also shown in the “Composite Number Factorization Using Modular Exponentiation Period” that the minimal period $r$ must be equal or smaller than $N$. When $N$ is small, we could the generate the modular exponentiation remainder sequence $f_{a,N}$ for a sequence length of at least $2N$ using modular algorithmic I described, and find out the period. However, when $N$ is extremely large, such as a number that is hundreds or even thousands of digits long, generating the modular exponentiation remainder sequence $f_{a,N}$ of length $2N$ using modular algorithmic takes exponential asymptotic complexity $O(10^n)$ which is intractable, where $n$ is the number of digits in the number $N$ using base of $10$.

To find out the period of the modular exponentiation remainder sequence $f_{a,N}$ when $N$ is extremely large, it turns out that quantum computer with quantum Fourier transform could do it in polynomial asymptotic complexity which is tractable. Let’s look at how to find out the period using quantum circus and quantum algorithms.

#### Quantum Part

Similar to all the quantum oracle circuits used for other quantum algorithm, we have a quantum oracle circuit specific for the function $f_{a,N}(x)$ for Shor’s algorithm.

We explicitly define $f_{a,N}(x)$ as the $x$th value in the $0$-indexed $f_{a,N}$ sequence. Mathematically,

\begin{align} f_{a,N}(x) = a^{x} \bmod {N} \end{align}

Note that in the quantum oracle circuit, $\mathbf{x} \in \{0, 1\}^m$ and $\mathbf{y} \in \{0, 1\}^n$ are just the binary representations for $x$ and $y$ and $\oplus$ is (bit-wise) $\text{XOR}$ (binary addition modulo 2).

With the quantum oracle circuit $U_{f_{a,N}}$, we could further design the quantum algorithm to find out the period of $f_{a,N}$.

To compute the quantum states $|\varphi_0\rangle$, $|\varphi_1\rangle$, and $|\varphi_2\rangle$, we have

\begin{align} |\varphi_0\rangle &= |\mathbf{0}\rangle \otimes |\mathbf{0}\rangle\\ &= |\mathbf{0}, \mathbf{0}\rangle \\ \end{align} \begin{align} |\varphi_1\rangle &= (H^{\otimes m} \otimes I) |\varphi_0\rangle \\ &= (H^{\otimes m} \otimes I) (|\mathbf{0}\rangle \otimes |\mathbf{0}\rangle) \\ &= H^{\otimes m}|\mathbf{0}\rangle \otimes I | \mathbf{0} \rangle \\ &= \bigg( \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x} \rangle \bigg) \otimes | \mathbf{0} \rangle \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x} \rangle \otimes | \mathbf{0} \rangle \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x}, \mathbf{0} \rangle \\ \end{align} \begin{align} |\varphi_2\rangle &= U_{f_{a,N}} |\varphi_1\rangle \\ &= U_{f_{a,N}} \bigg( \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x}, \mathbf{0} \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } U_{f_{a,N}} | \mathbf{x}, \mathbf{0} \rangle \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x}, \mathbf{0} \oplus f_{a,N}(\mathbf{x}) \rangle \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x}, f_{a,N}(\mathbf{x}) \rangle \\ &= \frac{1}{\sqrt{2^m}} \sum_{ \mathbf{x} \in \{0,1\}^m } | \mathbf{x}, a^{\mathbf{x}} \bmod {N} \rangle \\ \end{align}

We could see that the periodic $f_{a,N}$ sequence of length $2^m$ from $f_{a,N}(0)$ to $f_{a,N}(2^{m}-1)$ has been encoded in the $|\varphi_2\rangle$ qubits. The bottom qubits store the remainder information. Because the remainder is less than $N$, $n = \log_2 N$ qubits are just sufficient. Regarding the value of $m$, although personally I think $m = \log_2 (2N) = n + 1$ qubits should be sufficient to find out the period in theory, in practice, we used $m = \log_2 N^2 = 2 \log_2 N = 2n$ qubits.

After measuring the bottom qubits, the state of the bottom qubits becomes determined. Suppose the bottom qubits we found after measurement is

$a^{\mathbf{\overline{x}}} \bmod {N}$

Because $f_{a,N}$ is periodic with a period of $r$, for $s \in \mathbb{Z}$ and $\mathbf{\overline{x}} + sr \geq 0$, we have

$a^{\mathbf{\overline{x}}} \equiv a^{\mathbf{\overline{x}} + sr} \pmod {N}$

Because of the quantum entanglement, many possibilities for the top qubits have been eliminated. The remaining possibilities are the $\mathbf{x}$ whose $a^{\mathbf{x}} \equiv a^{\mathbf{\overline{x}}} \pmod {N}$, and there are $\lfloor \frac{2^m}{r} \rfloor$ different possibilities. So we have

\begin{align} |\varphi_3\rangle &= \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{a^{\mathbf{x}} \equiv a^{\mathbf{\overline{x}}} \pmod {N}}^{} | \mathbf{x}, a^{\mathbf{\overline{x}}} \bmod N \rangle \\ \end{align}

We could further rewrite $|\varphi_3\rangle$ as

\begin{align} |\varphi_3\rangle &= \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r, a^{\mathbf{\overline{x}}} \bmod N \rangle \\ \end{align}

where $t_0$ is the smallest $\mathbf{x}$ that $a^{\mathbf{x}} \equiv a^{\mathbf{\overline{x}}} \pmod {N}$, and is referred as the offset.

We expand $|\varphi_3\rangle$ as

\begin{align} |\varphi_3\rangle &= \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r, a^{\mathbf{\overline{x}}} \bmod N \rangle \\ &= \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r \rangle \otimes | a^{\mathbf{\overline{x}}} \bmod N \rangle \\ &= \bigg( \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r \rangle \bigg) \otimes | a^{\mathbf{\overline{x}}} \bmod N \rangle \\ \end{align}

We apply the inverse quantum Fourier transform unitary matrix $U_{\text{iQFT}}$ for the top qubits.

\begin{align} U_{\text{iQFT}} \bigg( \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r \rangle \bigg) &= \bigg( \frac{1}{\sqrt{2^m}} \sum_{x=0}^{2^m-1} \sum_{y=0}^{2^m-1} \omega_{2^m}^{x \cdot y} | y \rangle \langle x | \bigg) \bigg( \frac{1}{\lfloor \frac{2^m}{r} \rfloor} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{x=0}^{2^m-1} \sum_{y=0}^{2^m-1} \omega_{2^m}^{x \cdot y} | y \rangle \langle x | \bigg) \bigg( \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} | t_0 + j r \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} \sum_{y=0}^{2^m-1} \omega_{2^m}^{(t_0 + j r) \cdot y} | y \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{y=0}^{2^m-1} \sum_{j = 0}^{\lfloor \frac{2^m}{r} \rfloor - 1} \omega_{2^m}^{(t_0 + j r) \cdot y} | y \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{y=0}^{2^m-1} \Big( \sum_{k = 0}^{2^m-1} \omega_{2^m}^{k \cdot y} F[k] \Big) | y \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{y=0}^{2^m-1} \Big( \sum_{k = 0}^{2^m-1} e^{i\frac{2\pi ky}{2^m}} F[k] \Big) | y \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{y=0}^{2^m-1} \Big( \sum_{k = 0}^{2^m-1} F[k] e^{i\frac{2\pi ky}{2^m}} \Big) | y \rangle \bigg) \\ &= \frac{1}{\sqrt{2^m} \lfloor \frac{2^m}{r} \rfloor} \bigg( \sum_{y=0}^{2^m-1} f[y] | y \rangle \bigg) \\ \end{align}

where

\begin{align} F[k] &= \begin{cases} 1 & \text{if k = t_0 + j r}\\ 0 & \text{else}\\ \end{cases} \\ \end{align} $f[y] = \sum_{k = 0}^{2^m-1} F[k] e^{i\frac{2\pi ky}{2^m}}$

Note that in the above derivation, we used the fact that for one-hot encoded state vector $| x \rangle$ and $| y \rangle$,

\begin{align} \langle x, y \rangle &= \begin{cases} 1 & \text{if x = y}\\ 0 & \text{else}\\ \end{cases} \\ \end{align}

This is exactly the inverse discrete Fourier transform for $0/1$ periodic sequence. In “Discrete Fourier Transform for 0/1 Periodic Sequences”, we have shown and proved that the discrete Fourier transform and the inverse discrete Fourier transform for $0/1$ periodic sequence will change the sequence period from $r$ to $\frac{N}{r}$. More importantly, the leading zeros, i.e., the offset, will be eliminated after the transform. In our case, the original sequence period is $r$ for sequence $F$ and the sequence period for $f$ after inverse discrete Fourier transformation is $\frac{2^m}{r}$, and $f \neq 0$.

In “Discrete Fourier Transform for 0/1 Periodic Sequences”, we have shown and proved that if $r$ divides $2^m$, the sequence after inverse discrete Fourier transform will be a perfect $0/v$ periodic sequence with a period of exactly $\frac{2^m}{r}$. However, if $r$ does not divide $2^m$, the sequence after inverse discrete Fourier transform will not be a perfect $0/v$ periodic sequence.

Let’s discuss what Shor’s algorithm will do in these two scenarios.

When $r$ divides $2^m$, after inverse quantum Fourier transform, the quantum states for the top qubits are the superposition of the basic states $\{|y_0\rangle, |y_1\rangle, |y_2\rangle, \cdots, |y_k\rangle, \cdots \}$ with equal probabilities where $y_k = k\frac{2^m}{r}$. So when we measure the top qubits, the value we observed must be $x = k\frac{2^m}{r}$, multiples of $\frac{2^m}{r}$. Because $2^m$ is known, so we could know the value of $\frac{k}{r}$.

$\frac{k}{r} = \frac{x}{2^m}$

Does knowing the value of $\frac{k}{r}$ tells us the exact value of $r$? Not necessarily. For example, if $r = 2^6$ and $k = 3$, $\frac{k}{r} = \frac{3}{2^5}$, the denominator is irreducible, so there is no ambiguity about the value of $r$. However, if $r = 2^6$ and $k = 2\times3 = 6$, $\frac{k}{r} = \frac{3}{2^5}$, the denominator is reduced so the value of $r$ cannot be unambiguously determined. So in general, if $k$ is odd, we could immediately determine the value of $r$ without any ambiguity. In practice, we got $50\%$ chance that $k$ is odd. So we can run the quantum circuits $s$ times, and get the largest denominator as the value for $r$. The probability of sampling at least one odd $k$ in $s$ runs is $1 - \frac{1}{2^s}$. When $s = 20$, the probability is already greater than $0.999$. So this approach to determine the value of $r$ is feasible in practice.

When $r$ does not divide $2^m$, which is more common in practice, after inverse quantum Fourier transform, the value we observed from the top qubits, in addition to $x = k\lfloor\frac{2^m}{r}\rfloor$ and $x = k\lceil\frac{2^m}{r}\rceil$ with relatively high probabilities, might also be $x = k\lfloor\frac{2^m}{r}\rfloor - 1$ and $x = k\lceil\frac{2^m}{r}\rceil + 1$ with relatively low probabilities. In this scenario, we would like to again run the quantum circuits $s$ times, and try to analyze the observations to extract the most likely value for $r$. Since the random observation generation process has been explicitly modeled in my description, it is just a statistical inference for the random variable $r$ given the observation from running the quantum circuits $s$ times. I have written several articles on statistical inference and I am not going to elaborate it here for inferring the most likely value for $r$.

Once we got the period $r$, we used the classical part of the Shor’s algorithm to factorize the integer $N$.

### Final Remarks

I have been writing the article on Shor’s algorithm intermittently during the past year and I found it really difficult to make it mathematically friendly to the readers. However, if we don’t know the math, we will not understand why Shor’s algorithm will work.