# Hamming Code

## Introduction

Data storages and transmissions, regardless of the storage and transmission types, are not always error-free. There are always some non-zero probabilities that the data could be changed while it is being stored or transmitted.

For example, DNA is a perfect storage medium for information because DNA is highly stable, much more stable than any of the storage media used in the modern electronics, no matter whether it is the genetic material *in vivo* or the macromolecule *in vitro*. However, because of the natural radiations, DNA could always be damaged by radiations with some small probability. Therefore, the information the DNA stored could be altered or lost. *In vivo*, there are DNA repair mechanisms to fix the errors and damages brought by the radiation, usually based on the intact complementary information stored in the complementary DNA strand. DNA could also be altered while being transmitted, no matter whether it is being replicated by DNA polymerase *in vivo* or being synthesized from chemicals *in vitro*. *In vivo*, there are DNA error-correction mechanisms to fix the incorrect nucleotide being added by the DNA polymerase during DNA synthesis, usually also based on the intact complementary information stored in the complementary DNA strand.

An artificial data storage or transmission system would probably never be as delicate as the *in vivo* DNA genetic systems. Without some error-correction mechanisms for the artificial data storage and transmission system, the information could be lost forever. Early mathematicians and computer scientists, such as Richard Hamming, invented codes, that could automatically correct the errors introduced during data storage and transmission.

Hamming codes invented by Richard Hamming are a family of linear error-correcting codes and they have been widely used in applications such as error correction code memory (ECC memory). In this article, I would like to try pretending to be Richard Hamming and walk through how Hamming codes were invented mathematically.

## Error Detection VS Error Correction

It should be noted that error detection and error correction are different. Usually error detection is much simpler and more efficient whereas error correction could be more complicated and less efficient.

In addition, error detection and error correction does not always work. There are always some small probabilities that error detection and error correction would run into situations that they were not designed to deal with.

### Parity

Parity is one of most the simplest error detection codes. It adds a single bit that indicates whether the number of ones (bit-positions with values of one) in the preceding data was even or odd. If an odd number of bits is changed in transmission, the message will change parity and the error can be detected at this point; however, the bit that changed may have been the parity bit itself. The most common convention is that a parity value of one indicates that there is an odd number of ones in the data, and a parity value of zero indicates that there is an even number of ones. If the number of bits changed is even, the check bit will be valid and the error will not be detected.

The following is an example of attaching 1 parity bit to 7 bits of data, making the bit string to always have even number of 1s. Therefore, the XOR of the 8-bit data is always 0.

Data (7 Bits) | Count of 1-Bits | Parity Bit | Data Including Parity (8 Bits) | XOR |
---|---|---|---|---|

0000000 | 0 | 0 | 00000000 | 0 |

1010001 | 3 | 1 | 10100011 | 0 |

1101001 | 4 | 0 | 11010010 | 0 |

1111111 | 7 | 1 | 11111111 | 0 |

If the bit string has one single error, either in the data or at the parity bit position, the number of 1s in the bit string will not be even, and XOR will not be 0. However, if there are even number of errors in the bit string, the error could never be detected, as XOR would remain 0.

It is obvious that parity could not correct errors when it detects an error because it has no idea where the error happens. Whenever it detects an error, usually the receiver would have to request the sender to transmit the data again, which could be inconvenient and time-consuming.

### Repetitions

Repetitions is one of most the simplest error correction codes. It repeats every data bit multiple times in order to ensure that it was sent correctly.

The following is an example of repeating every bit from 3 bits of data 3 times.

Data (3 Bits) | Number of Repetitions | Data Including Repetitions |
---|---|---|

000 | 3 | 000000000 |

010 | 3 | 000111000 |

101 | 3 | 111000111 |

111 | 3 | 111111111 |

If the bit string has one single error, there would be inconsistency in the 3 bit repetition, and the single inconsistent bit would just be the error. The error could also be fixed by checking the values from the other 2 bits in the repetition. However, if there are two errors in the 3-bit repetition. For example, if the data including three repetitions 000 was incorrectly transmitted as 101, after correction, the data including three repetitions would become 111. The original data was 0, but it was incorrectly corrected to 1 after transmission and error correction.

The incorrect error correction could be mitigated by using larger number of repetitions. The more bit repetitions to vote, the more robust the error correction code to error rate. The drawback is it will reduce the error correction efficiency and is not favorable in a latency demanding data transmission system.

## Hamming Codes

In mathematical terms, Hamming codes are a class of binary linear code. For each integer $r \geq 2$, where $r$ is the number of error-correction bits, there is a code-word with block length $n = 2^r - 1$ and message length $k = 2^r - r - 1$.

For example, for $r = 3$, the Hamming code has $n = 7$ and $k = 4$. In the 7 bits of the Hamming code, 4 bits are the message we wanted to transmit, the rest 3 bits are error-correction bits which protects the message.

The rate of a block code, defined as the ratio between its message length and its block length, for Hamming codes is therefore

$$

\begin{align}

R &= \frac{k}{n} \\

&= 1 - \frac{r}{2^r - 1} \\

\end{align}

$$

As $r \rightarrow \infty$, $R \rightarrow 1$. When $r$ is very large enough, almost all the bits in the Hamming code are messages.

The Hamming code algorithm would be “created” and derived in the following sections.

## Hamming(7,4) Code

Let’s take a look at the core principles of Hamming(7,4) code, that is Hamming code when $r = 3$, first. We will try to come up ways of extending the Hamming(7,4) logic and principles to generic cases.

Hamming(7,4) code consists of 4 data bits, $d_1$, $d_2$, $d_3$, $d_4$, and 3 parity bits, $p_1$, $p_2$, $p_3$. As is shown in the following graphical depiction, the parity bit $p_1$ applies to the data bits $d_1$, $d_2$, and $d_4$, the parity bit $p_2$ applies to the data bits $d_1$, $d_3$, and $d_4$, the parity bit $p_3$ applies to the data bits $d_2$, $d_3$, and $d_4$.

When there is no error in the bits, none of the parities will break.

When there is a single error in the data bits, more than one parity will break. For example, say $d_1$ gets an error, the parities where $p_1$ and $p_2$ apply will break and the parity where $p_3$ applies remains. Then we could unambiguously determine $d_1$ gets an error. To correct the error, simply flip the value of $d_1$ as it’s binary.

When there is a single error in the parity bits, only one parity will break. For example, say $p_3$ gets an error, only the parity where $p_3$ applies will break and the parities where $p_1$ and $p_2$ apply remain.

We could see from Hamming(7,4) code that Hamming code is a natural extension from the parity used for error detection.

## Create Generic Hamming Code From Scratch

Based on Hamming(7,4) code intuitions and principles, let’s pretend we were the genius Richard Hamming in 1950s and tried to come up with a generic Hamming code for different numbers of error-correction bits.

The number of parity bits $r$, and each parity bit is used only once in one parity. So there are $r$ parities, and the number of different parity states that Hamming code could represent is $2^r$. One parity state has to represent the state that the code has no error. Each of the rest parity states has to represent the state that one unique bit has an error. The number of the rest parity states is $2^r - 1$, therefore, $r$ parity bits could be used for error-correcting codes up to $2^r - 1$ bits. That’s why Hamming code has $r$ number of error-correction bits, the block length $n = 2^r - 1$ and message length $k = 2^r - r - 1$.

The next question is is it possible to and how to arrange $n = 2^r - 1$ bits into $r$ parities so that each single error could be represented by one unique parity state. The graphical depiction of Hamming(7,4) code perfectly tells us how to arrange Hamming code for $r = 3$.

$d_1$ | $d_2$ | $d_3$ | $d_4$ | $p_1$ | $p_2$ | $p_3$ | |
---|---|---|---|---|---|---|---|

$p_1$ | Yes | Yes | No | Yes | Yes | No | No |

$p_2$ | Yes | No | Yes | Yes | No | Yes | No |

$p_3$ | No | Yes | Yes | Yes | No | No | Yes |

But what’s the rationale behind this? How to arrange Hamming code for any $r$ that could potentially go very large?

The core principle is that no two bits from the $n = 2^r - 1$ bits will be applied by the same parity bit(s). The following mathematics is the guideline for arranging parity bits.

$$

\begin{align}

n &= 2^r - 1 \\

&= { {r}\choose{0} } + { {r}\choose{1} } + { {r}\choose{2} } + \cdots + { {r}\choose{r} } - 1 \\

&= { {r}\choose{1} } + { {r}\choose{2} } + \cdots + { {r}\choose{r} } \\

\end{align}

$$

Among the $n = 2^r - 1$ bits, ${r}\choose{1}$ bits will be applied by only one parity bit, ${r}\choose{2}$ bits will be applied by two parity bits, ${r}\choose{3}$ bits will be applied by three parity bits, etc. More specifically, the ${r}\choose{1}$ bits have to be the parity bits based on the principle that each parity bit is used only once in one parity.

Theoretically, we could randomly group the $k = n - {r \choose 1} = 2^r - r - 1$ bits for the $r \choose 2$, $r \choose 3$, $\cdots$, $r \choose r$ groups. As long as the Hamming code decoder knows how the bits were grouped, the decoder could still decode the Hamming code to messages and correct the message based on $r$ parities.

Of course, random grouping does not seem to be optimal. So the next question related to optimization is can we group the bits in a way, rather than random, such that it is more friendly to Hamming code encoding and decoding and we could design efficient Hamming code encoding and decoding algorithms. More formally, given a sequence of $n = 2^r - 1$ bits, how to better assign $n = 2^r - 1$ bit (address) to $r$ parity bits and $k = 2^r - r - 1$ data bits and how to better assign $n = 2^r - 1$ bit (address) to $r$ parity groups?

A more motivating question to ask is, can we make those assignment such that by doing some algebra on the address of parity bits whose parities have been broken we would be able to find out the address of incorrect bits?

Maybe peeking into the Hamming(7,4) code bit assignments could give us some ideas. (I know we are cheating here since we are not really the genius Richard Hamming…)

$p_1$ | $p_2$ | $d_1$ | $p_3$ | $d_2$ | $d_3$ | $d_4$ | |
---|---|---|---|---|---|---|---|

Bit Address | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Bit Address (Binary) | 001 | 010 | 011 | 100 | 101 | 110 | 111 |

We noticed that using one-based binary indexing, the addresses of parity bits $p_1$, $p_2$, and $p_3$ are 001, 010, 100, all of which are power of 2. What’s more interesting is, the address of the error bit could be computed from the address of parity bits whose parities have been broken using XOR. Going back to the examples in Hamming(7,4) code, suppose $d_1$ gets an error, the parities where $p_1$ and $p_2$ apply will break. The addresses of $d_1$, $p_1$, and $p_2$, are 011, 001, and 010, respectively. So, $011 = 001 \ \text{XOR} \ 010$.

Knowing the address assignments of parity bits and data bits, it becomes straightforward to assign the bits to parity groups. To fill the following blanks, we will iterate through all the combinations of $p_1$, $p_2$, and $p_3$, excluding the null combination.

$p_1$ | $p_2$ | $d_1$ | $p_3$ | $d_2$ | $d_3$ | $d_4$ | |
---|---|---|---|---|---|---|---|

Bit Address | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Bit Address (Binary) | 001 | 010 | 011 | 100 | 101 | 110 | 111 |

$p_1$ | |||||||

$p_2$ | |||||||

$p_3$ |

All the combinations of $p_1$, $p_2$, and $p_3$, excluding the null combination, are, ($p_1$), ($p_2$), ($p_3$), ($p_1$, $p_2$), ($p_1$, $p_3$), ($p_2$, $p_3$), ($p_1$, $p_2$, $p_3$). The index-XOR (address-XOR) for all the combinations are, 001, 010, 100, 011, 101, 110, 111, respectively. This tells us that bit 001 is guarded by parities where $p_1$ applies, bit 010 is guarded by parities where $p_2$ applies, …, bit 011 is guarded by parities where $p_1$ and $p_2$ applies, …, bit 111 is guarded by parities where $p_1$, $p_2$ and $p_3$ applies.

So the filled table will look like this.

$p_1$ | $p_2$ | $d_1$ | $p_3$ | $d_2$ | $d_3$ | $d_4$ | |
---|---|---|---|---|---|---|---|

Bit Address | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Bit Address (Binary) | 001 | 010 | 011 | 100 | 101 | 110 | 111 |

$p_1$ | Yes | No | Yes | No | Yes | No | Yes |

$p_2$ | No | Yes | Yes | No | No | Yes | Yes |

$p_3$ | No | No | No | Yes | Yes | Yes | Yes |

More close examining reveals the following rules for parity bit assignments.

- Parity bit $p_1$ covers all bit address which have the
**least**significant bit set, 00**1**, 01**1**, 10**1**, 11**1**. - Parity bit $p_2$ covers all bit address which have the
**second**least significant bit set, 0**1**0, 0**1**1, 1**1**0, 1**1**1. - Parity bit $p_3$ covers all bit address which have the
**third**least significant bit set,**1**00,**1**01,**1**10,**1**11.

Strictly speaking, we would need a mathematical proof for deriving this rule. But we will just skip here.

The address assignment rules for parity bits and data bits and the parity bit assignment rules could be naturally extended to Hamming code for any $r \geq 2$.

In addition, knowing exactly the parity group assignment is necessary for Hamming code encoding, but is useless for decoding, as decoding only needs to check the parity bits, compute the address of error bit using the parity bit address, and correct the error bit by flipping if necessary.

## Generic Hamming Code Algorithm

Based on our creation of Hamming code from scratch, we could now state the generic Hamming code algorithm. Given a bit string,

- Using one-based binary indexing for all the bits starting from 1, such as 00001, 00010, 00011, etc.
- All bit positions that are powers of two (have a single 1 bit in the binary form of their position) are parity bits, such as 00001, 00010, 00100, 01000, etc.
- All other bit positions, with two or more 1 bits in the binary form of their position, are data bits.
- Each data bit is included in a unique set of 2 or more parity bits, as determined by the binary form of its bit position.
- Parity bit $p_1$ covers all bit address which have the
**least**significant bit set, 0000**1**, 0001**1**, 0010**1**, 0011**1**, etc. - Parity bit $p_2$ covers all bit address which have the
**second**least significant bit set, 000**1**0, 000**1**1, 001**1**0, 001**1**1, etc. - Parity bit $p_3$ covers all bit address which have the
**third**least significant bit set, 00**1**00, 00**1**01, 00**1**10, 00**1**11, etc. - …
- Parity bit $p_i$ covers all bit address which have the
**i-th**least significant bit set.

- Parity bit $p_1$ covers all bit address which have the

To encode the message, copy the message on to the data bit positions, and fill the parity bits by computing the parities based on the parity assignments.

To decode the message, check each of the parities. If any parity is broken, get the addresses of parity bits whose parity coverage are broken, and compute the error bit address using XOR for the addresses of broken parity bits that were just found. Once the error bit address was computed, flip the value of the error bit.

## Hamming Code Dealing With Multiple Bit Errors

Let’s check one instance of Hamming(7,4) code 0110011 where $p_1 = 0$, $p_2 = 1$, $d_1 = 1$, $p_3 = 0$, $d_2 = 0$, $d_3 = 1$, $d_4 = 1$. Because of two bit errors were introduced, $p_3 = 1$, $d_2 = 1$, the Hamming(7,4) code becomes 0111111, as is shown below.

The parity where $p_1$ applies was broken suggesting that there is an error in the code. However, after fixing the code, the code will become 1111111 which is incorrect.

In general, in addition to Hamming(7,4) code, Hamming code can detect and correct a single error, but it cannot distinguish a double bit error of some codeword from a single bit error of a different codeword. Thus, some double-bit errors will be incorrectly decoded as if they were single bit errors and therefore go undetected, unless no correction is attempted.

### Hamming Code With Additional Parity

To remedy the shortcoming of not being able to distinguish single bit error and double bit errors, Hamming codes can be extended by an extra parity bit. In this way, the decoder can detect and correct a single error and at the same time detect (but not correct) a double error.

This extra parity bit creates the parity with all the original Hamming code bits. When single bit error happens, the parity that the extra parity bit applies will break, in addition to the parity breaks from the original Hamming code. When double bit errors happen, the parity that the extra parity bit applies will remain intact, whereas the parity breaks from the original Hamming code remains unchanged.

The following is an example of Hamming(7,4) code with an extra parity bit, making it Hamming(8,4) code.

This extended Hamming code is popular in systems such as computer memory, as it will not try to make an correction mistake when double bit errors happen, which is already very rare.

## Hamming Code (Binary) Linear Algebra

Hamming code encoding, error detection, error correction, and decoding processes could be represented using (binary) linear algebra.

Consider Hamming(7,4) code, the data bits could be represented using a column vector $\mathbf{d} \in \mathbb{R}^{4}$, where

$$

\begin{align}

\mathbf{d} &=

\begin{bmatrix}

d_1 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix}

\end{align}

$$

### Encoding

To encode the data bits $\mathbf{d}$, it is actually a (binary) linear transformation using the code generator matrix $\mathbf{G}^{4 \times 7}$, where

$$

\begin{align}

\mathbf{G}^{\top} &=

\begin{bmatrix}

1 & 1 & 0 & 1 \\

1 & 0 & 1 & 1 \\

1 & 0 & 0 & 0 \\

0 & 1 & 1 & 1 \\

0 & 1 & 0 & 0 \\

0 & 0 & 1 & 0 \\

0 & 0 & 0 & 1 \\

\end{bmatrix}

\end{align}

$$

The Hamming(7,4) encoded bits $\mathbf{x}$ would be

$$

\begin{align}

\mathbf{x} &= \mathbf{G}^{\top} \mathbf{d} \\

&=

\begin{bmatrix}

1 & 1 & 0 & 1 \\

1 & 0 & 1 & 1 \\

1 & 0 & 0 & 0 \\

0 & 1 & 1 & 1 \\

0 & 1 & 0 & 0 \\

0 & 0 & 1 & 0 \\

0 & 0 & 0 & 1 \\

\end{bmatrix}

\begin{bmatrix}

d_1 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

d_1 + d_2 + d_4 \\

d_1 + d_3 + d_4 \\

d_1 \\

d_2 + d_3 + d_4 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

d_1 + d_2 + d_4 \\

d_1 + d_3 + d_4 \\

d_1 \\

d_2 + d_3 + d_4 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix}

\mod 2 \\

&=

\begin{bmatrix}

(d_1 + d_2 + d_4) \mod 2 \\

(d_1 + d_3 + d_4) \mod 2 \\

d_1 \\

(d_2 + d_3 + d_4) \mod 2 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

p_1 \\

p_2 \\

d_1 \\

p_3 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix} \\

\end{align}

$$

The modulo is to compute the parity bits for even parity.

Also notice that the code generator matrix $\mathbf{G}$ could be easily derived from the bit sequence and parity table.

### Error Detection

Suppose the actual Hamming(7,4) encoded bits received is $\mathbf{x}^{\prime}$, and $\mathbf{x}^{\prime}$ may or may not equal to $\mathbf{x}$.

The error detection is just parity checking by applying the parity checking matrix $\mathbf{H} \in \mathbb{R}^{3 \times 7}$ on $\mathbf{x}^{\prime}$, where

$$

\begin{align}

\mathbf{H} &=

\begin{bmatrix}

1 & 0 & 1 & 0 & 1 & 0 & 1 \\

0 & 1 & 1 & 0 & 0 & 1 & 1 \\

0 & 0 & 0 & 1 & 1 & 1 & 1 \\

\end{bmatrix}

\end{align}

$$

$$

\begin{align}

\mathbf{z} = \mathbf{H} \mathbf{x}^{\prime}

\end{align}

$$

If $\mathbf{x} = \mathbf{x}^{\prime}$, we must have $\mathbf{z} = \mathbf{0}$.

$$

\begin{align}

\mathbf{z} &= \mathbf{H} \mathbf{x}^{\prime} \\

&= \mathbf{H} \mathbf{x} \\

&=

\begin{bmatrix}

1 & 0 & 1 & 0 & 1 & 0 & 1 \\

0 & 1 & 1 & 0 & 0 & 1 & 1 \\

0 & 0 & 0 & 1 & 1 & 1 & 1 \\

\end{bmatrix}

\begin{bmatrix}

p_1 \\

p_2 \\

d_1 \\

p_3 \\

d_2 \\

d_3 \\

d_4 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

p_1 + d_1 + d_2 + d_3 \\

p_2 + d_1 + d_3 + d_4 \\

p_3 + d_d + d_3 + d_4 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

p_1 + d_1 + d_2 + d_3 \\

p_2 + d_1 + d_3 + d_4 \\

p_3 + d_d + d_3 + d_4 \\

\end{bmatrix} \mod 2 \\

&=

\begin{bmatrix}

0 \\

0 \\

0 \\

\end{bmatrix} \\

&= \mathbf{0} \\

\end{align}

$$

Also notice that the parity checking matrix $\mathbf{H}$ is just the bit sequence and parity table.

A more interesting property of $\mathbf{H}$ is that each column in $\mathbf{H}$ is actually binary index sequence, 1, 2, 3, etc.

$$

\begin{align}

\mathbf{H} &=

\begin{bmatrix}

1 & 0 & 1 & 0 & 1 & 0 & 1 \\

0 & 1 & 1 & 0 & 0 & 1 & 1 \\

0 & 0 & 0 & 1 & 1 & 1 & 1 \\

\end{bmatrix} \\

&=

\begin{bmatrix}

1 & 2 & 3 & 4 & 5 & 6 & 7 \\

\end{bmatrix}_{10}

\end{align}

$$

This property will be very useful for error correction.

### Error Correction

Suppose there is a single bit error at bit position $i$ (one-based), the actual Hamming encoded code received would be

$$

\mathbf{x}^{\prime} = \mathbf{x} + \mathbf{e}_{i}

$$

Parity checking would just yield the bit error position in binary.

$$

\begin{align}

\mathbf{z} &= \mathbf{H} \mathbf{x}^{\prime} \\

&= \mathbf{H} \left( \mathbf{x} + \mathbf{e}_{i} \right) \\

&= \mathbf{H} \mathbf{x} + \mathbf{H} \mathbf{e}_{i} \\

&= \mathbf{0} + \mathbf{H} \mathbf{e}_{i} \\

&= \mathbf{H} \mathbf{e}_{i} \\

&=

\begin{bmatrix}

1 & 2 & 3 & 4 & 5 & 6 & 7 \\

\end{bmatrix}_{10}

\mathbf{e}_{i} \\

&= i \\

&= (i)_{2} \\

\end{align}

$$

The resulting value would be non-zero if there is a bit error.

To fix the bit error, simply flip the bit value at address $\mathbf{z}$.

### Decoding

Decoding is simple. Suppose the corrected Hamming code is $\mathbf{d}_{r}$, and the decoding matrix $\mathbf{R} \in \mathbb{R}^{4 \times 7}$, where

$$

\begin{align}

\mathbf{R} &=

\begin{bmatrix}

0 & 0 & 1 & 0 & 0 & 0 & 0 \\

0 & 0 & 0 & 0 & 1 & 0 & 0 \\

0 & 0 & 0 & 0 & 0 & 1 & 0 \\

0 & 0 & 0 & 0 & 0 & 0 & 1 \\

\end{bmatrix}

\end{align}

$$

Also notice that the code generator matrix $\mathbf{R}$ could be easily derived from the bit sequence and parity table.

## Perfect Code

### Minimum Distance

The distance or minimum (Hamming) distance $d$ of a block code is the minimum number of positions in which any two distinct codewords differ.

Because each message bit is covered by at least two parities, changing one message bit will result in changing at least three bits in codeword. Changing two message bits will result in changing at least three bits in codeword as well. So the distance of Hamming code $d = 3$.

In fact, a code with distance $d$ allows the receiver to detect up to $d-1$ transmission errors since changing $d-1$ positions of a codeword can never accidentally yield another codeword. Furthermore, if no more than $(d-1)/2$ transmission errors occur, the receiver can uniquely decode the received word to a codeword. This is because every received word has at most one codeword at distance $(d-1)/2$. If more than $(d-1)/2$ transmission errors occur, the receiver cannot uniquely decode the received word in general as there might be several possible codewords.

To understand this more intuitively, suppose we have certain number of people standing on a playground, the distance can only be integer, and the minimum distance between any two people is $d$. Given a circle of a radius of $(d-1)/2$ around each people, we know the closest person to all the points in the circle is the person in the center of the circle. If the radius of circles becomes $d - 1$, there are some chances that the closest person to all the points in the circle is not the person in the center of the circle, but the choices of the closest person is very limited. If the radius of circles becomes even larger, it is almost impossible to have the closest person to be the one in the center of the circle.

### Hamming Bound

With the understanding of minimum distance from Hamming code, we could further derive Hamming bound, an important limitation on the efficiency with which any error-correcting code can utilize the space in which its code words are embedded. A code that attains the Hamming bound is said to be a perfect code.

An alphabet set $\mathcal{A}_{q}$ is a set of symbols with $q$ elements and $\lvert \mathcal{A}_{q} \rvert = q$. The set of strings of length $n$ on the alphabet set $\mathcal{A}_{q}$ are denoted $\mathcal{A}_{q}^{n}$ and $\lvert \mathcal{A}_{q}^{n} \rvert = q^n$. A $q$-ary block code of length $n$ is a subset of the strings of $\mathcal{A}_{q}^{n}$.

Let $A_{q}(n,d)$ denote the maximum possible size of a $q$-ary block code $C$ of length $n$ and minimum Hamming distance $d$ between elements of the block code (necessarily positive for $q^{n}>1$).

Then, the Hamming bound is:

$$

A_{q}(n,d) \leq \frac{q^n}{\sum_{k=0}^{t} {n \choose k} (q-1)^k }

$$

where

$$

t = \left\lfloor \frac{d-1}{2} \right\rfloor

$$

*Proof*

Similar to what we have discussed in the minimum distance section, for each codeword $c$ in $q$-ary block code $C$, consider a ball of fixed radius of $t$ around $c$, where $t$ is just the minimum distance. The number of words $m$ that could deviate at most $t$ components from the center of the sphere, which is a codeword, could be computed using combinations.

$$

m = \sum_{k=0}^{t} {n \choose k} (q-1)^k

$$

Because $A_{q}(n,d)$ is the maximum number of codewords in $C$, we must have

$$

A_{q}(n,d) \times m \leq \lvert \mathcal{A}_{q}^{n} \rvert

$$

Therefore,

$$

A_{q}(n,d) \leq \frac{q^n}{\sum_{k=0}^{t} {n \choose k} (q-1)^k }

$$

This concludes the proof. $\square$

### Hamming Codes

A code that attains the Hamming bound is said to be a perfect code. Perfect code achieves the highest possible rate for codes with their block length and minimum distance.

Hamming codes are perfect codes. Let’s verify it.

*Proof*

For Hamming codes, as we have derived previously, $q$ = 2 because Hamming codes are binary codes, code length $n = 2^r - 1$, message length $k = 2^r - r - 1$, and minimum distance $t = \lfloor (d - 1) / 2 \rfloor = 1$.

$$

\begin{align}

A_{q}(n,d) &= q^{k} \\

&= 2^{2^r - r - 1} \\

\end{align}

$$

$$

\begin{align}

\frac{q^n}{\sum_{i=0}^{t} {n \choose i} (q-1)^i }

&= \frac{2^n}{\sum_{i=0}^{1} {n \choose i} 1^i } \\

&= \frac{2^n}{ {n \choose 0} + {n \choose 1} } \\

&= \frac{2^n}{ 1 + n } \\

&= \frac{2^{2^r - 1}}{ 1 + 2^r - 1 } \\

&= \frac{2^{2^r - 1}}{ 2^r } \\

&= 2^{2^r - r - 1} \\

\end{align}

$$

Therefore, for Hamming codes,

$$

A_{q}(n,d) = \frac{q^n}{\sum_{k=0}^{t} {n \choose k} (q-1)^k }

$$

Hamming code is perfect code.

This concludes the proof. $\square$

## Final Remarks

As I mentioned, random parity grouping would also work in Hamming code for error correction but it would just be less efficient. The way Richard Hamming was assigning the bits to parity groups was the only part that we “cheated” in this article. There must be some mathematical motivations behind this, which I did not get time to research and explore.

That’s why Richard Hamming was a famous mathematician but I am not :)

## References

Hamming Code