Heap Building Algorithm
Introduction
Given a binary heap, either min-heap or max-heap, insertion takes $O(\log N)$ time in the worst case. To build a heap from an array of $N$ elements, if we insert each element into the heap one by one, it would take $O(N \log N)$ time. However, there is actually a heap building algorithm that can build a heap from an array of $N$ elements in $O(N)$ time.
In this blog post, I would like to discuss the $O(N)$ heap building algorithm and its asymptotic complexity tight bound.
Heap Building Algorithm
The binary heap requires no special data structure to store the elements. It can just be stored in an array. Suppose the binary heap tree node is indexed from top to bottom, left to right, then the array representation of the binary heap is a complete binary tree. The root element is at index 0. The left child of a node at index $i$ is at index $2i + 1$ of the array, and the right child of a node at index $i$ is at index $2i + 2$ of the array. The parent of a node at index $i$ is at index $\frac{i - 1}{2}$ of the array.
The heap building algorithm is as follows:
- Start from the last non-leaf node of the binary heap, i.e., the parent of the last node at index $N - 1$, which is at index $\frac{\left( N - 1 \right) - 1}{2} = \frac{N}{2} - 1$ of the array.
- For each non-leaf node, heapify the subtree rooted at the node. The heapify operation is to compare the node with its left and right children, and swap the node with the child with the maximum value if the node is a max-heap, or the child with the minimum value if the node is a min-heap. After the swap, recursively heapify the subtree rooted at the child node.
- Continue the heapify operation for each non-leaf node from the last non-leaf node to the root node.
For example, suppose we have an array of values $[10, 5, 3, 4, 1]$, and we want to build a min-heap from the array. The initialized array is as follows:
1 | [10, 5, 3, 4, 1] |
The tree representation before building the heap is as follows:
1 | 10 |
Starting from the last non-leaf node at index $\frac{5}{2} - 1 = 1$, we heapify the subtree rooted at the node at index 1. This can be done by comparing the node at index 1 with its left and right children, and swapping the node with the minimum value of the three nodes. After the swap, the tree representation becomes:
1 | 10 |
Then we continue the heapify operation for the node at index 0. Because the heap height is 2, the swap operation can be performed at most twice. In the first step, the node at index 0 is compared with its left and right children, and swapped with the minimum value of the three nodes. After the swap, the tree representation becomes:
1 | 1 |
In the second step, the node at index 1 is compared with its left and right children, and swapped with the minimum value of the three nodes. After the swap, the tree representation becomes:
1 | 1 |
The array representation of the binary heap is:
1 | [1, 4, 3, 10, 5] |
Heap Building Algorithm Analysis
A heap of size $N$ has at most $\lceil \frac{N}{2^{h + 1}} \rceil$ nodes at height $h$. The height of the heap is $\log N$. The heap building algorithm performs the heapify operation for each non-leaf node from the last non-leaf node to the root node.
The number of operations to build the heap can be described as
$$
\begin{align}
T(N) &= \sum_{h = 0}^{\log N} \left\lceil \frac{N}{2^{h + 1}} \right\rceil \cdot O(h) \\
&= O\left( \sum_{h = 0}^{\log N} \left\lceil \frac{N}{2^{h + 1}} \right\rceil \cdot h \right) \\
&= O\left( \sum_{h = 0}^{\log N} \frac{N}{2^{h + 1}} \cdot h \right) \\
&= O\left( N \sum_{h = 0}^{\log N} \frac{h}{2^{h + 1}} \right) \\
&= O\left( \frac{1}{2} N \sum_{h = 0}^{\log N} \frac{h}{2^{h}} \right) \\
&= O\left( N \sum_{h = 0}^{\log N} \frac{h}{2^{h}} \right) \\
&= O\left( N \sum_{h = 0}^{\infty} \frac{h}{2^{h}} \right) \\
&= O\left( N \sum_{h = 0}^{\infty} h \left( \frac{1}{2} \right)^{h} \right) \\
\end{align}
$$
Given the series $\sum_{n = 0}^{\infty} x^n$, assuming $|x| < 1$, the series can be computed as
$$
\begin{align}
\sum_{n = 0}^{\infty} x^n &= \frac{1\times \left(1 - x^\infty\right)}{1 - x} \\
&= \frac{1}{1 - x}
\end{align}
$$
If we differentiate the series $\sum_{n = 0}^{\infty} x^n$, we will have
$$
\begin{align}
\sum_{n = 0}^{\infty} nx^{n - 1} &= \frac{1}{(1 - x)^2} \\
\end{align}
$$
Finally, we multiply the above series by $x$ and we will have
$$
\begin{align}
\sum_{n = 0}^{\infty} nx^{n} &= \frac{x}{(1 - x)^2} \\
\end{align}
$$
In our case, $x = \frac{1}{2}$, and we have
$$
\begin{align}
\sum_{h = 0}^{\infty} h \left( \frac{1}{2} \right)^{h} &= \frac{\frac{1}{2}}{\left(1 - \frac{1}{2}\right)^2} \\
&= 2
\end{align}
$$
$$
\begin{align}
T(N) &= O\left( N \sum_{h = 0}^{\infty} \frac{h}{2^{h}} \right) \\
&= O\left( 2N \right) \\
&= O(N)
\end{align}
$$
Therefore, the heap building algorithm has a time complexity of $O(N)$.
Heap Building Algorithm Implementation
The implementation of the heap building algorithm in C++ is as follows:
1 |
|
1 | $ g++ build_heap.cpp -o build_heap -std=c++14 |
References
Heap Building Algorithm
https://leimao.github.io/blog/Heap-Building-Asymptotic-Algorithm/