Half-Open Range for Algorithm Implementation

Introduction

Consider implementing a binary search algorithm for a sorted $0$-indexed array of size $n$, we would use index to retrieve the value from the array and compare with the target value $v$. We will start from the two-ends of the array with two indices $i$ and $j$. There are usually two choices for the array representation.

  1. Half-open range. An array is represented using $[i, j)$, i.e., elements with index in $[i, j-1]$ belong to the array.
  2. None-open range. An array is represented using $[i, j]$, i.e., elements with index in $[i, j]$ belong to the array.

These two array representations could also be extended to the implementation of other algorithms. Further more, the index notion could be somewhat translated more generally to C/C++ pointers and iterators used in various kind of containers.

In this blog post, I would like to discuss the difference between the half-open range representation and the none-open range representation using the binary search algorithm implementations as an example and which representation we should use in practice.

Binary Search Algorithm

Implementations

Let’s see how we will implement the binary search algorithm using the half-open range representation and the none-open range representation.

binary_search.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#include <cassert>
#include <iostream>
#include <vector>

std::vector<int> create_ascending_vector(size_t size)
{
std::vector<int> vec(size);
for (size_t i = 0; i < size; i++)
{
vec[i] = i;
}
return vec;
}

template <typename T>
size_t binary_search_ascending_half_open(std::vector<T> const& vec,
T const& target)
{
size_t i{0};
size_t j{vec.size()};
size_t k{0};
while (i != j)
{
k = (i + j) / 2;
if (vec[k] == target)
{
return k;
}
else if (vec[k] < target)
{
i = k + 1;
}
else
{
j = k;
}
}
// Now i == j and the it is an empty array now.
// Return index vec.size() to indicate that the target was not found.
return vec.size();
}

template <typename T>
size_t binary_search_ascending_none_open(std::vector<T> const& vec,
T const& target)
{
size_t i{0};
size_t j{vec.size() - 1};
size_t k{0};
while (i != j)
{
k = (i + j) / 2;
if (vec[k] == target)
{
return k;
}
else if (vec[k] < target)
{
i = k + 1;
}
else
{
j = k - 1;
}
}
// Now i == j and there is only one element left in the array
// which has not been checked against the target.
if (vec[i] == target)
{
return i;
}
// Return index vec.size() to indicate that the target was not found.
return vec.size();
}

int main()
{
size_t const size{100};
std::vector<int> const vec{create_ascending_vector(size)};
for (int i = 0; i < size; i++)
{
assert(binary_search_ascending_half_open(vec, i) == i);
}
for (int i = 0; i < size; i++)
{
assert(binary_search_ascending_none_open(vec, i) == i);
}
}

Half-Open Range VS None-Open Range

It turns out that the none-open range implementation is logically more complicated than the half-open range implementation. When I was creating the implementations, I completed half-open range implementation pretty quickly. However, I spent much more time on the none-open range implementation and I failed the simple unit tests a couple of times before I finally implemented it correctly.

Fundamentally, that the none-open range implementation is more complicated is because there is no way for none-open range to represent an empty array. For the half-open range array representation, $[i, j)$ is a valid representation for an array for $0 \leq i \leq j \leq n$. In particular, when $i = j$, $[i, j)$ represents an empty array. However, for the none-open range array representation, $[i, j]$ is a valid representation for an array for $0 \leq i \leq j \leq n - 1$. But there is no way to represent an empty array mathematically. When $i = j$, $[i, j]$ always represents an array containing one element.

Conclusions

Always use the half-open range representation for all different kind of containers and their related algorithms. In fact, the notion of half-open range is fundamental to C++ STL and other programming languages.

Half-Open Range for Algorithm Implementation

https://leimao.github.io/blog/Half-Open-Range/

Author

Lei Mao

Posted on

10-30-2022

Updated on

10-30-2022

Licensed under


Comments