C++ STL

Introduction

The Standard Template Library (STL) is a software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components called containers, algorithms, iterators, and functions. Later, the STL has been expanded routinely but the conventions followed by STL remain the same.

In this blog post, I would like to discuss the C++ STL.

STL and Conventions

The STL is held by conventions.

  • Containers represent sequences of values, even though the values do not necessarily have to be consecutive on memory.
  • Containers provide iterators that know how to traverse the sequences they represent.
  • Algorithms use iterators to define the sequences of values on which they should operate.
  • Algorithms consume functions to become more versatile for different use cases.

These conventions cover all the four fundamental components of STL, containers, algorithms, iterators, and functions.

So what is STL? Jon Kalb’s informal definition is

Anything in the standard C++ library supporting iterators.

Because containers provide iterators and algorithms use iterators in the C++ standard library.

Containers

A container is a holder object that stores a collection of other objects (its elements). They are implemented as class templates, which allows a great flexibility in the types supported as elements.

Iterators

In the STL, pointer-like objects are known as iterators.

Iterators can be thought of as restricted generalized pointers. They are restricted, because they do not necessarily support all pointer
operations. For example, sometimes, it does not have to support both == and !=, and sometimes, it does not have to support --. They are generalized, because they do not have to be real pointers. Often the time, they are implemented as nested class inside the container class.

Algorithms

Algorithms use iterators to specify half-open ranges defining the sequences of values on which the algorithms should operate.

Algorithms could be general for most or any of the STL containers but are less optimized. For example, std::find is a general search algorithm for finding a value from all the STL containers. Its complexity is $O(N)$.

Algorithms could also be specialized for certain STL containers and are highly optimized. For example, std::unordered_map<Key,T,Hash,KeyEqual,Allocator>::find is a specialized search algorithm for finding a value from the STL container std::unordered_map<Key,T,Hash,KeyEqual,Allocator>. Its complexity is $O(1)$ on average.

Because containers are often template classes, so algorithms are also often template functions.

Functions

In practice, instead of using conventional functions (pointers to functions), we use function object and lambda expression for STL algorithm, mainly because their performances and programming experiences are much better.

Why Function Object?

Function object, when passing to a function, can be optimized as inline by the compiler, because the compiler knows explicitly the definition of the object at compile time and therefore applies code optimizations accordingly.

For example, we have the following two implementations for find_if. The former implementation takes a function pointer as predicator and the latter implementation takes a template class object as predicator.

find_if.hpp
1
2
3
4
5
6
template <class It, class T>
It find_if(It begin, It end, bool pred(T const&))
{
while (begin != end and not pred(*begin)) ++begin;
return begin;
}
find_if.hpp
1
2
3
4
5
6
template <class It, class Pred>
It find_if(It begin, It end, Pred p)
{
while (begin != end and not p(*begin)) ++begin;
return begin;
}

Why function object over pointer to function in template function?

  1. It is more general. The former one restricts that the return type must be bool, whereas the latter one allows the return type to be anything that is convertible to bool. The input type must be T const& and not anything that is convertible to T const&. The former one can only take pointer to function, but the latter one can take almost anything, including pointer to function, function object, and closure.

  2. Function object calls can be inline optimized in template function, because the compiler has to know explicitly the definitions of the object and the template function at compile time and therefore generates specialized code with optimizations accordingly. Say, we have 3 different types of function objects used in find_if, the compiler will generate 3 different specialized code blocks for the 3 different types of function objects. Pointer to function, however, cannot be inline optimized in template function, because the compiler will only in principle check the type of the pointer matches and rarely any optimization has to be applied. In this case, the compiler will generate one code block only. The performance of the one with function object is extremely likely to be faster than the one with pointer to function, because of the inline optimization.

  3. Function objects with parameters avoid code bloat. Suppose we have to create 10 thousands of different predicator for find_if to see if there is any value in the container matches certain value. Without using function objects with parameters, we will have to create 10 thousands of different functions, resulting in code bloat. Even with template function, the compiler will generate 10 thousands of different functions, still resulting in code bloat.

Manually implementing functions,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bool equal_1(int val)
{
return val == 1;
}

bool equal_2(int val)
{
return val == 2;
}

bool equal_3(int val)
{
return val == 3;
}

...

Even with non-type value template functions,

1
2
3
4
5
template <int target>
bool equal(int val)
{
return val == target;
}

With function objects with parameters, only one copy of the class member functions will be generated, no matter how many objects are created.

1
2
3
4
5
6
7
8
9
10
11
12
13
class Equal
{
public:
Equal(int targe) : m_targe{target}
{
}
bool operator()(int val) const
{
return val == m_target;
}
private:
int m_target;
};

Why Lambda Expression?

Similar to function object, lambda expression also supports type deduction and it could be inline optimized in template function as well. Essentially, lambda expression have all the advantages that the function object has. Besides, it is self expressive.

Summary

  • There are four fundamental components in the STL, containers, iterators, algorithms, and functions (function objects and lambda expressions).
  • Containers are class templates.
    • Data was managed by the container.
  • Iterators are pointer-like objects.
    • Pairs are often used to specify half-open ranges on which algorithms should work.
    • The “one past the end” iterator, i.e., the iterator returned by std::end, is often used to signal failure.
  • Algorithms are function templates.
    • They often encapsulate loops, e.g., find and find_if.
  • Function objects are function-like objects.
    • They are generated by classes that overload operator().
  • The STL is held together by conventions.
    • This allows for flexibility and extensibility.
Author

Lei Mao

Posted on

11-22-2022

Updated on

11-22-2022

Licensed under


Comments