If a range of N elements is already sorted, constructing a std::set, std::multiset, std::map, or std::multimap from that range has time complexity O(N); construction from an unsorted range has time complexity O(N*log(N)).

Detailed description Link to heading

We’ll discuss std::set; the same reasoning applies to std::map, std::multiset, and std::multimap.

When terms like “sorted” or “maximum” are used, they are meant with respect to the container’s comparator.

Let’s check cppreference for the std::set constructor that takes two iterators:

template<class InputIt>
set(InputIt first, InputIt last, const Compare& comp = Compare(), const Allocator& alloc = Allocator());

Complexity: N*log(N) where N is std::distance(first, last) in general, linear in N if [first, last) is already sorted by value_comp().

A regular insertion into std::set has O(log(N)) complexity. How do implementations actually get the linear case?

A std::set is typically implemented as a red-black tree, which has O(log(N)) insertion complexity (in general). This complexity is a combination of two parts, each of which has worst-case O(log(N)) complexity:

  • finding the place where the element should be inserted;
  • tree fixup: restoring properties of the red-black tree after inserting the element.

Is there a way to eliminate a logarithmic tree search for the insertion position? Yes. An RB-tree is a search tree, meaning that if we insert a new maximum, it should become the right child of the current maximum element (in a pre-fixup state). Therefore, we can remember the address of the inserted node and use it to skip a full-tree search for the insertion position of the next element. Let’s generalize this idea: we can avoid the O(log(N)) search for the insertion position entirely if we can provide the correct insertion position.

These suggested insertion positions are called hints. A hint is an iterator to the position before which (as close as possible) the new element will be inserted. For example, std::set::end() hint can be used to insert a new maximum into the set; std::set::begin() — for the new minimum. Hints can be passed to std::set::insert and std::set::emplace_hint.

A hint is just a suggestion: implementations typically check whether the key can be inserted at (or near) the hinted position. If not, they fall back to the regular O(log(N)) insertion. If the insertion took place right before the hint, the Standard promises O(1) amortized time complexity for these functions.

libstdc++ used to “look around” the hint, even if it is incorrect. But now, the hint is simply discarded if it is incorrect.

How is a hint checked? In tree-based implementations, the usual strategy is to check whether the new key fits between the hint and its in-order neighbors (typically prev(hint) / hint, or hint / next(hint)). If it fits, the insertion position is known and no full-tree search is needed; otherwise the implementation falls back to the normal O(log(N)) search. These checks use only a constant number of comparator executions, but moving an iterator to its predecessor/successor (--it / ++it) to find an in-order neighbor can be worst-case O(log(N)) in an RB-tree (the in-order neighbor element might be far from the hint in the tree). It is non-trivial to guarantee O(1) amortized complexity for all hints and usage patterns and, in theory, we might get O(log(N)) insertion complexity even with a correct hint in existing implementations. For example, in implementations such as GCC’s libstdc++ and Clang’s libc++, inserting N elements with a correct hint that is the current tree root will lead to O(N*log(N)) complexity, because each hint will take O(log(N)) time to be verified. It seems the Standard’s amortized O(1) complexity guarantee is intended mainly for typical, non-adversarial usage patterns.

Implementations might do optimizations for such commonly used hints as std::set::begin() and std::set::end(). GCC’s libstdc++ maintains cached pointers to both the minimum and maximum nodes, so validating std::set::end() hint is cheap. libc++ caches the minimum, but not the maximum. Therefore, an O(log(N)) descent from the root is needed to find the maximum node and validate std::set::end() hint.

However, it is possible in theory to implement guaranteed O(1) hint correctness checks. To do this, the tree nodes also form an in-order doubly linked list. This would require extra memory overhead (two pointers per node to build a list) and additional bookkeeping overhead, so this is not actually worth it.

The remaining part is tree rebalancing (fixup). The rebalancing work is O(1) amortized per insertion over a sequence of insertions into an empty red-black tree. The proof for this can be found, for example, here (theorem 9.2).

With this in mind, we can think of the following range construction algorithm. Insert all elements consecutively, keeping track of the current maximum value in the container. If the value-to-insert is greater than the current maximum, then we already know the insertion position: right child of the maximum node, followed by tree fixup. Otherwise, do a normal insertion. For sorted ranges, the insertion position of each element is already known. This results in O(N) total construction complexity. For unsorted ranges, the performance would degrade to O(N*log(N)). This is exactly what Clang’s libc++ does. GCC’s libstdc++ does something a bit different: it constantly uses std::set::end() as a hint. That’s fine in their case, because in their implementation the pointer to the node with maximum value is always maintained, so it is quick to verify std::set::end() hint correctness.

The range overload of std::set::insert doesn’t provide a linear complexity guarantee for sorted ranges.

Example Link to heading

The complexities written in comments are the complexities guaranteed by the Standard:

#include <iterator>
#include <ranges>
#include <set>

int main() {
  const auto v = std::views::iota(0, 10000);  // 0 ... 9999

  // O(N)
  std::set<int> s1{std::cbegin(v), std::cend(v)};

  // O(N*log(N))
  std::set<int> s2;
  for (int x : v) {
    s2.insert(x);
  }

  // O(N)
  std::set<int> s3;
  for (int x : v) {
    s3.insert(std::cend(s3), x);
  }
}

Benchmarking Link to heading

Code Link to heading

The online benchmark results and code are here.

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <numeric>
#include <random>
#include <set>
#include <vector>

constexpr int N = 1 << 12;

std::vector<int> make_sorted(std::size_t n) {
  std::vector<int> v(n);
  std::iota(v.begin(), v.end(), 0);
  return v;
}

// Generates the same unsorted vector for different STL implementations
// for a fair comparison.
std::vector<int> make_unsorted(std::size_t n) {
  std::vector<int> v(n);
  std::mt19937_64 mt{};
  for (int& x : v) {
    x = static_cast<int>(mt());
  }
  return v;
}

static void constructor_sorted(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_sorted(n);
  for (auto _ : state) {
    std::set<int> s(v.cbegin(), v.cend());
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(constructor_sorted)->Arg(N);

static void insert_sorted_begin_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  std::vector<int> v = make_sorted(n);
  std::reverse(std::begin(v), std::end(v));
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(s.begin(), x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_sorted_begin_hint)->Arg(N);

static void insert_sorted_end_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_sorted(n);
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(s.end(), x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_sorted_end_hint)->Arg(N);

static void insert_sorted_range(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_sorted(n);
  for (auto _ : state) {
    std::set<int> s;
    s.insert(std::cbegin(v), std::cend(v));
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_sorted_range)->Arg(N);

static void insert_sorted_no_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_sorted(n);
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_sorted_no_hint)->Arg(N);

static void constructor_unsorted(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_unsorted(n);
  for (auto _ : state) {
    std::set<int> s(v.begin(), v.end());
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(constructor_unsorted)->Arg(N);

static void insert_unsorted_begin_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_unsorted(n);
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(s.begin(), x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_unsorted_begin_hint)->Arg(N);

static void insert_unsorted_end_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_unsorted(n);
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(s.end(), x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_unsorted_end_hint)->Arg(N);

static void insert_unsorted_range(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_unsorted(n);
  for (auto _ : state) {
    std::set<int> s;
    s.insert(std::cbegin(v), std::cend(v));
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_unsorted_range)->Arg(N);

static void insert_unsorted_no_hint(benchmark::State& state) {
  const std::size_t n = static_cast<std::size_t>(state.range(0));
  const std::vector<int> v = make_unsorted(n);
  for (auto _ : state) {
    std::set<int> s;
    for (int x : v) {
      s.insert(x);
    }
    benchmark::DoNotOptimize(s.size());
  }
}
BENCHMARK(insert_unsorted_no_hint)->Arg(N);

The charts below are for Clang 17:

std::set 4096 elements libc++
std::set 4096 elements libstdc++
std::set 65536 elements libc++
std::set 65536 elements libstdc++

My local benchmarks with:

  • Clang 18.1.3 with flags -O3 -std=c++23 -DNDEBUG
  • libc++ 18.1.3
  • libstdc++ 13.3.0 have shown same patterns. These patterns do not change if we use a different number of inserted elements and/or a PMR set with a monotonic buffer, thus eliminating possible memory allocation effects. GCC with libstdc++ gives the same numbers as Clang with libstdc++.

Analysis Link to heading

Bechmarking results are interesting. With GCC’s libstdc++, the range constructor and sequential insertion with std::set::begin()/std::set::end() hints perform similarly in both sorted and unsorted cases.

With Clang’s libc++, for unsorted input, the sequential insertion with std::set::begin() as a hint doesn’t differ much from the range constructor. For sorted input, the range constructor performs similarly as sequential insertion with a std::set::end() hint. However, sequential insertion with a std::set::begin() hint is faster than both! So, it seems that for libc++ the sequential insertion of a sorted range with std::set::begin() hint is faster than the range constructor.

In both libc++ and libstdc++, std::set, std::multiset, std::map, and std::multimap reuse the same underlying tree implementation, so their performance characteristics should be similar as well. I ran a couple of benchmarks for the other three containers, and in each case sequential insertion with a begin() hint was the fastest approach in libc++.

I don’t currently know why the range constructor is slower than insertions with std::set::begin() hint in libc++. The reason why insertion with correct std::set::begin() hints is faster than insertion with correct std::set::end() hints in libc++ is (as mentioned above) that it caches only the minimum node and has to descend the tree from the root to find the maximum. In contrast, GCC’s libstdc++ maintains pointers to both minimum and maximum nodes; that’s why the difference between begin and end insertion here is not that big.

If this libc++ observation is also correct on other data patterns, a straightforward way to speed up the libc++ implementation would be to iterate over the range in reverse order (when we have bidirectional iterators) and insert each value with a std::set::begin() hint rather than the currently used std::set::end()-like behavior. Of course, more thorough testing on different kinds of input data and on different platforms would also be needed. If this is a known feature or there is an existing bug, please let me know.