Commit 99cd4060 authored by GILLES Sebastien's avatar GILLES Sebastien
Browse files

Containers: insist on reserve/resize difference.

parent 5e1bf353
%% Cell type:markdown id: tags:
# [Getting started in C++](/) - [Useful concepts and STL](/notebooks/5-UsefulConceptsAndSTL/0-main.ipynb) - [Containers](/notebooks/5-UsefulConceptsAndSTL/3-Containers.ipynb)
%% Cell type:markdown id: tags:
<h1>Table of contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li><li><span><a href="#std::vector" data-toc-modified-id="std::vector-2"><code>std::vector</code></a></span><ul class="toc-item"><li><span><a href="#Allocator-template-parameter" data-toc-modified-id="Allocator-template-parameter-2.1">Allocator template parameter</a></span></li><li><span><a href="#Most-used-constructors" data-toc-modified-id="Most-used-constructors-2.2">Most used constructors</a></span></li><li><span><a href="#Size" data-toc-modified-id="Size-2.3">Size</a></span></li><li><span><a href="#Adding-new-elements" data-toc-modified-id="Adding-new-elements-2.4">Adding new elements</a></span></li><li><span><a href="#Direct-access:-operator[]-and-at()" data-toc-modified-id="Direct-access:-operator[]-and-at()-2.5">Direct access: <code>operator[]</code> and <code>at()</code></a></span></li><li><span><a href="#Under-the-hood:-storage-and-capacity" data-toc-modified-id="Under-the-hood:-storage-and-capacity-2.6">Under the hood: storage and capacity</a></span></li><li><span><a href="#reserve()-and-resize()" data-toc-modified-id="reserve()-and-resize()-2.7"><code>reserve()</code> and <code>resize()</code></a></span></li><li><span><a href="#std::vector-as-a-C-array" data-toc-modified-id="std::vector-as-a-C-array-2.8"><code>std::vector</code> as a C array</a></span></li><li><span><a href="#Iterators" data-toc-modified-id="Iterators-2.9">Iterators</a></span></li></ul></li><li><span><a href="#Incrementing-/-decrementing-iterators" data-toc-modified-id="Incrementing-/-decrementing-iterators-3">Incrementing / decrementing iterators</a></span></li><li><span><a href="#Other-containers" data-toc-modified-id="Other-containers-4">Other containers</a></span></li></ul></div>
%% Cell type:markdown id: tags:
## Introduction
Containers are the standard answer to a very common problem: how to store a collection of homogeneous data, while ensuring the kind of safety RAII provides.
In this chapter, I won't deal with **associative containers** - which will be handled in the [very next chapter](/notebooks/5-UsefulConceptsAndSTL/4-AssociativeContainers.ipynb).
## `std::vector`
The container of choice, which I haven't resisted using a little in previous examples so far...
### Allocator template parameter
Its full prototype is:
````
template
<
class T,
class Allocator = std::allocator<T>
> class vector;
````
where the second template argument provides the way the memory is allocated. Most of the time the default value is ok and therefore in use you often have just the type stored within, e.g. `std::vector<double>`.
### Most used constructors
%% Cell type:markdown id: tags:
* Empty constructors: no element inside.
%% Cell type:code id: tags:
``` C++17
#include <vector>
{
std::vector<double> bar;
}
```
%% Cell type:markdown id: tags:
* Constructors with default number of elements. The elements are the default-constructed ones in this case.
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<double> bar(3);
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* Constructors with default number of elements and a default value.
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<double> bar(3, 4.3);
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* Since C++ 11, constructor with the initial content.
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
for (auto item : foo)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* And of course copy and move constructions
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::vector<int> bar { foo };
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
### Size
A useful perk is that in true object paradigm, `std::vector` knows its size at every moment (in C with dynamic arrays you needed to keep track of the size independantly: the array was actually a pointer which indicates where the array started, but absolutely not when it ended.).
The method to know it is `size()`:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "Size = " << foo.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
### Adding new elements
`std::vector` provides an easy and (most of the time) cheap to add an element **at the end of the array**. The method to add a new element is `push_back`:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "Size = " << foo.size() << std::endl;
foo.push_back(7);
std::cout << "Size = " << foo.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
There is also an `insert()` method to add an element anywhere, but it is not very efficient (see capacity below).
%% Cell type:markdown id: tags:
### Direct access: `operator[]` and `at()`
`std::vector` provides a direct access to an element through an index (that is not true for all containers) with the `operator[]`:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[1] = " << foo[1] << std::endl;
}
```
%% Cell type:markdown id: tags:
Direct access is not checked: if you go beyond the size of the vector you enter undefined behaviour territory:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[4] = " << foo[4] << std::endl; // undefined territory
}
```
%% Cell type:markdown id: tags:
A specific method `at()` exists that performs the adequate check and thrown an exception if needed:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[4] = " << foo.at(4) << std::endl; // exception thrown
}
```
%% Cell type:markdown id: tags:
I do not necessarily recommend it: I would rather check the index is correct with an `assert`, which provides the runtime check in debug mode only and doesn't slow down the code in release mode.
%% Cell type:markdown id: tags:
### Under the hood: storage and capacity
In practice, `std::vector` is a dynamic array allocated with safety through the use of RAII.
To make `push_back` a O(1) operation most of the time, slightly more memory than what you want to use is allocated.
The `capacity()` must not be mistaken for the `size()`:
* `size()` is the number of elements in the array and might be of use for the end-user.
* `capacity()` is more internal: it is the underlying memory the compiler allocated for the container, which is a bit larger to make room for few new elements.
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo;
for (auto i = 0ul; i < 10ul; ++i)
{
std::cout << "Vector: size = " << foo.size() << " and capacity = " << foo.capacity() << std::endl;
foo.push_back(i);
}
}
```
%% Cell type:markdown id: tags:
The pattern for capacity is clear here but is not dictated by the standard: it is up to the STL vendor to choose the way it deals with it.
So what's happen when the capacity is reached and a new element is added?
* A new dynamic array with the new capacity is created.
* Each element of the former dynamic array is **copied** (or eventually **moved**) into the new one.
* The former dynamic array is destroyed.
The least we can say is we're far from O(1) here! (and we're with a POD type - copy is cheap, which is not the case for certain types of objects...) So obviously it is better to avoid this operation as much as possible!
### `reserve()` and `resize()`
`reserve()` is the method to set manually the size of the capacity. When you have a clue of the expected number of elements, it is better to provide it: even if your guess was flawed, it limits the number of reallocations:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo;
foo.reserve(5); // 10 would have been better of course!
for (auto i = 0ul; i < 10ul; ++i)
{
std::cout << "Vector: size = " << foo.size() << " and capacity = " << foo.capacity() << std::endl;
foo.push_back(i);
}
}
```
%% Cell type:markdown id: tags:
It must not be mistaken with `resize()`, which changes the size of the meaningful content of the dynamic array.
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <string>
// Utility to print the content of a non-associative container.
// Don't bother with it now: it uses up iterators we'll see a bit below.
template
<
class VectorT
>
void PrintVector(const VectorT& vector,
std::string separator = ", ", std::string opener = "[", std::string closer = "]\n")
{
auto size = vector.size();
std::cout << "Size = " << size << " Capacity = " << vector.capacity() << " Content = ";
std::cout << opener;
auto it = vector.cbegin();
auto end = vector.cend();
static_cast<void>(end); // to avoid compilation warning in release mode
for (decltype(size) i = 0u; i + 1u < size; ++it, ++i)
{
assert(it != end);
std::cout << *it << separator;
}
if (size > 0u)
std::cout << *it;
std::cout << closer;
}
```
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo { 3, 5};
PrintVector(foo);
foo.resize(8, 10); // Second optional argument gives the values to add.
PrintVector(foo);
foo.resize(3, 15);
PrintVector(foo);
}
```
%% Cell type:markdown id: tags:
As you see, `resize()` may increase or decrease the size of the `std::vector`; if it decreases it some values are lost.
You may see as well the capacity is not adapted consequently; you may use `shrink_to_fit()` method to tell the program to reduce the capacity but it is not binding and the compiler may not do so (it does here):
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo { 3, 5};
PrintVector(foo);
foo.resize(8, 10); // Second optional argument gives the values to add.
PrintVector(foo);
foo.resize(3, 10);
PrintVector(foo);
foo.shrink_to_fit();
PrintVector(foo);
}
```
%% Cell type:markdown id: tags:
As a rule:
* When you use `reserve`, it often means you intend to add new content with `push_back()` which increases the size by 1 (and the capacity would be unchanged provided you estimated the argument given to reserve well).
* When you use `resize`, you intend to modify on the spot the values in the container, with for instance `operator[]`, a loop or iterators.
A common mistake is to mix up unduly both:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<int> five_pi_digits;
five_pi_digits.resize(5);
five_pi_digits.push_back(3);
five_pi_digits.push_back(1);
five_pi_digits.push_back(4);
five_pi_digits.push_back(1);
five_pi_digits.push_back(5);
for (auto item : five_pi_digits)
std::cout << "Digit = " << item << std::endl; // Not what we intended!
}
```
%% Cell type:markdown id: tags:
### `std::vector` as a C array
In your code, you might at some point use a C library which deals with dynamic array. If the function doesn't mess with the structure of the dynamic array (by reallocating the content for instance), you may use without any issue a `std::vector` through its method `data()`
**NOTE:** Xeus-cling does not support yet C printf; so you may try this [@Coliru](https://coliru.stacked-crooked.com/a/f55f1e11833594aa).
%% Cell type:code id: tags:
``` C++17
#include <cstdio>
// A C function
void C_PrintArray(double* array, std::size_t Nelt)
{
if (Nelt > 0ul)
{
printf("[");
for (auto i = 0ul; i < Nelt - 1; ++i)
printf("%lf, ", array[i]);
printf("%lf]", array[Nelt - 1]);
}
else
printf("[]");
}
```
%% Cell type:code id: tags:
``` C++17
#include <vector>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
C_PrintArray(cpp_vector.data(), cpp_vector.size());
}
```
%% Cell type:markdown id: tags:
`data()` was introduced in C++ 11; previously you could do the same with equivalent but much less appealing call to the address of the first element:
%% Cell type:code id: tags:
``` C++17
#include <vector>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
C_PrintArray(&cpp_vector[0], cpp_vector.size());
}
```
%% Cell type:markdown id: tags:
### Iterators
**Iterators** are an useful feature that is less prominent with C++ 11 (albeit still very useful if you use STL algorithm) but needs to be at least acknowledged as under the hood they are still used in the more sexy [`for`](/notebooks/1-ProceduralProgramming/2-Conditions-and-loops.ipynb#New-for-loop) loops now available.
The idea of an iterator is to provide an object to navigate over all (or part of the) items of a container efficiently.
Let's forget for a while our syntactic sugar `for (auto item : container)` and see what our options are:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
const auto size = cpp_vector.size();
for (auto i = 0ul; i < size; ++i)
std::cout << cpp_vector[i] << " ";
}
```
%% Cell type:markdown id: tags:
It is not extremely efficient: at each call to `operator[]`, the program must figure out the element to draw without using the fact it had just fetched the element just in the previous memory location. Iterators provides this more efficient access:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
std::vector<double>::const_iterator end = cpp_vector.cend();
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; ++it)
std::cout << it - cpp_vector.cbegin() << "\t" << *it << " ";
}
```
%% Cell type:markdown id: tags:
It is more efficient and quite verbosy; prior to C++ 11 you had to use this nonetheless (just `auto` would simplify greatly the syntax here but it is also a C++11 addition...)
Iterators are *not* pointers, even if they behave really similarly, e.g. they may use the same `*` and `->` syntax (they might be implemented as pointers, but think of it as private inheritance in this case...)
There are several flavors:
* Constant iterators, used here, with which you can only read the value under the iterator.
* Iterators, with which you can also modify it.
* Reverse iterators, to iterate the container from the last to the first.
There are default values for each container:
* `begin()` points to the very first element of the container.
* `end()` is **after** the last element of the container.
* `cbegin()` is the constant_iterator that does the same job as `begin()`; prior to C++11 it was confusingly named `begin()`.
* `cend()`: you might probably figure it out...
* `rbegin()` points to the very last element of the container.
* `rend()` is **before** the first element of the container.
What is tricky with them is that they may become invalid if some operations are performed in the time being on the container. For instance if the container is extended the iterators become invalid. Therefore, code like:`
%% Cell type:code id: tags:
``` C++17
#include <vector>
{
std::vector<int> vec { 2, 3, 4, 5, 7, 18 };
for (auto item : vec)
{
if (item % 2 == 0)
vec.push_back(item + 2); // don't do that!
}
PrintVector(vec);
}
```
%% Cell type:markdown id: tags:
is undefined behaviour: it might work (seems to in this notebook) but is not robust. Here it "works" but you may see the iteration is done over the initial vector; the additional values aren't iterated over (we would end up with an infinite loop in this case).
So, the bottom line is you should really separate actions that modify the structure of a container and iteration over it.
## Incrementing / decrementing iterators
As you POD types, there are both a pre- and post-increment available:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
std::vector<double>::const_iterator end = cpp_vector.cend();
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; ++it) // pre-increment
std::cout << *it << " ";
std::cout << std::endl;
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; it++) // post-increment
std::cout << *it << " ";
}
```
%% Cell type:markdown id: tags:
Without any surprises the result is the same... but the efficiency is absolutely not: post-increment actually makes a copy of the iterator that replaces the former one, whereas pre-increment one just modify the current value. So if you do not care about pre- or post- increment (as in the case above) stick with pre-increment one.
%% Cell type:markdown id: tags:
## Other containers
`std::vector` is not the only possible choice; I will present very briefly the other possibilities here:
* `std::list`: A double-linked list: the idea is that each element knows the address of the element before and the element after. It might be considered if you need to add often elements at specific locations in the list: adding a new element is just changing 2 pointers and setting 2 new ones for instances. You can't access directly an element by its index with a `std::list`.
* `std::slist`: A single-linked list: similar as a `std::list` except only the pointer to the next element is kept.
* `std::deque`: For "double ended queue"; this container may be helpful if you need to store a really huge amount of data that might not fit within a `std::vector`. It might also be of use if you are to add often elements in front on the list: there is `push_front()` method as well as a `push_back` one. Item 18 of \cite{Meyers2001} recommends using `std::deque` with `bool`: `std::vector<bool>` was an experiment to provide a specific implementation to spare memory that went wrong...
* `std::array`: You should use this one if the number of elements is known at compile time and doesn't change at all.
* `std::string`: Yes, it is actually a container! I will not tell much more about it; just add that it is the sole container besides `std::vector` that ensures continuous storage.
%% Cell type:markdown id: tags:
# References
(<a id="cit-Meyers2001" href="#call-Meyers2001">Meyers, 2001</a>) Scott Meyers, ``_Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library_'', 2001.
%% Cell type:markdown id: tags:
© _CNRS 2016_ - _Inria 2018-2019_
_This notebook is an adaptation of a lecture prepared by David Chamont (CNRS) under the terms of the licence [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](http://creativecommons.org/licenses/by-nc-sa/4.0/)_
_The present version has been written by Sébastien Gilles and Vincent Rouvreau (Inria)_
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment