Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 7ce4d84e authored by ROUVREAU Vincent's avatar ROUVREAU Vincent
Browse files

Merge branch 'sebastien_note_2024_day4' into 'master'

Fixes from my notes of the last day of March 2024 session

See merge request !120
parents 766c27ea 964aee53
No related branches found
No related tags found
1 merge request!120Fixes from my notes of the last day of March 2024 session
Pipeline #1085761 passed
%% Cell type:markdown id: tags:
# [Getting started in C++](./) - [Useful concepts and STL](./0-main.ipynb) - [Containers](./3-Containers.ipynb)
%% Cell type:markdown id: tags:
## Introduction
Containers are the standard answer to a very common problem: how to store a collection of homogeneous data, while ensuring the kind of safety RAII provides.
In this chapter, I won't deal with **associative containers** - which will be handled in the [very next chapter](/notebooks/5-UsefulConceptsAndSTL/4-AssociativeContainers.ipynb).
## `std::vector`
The container of choice, which I haven't resisted using a little in previous examples so far...
### Allocator template parameter
Its full prototype is:
```c++
template
<
class T,
class Allocator = std::allocator<T>
> class vector;
```
where the second template argument provides the way the memory is allocated. Most of the time the default value is ok and therefore in use you often have just the type stored within, e.g. `std::vector<double>`.
### Most used constructors
%% Cell type:markdown id: tags:
* Empty constructors: no element inside.
%% Cell type:code id: tags:
``` c++
#include <vector>
{
std::vector<double> bar;
}
```
%% Cell type:markdown id: tags:
* Constructors with default number of elements. The elements are the default-constructed ones in this case.
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<double> bar(3);
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* Constructors with default number of elements and a default value.
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<double> bar(3, 4.3);
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* Since C++ 11, constructor with the initial content (prior to C++ 11 you had to use an empty constructor and then add all elements one by one with something like `push_back` (see below) or use a third party library such as Boost::Assign).
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
for (auto item : foo)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
* And of course copy (and move - that we will present soon...) constructions
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::vector<int> bar { foo };
for (auto item : bar)
std::cout << item << std::endl;
}
```
%% Cell type:markdown id: tags:
### Size
A useful perk is that in true object paradigm, `std::vector` knows its size at every moment (in C with dynamic arrays you needed to keep track of the size independently: the array was actually a pointer which indicates where the array started, but absolutely not when it ended.).
The method to know it is `size()`:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "Size = " << foo.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
### Adding new elements
`std::vector` provides an easy and (most of the time) cheap way to add an element **at the end of the array**. The method to add a new element is `push_back`:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "Size = " << foo.size() << std::endl;
foo.push_back(7);
std::cout << "Size = " << foo.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
There is also an `insert()` method to add an element anywhere, but it is not very efficient (see capacity below).
%% Cell type:markdown id: tags:
### Direct access: `operator[]` and `at()`
`std::vector` provides a direct access to an element through an index (that is not true for all containers) with the `operator[]`:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[1] = " << foo[1] << std::endl; // Remember: indexing starts at 0 in C and C++
}
```
%% Cell type:markdown id: tags:
Direct access is not checked: if you go beyond the size of the vector you enter undefined behaviour territory:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[4] = " << foo[4] << std::endl; // undefined territory
}
```
%% Cell type:markdown id: tags:
A specific method `at()` exists that performs the adequate check and thrown an exception if needed:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> foo { 3, 5, 6 };
std::cout << "foo[4] = " << foo.at(4) << std::endl; // exception thrown
}
```
%% Cell type:markdown id: tags:
I do not necessarily recommend it: I would rather check the index is correct with an `assert`, which provides the runtime check in debug mode only and doesn't slow down the code in release mode.
%% Cell type:markdown id: tags:
### Under the hood: storage and capacity
In practice, `std::vector` is a dynamic array allocated with safety through the use of RAII.
To make `push_back` a O(1) operation most of the time, slightly more memory than what you want to use is allocated.
The `capacity()` must not be mistaken for the `size()`:
* `size()` is the number of elements in the array and might be of use for the end-user.
* `capacity()` is more internal: it is the underlying memory area the compiler allocated for the container, which is a bit larger to make room for few new elements.
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo;
for (auto i = 0ul; i < 10ul; ++i)
{
std::cout << "Vector: size = " << foo.size() << " and capacity = " << foo.capacity() << std::endl;
foo.push_back(i);
}
}
```
%% Cell type:markdown id: tags:
The pattern for capacity is clear here but is not dictated by the standard: it is up to the STL vendor to choose the way it deals with it.
So what's happen when the capacity is reached and a new element is added?
* A new dynamic array with the new capacity is created.
* Each element of the former dynamic array is **copied** (or eventually **moved**) into the new one.
* The former dynamic array is destroyed.
The least we can say is we're far from O(1) here! (and we're with a POD type - copy is cheap, which is not the case for certain types of objects...) So obviously it is better to avoid this operation as much as possible!
%% Cell type:markdown id: tags:
### `reserve()` and `resize()`
`reserve()` is the method to set manually the value of the capacity. When you have a clue of the expected number of elements, it is better to provide it: even if your guess was flawed, it limits the number of reallocations:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo;
foo.reserve(5); // 10 would have been better of course!
for (auto i = 0ul; i < 10ul; ++i)
{
std::cout << "Vector: size = " << foo.size() << " and capacity = " << foo.capacity() << std::endl;
foo.push_back(i);
}
}
```
%% Cell type:markdown id: tags:
It must not be mistaken with `resize()`, which changes the size of the meaningful content of the dynamic array.
%% Cell type:code id: tags:
``` c++
#include <iostream>
#include <string>
// Helper function to avoid typing endlessly the same lines...
template<class VectorT>
void PrintVector(const VectorT& vector)
{
auto size = vector.size();
std::cout << "Size = " << size << " Capacity = " << vector.capacity() << " Content = [ ";
for (auto item : vector)
std::cout << item << ' ';
std::cout << ']' << std::endl;
}
```
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo { 3, 5};
PrintVector(foo);
foo.resize(8, 10); // Second optional argument gives the values to add.
PrintVector(foo);
foo.resize(12); // If not specified, a default value is used - here 0 for a POD
// The default value is the same as the one that would be used when constructing
// an element with empty braces - here `std::size_t myvariable {}`;
PrintVector(foo);
foo.resize(3, 15);
PrintVector(foo);
}
```
%% Cell type:markdown id: tags:
As you see, `resize()` may increase or decrease the size of the `std::vector`; if it decreases it some values are lost.
You may see as well the capacity is not adapted consequently; you may use `shrink_to_fit()` method to tell the program to reduce the capacity but it is not binding and the compiler may not do so (it does here):
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<std::size_t> foo { 3, 5};
PrintVector(foo);
foo.resize(8, 10); // Second optional argument gives the values to add.
PrintVector(foo);
foo.resize(3, 10);
PrintVector(foo);
foo.shrink_to_fit();
PrintVector(foo);
}
```
%% Cell type:markdown id: tags:
As a rule:
* When you use `reserve`, it often means you intend to add new content with `push_back()` which increases the size by 1 (and the capacity would be unchanged provided you estimated the argument given to reserve well).
* When you use `resize`, you intend to modify on the spot the values in the container, with for instance `operator[]`, a loop or iterators.
A common mistake is to mix up unduly both:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> five_pi_digits;
five_pi_digits.resize(5);
five_pi_digits.push_back(3);
five_pi_digits.push_back(1);
five_pi_digits.push_back(4);
five_pi_digits.push_back(1);
five_pi_digits.push_back(5);
PrintVector(five_pi_digits); // not what we intended!
}
```
%% Cell type:markdown id: tags:
### `std::vector` as a C array
In your code, you might at some point use a C library which deals with dynamic array. If the function doesn't mess with the structure of the dynamic array (by reallocating the content for instance), you may use without any issue a `std::vector` through its method `data()`
%% Cell type:code id: tags:
``` c++
#include <cstdio>
// A C function
void C_PrintArray(double* array, size_t Nelt)
{
if (Nelt > 0ul)
{
printf("[");
for (size_t i = 0ul; i < Nelt - 1; ++i)
printf("%lf, ", array[i]);
printf("%lf]", array[Nelt - 1]);
}
else
printf("[]");
}
```
%% Cell type:code id: tags:
``` c++
#include <vector>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
C_PrintArray(cpp_vector.data(), cpp_vector.size());
}
```
%% Cell type:markdown id: tags:
`data()` was introduced in C++ 11; previously you could do the same with equivalent but much less appealing call to the address of the first element:
%% Cell type:code id: tags:
``` c++
#include <vector>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
C_PrintArray(&cpp_vector[0], cpp_vector.size());
}
```
%% Cell type:markdown id: tags:
### Iterators
**Iterators** are an useful feature that is less prominent with C++ 11 (albeit still very useful if you use STL algorithm) but needs to be at least acknowledged as under the hood they are still used in the more sexy [`for`](/notebooks/1-ProceduralProgramming/2-Conditions-and-loops.ipynb#New-for-loop) loops now available.
The idea of an iterator is to provide an object to navigate over all (or part of the) items of a container efficiently.
Let's forget for a while our syntactic sugar `for (auto item : container)` and see what our options are:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
const auto size = cpp_vector.size();
for (auto i = 0ul; i < size; ++i)
std::cout << cpp_vector[i] << " ";
}
```
%% Cell type:markdown id: tags:
It may not extremely efficient: at each call to `operator[]`, the program must figure out the element to draw without using the fact it had just fetched the element just in the previous memory location (in practice now compilers are rather smart and figure this out...)
Iterators provides this (possibly) more efficient access:
Iterators provides another way to access same data (which used to be more efficient but now compilers are cleverer and both are equivalent):
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
std::vector<double>::const_iterator end = cpp_vector.cend();
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; ++it)
std::cout << *it << " ";
}
```
%% Cell type:markdown id: tags:
It is more efficient and quite verbose; prior to C++ 11 you had to use this nonetheless (just `auto` would simplify greatly the syntax here but it is also a C++11 addition...)
Iterators are *not* pointers, even if they behave really similarly, e.g. they may use the same `*` and `->` syntax (they might be implemented as pointers, but think of it as [private inheritance](../2-ObjectProgramming/6-inheritance.ipynb#IS-IMPLEMENTED-IN-TERMS-OF-relationship-of-private-inheritance) in this case...)
There are several flavors:
* Constant iterators, used here, with which you can only read the value under the iterator.
* Iterators, with which you can also modify it.
* Reverse iterators, to iterate the container from the last to the first (avoid them if you can: they are a bit messy to use and may not be used in every algorithms in which standard iterators are ok...)
There are default values for each container:
* `begin()` points to the very first element of the container.
* `end()` is **after** the last element of the container.
* `cbegin()` is the constant_iterator that does the same job as `begin()`; prior to C++11 it was confusingly named `begin()`.
* `cend()`: you might probably figure it out...
* `rbegin()` points to the very last element of the container.
* `rend()` is **before** the first element of the container.
What is tricky with them is that they may become invalid if some operations are performed in the time being on the container. For instance if the container is extended the iterators become invalid. Therefore, code like:
%% Cell type:code id: tags:
``` c++
#include <vector>
{
std::vector<int> vec { 2, 3, 4, 5, 7, 18 };
for (auto item : vec)
{
if (item % 2 == 0)
vec.push_back(item + 2); // don't do that!
}
PrintVector(vec);
}
```
%% Cell type:markdown id: tags:
is undefined behaviour: it might work (did up to 2022; prints gibberish on first try in 2024) but is not robust. Even if it seemingly "works" you may see the iteration is done over the initial vector; the additional values aren't iterated over (we would end up with an infinite loop in this case).
So, the bottom line is you should really separate actions that modify the structure of a container and iterate over it.
## Incrementing / decrementing iterators
As with POD types, there are both a pre- and post-increment available:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<double> cpp_vector { 3., 8., 9., -12.3, -32.35 };
std::vector<double>::const_iterator end = cpp_vector.cend();
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; ++it) // pre-increment
std::cout << *it << " ";
std::cout << std::endl;
for (std::vector<double>::const_iterator it = cpp_vector.cbegin(); it != end; it++) // post-increment
std::cout << *it << " ";
}
```
%% Cell type:markdown id: tags:
Without any surprises the result is the same... but the efficiency is absolutely not: post-increment actually makes a copy of the iterator that replaces the former one, whereas pre-increment one just modify the current value. So if you do not care about pre- or post- increment (as in the case above) stick with pre-increment one.
%% Cell type:markdown id: tags:
## Access a container element - Python like syntax
%% Cell type:code id: tags:
``` c++
#include <iostream>
#include <vector>
std::vector<int> v {1, // can be accessed with begin()[0] or end()[-4]
2, // can be accessed with begin()[1] or end()[-3]
3, // can be accessed with begin()[2] or end()[-2]
4 // can be accessed with begin()[3] or end()[-1]
};
std::cout << v.end()[-2] << " - " << v.begin()[1] << std::endl;
// Displays '3 - 2'
```
%% Cell type:markdown id: tags:
## Other containers
`std::vector` is not the only possible choice; I will present very briefly the other possibilities here:
* `std::list`: A double-linked list: the idea is that each element knows the addresses of the element before and the element after. It might be considered if you need to add often elements at specific locations in the list: adding a new element is just changing 2 pointers and setting 2 new ones. You can't access directly an element by its index with a `std::list`.
* `std::slist`: A single-linked list: similar as a `std::list` except only the pointer to the next element is kept.
* `std::forward_list`: A single-linked list: similar as a `std::list` except only the pointer to the next element is kept. This is a C++ 11 addition.
* `std::deque`: For "double ended queue"; this container may be helpful if you need to store a really huge amount of data that might not fit within a `std::vector`. It might also be of use if you are to add often elements in front on the list: there is `push_front()` method as well as a `push_back` one. Item 18 of [Effective STL](http://localhost:8888/lab/tree/bibliography.ipynb#Effective-STL) recommends using `std::deque` with `bool`: `std::vector<bool>` was an experiment to provide a specific implementation to spare memory when storing booleans that went wrong and should therefore be avoided...
* `std::array`: You should use this one if the number of elements is known at compile time and doesn't change at all, as compiler may provide even more optimizations than for `std::vector` (and your end user can't by mistake modify its size).
* `std::array`: You should use this one if the number of elements is known at compile time and doesn't change at all, as compiler may provide even more optimizations than for `std::vector` (and your end user can't by mistake modify its size). This is a C++ 11 addition.
* `std::string`: Yes, it is actually a container! I will not tell much more about it; just add that it is the sole container besides `std::vector` and `std::array` that ensures contiguous storage.
%% Cell type:markdown id: tags:
[© Copyright](../COPYRIGHT.md)
......
%% Cell type:markdown id: tags:
# [Getting started in C++](./) - [Useful concepts and STL](./0-main.ipynb) - [Associative containers](./4-AssociativeContainers.ipynb)
%% Cell type:markdown id: tags:
## Introduction
A `std::vector` can be seen as an association between two types:
* A `std::size_t`
index, which value is in interval [0, size[, that acts as a key.
* The value actually stored.
The `operator[]` might be used to access one of them:
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<int> prime { 2, 3, 5, 7, 11, 13, 17, 19 };
auto index = 3ul;
std::cout << "Element which key is " << index << " is " << prime[index] << std::endl;
}
```
%% Cell type:markdown id: tags:
An associative container is an extension: what if we could loosen the constraint upon the key and use something else?
%% Cell type:markdown id: tags:
## `std::map`
### Construction
`std::map` is a list of key/value pairs that is ordered through a relationship imposed on the keys.
%% Cell type:code id: tags:
``` c++
#include <map>
{
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
auto index = "Charlie";
std::cout << "Element which key is " << index << " is " << age_list[index] << std::endl;
}
```
%% Cell type:markdown id: tags:
### Iteration
In this example, we set three people with their age. We may iterate through it; the actual storage of an item is here a `std::pair<std::string, unsigned int>`. We haven't seen `std::pair` so far, but think of it as a `std::tuple` with 2 elements (it existed prior to `std::tuple` in fact).
There are two handy attributes to access the respective first and second element: `first` and `second`.
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
for (const auto& pair : age_list)
std::cout << pair.first << " : " << pair.second << std::endl;
```
%% Cell type:markdown id: tags:
#### C++ 17: Structure binding
C++ 17 introduced an alternate new syntax I like a lot that is called **structure bindings**:
%% Cell type:code id: tags:
``` c++
for (const auto& [person, age] : age_list)
std::cout << person << " : " << age << std::endl;
```
%% Cell type:markdown id: tags:
As you see, the syntax allocates on the fly variable (here references) for the first and second element of the pair, making the code much more expressive.
You may read more on them [here](https://www.fluentcpp.com/2018/06/19/3-simple-c17-features-that-will-make-your-code-simpler/); we will use them again in this notebook.
%% Cell type:markdown id: tags:
### Provide another ordering rule
The output order is not an accident: as I said it is an **ordered** associative container, and the key must provide a relationship. The default one is `std::less` but you might specify another in template arguments:
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
{
std::map<std::string, unsigned int, std::greater<std::string>> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
for (const auto& [name, age] : age_list) // structure binding!
std::cout << name << " : " << age << std::endl;
}
```
%% Cell type:markdown id: tags:
### insert()
You may insert another element later with `insert()`:
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
{
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
age_list.insert({"Dave", 44});
age_list.insert({"Alice", 32});
for (const auto& [name, age] : age_list)
std::cout << name << " : " << age << std::endl;
}
```
%% Cell type:markdown id: tags:
See here that Dave was correctly inserted... but Alice was unchanged!
In fact `insert` returns a pair:
* First is an iterator to the newly inserted element, or to the position of the one that made the insertion fail.
* Second is a boolean that returns `true` if the insertion worked.
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
{
auto result = age_list.insert({"Dave", 44});
if (!result.second)
std::cerr << "Insertion of Dave failed" << std::endl;
}
{
auto result = age_list.insert({"Alice", 32});
if (!result.second)
std::cerr << "Insertion of Alice failed" << std::endl;
}
for (const auto& [name, age] : age_list)
std::cout << name << " : " << age << std::endl;
```
%% Cell type:markdown id: tags:
Or even better with structure bindings:
%% Cell type:code id: tags:
``` c++
%%cppmagics cppyy/cppdef
const auto& [iterator, was_properly_inserted] = age_list.insert({"Alice", 32});
```
%% Cell type:code id: tags:
``` c++
if (!was_properly_inserted)
std::cerr << "Insertion of Alice failed" << std::endl;
```
%% Cell type:markdown id: tags:
That's something I dislike in this very useful class: error handling is not up to my taste as you have to remember to check explicitly all went right... (this is the discussion we had previously about error codes all over again...)
### Access to one element: don't use `operator[]`!
And this is not the sole example: let's look for an element in a map:
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
{
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
std::cout << "Alice : " << age_list["Alice"] << std::endl;
std::cout << "Erin : " << age_list["Erin"] << std::endl;
std::cout << "========" << std::endl;
for (const auto& [person, age] : age_list)
std::cout << person << " : " << age << std::endl;
}
```
%% Cell type:markdown id: tags:
So if you provide a wrong key, it doesn't yell and instead creates a new entry on the spot, filling the associated value with the default constructor for the type...
To do it properly (but more verbose!), use the `find()` method (if you're intrigued by the use of iterator there, we will present them more in details in the notebook about [algorithms](./7-Algorithms.ipynb)):
%% Cell type:code id: tags:
``` c++
#include <map>
#include <iostream>
{
std::map<std::string, unsigned int> age_list
{
{ "Alice", 25 },
{ "Charlie", 31 },
{ "Bob", 22 },
};
auto it = age_list.find("Alice");
if (it == age_list.cend())
std::cerr << "No Alice found in the listing!" << std::endl;
else
std::cout << "Alice's age is " << it->second << std::endl;
it = age_list.find("Erin");
if (it == age_list.cend())
std::cerr << "No Erin found in the listing!" << std::endl;
else
std::cout << "Erin's age is " << it->second << std::endl;
for (const auto& [name, age] : age_list)
std::cout << name << " : " << age << std::endl;
}
```
%% Cell type:markdown id: tags:
A side note which will be useful to explain later the `std::unordered_map`: search is performed by dichotomy (~O(log N)).
%% Cell type:markdown id: tags:
### Unicity of key
`std::map` is built on the fact a key must be unique.
If you need to enable possible repetition of keys, you should look at `std::multimap` which provides this possibility with slightly different interface (rather obviously `find()` is replaced by methods that returns a range of iterators).
%% Cell type:markdown id: tags:
### Using objects as keys
You may use your own objects as keys, provided that:
* Either you define `operator<` for it. It is really important to grasp that `operator==` **doesn't matter**: even in `find` it is really `operator<` that is used!
* Or provide as template parameter the ordering relationship you intend to use.
**WARNING:** If you're using pointers as keys, make sure to provide an adequate relationship ordering, typically that takes the pointed object relationship. Otherwise from one run to another you might end with different results as the address won't probably be given in the same order...
%% Cell type:markdown id: tags:
## `std::set`
`std::set` is a special case in which you do not associate a value to the key. The interface is roughly the same.
It might be used for instance if you want to keep a list of stuff you have encountered at least once: you don't care about how many times, but you want to know if it was encountered at least once. A `std::vector` would be inappropriate: you would have to look up its whole content before each insertion. With a `std::set` it is already built-in in the class.
## std::unordered_map
## `std::unordered_map`
This is another associative container introduced in C++ 11, with a different trade-off (and closer to a `dict` in Python for instance):
* Access is much more efficient (~O(1), i.e. independent on the number of elements!).
* Memory imprint is bigger.
* Adding new elements is more expensive.
* The result is not ordered, and there are no rules whatsoever: two runs on the same computer might not yield the list in the same order.
The constraint on the key is different too: the key must be **hashable**, meaning that there must be a specialization of `std::hash` for the type used for key. It must also define `operator==`.
STL provides good such **hashing functions** for POD types (and few others like `std::string`); it is not trivial (but still possible - see for instance [The C++ Standard Library: A Tutorial and Reference](../bibliography.ipynb#The-C++-Standard-Library:-A-Tutorial-and-Reference) for a discussion on this topic) to add new ones.
So to put in a nutshell, if your key type is already handled by the STL and you spend more time reading data than inserting new ones, you should really use this type.
Just an additional note: [The C++ Standard Library: A Tutorial and Reference](../bibliography.ipynb#The-C++-Standard-Library:-A-Tutorial-and-Reference) recommends changing the default internal setting of the class for efficiency: there is an internal float value named `max_load_factor` which has a default value of 1; API of the class introduces a mutator to modify it. He says 0.7f or 0.8f is more efficient; I haven't benchmarked and trusted him on this and am using it in my library.
%% Cell type:code id: tags:
``` c++
#include <unordered_map>
{
std::unordered_map<int, double> list;
list.max_load_factor(0.7f);
}
```
%% Cell type:markdown id: tags:
[© Copyright](../COPYRIGHT.md)
......
%% Cell type:markdown id: tags:
# [Getting started in C++](./) - [Useful concepts and STL](./0-main.ipynb) - [Smart pointers](./6-SmartPointers.ipynb)
%% Cell type:markdown id: tags:
## Introduction
In short, **smart pointers** are the application of [RAII](./2-RAII.ipynb) to pointers: objects which handle more nicely the acquisition and release of dynamic allocation.
There are many ways to define the behaviour of a smart pointer (the dedicated chapter in [Modern C++ design](../bibliography.ipynb#Modern-C++-Design) is a very interesting read for this, especially as it uses heavily the template [policies](../4-Templates/5-MoreAdvanced.ipynb#Policies) to implement his):
* How the pointer might be copied (or not).
* When is the memory freed.
* Whether `if (ptr)` syntax is accepted
* ...
The STL made the choice of providing two (and a half in fact...) kinds of smart pointers (introduced in C++ 11):
* **unique pointers**
* **shared pointers** (and the **weak** ones that goes along with them).
One should also mention for legacy the first attempt: **auto pointers**, which were removed in C++ 17: you might encounter them in some libraries, but by all means don't use them yourself (look for *sink effect* on the Web if you want to know why).
By design all smart pointers keep the whole syntax semantic:
* `*` to dereference the (now smart) pointer.
* `->` to access an attribute of the underlying object.
Smart pointers are clearly a very good way to handle the ownership of a given object.
This does not mean they supersede entirely ordinary (often called **raw** or more infrequently **dumb**) pointers: raw pointers might be a good choice to pass an object as a function parameter (see the discussion for the third question in this [Herb Sutter's post blog](https://herbsutter.com/2013/06/05/gotw-91-solution-smart-pointer-parameters/)). The raw pointer behind a smart pointer may be accessed through the `get()` method.
Both smart pointers exposed below may be constructed directly from a raw pointer; in this case they take the responsibility of destroying the pointer:
%% Cell type:code id: tags:
``` c++
#include <memory>
#include <iostream>
struct Foo
{
~Foo()
{
std::cout << "Destroy foo"<< std::endl;
}
};
{
Foo* raw = new Foo;
std::unique_ptr<Foo> unique(raw); // Now unique_ptr is responsible for pointer ownership: don't call delete
// on `raw`! Destructor of unique_ptr will call the `Foo` destructor.
}
```
%% Cell type:markdown id: tags:
## `unique_ptr`
This should be your first choice for a smart pointer.
The idea behind this smart pointer is that it can't be copied: there is exactly one instance of the smart pointer, and when this instance becomes out of scope the resources are properly released.
In C++ 11 you had to use the classic `new` syntax to create one, but C++ 14 introduced a specific syntax `make_unique`:
%% Cell type:code id: tags:
``` c++
#include <memory>
{
auto ptr = std::make_unique<int>(5);
}
```
%% Cell type:markdown id: tags:
The parenthesis takes the constructor arguments.
The smart pointer can't be copied, but it can be moved:
%% Cell type:code id: tags:
``` c++
#include <memory>
{
auto ptr = std::make_unique<int>(5);
auto copy = ptr; // COMPILATION ERROR: can't be copied!
}
```
%% Cell type:code id: tags:
``` c++
%%cppmagics clang
#include <cstdlib>
#include <iostream>
#include <memory>
int main([[maybe_unused]] int argc, [[maybe_unused]] char** argv)
{
auto ptr = std::make_unique<int>(5);
auto copy = std::move(ptr);
auto moved_ptr = std::move(ptr);
std::cout << "Beware as now there are no guarantee upon the content of ptr: " << *ptr << std::endl; // EXPECTED RUNTIME ISSUE!
return EXIT_SUCCESS;
}
```
%% Cell type:markdown id: tags:
As usual with move semantics, beware in this second case: ptr is undefined after the `move` occurred... hence the segmentation fault you might have got.
### Usage to store data in a class
`std::unique_ptr` are a really good choice to store objects in a class, especially ones that do not have a default constructor.
You may always define an object directly as a data attribute without pointer indirection, but in this case you have to call explicitly the constructor of the data attribute with the `:` syntax before the body of the constructor (that's exactly what we did when we introduced composition [back in the inheritance notebook](../2-ObjectProgramming/6-inheritance.ipynb#CONTAINS-A-relationship-of-composition). By using a (smart) pointer, you loosen this constraint and may define the data attribute whenever you wish, not only at construction.
The underlying object may be accessed through reference or raw pointer; usually your class may look like:
%% Cell type:code id: tags:
``` c++
#include <string>
// Class which will be stored in another one through a `unique_ptr`
class Content
{
public:
Content(std::string&& text); // notice: no default constructor!
const std::string& GetValue() const;
private:
std::string text_ {};
};
```
%% Cell type:code id: tags:
``` c++
Content::Content(std::string&& text)
: text_(text)
{ }
```
%% Cell type:code id: tags:
``` c++
const std::string& Content::GetValue() const
{
return text_;
}
```
%% Cell type:code id: tags:
``` c++
#include <memory>
class WithUniquePtr
{
public:
WithUniquePtr() = default;
void Init(std::string&& text); // rather artificial here, but we want to point out it can be done anywhere and not just in constructor!
const Content& GetContent() const;
private:
//! Store `Content`object through a smart pointer.
std::unique_ptr<Content> content_ { nullptr };
};
```
%% Cell type:code id: tags:
``` c++
void WithUniquePtr::Init(std::string&& text)
{
content_ = std::make_unique<Content>(std::move(text));
}
```
%% Cell type:code id: tags:
``` c++
%%cppmagics cppyy/cppdef
#include <cassert>
const Content& WithUniquePtr::GetContent() const
{
assert(content_ != nullptr && "Make sure Init() has been properly called beforehand!");
return *content_;
}
```
%% Cell type:markdown id: tags:
Doing so:
* `Content` is stored by a `unique_ptr`, which will manage the destruction in due time of the object (when the `WithUniquePtr` object will be destroyed).
* `Content` object might be manipulated through its reference; end-user don't even need to know resource was stored through a (smart) pointer:
%% Cell type:code id: tags:
``` c++
#include <iostream>
void PrintContent(const Content& content)
{
std::cout << content.GetValue() << std::endl;
}
```
%% Cell type:code id: tags:
``` c++
{
auto obj = WithUniquePtr(); // auto-to-stick syntax, to avoid most vexing parse.
obj.Init("My priceless text here!");
decltype(auto) content = obj.GetContent();
PrintContent(content);
}
```
%% Cell type:markdown id: tags:
(if you need a refresher about most vexing parse and auto-to-stick syntax, it's [here](../2-ObjectProgramming/3-constructors-destructor.ipynb#[WARNING]-How-to-call-a-constructor-without-argument)).
%% Cell type:markdown id: tags:
### Releasing a `unique_ptr`
To free manually the content of a `unique_ptr`:
To free manually the content of a `unique_ptr`, assign `nullptr` to the pointer:
* Use `release()` method:
%% Cell type:markdown id: tags:
struct Class
{
explicit Class(int a)
: a_ { a }
{ }
~Class()
{
std::cout << "Release object with value " << a_ << '\n';
}
private:
int a_ {};
};
%% Cell type:code id: tags:
``` c++
#include <memory>
{
auto ptr = std::make_unique<int>(5);
ptr.release(); // Beware: `.` and not `->` as it is a method of the smart pointer class, not of the
// underlying class!
auto ptr = std::make_unique<Class>(5);
ptr = nullptr;
}
```
%% Cell type:markdown id: tags:
* Or assign `nullptr` to the pointer
#### Beware: `release()` doesn't do what you might think it does!
%% Cell type:markdown id: tags:
Smart pointer classes provide a `release()` method, but what they actually release is **ownership**, not memory.
%% Cell type:code id: tags:
``` c++
{
auto ptr = std::make_unique<int>(5);
ptr = nullptr;
auto ptr = std::make_unique<Class>(5);
Class* raw_ptr = ptr.release(); // Beware: `.` and not `->` as it is a method of the smart pointer class, not of the
// underlying class!
}
```
%% Cell type:markdown id: tags:
As you can see, there are no call to the destructor: the role of `release()` is to release the ownership of the allocated memory to `raw_ptr`, which has now the **responsability** of freeing the memory.
What we ought to do to properly clean-up memory is therefore to call `delete` function (see [here](../1-ProceduralProgramming/5-DynamicAllocation.ipynb#Heap-and-free-store) if you need a refreshed of memory allocation).
%% Cell type:code id: tags:
``` c++
{
auto ptr = std::make_unique<Class>(5);
Class* raw_ptr = ptr.release(); // Beware: `.` and not `->` as it is a method of the smart pointer class, not of the
// underlying class!
delete raw_ptr;
}
```
%% Cell type:markdown id: tags:
## `shared_ptr`
The philosophy of `shared_ptr` is different: this kind of smart pointers is fully copyable, and each time a copy is issued an internal counter is incremented (and decremented each time a copy is destroyed). When this counter reaches 0, the underlying object is properly destroyed.
As for `unique_ptr`, there is a specific syntax to build them (properly named `make_shared`...); it was introduced earlier (C++ 11) and is not just cosmetic: the compiler is then able to store the counter more cleverly if you use `make_shared` rather than `new` (so make it so!).
%% Cell type:code id: tags:
``` c++
#include <iostream>
#include <memory>
{
std::shared_ptr<double> ptr = std::make_shared<double>(5.);
std::cout << "Nptr = " << ptr.use_count() << std::endl;
auto ptr2 = ptr;
std::cout << "Nptr = " << ptr.use_count() << std::endl;
//< Notice the `.`: we access a method from std::shared_ptr, not from the type encapsulated
// by the pointer!
}
```
%% Cell type:markdown id: tags:
`shared_ptr` are clearly useful, but you should always wonder first if you really need them: for most uses a `unique_ptr` eventually seconded by raw pointers extracted by `get()` is enough.
There is also a risk of not releasing properly the memory is there is a circular dependency between two `shared_ptr`. A variation of this pointer named `weak_ptr` enables to circumvent this issue, but is a bit tedious to put into motion. I have written in [appendix](../7-Appendix/WeakPtr.ipynb) a notebook to describe how to do so.
%% Cell type:markdown id: tags:
## Efficient storage with vectors of smart pointers
* `std::vector` are cool, but the copy when capacity is exceeded might be very costly for some objects. Moreover, it forces you to provide copy behaviour to your classes intended to be stored in `std::vector`, which is not a good idea if you do not want them to be copied.
* An idea could be to use pointers: copy is cheap, and there is no need to copy the underlying objects when the capacity is exceeded. Another good point is that a same object might be stored in two different containers, and the modifications given in one of this is immediately "seen" by the other (as the underlying object is the same).
However, when this `std::vector` of pointers is destroyed the objects inside aren't properly deleted, provoking memory leaks.
The way to combine advantages without retaining the flaws is to use a vector of smart pointers:
%% Cell type:code id: tags:
``` c++
#include <array>
class NotCopyable
{
public:
NotCopyable(double value);
~NotCopyable();
NotCopyable(const NotCopyable& ) = delete;
NotCopyable& operator=(const NotCopyable& ) = delete;
NotCopyable(NotCopyable&& ) = delete;
NotCopyable& operator=(NotCopyable&& ) = delete;
private:
std::array<double, 1000> data_;
};
```
%% Cell type:code id: tags:
``` c++
NotCopyable::NotCopyable(double value)
{
data_.fill(value);
}
```
%% Cell type:code id: tags:
``` c++
#include <iostream>
NotCopyable::~NotCopyable()
{
std::cout << "Call to NotCopyable destructor!" << std::endl;
}
```
%% Cell type:code id: tags:
``` c++
#include <vector>
#include <iostream>
{
std::vector<std::unique_ptr<NotCopyable>> list;
for (double x = 0.; x < 8.; x += 1.1)
{
std::cout << "Capacity = " << list.capacity() << std::endl;
list.emplace_back(std::make_unique<NotCopyable>(x)); // emplace_back is like push_back for rvalues
}
}
```
%% Cell type:markdown id: tags:
Doing so:
- The `NotCopyable` are properly stored in a container.
- No costly copy occurred: there were just few moves of `unique_ptr` when the capacity was exceeded.
- The memory is properly freed when the `list` becomes out of scope.
- And as we saw in previous section, the underlying data remains accessible through reference or raw pointer if needed.
%% Cell type:markdown id: tags:
#### Using a trait as syntactic sugar
I like to create aliases in my classes to provide more readable code:
%% Cell type:code id: tags:
``` c++
#include <array>
#include <vector>
class NotCopyable2
{
public:
// Trait to alias the vector of smart pointers.
using vector_unique_ptr = std::vector<std::unique_ptr<NotCopyable2>>;
NotCopyable2(double value);
NotCopyable2(const NotCopyable2& ) = delete;
NotCopyable2& operator=(const NotCopyable2& ) = delete;
NotCopyable2(NotCopyable2&& ) = delete;
NotCopyable2& operator=(NotCopyable2&& ) = delete;
private:
std::array<double, 1000> data_; // not copying it too much would be nice!
};
```
%% Cell type:code id: tags:
``` c++
NotCopyable2::NotCopyable2(double value)
{
data_.fill(value);
}
```
%% Cell type:code id: tags:
``` c++
#include <iostream>
#include<vector>
{
// Use the alias
NotCopyable2::vector_unique_ptr list;
// or not: it amounts to the same!
std::vector<std::unique_ptr<NotCopyable2>> list2;
// std::boolalpha is just a stream manipulator to write 'true' or 'false' for a boolean
std::cout << std::boolalpha << std::is_same<NotCopyable2::vector_unique_ptr, std::vector<std::unique_ptr<NotCopyable2>>>() << std::endl;
}
```
%% Cell type:markdown id: tags:
This simplifies the reading, especially if templates are also involved...
%% Cell type:markdown id: tags:
[© Copyright](../COPYRIGHT.md)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment