Commit 41a90447 authored by GILLES Sebastien's avatar GILLES Sebastien
Browse files

Add link to loop unrolling Wikipedia article.

parent b4ba3bd7
%% Cell type:markdown id: tags:
# [Getting started in C++](./) - [Useful concepts and STL](./0-main.ipynb) - [Algorithms](./7-Algorithms.ipynb)
%% Cell type:markdown id: tags:
<h1>Table of contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li><li><span><a href="#Example:-std::sort" data-toc-modified-id="Example:-std::sort-2">Example: <code>std::sort</code></a></span></li><li><span><a href="#std::find" data-toc-modified-id="std::find-3"><code>std::find</code></a></span></li><li><span><a href="#Output-iterators-and-std::back_inserter" data-toc-modified-id="Output-iterators-and-std::back_inserter-4">Output iterators and <code>std::back_inserter</code></a></span></li><li><span><a href="#The-different-kinds-of-operators" data-toc-modified-id="The-different-kinds-of-operators-5">The different kinds of operators</a></span></li><li><span><a href="#Algorithm:-read-the-documentation-first!" data-toc-modified-id="Algorithm:-read-the-documentation-first!-6">Algorithm: read the documentation first!</a></span><ul class="toc-item"><li><span><a href="#std::unique" data-toc-modified-id="std::unique-6.1">std::unique</a></span></li><li><span><a href="#std::remove" data-toc-modified-id="std::remove-6.2">std::remove</a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-7">Conclusion</a></span></li></ul></div>
%% Cell type:markdown id: tags:
## Introduction
%% Cell type:markdown id: tags:
Even if C++ can't be qualified as a _batteries included_ language like Python (until C++ 17 there was no proper filesystem management, and the support of this feature was still shaky at best in several STL implementations one year ago...), there are plenty of algorithms that are already provided within the STL.
We won't obviously list them all here - the mighty \cite{Josuttis2012} which is more than 1000 pages long don't do it either! - but show few examples on how to use them. For instance, many STL algorithms rely upon iterators: this way a same algorithm may be used as well on `std::vector`, `std::list`, and so on...
A side note: if a STL class provides a method which has a namesake algorithm, use the method. For instance there is a `std::sort` algorithm, but `std::list` provides a method which takes advantage on the underlying structure of the object and is therefore much more efficient.
## Example: `std::sort`
%% Cell type:code id: tags:
``` C++17
#include <algorithm>
#include <vector>
#include <iostream>
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100 };
std::sort(int_vec.begin(), int_vec.end());
for (auto item : int_vec)
std::cout << item << " ";
```
%% Cell type:code id: tags:
``` C++17
#include <vector>
std::deque<double> double_deque { -9., 87., 11., 0., -21., 100. };
std::sort(double_deque.begin(), double_deque.end(), std::greater<double>()); // optional third parameter is used
for (auto item : double_deque)
std::cout << item << " ";
```
%% Cell type:markdown id: tags:
As you can see, the same algorithm works upon two different types of objects. It works with non constant iterators; an optional third argument to `std::sort` enables to provide your own sorting algorithm.
Lambda functions may be used as well to provide the comparison to use:
%% Cell type:code id: tags:
``` C++17
#include <algorithm>
#include <vector>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100 };
std::sort(int_vec.begin(), int_vec.end(),
[](auto lhs, auto rhs)
{
const bool is_lhs_even = (lhs % 2 == 0);
const bool is_rhs_even = (rhs % 2 == 0);
// Even must be ordered first, then odds
// Granted, this is really an oddball choice..
if (is_lhs_even && !is_rhs_even)
return true;
if (is_rhs_even && !is_lhs_even)
return false;
return lhs < rhs;
});
for (auto item : int_vec)
std::cout << item << " ";
}
```
%% Cell type:markdown id: tags:
Of course, we may use this on something other than `begin()` and `end()`; we just have to make sure iterators are valid:
%% Cell type:code id: tags:
``` C++17
#include <algorithm>
#include <vector>
#include <iostream>
#include <cassert>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100 };
auto it = int_vec.begin() + 4;
assert(it < int_vec.end()); // Important condition to check iterator means something!
std::sort(int_vec.begin(), it); // Only first four elements are sort.
for (auto item : int_vec)
std::cout << item << " ";
}
```
%% Cell type:markdown id: tags:
## `std::find`
I will also show examples of `std::find` as it provides an additional common practice: it returns an iterator, and there is a specific behaviour if the algorithm failed to find something.
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21 };
const auto it = std::find(int_vec.cbegin(), int_vec.cend(), -21);
if (it != int_vec.cend())
std::cout << "Found at position " << it - int_vec.cbegin() << std::endl;
else
std::cout << "Not found." << std::endl;
}
```
%% Cell type:markdown id: tags:
As you can see, `std::find` returns the first instance in the iterator range (and you can also do arithmetic over the iterators). You may know how many instances there are with `std::count`:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
const auto count = std::count(int_vec.cbegin(), int_vec.cend(), -21);
std::cout << "There are " << count << " instances of -21." << std::endl;
}
```
%% Cell type:markdown id: tags:
If you want to use a condition rather than a value, there are dedicated versions of the algorithms to do so:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
const auto count = std::count_if(int_vec.cbegin(), int_vec.cend(),
[](int value)
{
return value % 2 == 0;
});
std::cout << "There are " << count << " even values in the list." << std::endl;
}
```
%% Cell type:markdown id: tags:
## Output iterators and `std::back_inserter`
Some algorithms require output iterators: they don't work uniquely upon existing content but need to shove new data somewhere. You must in this case provide the adequate memory beforehand:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
std::vector<int> odd_only;
std::copy_if(int_vec.cbegin(), int_vec.cend(),
odd_only.begin(),
[](int value)
{
return value % 2 != 0;
}
); // SHOULD MAKE YOUR KERNEL CRASH!
}
```
%% Cell type:markdown id: tags:
The issue is that the memory is not allocated first: the algorithm doesn't provide the memory at destination! (the reason is that an algorithm is as generic as possible; here `std::copy_if` is expected to work as well with `std::set`... and `std::vector` and `std::set` don't use the same API to allocate the memory).
Of course, in some cases it is tricky to know in advance what you need, and here computing it previously with `std::count_if` add an additional operation. There is actually a way to tell the program to insert the values by `push_back` with `std::back_inserter`; it might be a good idea to reserve enough memory to use this method without recopy:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
std::vector<int> odd_only;
odd_only.reserve(int_vec.size()); // at most all elements of int_vec will be there
std::copy_if(int_vec.cbegin(), int_vec.cend(),
std::back_inserter(odd_only),
[](int value)
{
return value % 2 != 0;
}
);
// And if you're afraid to have used too much memory with your `reserve()` call,
// you may call shrink_to_fit() method here.
std::cout << "The odd values are: ";
for (auto item : odd_only)
std::cout << item << " ";
}
```
%% Cell type:markdown id: tags:
## The different kinds of operators
`std::back_inserter` works only with containers that provide a `push_back()` method. This may be generalized: the fact that algorithms rely upon iterators to make them as generic as possible doesn't mean each algorithm will work on any container.
There are actually several kinds of iterators:
* **Forward iterators**, which which you may only iterate forward. For instance `std::forward_list` or `std::unordered_map` provide such iterators.
* **Bidirectional iterators**, which way you may also iterate backward. For instance `std::list` or `std::map` provide such iterators.
* **Random-access iterators**, which are bidirectional operators with on top of it the ability to provide random access (through an index). Think of `std::vector` or `std::string`.
When you go on [cppreference](https://en.cppreference.com/w/) (or in \cite{Josuttis2012}) the name of the template parameter explicitly describes which kind of iterator is actually used.
Besides this classification, there are also in algorithms the difference between **input iterators** (which are read-only) and **output iterators** that assume you will write new content there.
%% Cell type:markdown id: tags:
## Algorithm: read the documentation first!
You should really **carefully read the documentation** before using an algorithm: it might not behave as you believe...
I will provide two examples:
### std::unique
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
std::unique(int_vec.begin(), int_vec.end());
std::cout << "The unique values are (or not...): ";
for (auto item : int_vec)
std::cout << item << " ";
std::cout << std::endl;
}
```
%% Cell type:markdown id: tags:
So what's happen? If you look at [cppreference](http://www.cplusplus.com/reference/algorithm/unique/) you may see the headline is _Remove **consecutive** duplicates in range_.
So to make it work you need to sort it first (or use a home-made algorithm if you need to preserve the original ordering):
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
std::sort(int_vec.begin(), int_vec.end());
std::unique(int_vec.begin(), int_vec.end());
std::cout << "The unique values are (really this time): ";
for (auto item : int_vec)
std::cout << item << " ";
std::cout << std::endl;
}
```
%% Cell type:markdown id: tags:
Personally I have in my Utilities library a function `EliminateDuplicate()` which calls both in a row.
%% Cell type:markdown id: tags:
### std::remove
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
std::remove_if(int_vec.begin(), int_vec.end(),
[](int value)
{
return value % 2 != 0;
});
std::cout << "The even values are (or not...): ";
for (auto item : int_vec)
std::cout << item << " ";
std::cout << std::endl;
}
```
%% Cell type:markdown id: tags:
So what happens this time? [cppreference](http://www.cplusplus.com/reference/algorithm/remove/?kw=remove) tells that it _transforms the range \[first,last) into a range with all the elements that compare equal to val removed, and returns an iterator to the new end of that range_.
In other words, `std::remove`:
* Place at the beginning of the vector the values to be kept.
* Returns an iterator to the **logical end** of the expected series...
* But does not deallocate the memory! (and keeps the container's `size()` - see below)
So to print the relevant values only, you should do:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
auto logical_end = std::remove_if(int_vec.begin(), int_vec.end(),
[](int value)
{
return value % 2 != 0;
});
std::cout << "The even values are: ";
for (auto it = int_vec.cbegin(); it != logical_end; ++it)
std::cout << *it << " ";
std::cout << std::endl;
std::cout << "But the size of the vector is still " << int_vec.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
And if you want to reduce this size, you should use the `std::vector::erase()` method:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <algorithm>
#include <iostream>
{
std::vector<int> int_vec { -9, 87, 11, 0, -21, 100, -21, 17, -21 };
auto logical_end = std::remove_if(int_vec.begin(), int_vec.end(),
[](int value)
{
return value % 2 != 0;
});
int_vec.erase(logical_end, int_vec.end());
std::cout << "The even values are: ";
for (auto item : int_vec)
std::cout << item << " ";
std::cout << std::endl;
std::cout << "And the size of the vector is correctly " << int_vec.size() << std::endl;
}
```
%% Cell type:markdown id: tags:
## Conclusion
My point was absolutely not to tell you not to use the STL algorithms; on the contrary it is better not to reinvent the wheel, especially considering you would likely end up with a less efficient version of the algorithm!
You need however to be very careful: sometimes the names are unfortunately misleading, and you should always check a function does the job you have in mind. Algorithms were written to be as generic as possible, and can't do some operations such as allocate or deallocate memory as it would break this genericity.
I have barely scratched the surface; many algorithms are extremely useful. So whenever you want to proceed with a transformation that is likely common (check a range is sorted, partition a list in a specific way, finding minimum and maximum, etc...) it is highly likely the STL has something in store for you.
The reading of \cite{Cukic2018} should provide more incentive to use them.
It is also important to highlight that while the STL algorithms may provide you efficiency (this library is written by highly skilled engineers after all), this is not its main draw: the algorithms are written to be as generic as possible. The primary reason to use them is to allow you to think at a higher level of abstraction, not to get the fastest possible implementation. So if your ~~intuition~~ benchmarking has shown that the standard library is causing a critical slowdown, you are free to explore classic alternatives such as loop unrolling - that's one of the strength of the language (and the STL itself opens up this possibility directly for some of its construct - you may for instance use your own memory allocator when defining a container). For most purposes however that will not be necessary.
It is also important to highlight that while the STL algorithms may provide you efficiency (this library is written by highly skilled engineers after all), this is not its main draw: the algorithms are written to be as generic as possible. The primary reason to use them is to allow you to think at a higher level of abstraction, not to get the fastest possible implementation. So if your ~~intuition~~ benchmarking has shown that the standard library is causing a critical slowdown, you are free to explore classic alternatives such as [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling) - that's one of the strength of the language (and the STL itself opens up this possibility directly for some of its construct - you may for instance use your own memory allocator when defining a container). For most purposes however that will not be necessary.
FYI, C++ 20 introduce a completely new way to deal with algorithms, which does not rely on direct use of iterators but instead on a range library. This leads to a syntax which is more akin to what is done in other languages - see for instance this example lifted from this [blog post](https://www.modernescpp.com/index.php/c-20-the-ranges-library):
%% Cell type:code id: tags:
``` C++17
// C++ 20: does not run in Xeus-cling!
#include <iostream>
#include <ranges>
#include <vector>
int main(int argc, char** argv)
{
std::vector<int> numbers = {1, 2, 3, 4, 5, 6};
auto results = numbers | std::views::filter([](int n){ return n % 2 == 0; })
| std::views::transform([](int n){ return n * 2; });
for (auto v: results) std::cout << v << " "; // 4 8 12
return EXIT_SUCCESS;
}
```
%% Cell type:markdown id: tags:
Having no first hand experience of it I really can't say more about it but don't be astonished if you meet such a syntax in a C++ program.
%% Cell type:markdown id: tags:
# References
[<a id="cit-Josuttis2012" href="#call-Josuttis2012">Josuttis2012</a>] Nicolai M. Josuttis, ``_The C++ Standard Library: A Tutorial and Reference_'', 2012.
[<a id="cit-Cukic2018" href="#call-Cukic2018">Cukic2018</a>] Ivan Čukić, ``_Functional Programming in C++_'', 01 2018.
%% Cell type:markdown id: tags:
© _CNRS 2016_ - _Inria 2018-2021_
_This notebook is an adaptation of a lecture prepared by David Chamont (CNRS) under the terms of the licence [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](http://creativecommons.org/licenses/by-nc-sa/4.0/)_
_The present version has been written by Sébastien Gilles and Vincent Rouvreau (Inria)_
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment