Mentions légales du service

Skip to content
Snippets Groups Projects

Add in appendix a notebook explaining briefly std::string_view

Closed GILLES Sebastien requested to merge sgilles/gettingstartedwithmoderncpp:stringview into master
2 unresolved threads
3 files
+ 138
1
Compare changes
  • Side-by-side
  • Inline
Files
3
%% Cell type:markdown id: tags:
# [Getting started in C++](./) - [Procedural programming](./0-main.ipynb) - [Predefined types](./3-Types.ipynb)
%% Cell type:markdown id: tags:
<h1>Table of contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Boolean" data-toc-modified-id="Boolean-1">Boolean</a></span></li><li><span><a href="#Enumerations" data-toc-modified-id="Enumerations-2">Enumerations</a></span><ul class="toc-item"><li><span><a href="#Historical-enumerations" data-toc-modified-id="Historical-enumerations-2.1">Historical enumerations</a></span></li><li><span><a href="#New-enumerations" data-toc-modified-id="New-enumerations-2.2">New enumerations</a></span></li></ul></li><li><span><a href="#Numerical-types" data-toc-modified-id="Numerical-types-3">Numerical types</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#List-of-numerical-types" data-toc-modified-id="List-of-numerical-types-3.0.1">List of numerical types</a></span></li><li><span><a href="#Numeric-limits" data-toc-modified-id="Numeric-limits-3.0.2">Numeric limits</a></span></li><li><span><a href="#Conversions-between-digital-types" data-toc-modified-id="Conversions-between-digital-types-3.0.3">Conversions between digital types</a></span></li></ul></li><li><span><a href="#Explicit-conversions-inherited-from-C" data-toc-modified-id="Explicit-conversions-inherited-from-C-3.1">Explicit conversions inherited from C</a></span></li><li><span><a href="#Explicit-conversions-by-static_cast" data-toc-modified-id="Explicit-conversions-by-static_cast-3.2">Explicit conversions by static_cast</a></span></li><li><span><a href="#Other-explicit-conversions" data-toc-modified-id="Other-explicit-conversions-3.3">Other explicit conversions</a></span></li></ul></li><li><span><a href="#Characters-and-strings" data-toc-modified-id="Characters-and-strings-4">Characters and strings</a></span><ul class="toc-item"><li><span><a href="#Historical-strings" data-toc-modified-id="Historical-strings-4.1">Historical strings</a></span></li><li><span><a href="#std::string" data-toc-modified-id="std::string-4.2">std::string</a></span></li></ul></li><li><span><a href="#Renaming-types" data-toc-modified-id="Renaming-types-5">Renaming types</a></span></li><li><span><a href="#decltype-and-auto" data-toc-modified-id="decltype-and-auto-6"><code>decltype</code> and <code>auto</code></a></span></li></ul></div>
%% Cell type:markdown id: tags:
## Boolean
Variables with type `bool` may be set to true or false.
It should be noted that this type did not originally exist, and that C++ instructions with conditions do not necessarily expect boolean values, but rather integers.
There is a form of equivalence between booleans and integers: any null integer is equivalent to `false`, and any other value is equivalent to `true`.
%% Cell type:code id: tags:
``` C++17
#include <iostream>
bool undefined; // UNDEFINED !!
if (undefined)
std::cout << "This text might appear or not - it's truly undefined and may vary from "
"one run/compiler/architecture/etc... to another!" << std::endl;`
```
%% Cell type:code id: tags:
``` C++17
bool defined { true };
if (defined)
std::cout << "Defined!" << std::endl;
```
%% Cell type:code id: tags:
``` C++17
int n = -5;
if (n)
std::cout << "Boolean value of " << n << " is true." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
int n = 0;
if (!n) // ! is the not operator: the condition is true if n is false.
std::cout << "Boolean value of " << n << " is false." << std::endl;
```
%% Cell type:markdown id: tags:
## Enumerations
### Historical enumerations
The historical enumerations `enum` of C++ allow to define constants that are treated as integers, and that can be initialized from integers. By default the first value is 0 and the `enum` is incremented for each value, but it is possible to bypass these default values and provide the desired numerical value yourself.
%% Cell type:code id: tags:
``` C++17
#include <iostream>
{
enum color { red, green, blue } ;
std::cout << red << " " << green << " " << blue << " (expected: 0, 1, 2)" << std::endl;
enum shape { circle=10, square, triangle=20 };
std::cout << circle << " " << square << " " << triangle << " (expected: 10, 11, 20)"<< std::endl; // 10 11 20
}
```
%% Cell type:markdown id: tags:
These `enum` are placeholders for integers and might be used as such:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
{
enum color { red, green, blue } ;
int a { 5 };
color c = green;
int b = a + c;
std::cout << "b = " << b << " (expected: 6)" << std::endl;
enum shape { circle=10, square, triangle=20 };
shape s = triangle;
int d = s + c;
std::cout << "d = " << d << " (expected: 21... but we've just added a shape to a color without ado!)" << std::endl;
}
```
%% Cell type:markdown id: tags:
A shortcoming of historical `enum ` is that the same word can't be used in two different `enum`:
%% Cell type:code id: tags:
``` C++17
{
enum is_positive { yes, no };
enum is_colored { yes, no }; // COMPILATION ERROR!
}
```
%% Cell type:markdown id: tags:
### New enumerations
To overcome the two limitations we have just mentioned, C++11 makes it possible to declare new `enum class` enumerations, each constituting a separate type, not implicitly convertible into an integer. This type protects against previous errors at the cost of a little more writing work.
%% Cell type:code id: tags:
``` C++17
enum class is_positive { yes, no };
enum class is_colored { yes, no }; // OK
```
%% Cell type:code id: tags:
``` C++17
yes; // COMPILATION ERROR: `enum class ` value must be prefixed! (see below)
```
%% Cell type:code id: tags:
``` C++17
is_positive p = is_positive::yes; // OK
```
%% Cell type:code id: tags:
``` C++17
int a = is_positive::no; // COMPILATION ERROR: not implicitly convertible into an integer
```
%% Cell type:code id: tags:
``` C++17
is_positive::yes + is_colored::no; // COMPILATION ERROR: addition of two unrelated types
```
%% Cell type:code id: tags:
``` C++17
{
enum class color { red, green, blue } ;
color c = color::green;
bool is_more_than_red = (c > color::red); // Both belong to the same type and therefore might be compared
}
```
%% Cell type:markdown id: tags:
These enum types are really handy to make code more expressive, especially in function calls:
````
f(print::yes, perform_checks::no);
````
is much more expressive (and less error-prone) than:
````
f(true, false);
````
for which you will probably need to go check the prototype to figure out what each argument stands for.
As we shall see [shortly](#Explicit-conversions-by-static_cast), you may perform arithmetic with the underlying integer through _explicit cast_ of the enum into an integer.
%% Cell type:markdown id: tags:
## Numerical types
#### List of numerical types
The FORTRAN correspondences below are given as examples. The
size of the C++ digital types can vary depending on the processor used. The
standard C++ only imposes `short <= int <= long` and `float <= double <= long double`. This makes these predefined types unportable. Like many things
in C, and therefore in C++, performance is given priority over any other consideration.
The default integer and real types, `int` and `double`, are assumed
to match the size of the processor registers and be the fastest (for more details see [the article on cppreference](http://en.cppreference.com/w/cpp/language/types))
| C++ | Fortran | Observations | 0 notation |
|:------------- |:---------:|:-------------------:|:----------:|
| `short` | INTEGER*2 | At least on 16 bits | None |
| `int` | INTEGER*4 | At least on 16 bits | 0 |
| `long` | INTEGER*8 | At least on 32 bits | 0l |
| `long long` | INTEGER*16| At least on 64 bits | 0ll |
| `float` | REAL*4 | - | 0.f |
| `double` | REAL*8 | - | 0. |
| `long double` | REAL*16 | - | 0.l |
All integer types (`short`, `int` and `long`) also have an unsigned variant, for example
`unsigned int`, which only takes positive values.
It should also be noted that the type `char` is the equivalent of one byte,
and depending on the context will be interpreted as a number or as a
character.
If you need an integer type of a defined size, regardless of the type of processor or platform used, you should use those already defined in `<cstdint>` for C++11 (for more details click [here](http://en.cppreference.com/w/cpp/types/integer)).
The _0 notation column_ is the way to notice explicitly the type in an expression; of course any value might be used instead of 0. A `u` might be used to signal the unsigned status for integer types; for instance `3ul` means 3 as an _unsigned long_. `auto` notation below will illustrate a case in which such a notation is useful.
#### Numeric limits
Always keep in mind the types of the computer don't match the abstract concept you may use in mathematics... The types stored especially don't go from minus infinity to infinity:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <limits> // for std::numeric_limits
{
std::cout << "int [min, max] = [" << std::numeric_limits<int>::lowest() << ", "
<< std::numeric_limits<int>::max() << "]" << std::endl;
std::cout << "unsigned int [min, max] = [" << std::numeric_limits<unsigned int>::lowest() << ", "
<< std::numeric_limits<unsigned int>::max() << "]" << std::endl;
std::cout << "short [min, max] = [" << std::numeric_limits<short>::lowest() << ", "
<< std::numeric_limits<short>::max() << "]" << std::endl;
std::cout << "long [min, max] = [" << std::numeric_limits<long>::lowest() << ", "
<< std::numeric_limits<long>::max() << "]" << std::endl;
std::cout << "float [min, max] = [" << std::numeric_limits<float>::lowest() << ", "
<< std::numeric_limits<float>::max() << "]" << std::endl;
std::cout << "double [min, max] = [" << std::numeric_limits<double>::lowest() << ", "
<< std::numeric_limits<double>::max() << "]" << std::endl;
std::cout << "long double [min, max] = [" << std::numeric_limits<long double>::lowest() << ", "
<< std::numeric_limits<long double>::max() << "]" << std::endl;
}
```
%% Cell type:markdown id: tags:
If an initial value is not in the range, the compiler will yell:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
{
short s = -33010; // triggers a warning: outside the range
std::cout << s << std::endl;
}
```
%% Cell type:markdown id: tags:
However, if you go beyond the numeric limit during a computation you're on your own:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <limits> // for std::numeric_limits
{
unsigned int max = std::numeric_limits<unsigned int>::max();
std::cout << "Max = " << max << std::endl;
std::cout << "Max + 1 = " << max + 1 << "!" << std::endl;
}
```
%% Cell type:markdown id: tags:
When you reach the end of a type, a modulo is actually applied to make put it back into the range!
Don't worry, for most computations you shouldn't run into this kind of trouble, but if you are dealing with important values it is important to keep in mind this kind of issues.
The most obvious way to avoid this is to choose appropriate types: if your integer might be huge a `long` is more appropriate than an `int`.
Other languages such as Python gets a underlying integer model that is resilient to this kind of issue but there is a cost behind; types such as those used in C++ are tailored to favor optimization on your hardware.
%% Cell type:markdown id: tags:
#### Conversions between digital types
[Earlier](/notebooks/1-ProceduralProgramming/1-Variables.ipynb#Initialisation) I indicated there were small differences between the three initialization methods, that could be ignored most of the time.
The difference is related to implicit conversion: both historical initialization methods are ok with implicit conversion __with accuracy loss__:
%% Cell type:code id: tags:
``` C++17
{
float f = 1.12345678901234567890;
double d = 2.12345678901234567890;
float f_d(d);
float f_dd = d;
}
```
%% Cell type:markdown id: tags:
whereas C++ 11 introduced initialization with braces isn't:
%% Cell type:code id: tags:
``` C++17
{
double d = 2.12345678901234567890;
float f_d{d}; // COMPILATION ERROR
}
```
%% Cell type:markdown id: tags:
This is really related to **accuracy loss**: initialization with braces is ok if there are none:
%% Cell type:code id: tags:
``` C++17
{
float f = 1.12345678901234567890;
double d = 2.12345678901234567890;
double d_f { f }; // OK
}
```
%% Cell type:markdown id: tags:
Accuracy losses are detected during conversion:
* from a floating point type (`long double`, `double` and `float`) into an integer type.
* from a `long double` into a `double` or a `float`, unless the source is constant and its value fits into the type of the destination.
* from a `double` into a `float`, unless the source is constant and its value fits in the type of the destination.
* from an integer type to an enumerated or floating point type, unless the source is constant and its value fits into the type of the destination.
* from an integer type to an enumerated type or another integer type, unless the source is constant and its value fits into the type of the destination.
%% Cell type:markdown id: tags:
### Explicit conversions inherited from C
In the case of an explicit conversion, the programmer explicitly says which conversion to use.
C++ inherits the forcing mechanism of the C type:
%% Cell type:code id: tags:
``` C++17
{
unsigned short i = 42000 ;
short j = short(i) ;
unsigned short k = (unsigned short)(j) ;
}
```
%% Cell type:markdown id: tags:
It is **not recommended** to use this type of conversion: even if it is clearly faster to type, it is less accurate and does not stand out clearly when reading a code; it is preferable to use the other conversion modes mentioned below.
%% Cell type:markdown id: tags:
### Explicit conversions by static_cast
C++ has also redefined a family of type forcing,
more verbose but more precise. The most common type of explicit conversion is the `static_cast`:
%% Cell type:code id: tags:
``` C++17
{
unsigned short i = 42000;
short j = static_cast<short>(i);
unsigned short k = static_cast<unsigned short>(j);
}
```
%% Cell type:markdown id: tags:
Another advantage of this more verbosy syntax is that you may find it more easily in your code with your editor search functionality.
%% Cell type:markdown id: tags:
### Other explicit conversions
There are 3 other types of C++ conversions:
* `const_cast`, to add or remove constness to a reference or a pointer (obviously to be used with great caution!)
* `dynamic_cast`, which will be introduced when we'll deal with polymorphism.
* `reinterpret_cast`, which is a very brutal cast which changes the type into any other type, regardless of the compatibility of the two types considered. It is a dangerous one that should be considered only in very last resort (usually when interacting with a C library).
%% Cell type:markdown id: tags:
## Characters and strings
### Historical strings
In C, a character string is literally an array of `char` variables, the last character of which is by convention the symbol `\0`.
The `strlen` function returns the length of a string, which is the number of characters between the very first character and the first occurrence of `\0`.
The `strcpy` function copies a character string to a new memory location; care must be taken to ensure that the destination is large enough to avoid any undefined behavior.
The `strncpy` function allows you to copy only the first <b>n</b> first characters, where <b>n</b> is the third parameter of the function. Same remark about the need to foresee a large enough destination.
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <cstring> // For strlen, strcpy, strncpy
char hello[] = {'h','e','l','l','o', '\0'};
char copy[6] = {}; // = {'\0','\0','\0','\0','\0','\0' };
strcpy(copy, hello);
std::cout << "String '" << copy << "' is " << strlen(copy) << " characters long." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
const char* hi = "hi"; // Not putting the const here triggers a warning.
strncpy(copy, hi, strlen(hi));
copy[strlen(hi)] = '\0'; // Don't forget to terminate the string!
std::cout << "String '" << copy << "' is " << strlen(copy) << " characters long." << std::endl;
```
%% Cell type:markdown id: tags:
There are several other functions related to historical strings; for more information, do not hesitate to consult [this reference page](http://www.cplusplus.com/reference/cstring/).
%% Cell type:markdown id: tags:
### std::string
In modern C++, rather than bothering with character tables
which come from the C language, it's easier to use the type `std::string`, provided
through the standard language library, that provides a much simpler syntax:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <cstring> // For strlen
#include <string> // For std::string
const char* hello_str = "hello";
std::string hello = hello_str;
std::string hi("hi");
std::string copy {};
```
%% Cell type:code id: tags:
``` C++17
copy = hello; // please notice affectation is much more straightforward
std::cout << "String '" << copy << "' is " << copy.length() << " characters long." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
const char* copy_str = copy.data(); // Returns a classic C-string (from C++11 onward)
std::cout << "String '" << copy_str << "' is " << strlen(copy_str) << " characters long." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
const char* old_copy_str = &copy[0]; // Same before C++11...
std::cout << "String '" << old_copy_str << "' is " << strlen(old_copy_str) << " characters long." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
std::string dynamic {"dynamic std::string"};
std::cout << "String '" << dynamic << "' is " << dynamic.length() << " characters long." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
dynamic = "std::string is dynamical and flexible";
std::cout << "String '" << dynamic << "' is " << dynamic.length() << " characters long." << std::endl;
```
%% Cell type:markdown id: tags:
If needed (for instance to interact with a C library) you may access to the underlying table with `c_str()` or `data()` (both are interchangeable):
%% Cell type:code id: tags:
``` C++17
#include <string>
{
std::string cplusplus_string("C++ string!");
const char* c_string = cplusplus_string.c_str();
const char* c_string_2 = cplusplus_string.data();
}
```
%% Cell type:markdown id: tags:
The `const` here is important: you may access the content but should not modify it; this functionality is provided for read-only access.
%% Cell type:markdown id: tags:
Please notice C++17 introduced [std::string_view](https://en.cppreference.com/w/cpp/header/string_view) which is more efficient than `std::string` for some operations; it is however out of the scope of this lecture.
FYI, C++ 14 introduced a suffix to facilitate declaration of a `std::string` from a string litterals... but which requires to add a specific `using namespace` first (we will see that those are in a [much later notebook](../6-InRealEnvironment/5-Namespace.ipynb)).
%% Cell type:code id: tags:
``` C++17
#include <string>
using namespace std::string_literals;
auto hello_str = "Hello world"; // declares a char*
auto hello = "Hello world"s; // declares a std::string - requires first the using namespace directive
std::string hello_string("Hello world"); // the 'classic' way to define a std::string
```
%% Cell type:markdown id: tags:
Not sure it it is entirely worth it (maybe when you define loads of `std::string` is a same file?) but you may see that in an existing program.
%% Cell type:markdown id: tags:
FYI as well, C++17 introduced [std::string_view](https://en.cppreference.com/w/cpp/header/string_view) which is more efficient than `std::string` for some operations (it is presented [in appendix](../7-Appendix/StringView.ipynb) but if it's your first reading it's a bit early to tackle it now).
%% Cell type:markdown id: tags:
## Renaming types
Sometimes it may be handy to rename a type, for instance if you want to be able to change easily throughout the code the numeric precision to use. Historical syntax (up to C++ 11 and still valid) was `typedef`:
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <iomanip> // For std::setprecision
{
typedef double real; // notice the ordering: new typename comes after its value
real radius {1.};
real area = 3.1415926535897932385 * radius * radius;
std::cout <<"Area = " << std::setprecision(15) << area << std::endl;
}
```
%% Cell type:markdown id: tags:
In more modern C++ (C++11 and above), another syntax relying on `using` keyword was introduced; it is advised to use it as this syntax is more powerful in some contexts (see later with templates...):
%% Cell type:code id: tags:
``` C++17
#include <iostream>
#include <iomanip> // For std::setprecision
{
using real = float; // notice the ordering: more in line with was we're accustomed to when
// initialising variables.
real radius {1.};
real area = 3.1415926535897932385 * radius * radius;
std::cout <<"Area = " << std::setprecision(15) << area << std::endl;
}
```
%% Cell type:markdown id: tags:
## `decltype` and `auto`
C++ 11 introduced new keywords that are very handy to deal with types:
* `decltype` which is able to determine **at compile time** the underlying type of a variable.
* `auto` which determines automatically **at compile time** the type of an expression.
%% Cell type:code id: tags:
``` C++17
#include <vector>
{
auto i = 5; // i is here an int.
auto j = 5u; // j is an unsigned int
decltype(j) k; // decltype(j) is interpreted by the compiler as an unsigned int.
}
```
%% Cell type:markdown id: tags:
On such trivial examples it might not seem much, but in practice it might prove incredibly useful. Consider for instance the following C++03 code (the details don't matter: we'll deal with `std::vector` in a [later notebook](../5-UsefulConceptsAndSTL/3-Containers.ipynb)):
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
// C++ 03 initialization of a std::vector
std::vector<unsigned int> primes;
primes.push_back(2);
primes.push_back(3);
primes.push_back(5);
primes.push_back(7);
primes.push_back(11);
primes.push_back(13);
primes.push_back(17);
primes.push_back(19);
for (std::vector<unsigned int>::const_iterator it = primes.cbegin();
it != primes.cend();
++it)
{
std::cout << *it << " is prime." << std::endl;
}
}
```
%% Cell type:markdown id: tags:
It's very verbosy; we could of course use alias:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<unsigned int> primes { 2, 3, 5, 7, 11, 13, 17, 19 }; // I'm cheating this time with C++ 11 notation...
using iterator = std::vector<unsigned int>::const_iterator;
for (iterator it = primes.cbegin();
it != primes.cend();
++it)
{
std::cout << *it << " is prime." << std::endl;
}
}
```
%% Cell type:markdown id: tags:
But with `decltype` we may write instead:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<unsigned int> primes { 2, 3, 5, 7, 11, 13, 17, 19 }; // I'm cheating: it's C++ 11 notation...
for (decltype(primes.cbegin()) it = primes.cbegin();
it != primes.cend();
++it)
{
std::cout << *it << " is prime." << std::endl;
}
}
```
%% Cell type:markdown id: tags:
or even better:
%% Cell type:code id: tags:
``` C++17
#include <vector>
#include <iostream>
{
std::vector<unsigned int> primes { 2, 3, 5, 7, 11, 13, 17, 19 }; // I'm cheating: it's C++ 11 notation...
for (auto it = primes.cbegin();
it != primes.cend();
++it)
{
std::cout << *it << " is prime." << std::endl;
}
}
```
%% Cell type:markdown id: tags:
That is not to say `decltype` is always inferior to `auto`: there are some cases in which decltype is invaluable (especially in metaprogramming, but it's mostly out of the scope of this lecture - we'll skim briefly over it in a later [notebook](../4-Templates/4-Metaprogramming.ipynb)).
C++ 14 introduced a new one (poorly) called `decltype(auto)` which usefulness will be explained below:
%% Cell type:code id: tags:
``` C++17
#include <algorithm>
#include <iostream>
int i = 5;
int& j = i;
auto k = j;
if (std::is_same<decltype(j), decltype(k)>())
std::cout << "j and k are of the same type." << std::endl;
else
std::cout << "j and k are of different type." << std::endl;
```
%% Cell type:code id: tags:
``` C++17
if (std::is_same<decltype(i), decltype(k)>())
std::cout << "i and k are of the same type." << std::endl;
else
std::cout << "i and k are of different type." << std::endl;
```
%% Cell type:markdown id: tags:
Despite the `auto k = j`, j and k don't share the same type! The reason for this is that `auto` loses information about pointers, reference or constness in the process...
A way to circumvent this is `auto& k = j`.
`decltype(auto)` was introduced to fill this hole: contrary to `auto` it retains all these informations:
%% Cell type:code id: tags:
``` C++17
#include <algorithm>
#include <iostream>
{
int i = 5;
int& j = i;
decltype(auto) k = j;
if (std::is_same<decltype(j), decltype(k)>())
std::cout << "j and k are of the same type." << std::endl;
else
std::cout << "j and k are of different type." << std::endl;
}
```
%% Cell type:markdown id: tags:
© _CNRS 2016_ - _Inria 2018-2021_
_This notebook is an adaptation of a lecture prepared by David Chamont (CNRS) under the terms of the licence [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](http://creativecommons.org/licenses/by-nc-sa/4.0/)_
_The present version has been written by Sébastien Gilles and Vincent Rouvreau (Inria)_
Loading