"In this tutorial, we will present some of the basic concept of what is a neuron, neural network and how to train it for some task like classification or regression.\n",
"\n",
"Let's dive in.\n",
"\n",
"This tutorial is (almost completely) inspired from this [online book](http://neuralnetworksanddeeplearning.com) which I found is a gold mine if you want to understand how neural networks work and how they are implemented.\n",
"\n",
"## Perceptrons and Sigmoid Neuron\n",
"\n",
"The perceptron (or single neuron) is the common ancestor of all deep learning.\n",
"\n",
"A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output:\n",
"The idea of the perceptron is that it will activate itself based on a composition of its inputs and its internal weights.\n",
"In fact, each input $x_{j}$ has a corresponding weight $w_{j}$ that will control how relevant the input is for the decision process. \n",
"\n",
"The most basic way to do this is to use a sum and threshold:\n",
"\n",
"$$\n",
"\\left\\{\n",
"\\begin{array}{l}\n",
"0 \\quad if \\sum_{j}w_{j}x_{j} \\leq threshold\\\\\n",
"1 \\quad if \\sum_{j}w_{j}x_{j} > threshold\n",
"\\end{array}\n",
"\\right.\n",
"$$\n",
"\n",
"By changing the value of each weight, you can change how the decision process will be done.\n",
"In the context of neural networks, the step function, $if \\leq threshold$, is the called an activation function.\n",
"\n",
"Now, we will have to do some modification to this perceptron in order for it to be useful.\n",
"Indeed, the secret magic sauce of training algorithm of deep learning architecture is the ability to construct models that are fully differentiable in respect to its ouput.\n",
"However, you can see here that the use of the step function, $if \\leq threshold$ is totally not differentiable.\n",
"\n",
"First of all, let's simplify the use of a threshold by using a bias and recenter the steps function around 0:\n",
"\n",
"$$\n",
"\\left\\{\n",
"\\begin{array}{l}\n",
"\\sigma(x) = \\sum_{j}w_{j}x_{j} + b\\\\\n",
"1 \\quad if \\sum_{j}w_{j}x_{j} + b > 0\n",
"\\end{array}\n",
"\\right.\n",
"$$\n",
"\n",
"Next, let's change the activation function to the sigmoid function $\\sigma(x) = {1 \\over {1 + e^{-x}}}$ in order for it to be differentiable:\n",
"\n",
"$$\\sigma(\\sum_{j}w_{j}x_{j} + b)$$\n",
"\n",
"This now is not called a perceptron anymore, but a *sigmoid neuron* or *logistic neuron*.\n",
"\n",
"But enough theory, let's do some practice by implementing this perceptron in tensorflow!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
" from ._conv import register_converters as _register_converters\n"
]
}
],
"source": [
"# first let's import all that we will need\n",
"import tensorflow as tf\n",
"from itertools import product"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oversimplified, tensorflow is at its core a tensor library with automatic differentiation.\n",
"A *tensor* is a sort of vector, in the mathematical sense, where the important part is to define its size (or shape) and type.\n",
"One interesting aspect of a tensor is that the data that will flow through this tensor can be fed after the tensor is created.\n",
"\n",
"With tensors, you can then do any kind of mathematical operations you want.\n",
"Every mathematical operations will then produce a new tensor, where it will store its results, that can be then reused to compose with more operations."
"Well, it's not exactly the output we wanted, but that's what you get for manually tuning your weights.\n",
"\n",
"Next, we'll get something out of the way: the batch.\n",
"For now, we can only feed one example at a time to our neurone.\n",
"Keep in mind that transfering data to the gpu is very time consuming, and that you can take advantage of the special architecture of the gpu in order to parallelize the computation of multiple examples.\n",
"So by processing examples in batch, we will be faster than computing each example one at a time."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# Here, we'll modify our neuron so that it can take a batch of example as input\n",
"# so now we can process all examples in one sess.run call\n",
"sess.run(neuron_output, feed_dict={x: [[x1, x2] for x1, x2 in product([0., 1.], repeat=2)]})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Summary\n",
"\n",
"A neuron of $k$ inputs is a small decision unit constituted of:\n",
"\n",
"* a vector of shape $[k]$ of inputs\n",
"* a vector of shape $[k]$ of weights\n",
"* a bias\n",
"* an activation function\n",
" * if it is the step function, it's a *Perceptron*, it's not trainable\n",
" * if it is the sigmoid function, it's a *sigmoid* or a *logistic* neuron, and it's *trainable*\n",
"* processessing examples in batch is faster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Layered Perceptron (MLP)\n",
"\n",
"To have a more complex model that can do more usefull things, we can stack multiple neurons in parallel in order to make a *layer* of neurons.\n",
"Then, when to connect multiple layers one after the other, you obtain what we call a Multi-Layered Perceptrons or MLP.\n",
"WARNING! Keep in mind that, even though we use Perceptron here, if you want to train this network, you will have to use a Sigmoid neuron or any kind of trainable neuron.\n",
In this tutorial, we will present some of the basic concept of what is a neuron, neural network and how to train it for some task like classification or regression.
Let's dive in.
This tutorial is (almost completely) inspired from this [online book](http://neuralnetworksanddeeplearning.com) which I found is a gold mine if you want to understand how neural networks work and how they are implemented.
## Perceptrons and Sigmoid Neuron
The perceptron (or single neuron) is the common ancestor of all deep learning.
A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output:
The idea of the perceptron is that it will activate itself based on a composition of its inputs and its internal weights.
In fact, each input $x_{j}$ has a corresponding weight $w_{j}$ that will control how relevant the input is for the decision process.
The most basic way to do this is to use a sum and threshold:
$$
\left\{
\begin{array}{l}
0 \quad if \sum_{j}w_{j}x_{j} \leq threshold\\
1 \quad if \sum_{j}w_{j}x_{j} > threshold
\end{array}
\right.
$$
By changing the value of each weight, you can change how the decision process will be done.
In the context of neural networks, the step function, $if \leq threshold$, is the called an activation function.
Now, we will have to do some modification to this perceptron in order for it to be useful.
Indeed, the secret magic sauce of training algorithm of deep learning architecture is the ability to construct models that are fully differentiable in respect to its ouput.
However, you can see here that the use of the step function, $if \leq threshold$ is totally not differentiable.
First of all, let's simplify the use of a threshold by using a bias and recenter the steps function around 0:
$$
\left\{
\begin{array}{l}
\sigma(x) = \sum_{j}w_{j}x_{j} + b\\
1 \quad if \sum_{j}w_{j}x_{j} + b > 0
\end{array}
\right.
$$
Next, let's change the activation function to the sigmoid function $\sigma(x) = {1 \over {1 + e^{-x}}}$ in order for it to be differentiable:
$$\sigma(\sum_{j}w_{j}x_{j} + b)$$
This now is not called a perceptron anymore, but a *sigmoid neuron* or *logistic neuron*.
But enough theory, let's do some practice by implementing this perceptron in tensorflow!
%% Cell type:code id: tags:
``` python
# first let's import all that we will need
importtensorflowastf
fromitertoolsimportproduct
```
%% Output
/usr/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
%% Cell type:markdown id: tags:
Oversimplified, tensorflow is at its core a tensor library with automatic differentiation.
A *tensor* is a sort of vector, in the mathematical sense, where the important part is to define its size (or shape) and type.
One interesting aspect of a tensor is that the data that will flow through this tensor can be fed after the tensor is created.
With tensors, you can then do any kind of mathematical operations you want.
Every mathematical operations will then produce a new tensor, where it will store its results, that can be then reused to compose with more operations.
%% Cell type:code id: tags:
``` python
# Let's code a neuron that will do the NAND logical function
# This neuron will have two inputs, so a tensor x of size 2
defneuron(input_size,weights,bias):
# tf.placeholder are how we define input tensor. Later, we will be able to fed our data into these tensors
x=tf.placeholder(tf.float32,input_size)
# It will also have two weights
w=tf.Variable(weights,dtype=tf.float32)
# And a bias
b=tf.Variable(bias,dtype=tf.float32)
# we use tf.Variable to notify tensorflow that these tensors are tensors that will be learned during training
Well, it's not exactly the output we wanted, but that's what you get for manually tuning your weights.
Next, we'll get something out of the way: the batch.
For now, we can only feed one example at a time to our neurone.
Keep in mind that transfering data to the gpu is very time consuming, and that you can take advantage of the special architecture of the gpu in order to parallelize the computation of multiple examples.
So by processing examples in batch, we will be faster than computing each example one at a time.
%% Cell type:code id: tags:
``` python
# Here, we'll modify our neuron so that it can take a batch of example as input
A neuron of $k$ inputs is a small decision unit constituted of:
* a vector of shape $[k]$ of inputs
* a vector of shape $[k]$ of weights
* a bias
* an activation function
* if it is the step function, it's a *Perceptron*, it's not trainable
* if it is the sigmoid function, it's a *sigmoid* or a *logistic* neuron, and it's *trainable*
* processessing examples in batch is faster
%% Cell type:markdown id: tags:
## Multi-Layered Perceptron (MLP)
To have a more complex model that can do more usefull things, we can stack multiple neurons in parallel in order to make a *layer* of neurons.
Then, when to connect multiple layers one after the other, you obtain what we call a Multi-Layered Perceptrons or MLP.
WARNING! Keep in mind that, even though we use Perceptron here, if you want to train this network, you will have to use a Sigmoid neuron or any kind of trainable neuron.