Mentions légales du service

Skip to content
Snippets Groups Projects
Commit bd791bcf authored by CHOI Kwon-Young's avatar CHOI Kwon-Young
Browse files

convert notebook to python script

parent c99e3b0e
No related branches found
No related tags found
No related merge requests found
# coding: utf-8
# # Introduction to Deep Learning
#
# In this tutorial, we will present some of the basic concept of what is a neuron, neural network and how to train it for some task like classification or regression.
#
# Let's dive in.
#
# This tutorial is (almost completely) inspired from this [online book](http://neuralnetworksanddeeplearning.com) which I found is a gold mine if you want to understand how neural networks work and how they are implemented.
#
# ## Perceptrons and Sigmoid Neuron
#
# The perceptron (or single neuron) is the common ancestor of all deep learning.
#
# A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output:
#
# ![a perceptron](http://neuralnetworksanddeeplearning.com/images/tikz0.png)
#
# The idea of the perceptron is that it will activate itself based on a composition of its inputs and its internal weights.
# In fact, each input $x_{j}$ has a corresponding weight $w_{j}$ that will control how relevant the input is for the decision process.
#
# The most basic way to do this is to use a sum and threshold:
#
# $$
# \left\{
# \begin{array}{l}
# 0 \quad if \sum_{j}w_{j}x_{j} \leq threshold\\
# 1 \quad if \sum_{j}w_{j}x_{j} > threshold
# \end{array}
# \right.
# $$
#
# By changing the value of each weight, you can change how the decision process will be done.
# In the context of neural networks, the step function, $if \leq threshold$, is the called an activation function.
#
# Now, we will have to do some modification to this perceptron in order for it to be useful.
# Indeed, the secret magic sauce of training algorithm of deep learning architecture is the ability to construct models that are fully differentiable in respect to its ouput.
# However, you can see here that the use of the step function, $if \leq threshold$ is totally not differentiable.
#
# First of all, let's simplify the use of a threshold by using a bias and recenter the steps function around 0:
#
# $$
# \left\{
# \begin{array}{l}
# \sigma(x) = \sum_{j}w_{j}x_{j} + b\\
# 1 \quad if \sum_{j}w_{j}x_{j} + b > 0
# \end{array}
# \right.
# $$
#
# Next, let's change the activation function to the sigmoid function $\sigma(x) = {1 \over {1 + e^{-x}}}$ in order for it to be differentiable:
#
# $$\sigma(\sum_{j}w_{j}x_{j} + b)$$
#
# This now is not called a perceptron anymore, but a *sigmoid neuron* or *logistic neuron*.
#
# But enough theory, let's do some practice by implementing this perceptron in tensorflow!
# In[1]:
# first let's import all that we will need
import tensorflow as tf
from itertools import product
# Oversimplified, tensorflow is at its core a tensor library with automatic differentiation.
# A *tensor* is a sort of vector, in the mathematical sense, where the important part is to define its size (or shape) and type.
# One interesting aspect of a tensor is that the data that will flow through this tensor can be fed after the tensor is created.
#
# With tensors, you can then do any kind of mathematical operations you want.
# Every mathematical operations will then produce a new tensor, where it will store its results, that can be then reused to compose with more operations.
# In[2]:
# Let's code a neuron that will do the NAND logical function
# This neuron will have two inputs, so a tensor x of size 2
def neuron(input_size, weights, bias):
# tf.placeholder are how we define input tensor. Later, we will be able to fed our data into these tensors
x = tf.placeholder(tf.float32, input_size)
# It will also have two weights
w = tf.Variable(weights, dtype=tf.float32)
# And a bias
b = tf.Variable(bias, dtype=tf.float32)
# we use tf.Variable to notify tensorflow that these tensors are tensors that will be learned during training
# Now let's implement the weighted sum + bias
weighted_sum = tf.reduce_sum(x * w) + b
# Next, we apply the sigmoid
neuron_output = tf.sigmoid(weighted_sum)
return neuron_output, [x, w, b, weighted_sum]
neuron_output, [x, w, b, weighted_sum] = neuron(2, [-2, -2], 3)
print("x:", x)
print("w:", w)
print("b:", b)
print("weighted sum:", weighted_sum)
print("neurone_output:", neuron_output)
# In[28]:
# Now let's do some computation on the gpu!
# By default, tensorflow allocate all the memory of the gpu.
# For the need of the tutorial, we will only allocated what we need so that everybody can use a gpu
# Keep in mind that in real utilization, it is better to allocated all the memory of a gpu
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
# a tf.Session object will allow us to run computation on the gpu by specifying both a tensor to evaluate and data to evaluate with
sess = tf.Session(config=config)
# Next two line will initiliaze all variables like weights and biases
init = tf.global_variables_initializer()
sess.run(init)
# In[5]:
# We can run computation with sess.run(tensor_to_compute, feed_dict={input_tensor: input_data})
def NAND(x1, x2, neuron_output):
return sess.run(neuron_output, feed_dict={x: [x1, x2]})
for x1, x2 in product([0., 1.], repeat=2):
print(x1, x2, NAND(x1, x2, neuron_output))
# Well, it's not exactly the output we wanted, but that's what you get for manually tuning your weights.
#
# Next, we'll get something out of the way: the batch.
# For now, we can only feed one example at a time to our neurone.
# Keep in mind that transfering data to the gpu is very time consuming, and that you can take advantage of the special architecture of the gpu in order to parallelize the computation of multiple examples.
# So by processing examples in batch, we will be faster than computing each example one at a time.
# In[34]:
# Here, we'll modify our neuron so that it can take a batch of example as input
def neuron_batch(batch_size, input_size, weights, bias):
# tf.placeholder are how we define input tensor. Later, we will be able to fed our data into these tensors
x = tf.placeholder(tf.float32, [batch_size, input_size])
# It will also have two weights
w = tf.Variable(weights, dtype=tf.float32)
# And a bias
b = tf.Variable(bias, dtype=tf.float32)
# we use tf.Variable to notify tensorflow that these tensors are tensors that will be learned during training
# Now let's implement the weighted sum + bias
weighted_sum = tf.reduce_sum(x * w, axis=1) + b
# Next, we apply the sigmoid
neuron_output = tf.sigmoid(weighted_sum)
return neuron_output, [x, w, b, weighted_sum]
# here we use a fixed batch size of 4
neuron_output, [x, w, b, weighted_sum] = neuron_batch(4, 2, [-2, -2], 3)
# you can also put None in order to have a variable batch size
neuron_output, [x, w, b, weighted_sum] = neuron_batch(None, 2, [-2, -2], 3)
init = tf.global_variables_initializer()
sess.run(init)
# In[35]:
# so now we can process all examples in one sess.run call
sess.run(neuron_output, feed_dict={x: [[x1, x2] for x1, x2 in product([0., 1.], repeat=2)]})
# ### Summary
#
# A neuron of $k$ inputs is a small decision unit constituted of:
#
# * a vector of shape $[k]$ of inputs
# * a vector of shape $[k]$ of weights
# * a bias
# * an activation function
# * if it is the step function, it's a *Perceptron*, it's not trainable
# * if it is the sigmoid function, it's a *sigmoid* or a *logistic* neuron, and it's *trainable*
# * processessing examples in batch is faster
# ## Multi-Layered Perceptron (MLP)
#
# To have a more complex model that can do more usefull things, we can stack multiple neurons in parallel in order to make a *layer* of neurons.
# Then, when to connect multiple layers one after the other, you obtain what we call a Multi-Layered Perceptrons or MLP.
# WARNING! Keep in mind that, even though we use Perceptron here, if you want to train this network, you will have to use a Sigmoid neuron or any kind of trainable neuron.
#
# ![a 3 layer MLP](http://neuralnetworksanddeeplearning.com/images/tikz1.png)
#
#
#
# Next, let's train this neuron so it will automatically adjust it's weights and bias to compute the NAND function.
# The first and mother of all deep learning training algorithm is the gradient descent algorithm.
#
# First, we will need a cost function (or loss function or objective function).
# A cost function is a function that tells us how good or bad we are doing.
# You can see the cost function as a distance computation between the output of our neurone and the real value we wanted our network to output.
# Traditionnally, the quadratic cost function is used in gradient descent tutorial because it's super easy to differentiate:
#
# $$
# C(w, b) = {1 \over 2} \|real\_output(x) - neurone\_output(x)\|^{2}
# $$
# In[6]:
real_output = tf.placeholder(tf.float32, )
# In[30]:
import numpy as np
a = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
b = np.array([-2, -2])
a * b
# In[26]:
x = tf.placeholder(tf.float32, [4, 2])
w = tf.Variable([-2, -2], dtype=tf.float32)
a = x*w
print(x, w, a)
# In[29]:
sess.run(a, feed_dict={x: [[0, 0], [0, 1], [1, 0], [1, 1]]})
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment