convert notebook to python script

bd791bcf · CHOI Kwon-Young · c99e3b0e · bd791bcf
Commit bd791bcf authored 7 years ago by CHOI Kwon-Young
--- a/1-intro_to_deep_learning.py
+++ b/1-intro_to_deep_learning.py
+# coding: utf-8
+# # Introduction to Deep Learning
+# 
+# In this tutorial, we will present some of the basic concept of what is a neuron, neural network and how to train it for some task like classification or regression.
+# 
+# Let's dive in.
+# 
+# This tutorial is (almost completely) inspired from this [online book](http://neuralnetworksanddeeplearning.com) which I found is a gold mine if you want to understand how neural networks work and how they are implemented.
+# 
+# ## Perceptrons and Sigmoid Neuron
+# 
+# The perceptron (or single neuron) is the common ancestor of all deep learning.
+# 
+# A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output:
+# 
+# ![a perceptron](http://neuralnetworksanddeeplearning.com/images/tikz0.png)
+# 
+# The idea of the perceptron is that it will activate itself based on a composition of its inputs and its internal weights.
+# In fact, each input $x_{j}$ has a corresponding weight $w_{j}$ that will control how relevant the input is for the decision process. 
+# 
+# The most basic way to do this is to use a sum and threshold:
+# 
+# $$
+# \left\{
+# \begin{array}{l}
+# 0 \quad if \sum_{j}w_{j}x_{j} \leq threshold\\
+# 1 \quad if \sum_{j}w_{j}x_{j}  > threshold
+# \end{array}
+# \right.
+# $$
+# 
+# By changing the value of each weight, you can change how the decision process will be done.
+# In the context of neural networks, the step function, $if \leq threshold$, is the called an activation function.
+# 
+# Now, we will have to do some modification to this perceptron in order for it to be useful.
+# Indeed, the secret magic sauce of training algorithm of deep learning architecture is the ability to construct models that are fully differentiable in respect to its ouput.
+# However, you can see here that the use of the step function, $if \leq threshold$ is totally not differentiable.
+# 
+# First of all, let's simplify the use of a threshold by using a bias and recenter the steps function around 0:
+# 
+# $$
+# \left\{
+# \begin{array}{l}
+# \sigma(x) = \sum_{j}w_{j}x_{j} + b\\
+# 1 \quad if \sum_{j}w_{j}x_{j} + b > 0
+# \end{array}
+# \right.
+# $$
+# 
+# Next, let's change the activation function to the sigmoid function $\sigma(x) = {1 \over {1 + e^{-x}}}$ in order for it to be differentiable:
+# 
+# $$\sigma(\sum_{j}w_{j}x_{j} + b)$$
+# 
+# This now is not called a perceptron anymore, but a *sigmoid neuron* or *logistic neuron*.
+# 
+# But enough theory, let's do some practice by implementing this perceptron in tensorflow!
+# In[1]:
+# first let's import all that we will need
+import tensorflow as tf
+from itertools import product
+# Oversimplified, tensorflow is at its core a tensor library with automatic differentiation.
+# A *tensor* is a sort of vector, in the mathematical sense, where the important part is to define its size (or shape) and type.
+# One interesting aspect of a tensor is that the data that will flow through this tensor can be fed after the tensor is created.
+# 
+# With tensors, you can then do any kind of mathematical operations you want.
+# Every mathematical operations will then produce a new tensor, where it will store its results, that can be then reused to compose with more operations.
+# In[2]:
+# Let's code a neuron that will do the NAND logical function
+# This neuron will have two inputs, so a tensor x of size 2
+def neuron(input_size, weights, bias):
+    # tf.placeholder are how we define input tensor. Later, we will be able to fed our data into these tensors
+    x = tf.placeholder(tf.float32, input_size)
+    # It will also have two weights
+    w = tf.Variable(weights, dtype=tf.float32)
+    # And a bias
+    b = tf.Variable(bias, dtype=tf.float32)
+    # we use tf.Variable to notify tensorflow that these tensors are tensors that will be learned during training
+    # Now let's implement the weighted sum + bias
+    weighted_sum = tf.reduce_sum(x * w) + b
+    # Next, we apply the sigmoid
+    neuron_output = tf.sigmoid(weighted_sum)
+    return neuron_output, [x, w, b, weighted_sum]
+neuron_output, [x, w, b, weighted_sum] = neuron(2, [-2, -2], 3)
+print("x:", x)
+print("w:", w)
+print("b:", b)
+print("weighted sum:", weighted_sum)
+print("neurone_output:", neuron_output)
+# In[28]:
+# Now let's do some computation on the gpu!
+# By default, tensorflow allocate all the memory of the gpu.
+# For the need of the tutorial, we will only allocated what we need so that everybody can use a gpu
+# Keep in mind that in real utilization, it is better to allocated all the memory of a gpu
+config = tf.ConfigProto()
+config.gpu_options.allow_growth=True
+# a tf.Session object will allow us to run computation on the gpu by specifying both a tensor to evaluate and data to evaluate with
+sess = tf.Session(config=config)
+# Next two line will initiliaze all variables like weights and biases
+init = tf.global_variables_initializer()
+sess.run(init)
+# In[5]:
+# We can run computation with sess.run(tensor_to_compute, feed_dict={input_tensor: input_data})
+def NAND(x1, x2, neuron_output):
+    return sess.run(neuron_output, feed_dict={x: [x1, x2]})
+for x1, x2 in product([0., 1.], repeat=2):
+    print(x1, x2, NAND(x1, x2, neuron_output))
+# Well, it's not exactly the output we wanted, but that's what you get for manually tuning your weights.
+# 
+# Next, we'll get something out of the way: the batch.
+# For now, we can only feed one example at a time to our neurone.
+# Keep in mind that transfering data to the gpu is very time consuming, and that you can take advantage of the special architecture of the gpu in order to parallelize the computation of multiple examples.
+# So by processing examples in batch, we will be faster than computing each example one at a time.
+# In[34]:
+# Here, we'll modify our neuron so that it can take a batch of example as input
+def neuron_batch(batch_size, input_size, weights, bias):
+    # tf.placeholder are how we define input tensor. Later, we will be able to fed our data into these tensors
+    x = tf.placeholder(tf.float32, [batch_size, input_size])
+    # It will also have two weights
+    w = tf.Variable(weights, dtype=tf.float32)
+    # And a bias
+    b = tf.Variable(bias, dtype=tf.float32)
+    # we use tf.Variable to notify tensorflow that these tensors are tensors that will be learned during training
+    # Now let's implement the weighted sum + bias
+    weighted_sum = tf.reduce_sum(x * w, axis=1) + b
+    # Next, we apply the sigmoid
+    neuron_output = tf.sigmoid(weighted_sum)
+    return neuron_output, [x, w, b, weighted_sum]
+# here we use a fixed batch size of 4
+neuron_output, [x, w, b, weighted_sum] = neuron_batch(4, 2, [-2, -2], 3)
+# you can also put None in order to have a variable batch size
+neuron_output, [x, w, b, weighted_sum] = neuron_batch(None, 2, [-2, -2], 3)
+init = tf.global_variables_initializer()
+sess.run(init)
+# In[35]:
+# so now we can process all examples in one sess.run call
+sess.run(neuron_output, feed_dict={x: [[x1, x2] for x1, x2 in product([0., 1.], repeat=2)]})
+# ### Summary
+# 
+# A neuron of $k$ inputs is a small decision unit constituted of:
+# 
+# * a vector of shape $[k]$ of inputs
+# * a vector of shape $[k]$ of weights
+# * a bias
+# * an activation function
+#     * if it is the step function, it's a *Perceptron*, it's not trainable
+#     * if it is the sigmoid function, it's a *sigmoid* or a *logistic* neuron, and it's *trainable*
+# * processessing examples in batch is faster
+# ## Multi-Layered Perceptron (MLP)
+# 
+# To have a more complex model that can do more usefull things, we can stack multiple neurons in parallel in order to make a *layer* of neurons.
+# Then, when to connect multiple layers one after the other, you obtain what we call a Multi-Layered Perceptrons or MLP.
+# WARNING! Keep in mind that, even though we use Perceptron here, if you want to train this network, you will have to use a Sigmoid neuron or any kind of trainable neuron.
+# 
+# ![a 3 layer MLP](http://neuralnetworksanddeeplearning.com/images/tikz1.png)
+# 
+# 
+# 
+# Next, let's train this neuron so it will automatically adjust it's weights and bias to compute the NAND function.
+# The first and mother of all deep learning training algorithm is the gradient descent algorithm.
+# 
+# First, we will need a cost function (or loss function or objective function).
+# A cost function is a function that tells us how good or bad we are doing.
+# You can see the cost function as a distance computation between the output of our neurone and the real value we wanted our network to output.
+# Traditionnally, the quadratic cost function is used in gradient descent tutorial because it's super easy to differentiate:
+# 
+# $$
+# C(w, b) = {1 \over 2} \|real\_output(x) - neurone\_output(x)\|^{2}
+# $$
+# In[6]:
+real_output = tf.placeholder(tf.float32, )
+# In[30]:
+import numpy as np
+a = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
+b = np.array([-2, -2])
+a * b
+# In[26]:
+x = tf.placeholder(tf.float32, [4, 2])
+w = tf.Variable([-2, -2], dtype=tf.float32)
+a = x*w
+print(x, w, a)
+# In[29]:
+sess.run(a, feed_dict={x: [[0, 0], [0, 1], [1, 0], [1, 1]]})