Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 8a6d658d authored by hhakim's avatar hhakim
Browse files

Update notebook/livescript (use_csr parameter replaced by factor_format).

parent 7ebdc735
Branches
Tags
No related merge requests found
%% Cell type:markdown id:b563225b tags: %% Cell type:markdown id: tags:
# Using the PALM4MSA-MHTP Algorithm # Using the PALM4MSA-MHTP Algorithm
In this notebook we shall see how to use the PALM4MSA-MHTP algorithm. A notebook has already been written on the Hierarchical PALM4MSA algorithm and its wrappers and is a prerequisite to the reading of this notebook. In this notebook we shall see how to use the PALM4MSA-MHTP algorithm. A notebook has already been written on the Hierarchical PALM4MSA algorithm and its wrappers and is a prerequisite to the reading of this notebook.
The PALM4MSA-MHTP is a variant of PALM4MSA in which intervenes the Multilinear Hard Tresholdhing Pursuit algorithm (MHTP). The PALM4MSA-MHTP is a variant of PALM4MSA in which intervenes the Multilinear Hard Tresholdhing Pursuit algorithm (MHTP).
The interest of this variant is to avoid the situation in which PALM4MSA tends to stuck on some matrix supports without any way out. MHTP allows to explore the support more freely and hopefully find a more accurate factorization at the cost of just a few more dozens iterations of the gradient descent algorithm. The interest of this variant is to avoid the situation in which PALM4MSA tends to stuck on some matrix supports without any way out. MHTP allows to explore the support more freely and hopefully find a more accurate factorization at the cost of just a few more dozens iterations of the gradient descent algorithm.
For more information on the theory, you can read the following paper in which is treated the particular case of the BHTP (Bilinear HTP, that is running the MHTP on only two factors). For more information on the theory, you can read the following paper in which is treated the particular case of the BHTP (Bilinear HTP, that is running the MHTP on only two factors).
<a name="[1]">[1]</a> Quoc-Tung Le, Rémi Gribonval. Structured Support Exploration For Multilayer Sparse Matrix Fac- torization. ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2021, Toronto, Ontario, Canada. pp.1-5. [hal-03132013](https://hal.inria.fr/hal-03132013/document). <a name="[1]">[1]</a> Quoc-Tung Le, Rémi Gribonval. Structured Support Exploration For Multilayer Sparse Matrix Fac- torization. ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2021, Toronto, Ontario, Canada. pp.1-5. [hal-03132013](https://hal.inria.fr/hal-03132013/document).
## Configuring and Running PALM4MSA-MHTP ## Configuring and Running PALM4MSA-MHTP
This variant works very similarly to a classic run of PALM4MSA, that is with at least the same set of parameters. The main difference is that periodically (in term of PALM4MSA iterations) the MHTP algorithm is launched to renew each layer of the Faust being refined. This variant works very similarly to a classic run of PALM4MSA, that is with at least the same set of parameters. The main difference is that periodically (in term of PALM4MSA iterations) the MHTP algorithm is launched to renew each layer of the Faust being refined.
Hence running the PALM4MSA-MHTP needs two sets of parameters: [ParamsPalm4MSA](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/classpyfaust_1_1factparams_1_1ParamsPalm4MSA.html) and [MHTPParams](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/classpyfaust_1_1factparams_1_1MHTPParams.html) objects. The former should not be really new if you are used to PALM4MSA, the latter is dedicated to the configuartion of the MHTP part of PALM4MSA-MHTP. Hence running the PALM4MSA-MHTP needs two sets of parameters: [ParamsPalm4MSA](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/classpyfaust_1_1factparams_1_1ParamsPalm4MSA.html) and [MHTPParams](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/classpyfaust_1_1factparams_1_1MHTPParams.html) objects. The former should not be really new if you are used to PALM4MSA, the latter is dedicated to the configuartion of the MHTP part of PALM4MSA-MHTP.
The arguments to configure ``MHTPParams`` are basically: The arguments to configure ``MHTPParams`` are basically:
- ``num_its``: the number of iterations MHTP runs on each layer of the Faust. Remember that this number of iterations is for each factor. If you have two factors the overall number of iterations is ``2 x num_its`` (exactly as it is for PALM4MSA). - ``num_its``: the number of iterations MHTP runs on each layer of the Faust. Remember that this number of iterations is for each factor. If you have two factors the overall number of iterations is ``2 x num_its`` (exactly as it is for PALM4MSA).
- ``constant_step_size`` and ``step_size``: that determines if the MHTP gradient descent will be ran according to a constant step size, and in that case how long is the step size. By default, the step size is not constant and recomputed dynamically with the Lipschitz coefficient as in PALM4MSA. In most case, it is recommended to not use a constant step size to achieve a better loss function. - ``constant_step_size`` and ``step_size``: that determines if the MHTP gradient descent will be ran according to a constant step size, and in that case how long is the step size. By default, the step size is not constant and recomputed dynamically with the Lipschitz coefficient as in PALM4MSA. In most case, it is recommended to not use a constant step size to achieve a better loss function.
- ``palm4msa_period``: which governs how many times to evenly run the MHTP algorithm inside PALM4MSA itself. By default, the value is 50. It means that for example if PALM4MSA is running for 200 iterations, MHTP will run 4 times: at iterations 0, 49, 99, 149 and 199 of PALM4MSA. Every time it runs MHTP will run for ``num_its`` iterations. - ``palm4msa_period``: which governs how many times to evenly run the MHTP algorithm inside PALM4MSA itself. By default, the value is 50. It means that for example if PALM4MSA is running for 200 iterations, MHTP will run 4 times: at iterations 0, 49, 99, 149 and 199 of PALM4MSA. Every time it runs MHTP will run for ``num_its`` iterations.
- ``updating_lambda``: this boolean when set to ``True`` allows to update the scale factor of the Faust (the same one that is used in PALM4MSA) in the end of each iteration of MHTP. - ``updating_lambda``: this boolean when set to ``True`` allows to update the scale factor of the Faust (the same one that is used in PALM4MSA) in the end of each iteration of MHTP.
So let's run PALM4MSA-MHTP on a small example: we propose to factorize a 500x32 matrix into two factors. So let's run PALM4MSA-MHTP on a small example: we propose to factorize a 500x32 matrix into two factors.
**First** we configure PALM4MSA as usual: **First** we configure PALM4MSA as usual:
- The number of iterations of PALM4SA with the StoppingCriterion (here 200 iterations). - The number of iterations of PALM4SA with the StoppingCriterion (here 200 iterations).
- Then we define the constraints / projectors to use, here the SPLIN projector for the first factor of size 500x32 into which we want to count 5 nonzeros per row and the NORMCOL projector for the second factor in which each column must be normalized. - Then we define the constraints / projectors to use, here the SPLIN projector for the first factor of size 500x32 into which we want to count 5 nonzeros per row and the NORMCOL projector for the second factor in which each column must be normalized.
%% Cell type:code id:4c5b5fc5 tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.factparams import ParamsPalm4MSA, StoppingCriterion from pyfaust.factparams import ParamsPalm4MSA, StoppingCriterion
from pyfaust.proj import splin, normcol from pyfaust.proj import splin, normcol
projs = [ splin((500,32), 5), normcol((32,32), 1.0)] projs = [ splin((500,32), 5), normcol((32,32), 1.0)]
stop_crit = StoppingCriterion(num_its=200) stop_crit = StoppingCriterion(num_its=200)
param = ParamsPalm4MSA(projs, stop_crit) param = ParamsPalm4MSA(projs, stop_crit)
``` ```
%% Cell type:markdown id:bee4a03e tags: %% Cell type:markdown id: tags:
**Second** we define the ``MHTPParams`` structure to configure the MHTP pass of PALM4MSA-MHTP **Second** we define the ``MHTPParams`` structure to configure the MHTP pass of PALM4MSA-MHTP
Hence we define the arguments described above: ``num_its``, etc. We let all of them to their default values so there is not much to do. Hence we define the arguments described above: ``num_its``, etc. We let all of them to their default values so there is not much to do.
%% Cell type:code id:a15af4fd tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.factparams import MHTPParams from pyfaust.factparams import MHTPParams
mhtp_param = MHTPParams() mhtp_param = MHTPParams()
print(mhtp_param) print(mhtp_param)
``` ```
%% Cell type:markdown id:82a12c8a tags: %% Cell type:markdown id: tags:
It's now time to run the PALM4MSA-MHTP algorithm passing the two structures of parameters. Before we generate a random matrix ``M`` to factorize. It's now time to run the PALM4MSA-MHTP algorithm passing the two structures of parameters. Before we generate a random matrix ``M`` to factorize.
%% Cell type:code id:f86a3391 tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
from pyfaust.fact import palm4msa_mhtp from pyfaust.fact import palm4msa_mhtp
M = np.random.rand(500, 32) M = np.random.rand(500, 32)
F = palm4msa_mhtp(M, param, mhtp_param) F = palm4msa_mhtp(M, param, mhtp_param)
F F
``` ```
%% Cell type:markdown id:cf9f9c8e tags: %% Cell type:markdown id: tags:
As you see it's pretty similar to running PALM4MSA, which we could have done with the following code. As you see it's pretty similar to running PALM4MSA, which we could have done with the following code.
%% Cell type:code id:11020a57 tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.fact import palm4msa from pyfaust.fact import palm4msa
G = palm4msa(M, param) G = palm4msa(M, param)
G G
``` ```
%% Cell type:markdown id:2cc02249 tags: %% Cell type:markdown id: tags:
We can verify that the results are however not the same: We can verify that the results are however not the same:
%% Cell type:code id:db18bb81 tags: %% Cell type:code id: tags:
``` python ``` python
print("PALM4MSA-MHTP error:", (F-M).norm()/np.linalg.norm(M)) print("PALM4MSA-MHTP error:", (F-M).norm()/np.linalg.norm(M))
print("PALM4MSA error", (G-M).norm()/np.linalg.norm(M)) print("PALM4MSA error", (G-M).norm()/np.linalg.norm(M))
``` ```
%% Cell type:markdown id:0c0b1785 tags: %% Cell type:markdown id: tags:
They are very close though! In the next part of this notebook we'll demonstrate how PALM4MSA-MHTP can really enhance the accuracy of the Faust approximate and will do that on the MEG matrix (this matrix is also discussed and factorized in <a href="#[1]">[1]</a>). They are very close though! In the next part of this notebook we'll demonstrate how PALM4MSA-MHTP can really enhance the accuracy of the Faust approximate and will do that on the MEG matrix (this matrix is also discussed and factorized in <a href="#[1]">[1]</a>).
%% Cell type:markdown id:5c9d7a9d tags: %% Cell type:markdown id: tags:
## Factorizing the MEG matrix using the PALM4MSA-MHTP algorithm ## Factorizing the MEG matrix using the PALM4MSA-MHTP algorithm
%% Cell type:markdown id:36f5e20f tags: %% Cell type:markdown id: tags:
The MEG (for magnetoencephalography) matrix is also used in <a href="#[1]">[1]</a> to compare PALM4MSA and PALM4MSA-MHTP performance. The MEG (for magnetoencephalography) matrix is also used in <a href="#[1]">[1]</a> to compare PALM4MSA and PALM4MSA-MHTP performance.
The goal is to factorize the MEG matrix as $M_{MEG} \approx A \times B$ with $M_{MEG} \in \mathbb{R}^{8293 \times 204}, A \in \mathbb{R}^{8193 \times 204}$ and $B \in \mathbb{R}^{204 \times 204}$. A and B are subject to sparsity constraints. Here we'll test only one sparsity configuration of the two factors ($k_0$ = 100 and $k_1 = 25$ being respectively the per-row number of nonzeros of A and B). The goal is to factorize the MEG matrix as $M_{MEG} \approx A \times B$ with $M_{MEG} \in \mathbb{R}^{8293 \times 204}, A \in \mathbb{R}^{8193 \times 204}$ and $B \in \mathbb{R}^{204 \times 204}$. A and B are subject to sparsity constraints. Here we'll test only one sparsity configuration of the two factors ($k_0$ = 100 and $k_1 = 25$ being respectively the per-row number of nonzeros of A and B).
Let's load the MEG matrix which is embedded in FAµST data package (which should be downloaded automatically). Let's load the MEG matrix which is embedded in FAµST data package (which should be downloaded automatically).
%% Cell type:code id:6b08779b tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.demo import get_data_dirpath from pyfaust.demo import get_data_dirpath
from scipy.io import loadmat from scipy.io import loadmat
MEG = loadmat(get_data_dirpath()+"/matrix_MEG.mat")['matrix'] MEG = loadmat(get_data_dirpath()+"/matrix_MEG.mat")['matrix']
MEG.shape MEG.shape
``` ```
%% Cell type:markdown id:fb7144a5 tags: %% Cell type:markdown id: tags:
Going ahead we set the PALM4MSA parameters: Going ahead we set the PALM4MSA parameters:
%% Cell type:code id:58b58ce7 tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.proj import * from pyfaust.proj import *
from pyfaust.factparams import ParamsPalm4MSA from pyfaust.factparams import ParamsPalm4MSA
k0 = 100 k0 = 100
k1 = 25 k1 = 25
# the relevant projectors for our sparsity constraints # the relevant projectors for our sparsity constraints
projs = [splin((8193, 204), k0), splin((204, 204), k1)] projs = [splin((8193, 204), k0), splin((204, 204), k1)]
# the number of iterations of PALM4MSA # the number of iterations of PALM4MSA
stop_crit = StoppingCriterion(num_its=2000) stop_crit = StoppingCriterion(num_its=2000)
param = ParamsPalm4MSA(projs, stop_crit, is_verbose=True) param = ParamsPalm4MSA(projs, stop_crit, is_verbose=True)
param.use_csr = False # for not using sparse matrices during the algorithm param.factor_format = 'dense' # working with dense matrices
``` ```
%% Cell type:markdown id:2cd9365d tags: %% Cell type:markdown id: tags:
It remains the ``MHTPParams`` configuration (it's easy, we use the default parameters) : It remains the ``MHTPParams`` configuration (it's easy, we use the default parameters) :
%% Cell type:code id:d4540d02 tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.factparams import MHTPParams from pyfaust.factparams import MHTPParams
mhtp_p = MHTPParams() mhtp_p = MHTPParams()
``` ```
%% Cell type:markdown id:183b7428 tags: %% Cell type:markdown id: tags:
Now we are able to launch PALM4MSA and PALM4MSA-MHTP and compare the errors: the computation takes some time, it can last about 30 minutes. Now we are able to launch PALM4MSA and PALM4MSA-MHTP and compare the errors: the computation takes some time, it can last about 30 minutes.
%% Cell type:code id:f6bec34c tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.fact import palm4msa, palm4msa_mhtp from pyfaust.fact import palm4msa, palm4msa_mhtp
F_palm4msa = palm4msa(MEG, param) F_palm4msa = palm4msa(MEG, param)
F_mhtp = palm4msa_mhtp(MEG, param, mhtp_p) F_mhtp = palm4msa_mhtp(MEG, param, mhtp_p)
err_palm4msa = (F_palm4msa-MEG).norm()/np.linalg.norm(MEG) err_palm4msa = (F_palm4msa-MEG).norm()/np.linalg.norm(MEG)
err_palm4msa_mhtp = (F_mhtp-MEG).norm()/np.linalg.norm(MEG) err_palm4msa_mhtp = (F_mhtp-MEG).norm()/np.linalg.norm(MEG)
print("PALM4MSA error:", err_palm4msa) print("PALM4MSA error:", err_palm4msa)
print("PALM4MSA-MHTP error:", err_palm4msa_mhtp) print("PALM4MSA-MHTP error:", err_palm4msa_mhtp)
``` ```
%% Cell type:markdown id:7957bf65 tags: %% Cell type:markdown id: tags:
As you see the MHTP variant is twice accurate than PALM4MSA on this configuration. As you see the MHTP variant is twice accurate than PALM4MSA on this configuration.
## Using the Hierarchical PALM4MSA-MHTP algorithm ## Using the Hierarchical PALM4MSA-MHTP algorithm
Exactly the same way you can use the hierarchical factorization with PALM4MSA, it is possible to use the function ``pyfaust.fact.hierachical_mhtp`` to run a hierarchical factorization based on PALM4MSA-MHTP instead of simply PALM4MSA. Exactly the same way you can use the hierarchical factorization with PALM4MSA, it is possible to use the function ``pyfaust.fact.hierachical_mhtp`` to run a hierarchical factorization based on PALM4MSA-MHTP instead of simply PALM4MSA.
The launch of the algorithm function is very similar, you just need to add a ``MHTPParams`` instance to the argument list. The launch of the algorithm function is very similar, you just need to add a ``MHTPParams`` instance to the argument list.
Let's show how with this simple example: Let's show how with this simple example:
%% Cell type:code id:e9f81f54 tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust.fact import hierarchical_mhtp from pyfaust.fact import hierarchical_mhtp
from pyfaust.factparams import ParamsHierarchical, StoppingCriterion from pyfaust.factparams import ParamsHierarchical, StoppingCriterion
from pyfaust.factparams import MHTPParams from pyfaust.factparams import MHTPParams
from pyfaust.proj import sp, normcol, splin from pyfaust.proj import sp, normcol, splin
import numpy as np import numpy as np
M = np.random.rand(500, 32) M = np.random.rand(500, 32)
fact_cons = [splin((500, 32), 5), sp((32,32), 96), sp((32,32), 96)] fact_cons = [splin((500, 32), 5), sp((32,32), 96), sp((32,32), 96)]
res_cons = [normcol((32,32), 1), sp((32,32), 666), sp((32,32), 333)] res_cons = [normcol((32,32), 1), sp((32,32), 666), sp((32,32), 333)]
stop_crit1 = StoppingCriterion(num_its=200) stop_crit1 = StoppingCriterion(num_its=200)
stop_crit2 = StoppingCriterion(num_its=200) stop_crit2 = StoppingCriterion(num_its=200)
# 50 iterations of MHTP will run every 100 iterations of PALM4MSA (each time PALM4MSA is called by the hierarchical algorithm) # 50 iterations of MHTP will run every 100 iterations of PALM4MSA (each time PALM4MSA is called by the hierarchical algorithm)
mhtp_param = MHTPParams(num_its=50, palm4msa_period=100) mhtp_param = MHTPParams(num_its=50, palm4msa_period=100)
param = ParamsHierarchical(fact_cons, res_cons, stop_crit1, stop_crit2) param = ParamsHierarchical(fact_cons, res_cons, stop_crit1, stop_crit2)
param.is_verbose = True param.is_verbose = True
F = hierarchical_mhtp(M, param, mhtp_param) F = hierarchical_mhtp(M, param, mhtp_param)
F F
``` ```
%% Cell type:markdown id:6d7d3188 tags: %% Cell type:markdown id: tags:
This notebook is ending here. Please note that although the article <a href="#[1]">[1]</a> tackles the optimization problem of approximately factorizing a matrix in two sparse factors with the Bilinear Hard Tresholding Pursuit (BHTP) algorithm, the MHTP is a generalization to N factors that needs further experiences to be mature. Hence the function palm4msa_mhtp and moreover the function hierarchical_mhtp should be considered as experimental code and might evolve significantly in the future. This notebook is ending here. Please note that although the article <a href="#[1]">[1]</a> tackles the optimization problem of approximately factorizing a matrix in two sparse factors with the Bilinear Hard Tresholding Pursuit (BHTP) algorithm, the MHTP is a generalization to N factors that needs further experiences to be mature. Hence the function palm4msa_mhtp and moreover the function hierarchical_mhtp should be considered as experimental code and might evolve significantly in the future.
%% Cell type:markdown id:002b4460 tags: %% Cell type:markdown id: tags:
Thanks for reading this notebook! Many other are available at [faust.inria.fr](https://faust.inria.fr). Thanks for reading this notebook! Many other are available at [faust.inria.fr](https://faust.inria.fr).
**Note:** this notebook was executed using the following pyfaust version: **Note:** this notebook was executed using the following pyfaust version:
%% Cell type:code id:8afe1eff tags: %% Cell type:code id: tags:
``` python ``` python
import pyfaust import pyfaust
pyfaust.version() pyfaust.version()
``` ```
......
No preview for this file type
...@@ -361,7 +361,7 @@ First please copy the following function in the appropriate filename ``factorize ...@@ -361,7 +361,7 @@ First please copy the following function in the appropriate filename ``factorize
s = 8; s = 8;
tic tic
p = ParamsHierarchicalRectMat.createParams(MEG, {'rectmat', num_facts, k, s}); p = ParamsHierarchicalRectMat.createParams(MEG, {'rectmat', num_facts, k, s});
p.use_csr = false; p.factor_format = 'dense';
MEG16 = hierarchical(MEG, p, 'backend', 2020, 'gpu', strcmp('dev', 'gpu')); MEG16 = hierarchical(MEG, p, 'backend', 2020, 'gpu', strcmp('dev', 'gpu'));
total_time = toc; total_time = toc;
err = norm(MEG16-MEG, 'fro')/norm(MEG, 'fro'); err = norm(MEG16-MEG, 'fro')/norm(MEG, 'fro');
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Using The GPU FAµST API # Using The GPU FAµST API
In this notebook we'll see quickly how to leverage the GPU computing power with pyfaust. In this notebook we'll see quickly how to leverage the GPU computing power with pyfaust.
Since pyfaust 2.9.0 the API has been modified to make the GPU available directly from the python wrapper. Since pyfaust 2.9.0 the API has been modified to make the GPU available directly from the python wrapper.
Indeed, an independent GPU module (aka ``gpu_mod``) has been developed for this purpose. Indeed, an independent GPU module (aka ``gpu_mod``) has been developed for this purpose.
The first question you might ask is: does it work on my computer? Here is the answer: the loading of this module is quite transparent, if an NVIDIA GPU is available and CUDA is properly installed on your system, you have normally nothing to do except installing pyfaust to get the GPU implementations at your fingertips. We'll see at the end of this notebook how to load manually the module and how to get further information in case of an error. The first question you might ask is: does it work on my computer? Here is the answer: the loading of this module is quite transparent, if an NVIDIA GPU is available and CUDA is properly installed on your system, you have normally nothing to do except installing pyfaust to get the GPU implementations at your fingertips. We'll see at the end of this notebook how to load manually the module and how to get further information in case of an error.
It is worthy to note two drawbacks about the pyfaust GPU support: It is worthy to note two drawbacks about the pyfaust GPU support:
- Mac OS X is not supported because NVIDIA has stopped to support this OS. - Mac OS X is not supported because NVIDIA has stopped to support this OS.
- On Windows and Linux, the pyfaust GPU support is currently limited to CUDA 9.2 version. - On Windows and Linux, the pyfaust GPU support is currently limited to CUDA 9.2 version.
In addition to these drawbacks, please notice that the GPU module support is still considered in beta status as the code is relatively young and still evolving. However the API shouldn't evolve that much in a near future. In addition to these drawbacks, please notice that the GPU module support is still considered in beta status as the code is relatively young and still evolving. However the API shouldn't evolve that much in a near future.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Creating a GPU Faust object ### Creating a GPU Faust object
Let's start with some basic Faust creations on the GPU. Almost all the ways of creating a Faust object in CPU memory are also available to create a GPU Faust. Let's start with some basic Faust creations on the GPU. Almost all the ways of creating a Faust object in CPU memory are also available to create a GPU Faust.
First of all, creating a Faust using the constructor works seamlessly on GPU, the only need is to specify the ``dev`` keyword argument, as follows: First of all, creating a Faust using the constructor works seamlessly on GPU, the only need is to specify the ``dev`` keyword argument, as follows:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import Faust from pyfaust import Faust
from numpy.random import rand from numpy.random import rand
M, N = rand(10,10), rand(10,15) M, N = rand(10,10), rand(10,15)
gpuF = Faust([M, N], dev='gpu') gpuF = Faust([M, N], dev='gpu')
gpuF gpuF
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
It's clearly indicated in the output that the Faust object is instantiated in GPU memory (the N and M numpy arrays are copied from the CPU to the GPU memory). However it's also possible to check this programmatically: It's clearly indicated in the output that the Faust object is instantiated in GPU memory (the N and M numpy arrays are copied from the CPU to the GPU memory). However it's also possible to check this programmatically:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
gpuF.device gpuF.device
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
While for a CPU Faust you'll get: While for a CPU Faust you'll get:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
Faust([M, N], dev='cpu').device Faust([M, N], dev='cpu').device
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In ``gpuF`` the factors are dense matrices but it's totally possible to instantiate sparse matrices on the GPU as you can do on CPU side. In ``gpuF`` the factors are dense matrices but it's totally possible to instantiate sparse matrices on the GPU as you can do on CPU side.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import Faust from pyfaust import Faust
from scipy.sparse import random, csr_matrix from scipy.sparse import random, csr_matrix
S, T = csr_matrix(random(10, 15, density=0.25)), csr_matrix(random(15, 10, density=0.05)) S, T = csr_matrix(random(10, 15, density=0.25)), csr_matrix(random(15, 10, density=0.05))
sparse_gpuF = Faust([S, T], dev='gpu') sparse_gpuF = Faust([S, T], dev='gpu')
sparse_gpuF sparse_gpuF
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can also create a GPU Faust by explicitly copying a CPU Faust to the GPU memory. Actually, at anytime you can copy a CPU Faust to GPU and conversely. The ``clone()`` member function is here precisely for this purpose. Below we copy ``gpuF`` to CPU and back again to GPU in the new Faust ``gpuF2``. You can also create a GPU Faust by explicitly copying a CPU Faust to the GPU memory. Actually, at anytime you can copy a CPU Faust to GPU and conversely. The ``clone()`` member function is here precisely for this purpose. Below we copy ``gpuF`` to CPU and back again to GPU in the new Faust ``gpuF2``.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cpuF = gpuF.clone('cpu') cpuF = gpuF.clone('cpu')
gpuF2 = cpuF.clone('gpu') gpuF2 = cpuF.clone('gpu')
gpuF2 gpuF2
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Generating a GPU Faust ## Generating a GPU Faust
Many of the functions for generating a Faust object on CPU are available on GPU too. It is always the same, you precise the ``dev`` argument by assigning the ``'gpu'`` value and you'll get a GPU Faust instead of a CPU Faust. Many of the functions for generating a Faust object on CPU are available on GPU too. It is always the same, you precise the ``dev`` argument by assigning the ``'gpu'`` value and you'll get a GPU Faust instead of a CPU Faust.
For example, the code below will successively create a random GPU Faust, a Hadamard transform GPU Faust, identity GPU Faust and finally a DFT GPU Faust. For example, the code below will successively create a random GPU Faust, a Hadamard transform GPU Faust, identity GPU Faust and finally a DFT GPU Faust.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import rand as frand, eye as feye, wht, dft from pyfaust import rand as frand, eye as feye, wht, dft
print("Random GPU Faust:", frand(10,10, num_factors=11, dev='gpu')) print("Random GPU Faust:", frand(10,10, num_factors=11, dev='gpu'))
print("Hadamard GPU Faust:", wht(32, dev='gpu')) print("Hadamard GPU Faust:", wht(32, dev='gpu'))
print("Identity GPU Faust:", feye(16, dev='gpu')) print("Identity GPU Faust:", feye(16, dev='gpu'))
print("DFT GPU Faust:", dft(32, dev='gpu')) print("DFT GPU Faust:", dft(32, dev='gpu'))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Manipulating GPU Fausts and CPU interoperability ### Manipulating GPU Fausts and CPU interoperability
Once you've created GPU Faust objects, you can perform operations on them staying in GPU world (that is, with no array transfer to CPU memory). That's of course not always possible. Once you've created GPU Faust objects, you can perform operations on them staying in GPU world (that is, with no array transfer to CPU memory). That's of course not always possible.
For example, let's consider Faust-scalar multiplication and Faust-matrix product. In the first case the scalar is copied to the GPU memory and likewise in the second case the matrix is copied from CPU to GPU in order to proceed to the computation. However in both cases the Faust factors stay into GPU memory and don't move during the computation. For example, let's consider Faust-scalar multiplication and Faust-matrix product. In the first case the scalar is copied to the GPU memory and likewise in the second case the matrix is copied from CPU to GPU in order to proceed to the computation. However in both cases the Faust factors stay into GPU memory and don't move during the computation.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Faust-scalar multiplication # Faust-scalar multiplication
2*gpuF 2*gpuF
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
As you see the first factor's address has changed in the result compared to what it was in ``gpuF``. Indeed, when you make a scalar multiplication only one factor is multiplied, the others don't change, they are shared between the Faust being multiplied and the resulting Faust. This is an optimization and to go further in this direction the factor chosen to be multiplied is the smallest in memory (not necessarily the first one). As you see the first factor's address has changed in the result compared to what it was in ``gpuF``. Indeed, when you make a scalar multiplication only one factor is multiplied, the others don't change, they are shared between the Faust being multiplied and the resulting Faust. This is an optimization and to go further in this direction the factor chosen to be multiplied is the smallest in memory (not necessarily the first one).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Faust-matrix product (the matrix is copied to GPU # Faust-matrix product (the matrix is copied to GPU
# then the multiplication is performed on GPU) # then the multiplication is performed on GPU)
gpuF@rand(gpuF.shape[1],15) gpuF@rand(gpuF.shape[1],15)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
On the contrary, and that matters for optimization, there is no CPU-GPU transfer at all when you create another GPU Faust named for example ``gpuF2`` on the GPU and decide to multiply the two of them like this: On the contrary, and that matters for optimization, there is no CPU-GPU transfer at all when you create another GPU Faust named for example ``gpuF2`` on the GPU and decide to multiply the two of them like this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import rand as frand from pyfaust import rand as frand
gpuF2 = frand(gpuF.shape[1],18, dev='gpu') gpuF2 = frand(gpuF.shape[1],18, dev='gpu')
gpuF3 = gpuF@gpuF2 gpuF3 = gpuF@gpuF2
gpuF3 gpuF3
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Besides, it's important to note that ``gpuF3`` factors are not duplicated in memory because they already exist for ``gpuF`` and ``gpuF2``, that's an extra optimization: ``gpuF3`` is just a memory view of the factors of ``gpuF`` and ``gpuF2`` (the same GPU arrays are shared between ``Faust`` objects). That works pretty well the same for CPU ``Faust`` objects. Besides, it's important to note that ``gpuF3`` factors are not duplicated in memory because they already exist for ``gpuF`` and ``gpuF2``, that's an extra optimization: ``gpuF3`` is just a memory view of the factors of ``gpuF`` and ``gpuF2`` (the same GPU arrays are shared between ``Faust`` objects). That works pretty well the same for CPU ``Faust`` objects.
Finally, please notice that CPU Faust objects are not directly interoperable with GPU Fausts objects. You can try, it'll end up with an error. Finally, please notice that CPU Faust objects are not directly interoperable with GPU Fausts objects. You can try, it'll end up with an error.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cpuF = frand(5,5,5, dev='cpu') cpuF = frand(5,5,5, dev='cpu')
gpuF = frand(5,5,6, dev='gpu') gpuF = frand(5,5,6, dev='gpu')
try: try:
print("A first try to multiply a CPU Faust with a GPU one...") print("A first try to multiply a CPU Faust with a GPU one...")
cpuF@gpuF cpuF@gpuF
except: except:
print("it doesn't work, you must either convert cpuF to a GPU Faust or gpuF to a CPU Faust before multiplying.") print("it doesn't work, you must either convert cpuF to a GPU Faust or gpuF to a CPU Faust before multiplying.")
print("A second try using conversion as needed...") print("A second try using conversion as needed...")
print(cpuF.clone('gpu')@gpuF) # this is what you should do print(cpuF.clone('gpu')@gpuF) # this is what you should do
print("Now it works!") print("Now it works!")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Benchmarking your GPU with pyfaust! ### Benchmarking your GPU with pyfaust!
Of course when we run some code on GPU rather than on CPU, it is clearly to enhance the performances. So let's try your GPU and find out if it is worth it or not compared to your CPU. Of course when we run some code on GPU rather than on CPU, it is clearly to enhance the performances. So let's try your GPU and find out if it is worth it or not compared to your CPU.
First, measure how much time it takes on CPU to compute a Faust norm and the dense array corresponding to the product of its factors: First, measure how much time it takes on CPU to compute a Faust norm and the dense array corresponding to the product of its factors:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import rand as frand from pyfaust import rand as frand
cpuF = frand(1024, 1024, num_factors=10, fac_type='dense') cpuF = frand(1024, 1024, num_factors=10, fac_type='dense')
%timeit cpuF.norm(2) %timeit cpuF.norm(2)
%timeit cpuF.toarray() %timeit cpuF.toarray()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now let's make some GPU heat with norms and matrix products! Now let's make some GPU heat with norms and matrix products!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
gpuF = cpuF.clone(dev='gpu') gpuF = cpuF.clone(dev='gpu')
%timeit gpuF.norm(2) %timeit gpuF.norm(2)
%timeit gpuF.toarray() %timeit gpuF.toarray()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Of course not all GPUs are equal, below are the results I got using a Tesla V100: Of course not all GPUs are equal, below are the results I got using a Tesla V100:
``` ```
6.85 ms ± 9.06 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.85 ms ± 9.06 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.82 ms ± 90.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.82 ms ± 90.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
``` ```
Likewise let's compare the performance obtained for a sparse Faust: Likewise let's compare the performance obtained for a sparse Faust:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pyfaust import rand as frand from pyfaust import rand as frand
cpuF2 = frand(1024, 1024, num_factors=10, fac_type='sparse', density=.2) cpuF2 = frand(1024, 1024, num_factors=10, fac_type='sparse', density=.2)
gpuF2 = cpuF2.clone(dev='gpu') gpuF2 = cpuF2.clone(dev='gpu')
print("CPU times:") print("CPU times:")
%timeit cpuF2.norm(2) %timeit cpuF2.norm(2)
%timeit cpuF2.toarray() %timeit cpuF2.toarray()
print("GPU times:") print("GPU times:")
%timeit gpuF2.norm(2) %timeit gpuF2.norm(2)
%timeit gpuF2.toarray() %timeit gpuF2.toarray()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
On a Tesla V100 it gives these results: On a Tesla V100 it gives these results:
``` ```
9.86 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 9.86 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.8 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.8 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Running some FAµST algorithms on GPU ### Running some FAµST algorithms on GPU
Some of the FAµST algorithms implemented in the C++ core are now also available in pure GPU mode. Some of the FAµST algorithms implemented in the C++ core are now also available in pure GPU mode.
For example, let's compare the factorization times taken by the hierarchical factorization when launched on CPU and GPU. For example, let's compare the factorization times taken by the hierarchical factorization when launched on CPU and GPU.
When running on GPU, the matrix to factorize is copied in GPU memory and almost all operations executed during the algorithm don't imply the CPU in any manner (the only exception at this stage of development is the proximal operators that only run on CPU). When running on GPU, the matrix to factorize is copied in GPU memory and almost all operations executed during the algorithm don't imply the CPU in any manner (the only exception at this stage of development is the proximal operators that only run on CPU).
**Warning: THE COMPUTATION CAN LAST THIRTY MINUTES OR SO ON CPU** **Warning: THE COMPUTATION CAN LAST THIRTY MINUTES OR SO ON CPU**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scipy.io import loadmat from scipy.io import loadmat
from pyfaust.demo import get_data_dirpath from pyfaust.demo import get_data_dirpath
d = loadmat(get_data_dirpath()+'/matrix_MEG.mat') d = loadmat(get_data_dirpath()+'/matrix_MEG.mat')
def factorize_MEG(dev='cpu'): def factorize_MEG(dev='cpu'):
from pyfaust.fact import hierarchical from pyfaust.fact import hierarchical
from pyfaust.factparams import ParamsHierarchicalRectMat from pyfaust.factparams import ParamsHierarchicalRectMat
from time import time from time import time
from numpy.linalg import norm from numpy.linalg import norm
MEG = d['matrix'].T MEG = d['matrix'].T
num_facts = 9 num_facts = 9
k = 10 k = 10
s = 8 s = 8
t_start = time() t_start = time()
p = ParamsHierarchicalRectMat.createParams(MEG, ['rectmat', num_facts, k, s]) p = ParamsHierarchicalRectMat.createParams(MEG, ['rectmat', num_facts, k, s])
p.use_csr = False p.factor_format = 'dense'
MEG16 = hierarchical(MEG, p, backend=2020, on_gpu=dev=='gpu') MEG16 = hierarchical(MEG, p, backend=2020, on_gpu=dev=='gpu')
total_time = time()-t_start total_time = time()-t_start
err = norm(MEG16.toarray()-MEG)/norm(MEG) err = norm(MEG16.toarray()-MEG)/norm(MEG)
return MEG16, total_time, err return MEG16, total_time, err
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
gpuMEG16, gpu_time, gpu_err = factorize_MEG(dev='gpu') gpuMEG16, gpu_time, gpu_err = factorize_MEG(dev='gpu')
print("GPU time, error:", gpu_time, gpu_err) print("GPU time, error:", gpu_time, gpu_err)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cpuMEG16, cpu_time, cpu_err = factorize_MEG(dev='cpu') cpuMEG16, cpu_time, cpu_err = factorize_MEG(dev='cpu')
print("CPU time, error:", cpu_time, cpu_err) print("CPU time, error:", cpu_time, cpu_err)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Depending on you GPU card and CPU the results may vary, so below are shown some results obtained on specific hardware. Depending on you GPU card and CPU the results may vary, so below are shown some results obtained on specific hardware.
<table align="left"> <table align="left">
<tr align="center"> <tr align="center">
<th>Implementation</th> <th>Implementation</th>
<th> Hardware </th> <th> Hardware </th>
<th> Time (s) </th> <th> Time (s) </th>
<th>Error Faust vs MEG matrix </th> <th>Error Faust vs MEG matrix </th>
</tr> </tr>
<tr> <tr>
<td>CPU</td> <td>CPU</td>
<td>Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz</td> <td>Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz</td>
<td>1241.00</td> <td>1241.00</td>
<td>.129</td> <td>.129</td>
</tr> </tr>
<tr> <tr>
<td>GPU</td> <td>GPU</td>
<td>NVIDIA GTX980</td> <td>NVIDIA GTX980</td>
<td>166.72</td> <td>166.72</td>
<td>.129</td> <td>.129</td>
</tr> </tr>
<tr> <tr>
<td>GPU</td> <td>GPU</td>
<td>NVIDIA Tesla V100</td> <td>NVIDIA Tesla V100</td>
<td>49.49</td> <td>49.49</td>
<td>.129</td> <td>.129</td>
</tr> </tr>
</table> </table>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Manually loading the pyfaust GPU module ### Manually loading the pyfaust GPU module
If something goes wrong when trying to use the GPU pyfaust extension, here is how to manually load the module and obtain more information. If something goes wrong when trying to use the GPU pyfaust extension, here is how to manually load the module and obtain more information.
The key is the function [enable_gpu_mod](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/namespacepyfaust.html#aea03fff2525fc834f2a56e63fd30a54f). This function allows to give another try to ``gpu_mod`` loading with the verbose mode enabled. The key is the function [enable_gpu_mod](https://faustgrp.gitlabpages.inria.fr/faust/last-doc/html/namespacepyfaust.html#aea03fff2525fc834f2a56e63fd30a54f). This function allows to give another try to ``gpu_mod`` loading with the verbose mode enabled.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import pyfaust import pyfaust
pyfaust.enable_gpu_mod(silent=False, fatal=True) pyfaust.enable_gpu_mod(silent=False, fatal=True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Afterward you can call ``pyfaust.is_gpu_mod_enabled()`` to verify if it works in your script. Afterward you can call ``pyfaust.is_gpu_mod_enabled()`` to verify if it works in your script.
Below I copy outputs that show what it should look like when it doesn't work: Below I copy outputs that show what it should look like when it doesn't work:
1) If you asked a fatal error using ``enable_gpu_mod(silent=False, fatal=True)`` an exception will be raised and your code won't be able to continue after this call: 1) If you asked a fatal error using ``enable_gpu_mod(silent=False, fatal=True)`` an exception will be raised and your code won't be able to continue after this call:
``` ```
python -c "import pyfaust; pyfaust.enable_gpu_mod(silent=False, fatal=True)" python -c "import pyfaust; pyfaust.enable_gpu_mod(silent=False, fatal=True)"
WARNING: you must call enable_gpu_mod() before using GPUModHandler singleton. WARNING: you must call enable_gpu_mod() before using GPUModHandler singleton.
loading libgm loading libgm
libcublas.so.9.2: cannot open shared object file: No such file or directory libcublas.so.9.2: cannot open shared object file: No such file or directory
[...] [...]
Exception: Can't load gpu_mod library, maybe the path (/home/test/venv_pyfaust-2.10.14/lib/python3.7/site-packages/pyfaust/lib/libgm.so) is not correct or the backend (cuda) is not installed or configured properly so the libraries are not found. Exception: Can't load gpu_mod library, maybe the path (/home/test/venv_pyfaust-2.10.14/lib/python3.7/site-packages/pyfaust/lib/libgm.so) is not correct or the backend (cuda) is not installed or configured properly so the libraries are not found.
``` ```
2) If you just want a warning, you must use ``enable_gpu_mod(silent=False)``, the code will continue after with no gpu_mod enabled but you'll get some information about what is going wrong (here the CUDA toolkit version 9.2 is not installed) : 2) If you just want a warning, you must use ``enable_gpu_mod(silent=False)``, the code will continue after with no gpu_mod enabled but you'll get some information about what is going wrong (here the CUDA toolkit version 9.2 is not installed) :
``` ```
python -c "import pyfaust; pyfaust.enable_gpu_mod(silent=False)" python -c "import pyfaust; pyfaust.enable_gpu_mod(silent=False)"
WARNING: you must call enable_gpu_mod() before using GPUModHandler singleton. WARNING: you must call enable_gpu_mod() before using GPUModHandler singleton.
loading libgm loading libgm
libcublas.so.9.2: cannot open shared object file: No such file or directory libcublas.so.9.2: cannot open shared object file: No such file or directory
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
------------------------------------------------------------ ------------------------------------------------------------
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Thanks for reading this notebook! Many other are available at [faust.inria.fr](https://faust.inria.fr). Thanks for reading this notebook! Many other are available at [faust.inria.fr](https://faust.inria.fr).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
**Note**: this notebook was executed using the following pyfaust version: **Note**: this notebook was executed using the following pyfaust version:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import pyfaust import pyfaust
pyfaust.version() pyfaust.version()
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment