Commit bf6ed23f authored by Nathalie Furmento's avatar Nathalie Furmento
Browse files

website: tutorials/2014-05-PATC

git-svn-id: svn+ssh://scm.gforge.inria.fr/svn/starpu/website@12906 176f6dd6-97d6-42f4-bd05-d3db9ad07c7a
parent e6d4d95d
......@@ -6,6 +6,10 @@ LDFLAGS += $(shell pkg-config --libs starpu-1.1)
vector_scal_task_insert: vector_scal_task_insert.o vector_scal_cpu.o vector_scal_cuda.o vector_scal_opencl.o
mult: mult.c
gemm/sgemm: gemm/sgemm.o gemm/common/blas.o
gemm/dgemm: gemm/dgemm.o gemm/common/blas.o
clean:
rm -f vector_scal_task_insert mult *.o
rm -f gemm/sgemm gemm/dgemm gemm/*.o gemm/common/*.o
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -q mirage -l nodes=1:ppn=12
make gemm/sgemm
STARPU_WORKER_STATS=1 gemm/sgemm
......@@ -22,6 +22,9 @@
<div class="section">
<h2>Setup</h2>
<div class="section">
<h3>Connection to the Platform</h3>
<p>
The lab works are going to be done on
the <a href="http://plafrim.bordeaux.inria.fr/">Plafrim</a> platform.
......@@ -43,6 +46,10 @@ module load mpi/intel
module load runtime/starpu/1.1.0
</pre></tt>
</div>
<div class="section">
<h3>Job Submission</h3>
<p>
Jobs can be submitted to the platform to reserve a set of nodes and to
execute a application on those nodes. We advise not to reserve nodes
......@@ -102,6 +109,15 @@ Also add this do your <tt>.bashrc</tt> for further connections. Of course, on
a heterogeneous cluster, the cluster launcher script should set various
hostnames for the different node classes, as appropriate.
</p>
</div>
<div class="section">
<h3>Tutorial Material</h3>
<p>
faire un zip a copier sur plafrim avec tous les fichiers
</p>
</div>
</div>
......@@ -114,8 +130,8 @@ hostnames for the different node classes, as appropriate.
<h4>Making it and Running it</h4>
<p>
A typical <tt>Makefile</tt> for applications using StarPU is then the
following (<a href="files/Makefile">available for download</a>):
A typical <a href="files/Makefile"><tt>Makefile</tt></a> for
applications using StarPU is then the following:
</p>
<tt><pre>
......@@ -128,8 +144,7 @@ vector_scal_task_insert: vector_scal_task_insert.o vector_scal_cpu.o vector_scal
</pre></tt>
<p>
Download the following files along with the <tt>Makefile</tt>
mentioned above.
Here the source files for the application:
<ul>
<li><a href="files/vector_scal_task_insert.c">The main application</a></li>
<li><a href="files/vector_scal_cpu.c">The CPU implementation of the codelet</a></li>
......@@ -144,6 +159,14 @@ scheduler using the <a href="files/vector_scal.pbs">given qsub script</a>. It sh
given factor.
</p>
<tt><pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -q mirage -l nodes=1:ppn=12
make vector_scal_task_insert
vector_scal_task_insert
</pre></tt>
<h4>Computation Kernels</h4>
<p>
Examine the source code, starting from <tt>vector_scal_cpu.c</tt> : this is
......@@ -183,12 +206,12 @@ STARPU_NCPUS=0 STARPU_NCUDA=0 vector_scal_task_insert
<h4>Main Code</h4>
<p>
Now examine <tt>vector_scal_task_insert.c</tt>: the <tt>cl</tt>
(codelet) structure simply gathers pointers on the functions
mentioned above.
(codelet) structure simply gathers pointers on the functions
mentioned above.
</p>
<p>
The <tt>main</tt> function
The <tt>main</tt> function
<ul>
<li>Allocates an <tt>vector</tt> application buffer and fills it.</li>
<li>Registers it to StarPU, and gets back a DSM handle. From now on, the
......@@ -271,30 +294,40 @@ Figures show how the computation were distributed on the various processing
units.
</p>
<!--
<p>
<tt>examples/mult/xgemm.c</tt> is a very similar matrix-matrix product example,
but which makes use of BLAS kernels for much better performance. The <tt>mult_kernel_common</tt> functions
shows how we call <tt>DGEMM</tt> (CPUs) or <tt>cublasDgemm</tt> (GPUs) on the DSM interface.
<a href="files/gemm/xgemm.c"><tt>xgemm.c</tt></a> is a very similar
matrix-matrix product example, but which makes use of BLAS kernels for
much better performance. The <tt>mult_kernel_common</tt> functions
shows how we call <tt>DGEMM</tt> (CPUs) or <tt>cublasDgemm</tt> (GPUs)
on the DSM interface.
</p>
<p>Let's execute it on a node with one GPU:
<p>
Let's execute it on a node with one GPU:
</p>
<tt><pre>
STARPU_WORKER_STATS=1 [PATH]/examples/mult/sgemm
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -q mirage -l nodes=1:ppn=12
make gemm/sgemm
STARPU_WORKER_STATS=1 gemm/sgemm
</pre></tt>
</p>
(it takes some time for StarPU to make an off-line bus performance
calibration, but this is done only once).
</p>
<p>We can notice that StarPU gave much more tasks to the GPU. You can also try
<p>
We can notice that StarPU gave much more tasks to the GPU. You can also try
to set <tt>num_gpu=2</tt> to run on the machine which has two GPUs (there is
only one of them, so you may have to wait a long time, so submit this in
background in a separate terminal), the interesting thing here is that
with <b>no</b> application modification beyond making it use a task-based
programming model, we get multi-GPU support for free!</p>
-->
programming model, we get multi-GPU support for free!
</p>
</div>
<!--
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment