Commit 39a4860e authored by Nathalie Furmento's avatar Nathalie Furmento
Browse files

tutorials/2015-06-PATC: have <pre> and </pre> anchors alone on their line

git-svn-id: svn+ssh://scm.gforge.inria.fr/svn/starpu/website@15603 176f6dd6-97d6-42f4-bd05-d3db9ad07c7a
parent b0b098e9
......@@ -51,7 +51,8 @@ Once you are connected, we advise you to add the following lines at
the end of your file <tt>.bash_profile</tt>.
</p>
<tt><pre>
<tt>
<pre>
module purge
module load compiler/intel
module load hardware/hwloc
......@@ -121,20 +122,24 @@ sets of machines:
For the rest of the tutorial, we will use the queue <tt>formation_gpu</tt>.
</p>
<tt><pre>
<tt>
<pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -l nodes=1:ppn=12 -q formation_gpu
starpu_machine_display
</pre></tt>
</pre>
</tt>
<P>
To submit the script, simply call:
</p>
<tt><Pre>
<tt>
<pre>
qsub starpu_machine_display.pbs
</pre></tt>
</pre>
</tt>
<p>
The state of the job can be queried by calling the command <tt>qstat | grep $USER</tt>.
......@@ -159,9 +164,11 @@ cluster may be different. Let's force it to use the same machine ID
for the whole cluster:
</p>
<tt><pre>
<tt>
<pre>
$ export STARPU_HOSTNAME=mirage
</pre></tt>
</pre>
</tt>
<p>
Also add this to your <tt>.bash_profile</tt> for further connections. Of course, on
......@@ -196,14 +203,16 @@ A typical <a href="files/Makefile"><tt>Makefile</tt></a> for
applications using StarPU is the following:
</p>
<tt><pre>
<tt>
<pre>
CFLAGS += $(shell pkg-config --cflags starpu-1.1)
LDFLAGS += $(shell pkg-config --libs starpu-1.1)
%.o: %.cu
nvcc $(CFLAGS) $< -c $
vector_scal_task_insert: vector_scal_task_insert.o vector_scal_cpu.o vector_scal_cuda.o vector_scal_opencl.o
</pre></tt>
</pre>
</tt>
<p>
Here are the source files for the application:
......@@ -221,7 +230,8 @@ scheduler using the <a href="files/vector_scal.pbs">given qsub script vector_sca
given factor.
</p>
<tt><pre>
<tt>
<pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -l nodes=1:ppn=12 -q formation_gpu
......@@ -230,7 +240,8 @@ cd $PBS_O_WORKDIR
make vector_scal_task_insert
./vector_scal_task_insert
</pre></tt>
</pre>
</tt>
<h4>Computation Kernels</h4>
<p>
......@@ -262,13 +273,15 @@ of one the implementations simply by disabling a type of device when
running your application, e.g.:
</p>
<tt><pre>
<tt>
<pre>
# to force the implementation on a GPU device, by default, it will enable CUDA
STARPU_NCPUS=0 vector_scal_task_insert
# to force the implementation on a OpenCL device
STARPU_NCPUS=0 STARPU_NCUDA=0 vector_scal_task_insert
</pre></tt>
</pre>
</tt>
<p>
You can set the environment variable STARPU_WORKER_STATS to 1 when
......@@ -277,9 +290,11 @@ device. You can see the whole list of environment
variables <a href="http://runtime.bordeaux.inria.fr/StarPU/doc/html/ExecutionConfigurationThroughEnvironmentVariables.html">here</a>.
</p>
<tt><pre>
<tt>
<pre>
STARPU_WORKER_STATS=1 vector_scal_task_insert
</pre></tt>
</pre>
</tt>
<h4>Main Code</h4>
<p>
......@@ -359,7 +374,8 @@ whole C result matrix.
Run the application with the <a href="files/mult.pbs">batch scheduler</a>, enabling some statistics:
</p>
<tt><pre>
<tt>
<pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -l nodes=1:ppn=12 -q formation_gpu
......@@ -368,7 +384,8 @@ cd $PBS_O_WORKDIR
make mult
STARPU_WORKER_STATS=1 ./mult
</pre></tt>
</pre>
</tt>
<p>
Figures show how the computation were distributed on the various processing
......@@ -391,7 +408,8 @@ on the DSM interface.
Let's execute it.
</p>
<tt><pre>
<tt>
<pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -l nodes=1:ppn=12 -q formation_gpu
......@@ -400,7 +418,8 @@ cd $PBS_O_WORKDIR
make gemm/sgemm
STARPU_WORKER_STATS=1 ./gemm/sgemm
</pre></tt>
</pre>
</tt>
<!--
<p>
......@@ -514,17 +533,21 @@ For instance, compare the <tt>eager</tt> (default) and <tt>dmda</tt> scheduling
policies:
</p>
<tt><pre>
<tt>
<pre>
STARPU_BUS_STATS=1 STARPU_WORKER_STATS=1 gemm/sgemm -x 1024 -y 1024 -z 1024
</pre></tt>
</pre>
</tt>
<p>
with:
</p>
<tt><pre>
<tt>
<pre>
STARPU_BUS_STATS=1 STARPU_WORKER_STATS=1 STARPU_SCHED=dmda gemm/sgemm -x 1024 -y 1024 -z 1024
</pre></tt>
</pre>
</tt>
<p>
You can see most (all?) the computation have been done on GPUs,
......@@ -560,7 +583,8 @@ result is saved to a file in <tt>$STARPU_HOME</tt> for later re-use. The
performance model.
</p>
<tt><pre>
<tt>
<pre>
$ starpu_perfmodel_display -l
file: &lt;starpu_sgemm_gemm.mirage&gt;
$ starpu_perfmodel_display -s starpu_sgemm_gemm
......@@ -571,7 +595,8 @@ performance model for cuda_0_impl_0
# hash size flops mean (us) stddev (us) n
8bd4e11d 2359296 0.000000e+00 4.918095e+02 9.404866e+00 66
...
</pre></tt>
</pre>
</tt>
<p>
This shows that for the sgemm kernel with a 2.5M matrix slice, the average
......@@ -631,7 +656,8 @@ sequential-looking loop, and eventually waiting for all the tasks to
complete.
</p>
<tt><pre>
<tt>
<pre>
#how many nodes and cores
#PBS -W x=NACCESSPOLICY:SINGLEJOB -l nodes=1:ppn=12 -q formation_gpu
......@@ -640,7 +666,8 @@ cd $PBS_O_WORKDIR
make ring_async_implicit
mpirun -np 2 $PWD/ring_async_implicit
</pre></tt>
</pre>
</tt>
</div>
<div class="section">
......@@ -663,10 +690,8 @@ new distribution.
<h2>Contact</h2>
<p>
For any questions regarding StarPU, please contact the StarPU developers mailing list.
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
</pre>
</p>
</div>
<div class="section">
......@@ -692,11 +717,13 @@ To use the version of StarPU compiled with FxT support, you need to reload the
module StarPU after loading the module FxT.
</p>
<tt><pre>
<tt>
<pre>
module unload runtime/starpu/1.1.4
module load trace/fxt/0.2.13
module load runtime/starpu/1.1.4
</pre></tt>
</pre>
</tt>
<p>
The trace file is stored in <tt>/tmp</tt> by default. Since execution will
......@@ -705,9 +732,11 @@ we need to tell StarPU to store output traces in the home directory, by
setting:
</p>
<tt><pre>
<tt>
<pre>
$ export STARPU_FXT_PREFIX=$HOME/
</pre></tt>
</pre>
</tt>
<p>
do not forget the add the line in your file <tt>.bash_profile</tt>.
......@@ -719,9 +748,11 @@ trace file will be generated in your home directory. This can be converted to
several formats by using:
</p>
<tt><pre>
<tt>
<pre>
$ starpu_fxt_tool -i ~/prof_file_*
</pre></tt>
</pre>
</tt>
<p>
This will create
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment