index.html 46 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
2 3 4 5 6
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
7
<link rel="Shortcut icon" href="http://www.inria.fr/extension/site_inria/design/site_inria/images/favicon.ico" type="image/x-icon" />
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
8 9 10 11
</HEAD>

<body>

12
<div class="title">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
13
<h1><a href="./">StarPU</a></h1>
14 15
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
16

17
<div class="menu">
18
<a href="http://runtime.bordeaux.inria.fr/">RUNTIME TEAM</a> |
19
&nbsp; &nbsp; &nbsp;
20
|
21
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
22
<a href="#news">News</a> |
23
<a href="#contact">Contact</a> |
24
<a href="#features">Features</a> |
25
<a href="#software">Software</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
26
<a href="#tryit">Try it!</a> |
27
<a href="#publications">Publications</a> |
28
<a href="internships/">Jobs/Interns</a> |
29 30 31
<a href="files/">Download</a> |
<a href="tutorials">Tutorials</a> |
<a href="https://wiki.bordeaux.inria.fr/runtime/doku.php?id=starpu">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
32
</div>
33

34 35
<div class="section" id="overview">
<h3>Overview</h3>
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
    <li>A graph of tasks, using either the StarPU's high level <b>GCC plugin</b> pragmas or StarPU's rich <b>C API</b></li>
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
THIBAULT Samuel's avatar
fix URL  
THIBAULT Samuel committed
61
<span class="note">The StarPU documentation is available in <a href="./doc/starpu.pdf">PDF</a> and in <a href="./doc/html/">HTML</a>.</span> Please note that these documents are up-to-date with the latest release of StarPU.
62 63 64 65
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
66 67
<h3>News</h3>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
68 69 70 71 72 73
August 2016 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.6 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
74 75 76 77 78 79 80
August 2016 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.0 release of StarPU is now available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
81 82 83 84 85 86 87 88
August 2016 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      sixth (and really hopefully last) release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
89 90
March 2016 <b>&raquo;&nbsp;</b> <b>Engineer job offer</b> at Inria: more
details on the job and on how to apply are available <a href="internships/hibox.html">here</a>
91 92
</p>
<p>
93 94 95 96 97 98 99 100
December 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      fifth (and hopefully last) release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
101 102 103 104 105 106
September 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.5 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
<p>
107 108 109 110 111 112 113 114
August 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      fourth release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
115 116 117 118 119 120 121 122
July 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      third release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
123 124 125 126 127 128 129 130
May 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      second release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
131
April 2015 <b>&raquo;&nbsp;</b>A <a href="https://events.prace-ri.eu/event/339/">tutorial</a> on runtime systems including
132 133 134
StarPU will be given at INRIA Bordeaux in June 2015.
</p>
<p>
135 136 137 138 139 140 141 142
March 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      first release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
143 144 145 146 147
March 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.4 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
148
</div>
149 150

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
151
Get the latest StarPU news by subscribing to the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-announce">starpu-announce mailing list</a>.
152
See also the full <a href="news/">news</a>.
153 154
</div>

155 156 157 158 159 160 161 162 163 164 165
<div class="section" id="video">
<h3>Video Conference</h3>
<p>
A video recording (26') of a <a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/>presentation at the XDC2014 conference</a> gives an overview of StarPU
(<a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/xdc_starpu.pdf>slides</a>):
</p>
<center>
<iframe width="420" height="315" src="https://www.youtube.com/embed/frsWSqb8UJU" frameborder="0" allowfullscreen></iframe>
</center>
</div>

166 167
<div class="section" id="contact">
<h3>Contact</h3>
168
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
169 170 171
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
172 173
</div>

174 175
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
176

177
<h4>Portability</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
178
  <p>
179 180 181 182 183 184 185 186
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
187 188 189 190
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
191 192
  </p>

193
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
194
  <p>
195
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
196
data management library enforces memory coherency over the machine: before a
197 198
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
199 200
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
201
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
202
(including <b>GPU-GPU direct transfers</b>) to achieve the most of the architecture.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
203 204
  </p>

205
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
206
  <p>
207 208 209
Dependencies between tasks can be given several ways, to provide the
programmer with best flexibility:
  <ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
210 211
    <li><b>explicitly</b> between pairs of tasks,</li>
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
212
    tasks (thus including tasks which have not been created yet),</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
213
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
214
  </ul>
215 216
  </p>
  <p>
217 218 219
  These dependencies are computed in a completely decentralized way.
  </p>
  <p>
220 221 222 223
StarPU also supports an OpenMP-like <a href="doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
  </p>
  <p>
It also supports a <a href="doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
224 225 226 227 228
  </p>

<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
229
portable performances by efficiently (and easily) using all computing resources
230
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
231
machine, for instance by using scheduling strategies based on auto-tuned
232 233 234
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
235
Various strategies and variants are available. Some of them are centralized, but
THIBAULT Samuel's avatar
THIBAULT Samuel committed
236
most of them are <b>completely distributed</b>. dmda (a data-locality-aware MCT strategy,
237
thus similar to heft but starts executing tasks before the whole task graph is
238 239 240
submitted, thus allowing dynamic task submission and a decentralized scheduler),
eager (dumb centralized queue), decentralized locality-aware work-stealing, ...
The overhead per task is typically around the order of
241 242 243
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
negligible.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
244 245
  </p>

246 247
<h4>Clusters</h4>
  <p>
248
To deal with clusters, StarPU can nicely integrate with <a href="doc/html/MPISupport.html">MPI</a> through
249 250 251 252
explicit network communications, which will then be <b>automatically combined and
overlapped</b> with the intra-node data transfers and computation. The application
can also just provide the whole task graph, a data distribution over MPI nodes, and StarPU
will automatically determine which MPI node should execute which task, and
253 254 255 256
<b>generate all required MPI communications</b> accordingly (new in v0.9). We
have gotten excellent scaling on a 144-node cluster with GPUs, we have not yet
had the opportunity to test on a yet larger cluster. We have however measured
that with naive task submission, it should scale to a thousand nodes, and with
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
257
pruning-tuned task submission, it should scale to about a <b>million nodes</b>.
258 259 260 261 262 263
  </p>

<h4>Out of core</h4>
  <p>
When memory is not big enough for the working set, one may have to resort to
using disks. StarPU makes this seamless thanks to its <a href="doc/html/OutOfCore.html">out of core support</a> (new in 1.2).
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
264 265
StarPU will <b>automatically evict</b> data from the main memory in advance, and
<b>prefetch back</b> required data before it is needed for tasks.
266 267
  </p>

268 269 270
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
271
  that <a href="doc/html/cExtensions.html">extends the C programming
272 273
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
274 275 276
  StarPU program</b> (new in v1.0).
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
277 278
<h4>OpenMP 4 -compatible interface</h4>
<p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
279
  <a href=http://kstar.gforge.inria.fr/>K'Star</a> provides an OpenMP
THIBAULT Samuel's avatar
THIBAULT Samuel committed
280 281 282 283 284 285 286 287 288
  4 -compatible interface on top of StarPU. This allows to just rebuild OpenMP
  applications with the K'Star source-to-source compiler, then build it with the
  usual compiler, and the result will use the StarPU runtime.
</p>
<p>
  K'Star also provides some extensions to the OpenMP 4 standard, to let the
  StarPU runtime perform online optimizations.
</p>

289 290 291 292
<h4>OpenCL-compatible interface</h4>
<p>
  StarPU provides an <a href="doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
  which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
293 294
</p>

295 296 297 298
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
299
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1).  This allows
300 301 302 303 304
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

305 306
<h4>All in all</h4>
  <p>
307
All that means that, with the help
308
of <a href="doc/html/cExtensions.html">StarPU's extensions to the C
309 310
language</a>, the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also valid
THIBAULT Samuel's avatar
THIBAULT Samuel committed
311
StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
312
distribution over MPI nodes, it is even a distributed version!
313 314 315 316 317 318 319 320 321 322 323 324 325
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
326

327
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
328 329
<ul>
<li>SMP/Multicore Processors (x86, PPC, ...) </li>
330
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in 1.1)</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
331 332 333
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
</ul>
334
and soon (in v1.2)
335 336
<ul>
<li>Intel SCC</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
337
<li>Intel MIC / Xeon Phi</li>
338
</ul>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
339

340
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
341
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
342 343
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
344 345 346
<li>Windows</li>
</ul>

347
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
348 349 350 351 352 353 354 355 356
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

357
<p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
358 359 360 361 362 363
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
364
<center><a href="./images/greedy-lu-16k-fx5800.png"> <img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
365 366
</p>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
367 368 369 370 371 372
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
373
GFlop/s</b>
374
<center><a href="./images/dmda-lu-16k-fx5800.png"><img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
375 376
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
377 378
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
379
graph, as shown below (new in v1.1).
THIBAULT Samuel's avatar
THIBAULT Samuel committed
380 381 382
</p>

<center>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
383
<a href="images/temanejo.png"><img src="images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
384 385
</center>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
386 387
</div>

388 389 390 391 392
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
393 394
architectures, here is a non-exhaustive list (feel free to ask to be in the
list!):
395 396 397 398
</p>

<ul>
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
399
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
400 401
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
402
	<li><a href="http://buttari.perso.enseeiht.fr/qr_mumps/">qr_mumps</a>, sparse linear algebra library</li>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
403
	<li><a href="http://scalfmm-public.gforge.inria.fr/doc/">ScalFMM</a>, N-body interaction simulation using the Fast Multipole Method. </li>
404
	<li><a href="https://project.inria.fr/maphys/fr/">MaPHyS</a>, Massively Parallel Hybrid Solver</li>
405
	<li><a href="https://hal.archives-ouvertes.fr/hal-01086246">SignalPU</a>, a Dataflow-Graph-specific programming model. </li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
406
	<li><a href="https://tel.archives-ouvertes.fr/tel-01410049/">SCHNAPS</a>, Solver for Conservative Hypebolic Non-linear systems Applied to PlasmaS. </li>
407 408
</ul>

409 410 411 412 413
<p>
You can find below the list of publications related to applications
using StarPU.
</p>

414 415
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
416 417 418 419
<div class="section" id="tryit">
<h3>Give it a try!</h3>
<p>
You can easily try the performance on the Cholesky factorization for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
420 421 422
instance. Make sure to have the pkg-config and
<a href="http://www.open-mpi.org/projects/hwloc/">hwloc</a>
software installed for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
423 424 425
proper CPU control and BLAS kernels for your computation units and configured in
your environment (e.g. MKL for CPUs and CUBLAS for GPUs).
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
426 427

<tt><pre>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
428 429 430 431 432 433
$ wget http://starpu.gforge.inria.fr/files/starpu-someversion.tar.gz
$ tar xf starpu-someversion.tar.gz
$ cd starpu-someversion
$ ./configure
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
THIBAULT Samuel's avatar
THIBAULT Samuel committed
434
$ STARPU_SCHED=dmdas mpirun -np 4 -machinefile mymachines ./mpi/examples/matrix_decomposition/mpi_cholesky_distributed -size $((960*40*4)) -nblocks $((40*4))</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
435 436 437 438 439

<p>Note that the dmdas scheduler uses performance models, and thus needs
calibration execution before exhibiting optimized performance (until the "model
something is not calibrated enough" messages go away).</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
440 441 442 443 444 445 446 447 448 449 450 451 452 453
<p>To get a glimpse at what happened, you can get an execution trace by
installing
<a href="http://savannah.nongnu.org/projects/fkt">FxT</a>
and <a href="http://vite.gforge.inria.fr/">ViTE</a>, and enabling traces:
</p>

<tt><pre>
$ ./configure --with-fxt
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
$ ./tools/starpu_fxt_tool -i /tmp/prof_file_${USER}_0
$ vite paje.trace
</pre></tt>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
454 455 456 457 458 459 460 461 462 463 464 465 466
<p>
Starting with StarPU 1.1, it is also possible to reproduce the performance that
we show in our articles on our machines, by installing simgrid, and then using
the simulation mode of StarPU using the performance models of our machines:
</p>
  <tt><pre>
$ ./configure --enable-simgrid
$ make -j 12
$ STARPU_PERF_MODEL_DIR=$PWD/tools/perfmodels/sampling STARPU_HOSTNAME=mirage STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
# size	ms	GFlops
38400	10216	1847.6</pre></tt>
<p>(MPI simulation is not supported yet)</p>

467 468 469 470
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
471
listed <a href="./publications">here</a>
472 473 474 475 476 477 478
with the corresponding Bibtex entries.
</p>

<p>A good overview is available in
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

479
<h4>General Presentations</h4> 
480 481
<ol>
<li>
482 483 484 485 486 487 488
<a name="agullo:hal-01283949"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Harnessing clusters of hybrid nodes with a sequential task-based programming model</strong><br/>
In <em>8th International Workshop on Parallel Matrix Algorithms and Applications</em>, July 2014<br/>
[<a href="https://hal.inria.fr/hal-01283949">WWW</a>]
[<a href="https://hal.inria.fr/hal-01283949/file/pmaa14.pdf">PDF</a>]
</li>
<li>
489 490 491
<a name="Aug11Thesis"></a>Cédric Augonnet<br/>
<strong>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2011<br/>
492
[<a href="http://tel.archives-ouvertes.fr/tel-00777154">WWW</a>]
493 494
</li>
<li>
495 496 497
<a name="AugThiNamWac11CCPE"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
<em>Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011<br/>
498
[<a href="http://hal.inria.fr/inria-00550877">WWW</a>]
499
[doi:<a href="http://dx.doi.org/10.1002/cpe.1631">10.1002/cpe.1631</a>]
500
</li>
501
<li>
502 503 504 505
<a name="AugThiNamWac10RR7240"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines</strong><br/>
Technical Report 7240, INRIA, March 2010<br/>
[<a href="http://hal.inria.fr/inria-00467677">WWW</a>]
506 507
</li>
<li>
508 509 510 511
<a name="Aug09Renpar19"></a>Cédric Augonnet<br/>
<strong>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</strong><br/>
In <em>19èmes Rencontres Francophones du Parallélisme</em>, Toulouse / France, September 2009<br/>
Note: Best Paper Award<br/>
512
[<a href="http://hal.inria.fr/inria-00411581">WWW</a>]
513 514
</li>
<li>
515 516 517 518
<a name="AugThiNamWac09Europar"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
In <em>Proceedings of the 15th International Euro-Par Conference</em>, volume 5704 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 863-874, August 2009<br/>
Springer<br/>
519
[<a href="http://hal.inria.fr/inria-00384363">WWW</a>]
520 521 522 523 524 525 526 527
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03869-3_80">10.1007/978-3-642-03869-3_80</a>]
</li>
<li>
<a name="AugNam08HPPC"></a>Cédric Augonnet and Raymond Namyst<br/>
<strong>A unified runtime system for heterogeneous multicore architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of <em>Lecture Notes in Computer Science</em>, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008<br/>
Springer<br/>
<strong>ISBN:</strong> 978-3-642-00954-9<br/>
528
[<a href="http://hal.inria.fr/inria-00326917">WWW</a>]
529 530 531 532 533 534
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-00955-6_22">10.1007/978-3-642-00955-6_22</a>]
</li>
<li>
<a name="Aug08Master"></a>Cédric Augonnet<br/>
<strong>Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes</strong><br/>
Mémoire de DEA, Université Bordeaux 1, June 2008<br/>
535
[<a href="http://hal.inria.fr/inria-00289361">WWW</a>]
536 537
</li>
</ol>
538
<h4>On Composability</h4> 
Nathalie Furmento's avatar
Nathalie Furmento committed
539 540
<ol>
<li>
541 542 543
<a name="AH13Renpar"></a>Andra Hugo<br/>
<strong>Le problème de la composition parallèle : une approche supervisée</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
544
[<a href="http://hal.inria.fr/hal-00773610">WWW</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
545 546
</li>
<li>
547 548 549
<a name="hugo:hal-00824514"></a>Andra Hugo, Abdou Guermouche, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Composing multiple StarPU applications over heterogeneous machines: a supervised approach</strong><br/>
In <em>Third International Workshop on Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May 2013<br/>
550
[<a href="http://hal.inria.fr/hal-00824514">WWW</a>]
551 552 553 554 555
</li>
<li>
<a name="AH11Master"></a>Andra Hugo<br/>
<strong>Composabilité de codes parallèles sur architectures hétérogènes</strong><br/>
Mémoire de Master, Université Bordeaux 1, June 2011<br/>
556
[<a href="http://hal.inria.fr/inria-00619654/en/">WWW</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
557 558
</li>
</ol>
559
<h4>On Scheduling</h4> 
560
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
561
<li>
562 563 564 565 566 567 568 569
<a name="agullo:hal-01223573"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Are Static Schedules so Bad ? A Case Study on Cholesky Factorization</strong><br/>
In <em>IPDPS'16</em>, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, Chicago, IL, United States, May 2016<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-01223573">WWW</a>]
[<a href="https://hal.inria.fr/hal-01223573/file/heteroprioCameraReady-ieeeCompatiable.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
570 571 572 573
<a name="beaumont:hal-01361992"></a>Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche,  and Suraj Kumar<br/>
<strong>Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources</strong><br/>
In <em>International Conference on High Performance Computing, Data, and Analytics (HiPC)</em>, Hyderabad, India, December 2016<br/>
[<a href="https://hal.inria.fr/hal-01361992">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
574
[<a href="https://hal.inria.fr/hal-01361992v2/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
575 576
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
577 578 579 580 581 582 583
<a name="garciapinto:hal-01353962"></a>Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault,  and Vincent Danjean<br/>
<strong>Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach</strong><br/>
In <em>3rd Workshop on Visual Performance Analysis (VPA)</em>, Salt Lake City, United States, November 2016<br/>
Note: Held in conjunction with SC16<br/>
[<a href="https://hal.inria.fr/hal-01353962">WWW</a>]
[<a href="https://hal.inria.fr/hal-01353962/file/VPA_2016_paper_3.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
584
<li>
585 586 587 588 589 590 591
<a name="JaBlHU2016a"></a>Johan Janzén, David Black-Schaffer,  and Andra Hugo.<br/>
<strong>Partitioning GPUs for Improved Scalability</strong>.
In <em>IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, October 2016.
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2016.14">10.1109/SBAC-PAD.2016.14</a>]
<br />
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
592 593 594 595 596 597
<a name="beaumont:hal-01386174"></a>Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs</strong><br/>
Note: Working paper or preprint, October 2016<br/>
[<a href="https://hal.inria.fr/hal-01386174">WWW</a>]
[<a href="https://hal.inria.fr/hal-01386174/file/heteroPrioApproxProofsRR.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
598
<li>
599 600 601
<a name="agullo:hal-01120507"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal,  and Samuel Thibault<br/>
<strong>Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms</strong><br/>
In <em>Heterogeneity in Computing Workshop 2015</em>, Hyderabad, India, May 2015<br/>
602
[<a href="https://hal.inria.fr/hal-01120507">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
603
</li>
604
<li>
605 606 607
<a name="sergent:hal-00978364"></a>Marc Sergent and Simon Archipoff<br/>
<strong>Modulariser les ordonnanceurs de tâches : une approche structurelle</strong><br/>
In <em>Compas'2014</em>, Neuchâtel, Suisse, April 2014<br/>
608 609
[<a href="http://hal.inria.fr/hal-00978364">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978364/PDF/ordonnanceurs_modulaires.pdf">PDF</a>]
610 611
</li>
</ol>
612
<h4>On The C Extensions</h4> 
613
<ol>
614 615 616 617
<li>
<a name="LC13Report"></a>Ludovic Courtès<br/>
<strong>C Language Extensions for Hybrid CPU/GPU Programming with StarPU</strong><br/>
Research Report RR-8278, INRIA, April 2013<br/>
618 619
[<a href="http://hal.inria.fr/hal-00807033">WWW</a>]
[<a href="http://hal.inria.fr/hal-00807033/PDF/RR-8278.pdf">PDF</a>]
620 621
</li>
</ol>
622
<h4>On OpenMP Support on top of StarPU</h4> 
THIBAULT Samuel's avatar
THIBAULT Samuel committed
623
<ol>
624
<li>
625 626 627 628 629 630 631
<a name="agullo:hal-01372022"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
<strong>Bridging the gap between OpenMP 4<br/>0 and native runtime systems for the fast multipole method</strong><br/>
Research Report RR-8953, Inria, March 2016<br/>
[<a href="https://hal.inria.fr/hal-01372022">WWW</a>]
[<a href="https://hal.inria.fr/hal-01372022/file/RR-8953.pdf">PDF</a>]
</li>
<li>
632 633 634 635
<a name="virouleau:hal-01081974"></a>Philippe Virouleau, Pierrick BRUNET, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage,  and Thierry Gautier<br/>
<strong>Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite</strong><br/>
In <em>10th International Workshop on OpenMP, IWOMP2014</em>, 10th International Workshop on OpenMP, IWOMP2014, Salvador, Brazil, France, pages 16 - 29, September 2014<br/>
Springer<br/>
636
[<a href="https://hal.inria.fr/hal-01081974">WWW</a>]
637
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-11454-5_2">10.1007/978-3-319-11454-5_2</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
638 639
</li>
</ol>
640
<h4>On MPI Support</h4> 
641 642
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
643 644 645 646 647 648 649
<a name="agullo:hal-01332774"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
Research Report RR-8927, Inria Bordeaux Sud-Ouest ; Bordeaux INP ; CNRS ; Université de Bordeaux ; CEA, June 2016<br/>
[<a href="https://hal.inria.fr/hal-01332774">WWW</a>]
[<a href="https://hal.inria.fr/hal-01332774/file/RR-8927.pdf">PDF</a>]
</li>
<li>
650 651 652 653 654 655 656
<a name="augonnet:hal-00992208"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
Rapport de recherche RR-8538, INRIA, May 2014<br/>
[<a href="http://hal.inria.fr/hal-00992208">WWW</a>]
[<a href="http://hal.inria.fr/hal-00992208/PDF/RR-8538.pdf">PDF</a>]
</li>
<li>
657 658 659 660 661
<a name="AugAumFurNamThi2012EuroMPI"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst,  and Samuel Thibault<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, <em>EuroMPI 2012</em>, volume 7490 of <em>LNCS</em>, September 2012<br/>
Springer<br/>
Note: Poster Session<br/>
662
[<a href="http://hal.inria.fr/hal-00725477">WWW</a>]
663
</li>
664
</ol>
Nathalie Furmento's avatar
Nathalie Furmento committed
665
<h4>On Memory Control</h4> 
THIBAULT Samuel's avatar
THIBAULT Samuel committed
666 667
<ol>
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
668
<a name="sergent:hal-01284004"></a>Marc Sergent, David Goudin, Samuel Thibault,  and Olivier Aumage<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
669
<strong>Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System</strong><br/>
Nathalie Furmento's avatar
Nathalie Furmento committed
670
In <em>21st International Workshop on High-Level Parallel Programming Models and Supportive Environments</em>, Chicago, United States, May 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
671 672 673
[<a href="https://hal.inria.fr/hal-01284004">WWW</a>]
[<a href="https://hal.inria.fr/hal-01284004/file/PID4127657.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
674 675 676 677 678 679 680
<li>
<a name="agullo:hal-01332774"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
Research Report RR-8927, Inria Bordeaux Sud-Ouest ; Bordeaux INP ; CNRS ; Université de Bordeaux ; CEA, June 2016<br/>
[<a href="https://hal.inria.fr/hal-01332774">WWW</a>]
[<a href="https://hal.inria.fr/hal-01332774/file/RR-8927.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
681
</ol>
682
<h4>On Data Transfer Management</h4> 
683 684
<ol>
<li>
685 686 687
<a name="AugCleThiNam10ICPADS"></a>Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Data-Aware Task Scheduling on Multi-Accelerator based Platforms</strong><br/>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, Shanghai, China, December 2010<br/>
688
[<a href="http://hal.inria.fr/inria-00523937">WWW</a>]
689
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2010.129">10.1109/ICPADS.2010.129</a>]
690 691
</li>
</ol>
692
<h4>On Performance Model Tuning</h4> 
693 694
<ol>
<li>
695 696 697 698
<a name="AugThiNam09HPPC"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2009, HPPC'09</em>, volume 6043 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 56-65, August 2009<br/>
Springer<br/>
699
[<a href="http://hal.inria.fr/inria-00421333">WWW</a>]
700
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-14122-5_9">10.1007/978-3-642-14122-5_9</a>]
701 702
</li>
</ol>
703
<h4>On The Simulation Support through SimGrid</h4> 
THIBAULT Samuel's avatar
THIBAULT Samuel committed
704
<ol>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
705
<li>
706 707 708
<a name="stanisic:hal-01147997"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
<em>Concurrency and Computation: Practice and Experience</em>, pp 16, May 2015<br/>
709 710
[<a href="https://hal.inria.fr/hal-01147997">WWW</a>]
[<a href="https://hal.inria.fr/hal-01147997/file/CCPE14_article.pdf">PDF</a>]
711
[doi:<a href="http://dx.doi.org/10.1002/cpe">10.1002/cpe</a>]
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
712
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
713
<li>
714 715 716 717
<a name="stanisic:hal-01011633"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
In <em>Euro-par - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014<br/>
Springer-Verlag<br/>
718 719
[<a href="http://hal.inria.fr/hal-01011633">WWW</a>]
[<a href="http://hal.inria.fr/hal-01011633/PDF/StarPUSG_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
720 721
</li>
</ol>
722
<h4>On The Cell Support</h4> 
723 724
<ol>
<li>
725 726 727
<a name="AugThiNamNij09Samos"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Maik Nijhuis<br/>
<strong>Exploiting the Cell/BE architecture with the StarPU unified runtime system</strong><br/>
In <em>SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation</em>, volume 5657 of <em>Lecture Notes in Computer Science</em>, Samos, Greece, July 2009<br/>
728
[<a href="http://hal.inria.fr/inria-00378705">WWW</a>]
729
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03138-0_36">10.1007/978-3-642-03138-0_36</a>]
730 731
</li>
</ol>
732
<h4>On Applications</h4> 
733
<ol>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
734
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
735 736 737 738 739 740 741
<a name="agullo:hal-01387482"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz,  and Luka Stanisic<br/>
<strong>Task-based fast multipole method for clusters of multicore processors</strong><br/>
Research Report RR-8970, Inria Bordeaux Sud-Ouest, October 2016<br/>
[<a href="https://hal.inria.fr/hal-01387482">WWW</a>]
[<a href="https://hal.inria.fr/hal-01387482/file/report-8970.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
742 743 744 745 746 747 748
<a name="agullo:hal-01316982"></a>E Agullo, L Giraud, A Guermouche, S Nakov,  and Jean Roman<br/>
<strong>Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures</strong><br/>
Research Report 8912, Inria Bordeaux Sud-Ouest, May 2016<br/>
[<a href="https://hal.inria.fr/hal-01316982">WWW</a>]
[<a href="https://hal.inria.fr/hal-01316982/file/RR-8912.pdf">PDF</a>]
</li>
<li>
749 750 751
<a name="rossignon:tel-01230876"></a>Corentin Rossignon<br/>
<strong>A fine grain model programming for parallelization of sparse linear solver</strong><br/>
PhD thesis, Université de Bordeaux, July 2015<br/>
752 753
[<a href="https://tel.archives-ouvertes.fr/tel-01230876">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01230876/file/ROSSIGNON_CORENTIN_2015.pdf">PDF</a>]
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
754
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
755
<li>
756 757 758
<a name="MaMiDuAuThiAoNa15"></a>Vìctor Martìnez, David Michéa, Fabrice Dupros, Olivier Aumage, Samuel Thibault, Hideo Aochi,  and Philippe Olivier Alexandre Navaux<br/>
<strong>Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system</strong><br/>
In <em>27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, Florianopolis, Brazil, October 2015<br/>
759 760
[<a href="https://hal.inria.fr/hal-01182746">WWW</a>]
[<a href="https://hal.inria.fr/hal-01182746/file/sbac2015_soumission.pdf">PDF</a>]
761 762
</li>
<li>
763 764 765 766
<a name="sylvain:hal-01005765"></a>Sylvain Henry, Alexandre Denis, Denis Barthou, Marie-Christine Counilh,  and Raymond Namyst<br/>
<strong>Toward OpenCL Automatic Multi-Device Support</strong><br/>
In Fernando Silva, Ines Dutra,  and Vitor Santos Costa, editors, <em>Euro-Par 2014</em>, Porto, Portugal, August 2014<br/>
Springer<br/>
767 768
[<a href="http://hal.inria.fr/hal-01005765">WWW</a>]
[<a href="http://hal.inria.fr/hal-01005765/PDF/final.pdf">PDF</a>]
769 770
</li>
<li>
771 772 773 774 775
<a name="lacoste:hal-00987094"></a>Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault,  and George Bosilca<br/>
<strong>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</strong><br/>
In <em>HCW'2014 workshop of IPDPS</em>, Phoenix, États-Unis, May 2014<br/>
IEEE<br/>
Note: RR-8446 RR-8446<br/>
776 777
[<a href="http://hal.inria.fr/hal-00987094">WWW</a>]
[<a href="http://hal.inria.fr/hal-00987094/PDF/sparsegpus.pdf">PDF</a>]
778 779
</li>
<li>
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804
<a name="lacoste:hal-00925017"></a>Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault,  and George Bosilca<br/>
<strong>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</strong><br/>
Rapport de recherche RR-8446, INRIA, January 2014<br/>
[<a href="http://hal.inria.fr/hal-00925017">WWW</a>]
[<a href="http://hal.inria.fr/hal-00925017/PDF/RR-8446.pdf">PDF</a>]
</li>
<li>
<a name="sergent:hal-00978602"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Overview of Distributed Linear Algebra on Hybrid Nodes over the StarPU Runtime</strong><br/>
SIAM Conference on Parallel Processing for Scientific Computing, February 2014<br/>
[<a href="http://hal.inria.fr/hal-00978602">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978602/PDF/siampp14.pdf">PDF</a>]
</li>
<li>
<a name="Bor13Thesis"></a>Cyril Bordage<br/>
<strong>Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2013<br/>
</li>
<li>
<a name="Hen13Thesis"></a>Sylvain Henry<br/>
<strong>Modèles de programmation et supports exécutifs pour architectures hétérogènes</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, November 2013<br/>
[<a href="http://tel.archives-ouvertes.fr/tel-00948309">WWW</a>]
</li>
<li>
805 806 807
<a name="hen13fhpc"></a>Sylvain Henry<br/>
<strong>ViperVM: a Runtime System for Parallel Functional High-Performance Computing on Heterogeneous Architectures</strong><br/>
In <em>2nd Workshop on Functional High-Performance Computing (FHPC'13)</em>, Boston, États-Unis, September 2013<br/>
808 809
[<a href="http://hal.inria.fr/hal-00851122">WWW</a>]
[<a href="http://hal.inria.fr/hal-00851122/PDF/fhpc13.pdf">PDF</a>]
810 811
</li>
<li>
812 813 814
<a name="odajima:hal-00920915"></a>Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, Toshihiro Hanawa, Yuetsu Kodama, Raymond Namyst, Samuel Thibault,  and Olivier Aumage<br/>
<strong>Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing</strong><br/>
In <em>The 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013)</em>, Vietri sul Mare, Italie, December 2013<br/>
815 816
[<a href="http://hal.inria.fr/hal-00920915">WWW</a>]
[<a href="http://hal.inria.fr/hal-00920915/PDF/ADPC2013-117.pdf">PDF</a>]
817 818 819 820 821 822
</li>
<li>
<a name="ohshima:hal-00926144"></a>Satoshi Ohshima, Satoshi Katagiri, Kengo Nakajima, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Implementation of FEM Application on GPU with StarPU</strong><br/>
In <em>SIAM CSE13 - SIAM Conference on Computational Science and Engineering 2013</em>, Boston, États-Unis, February 2013<br/>
SIAM<br/>
823
[<a href="http://hal.inria.fr/hal-00926144">WWW</a>]
824 825 826 827 828
</li>
<li>
<a name="Ros13Renpar"></a>Corentin Rossignon<br/>
<strong>Optimisation du produit matrice-vecteur creux sur architecture GPU pour un simulateur de reservoir</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
829
[<a href="http://hal.inria.fr/hal-00773571">WWW</a>]
830 831
</li>
<li>
832 833 834
<a name="rossignon:hal-00858350"></a>Corentin Rossignon, Pascal Hénon, Olivier Aumage,  and Samuel Thibault<br/>
<strong>A NUMA-aware fine grain parallelization framework for multi-core architecture</strong><br/>
In <em>PDSEC - 14th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - 2013</em>, Boston, États-Unis, May 2013<br/>
835 836
[<a href="http://hal.inria.fr/hal-00858350">WWW</a>]
[<a href="http://hal.inria.fr/hal-00858350/PDF/taggre_pdsec_2013.pdf">PDF</a>]
837
</li>
838
<li>
839 840 841 842 843 844 845 846 847 848 849 850
<a name="HenDenBar2012TSI"></a>Sylvain Henry, Alexandre Denis,  and Denis Barthou<br/>
<strong>Programmation unifiée multi-accélérateur OpenCL</strong><br/>
<em>Techniques et Sciences Informatiques</em>, (8-9-10):1233-1249, 2012<br/>
[<a href="http://hal.inria.fr/hal-00772742">WWW</a>]
</li>
<li>
<a name="MahManAugThi12TSI"></a>Sidi Ahmed Mahmoudi, Pierre Manneback, Cédric Augonnet,  and Samuel Thibault<br/>
<strong>Traitements d'Images sur Architectures Parallèles et Hétérogènes</strong><br/>
<em>Technique et Science Informatiques</em>, 2012<br/>
[<a href="http://hal.inria.fr/hal-00714858/">WWW</a>]
</li>
<li>
851 852 853
<a name="BenkBajMarSanNamThiEuroPar2012"></a>Siegfried Benkner, Enes Bajrovic, Erich Marth, Martin Sandrieser, Raymond Namyst,  and Samuel Thibault<br/>
<strong>High-Level Support for Pipeline Parallelism on Many-Core Architectures</strong><br/>
In <em>Europar - International European Conference on Parallel and Distributed Computing - 2012</em>, Rhodes Island, Grèce, August 2012<br/>
854 855
[<a href="http://hal.inria.fr/hal-00697020">WWW</a>]
[<a href="http://hal.inria.fr/hal-00697020/PDF/europar2012-submitted.pdf">PDF</a>]
856 857
</li>
<li>
858 859 860
<a name="kessler:hal-00776610"></a>Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Träff,  and Sabri Pllana<br/>
<strong>Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems</strong><br/>
In <em>Design, Automation and Test in Europe (DATE)</em>, Dresden, Allemagne, March 2012<br/>
861 862
[<a href="http://hal.inria.fr/hal-00776610">WWW</a>]
[<a href="http://hal.inria.fr/hal-00776610/PDF/date12-paper.pdf">PDF</a>]
863 864
</li>
<li>
865 866 867 868 869 870 871 872
<a name="BenPllTraTsiDolAugBacKesMolOsi11IEEEMicro"></a>Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Uwe Dolinsky, Cédric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney,  and Vitaly Osipov<br/>
<strong>PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems</strong><br/>
<em>IEEE Micro</em>, 31(5):28-41, September 2011<br/>
<strong>ISSN:</strong> 0272-1732<br/>
[<a href="http://hal.inria.fr/hal-00648480">WWW</a>]
[doi:<a href="http://dx.doi.org/10.1109/MM.2011.67">10.1109/MM.2011.67</a>]
</li>
<li>
873 874 875
<a name="AguAugDonFavLanLtaTomAICCSA11"></a>Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief,  and Stanimire Tomov<br/>
<strong>LU factorization for accelerator-based systems</strong><br/>
In <em>9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11)</em>, Sharm El-Sheikh, Egypt, June 2011<br/>
876
[<a href="http://hal.inria.fr/hal-00654193">WWW</a>]
877 878 879 880 881
</li>
<li>
<a name="AguAugDonFavLtaThiTom11IPDPS"></a>Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Mathieu Faverge, Hatem Ltaief, Samuel Thibault,  and Stanimire Tomov<br/>
<strong>QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators</strong><br/>
In <em>25th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2011)</em>, Anchorage, Alaska, USA, May 2011<br/>
882
[<a href="http://hal.inria.fr/inria-00547614">WWW</a>]
883 884 885 886 887 888
[doi:<a href="http://dx.doi.org/10.1109/IPDPS.2011.90">10.1109/IPDPS.2011.90</a>]
</li>
<li>
<a name="DasKesThi11ParCo"></a>Usman Dastgeer, Christoph Kessler,  and Samuel Thibault<br/>
<strong>Flexible runtime support for efficient skeleton programming on hybrid systems</strong><br/>
In <em>Proceedings of the International Conference on Parallel Computing (ParCo), Applications, Tools and Techniques on the Road to Exascale Computing</em>, volume 22 of <em>Advances of Parallel Computing</em>, Gent, Belgium, pages 159-166, August 2011<br/>
889
[<a href="http://hal.inria.fr/inria-00606200/">WWW</a>]
890 891 892 893 894
</li>
<li>
<a name="Hen11Renpar"></a>Sylvain Henry<br/>
<strong>Programmation multi-accélérateurs unifiée en OpenCL</strong><br/>
In <em>20èmes Rencontres Francophones du Parallélisme (RenPar'20)</em>, Saint Malo, France, May 2011<br/>
895
[<a href="http://hal.archives-ouvertes.fr/hal-00643257">WWW</a>]
896 897 898 899 900
</li>
<li>
<a name="MahManAugThi11Renpar20"></a>Sidi Ahmed Mahmoudi, Pierre Manneback, Cédric Augonnet,  and Samuel Thibault<br/>
<strong>Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multicoeurs hétérogènes</strong><br/>
In <em>20èmes Rencontres Francophones du Parallélisme</em>, Saint-Malo / France, May 2011<br/>
901
[<a href="http://hal.inria.fr/inria-00606195">WWW</a>]
902 903
</li>
<li>
904 905 906 907 908 909 910
<a name="AguAugDonLtaNamThiTomGPUgems"></a>Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault,  and Stanimire Tomov<br/>
<strong>A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs</strong><br/>
In Wen-mei W<br/> Hwu, editor, <em>GPU Computing Gems</em>, volume 2<br/>
Morgan Kaufmann, September 2010<br/>
[<a href="http://hal.inria.fr/inria-00547847">WWW</a>]
</li>
<li>
911 912 913
<a name="AguAugDonLtaNamRomThiTom10SAAHPC"></a>Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Jean Roman, Samuel Thibault,  and Stanimire Tomov<br/>
<strong>Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators</strong><br/>
In <em>Symposium on Application Accelerators in High Performance Computing (SAAHPC)</em>, Knoxville, USA, July 2010<br/>
914
[<a href="http://hal.inria.fr/inria-00547616">WWW</a>]
915
</li>
916
</ol>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
917

918
<div class="section bot">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
919
<p class="updated">
920
  Last updated on 2016/04/13.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
921
</p>
922
</div>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
923 924 925

</body>
</html>