index.html 63.9 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
2 3 4 5 6
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
7
<link rel="Shortcut icon" href="http://www.inria.fr/extension/site_inria/design/site_inria/images/favicon.ico" type="image/x-icon" />
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
8 9 10 11
</HEAD>

<body>

12
<div class="title">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
13
<h1><a href="./">StarPU</a></h1>
14 15
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
16

17
<div class="menu">
18
<a href="https://team.inria.fr/storm/">STORM TEAM</a> |
19
&nbsp; &nbsp; &nbsp;
20
|
21
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
22
<a href="#news">News</a> |
23
<a href="#contact">Contact</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
24
<a href="people/">People</a> |
25
<a href="#features">Features</a> |
26
<a href="#software">Software</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
27
<a href="#tryit">Try it!</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
28
<a href="help/">Help</a> |
29
<a href="#publications">Publications</a> |
30
<a href="internships/">Jobs/Interns</a> |
31
<a href="files/">Download</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
32
<a href="market/">Market</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
33
<a href="tutorials/">Tutorials</a> |
34
<a href="http://starpu.gforge.inria.fr/testing/morse/master/">Benchmarks</a> |
35
<a href="https://gforge.inria.fr/plugins/mediawiki/wiki/starpu/index.php/Main_Page">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
36
</div>
37

38 39
<div class="section" id="overview">
<h3>Overview</h3>
40 41 42 43 44 45
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
46
    <li>A graph of tasks, using either the StarPU's high level <b>GCC plugin</b> pragmas, StarPU's rich <b>C/C++ API</b>, or <b>OpenMP pragmas</b>.</li>
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
65 66 67 68 69 70
<span class="note">The StarPU documentation is available in
<a href="./doc/starpu.pdf">PDF</a> and in <a href="./doc/html/">HTML</a>.</span>
Please note that these documents are up-to-date with the latest release of
StarPU.
</p>
<p>
71 72
The latest documentation in <a href="./testing/master/doc/starpu.pdf">PDF</a>
and <a href="./testing/master/doc/html">HTML</a> is updated everyday, but covers
73
the latest developments which may not be available in the latest release.
74 75 76 77
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
78
<h3>News</h3>
79 80 81 82 83 84 85
<p>
January 2020 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.9 release of StarPU is now available!</b></a>.
	The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
	Phi support, an OpenMP runtime support, and a new internal
	communication system for MPI.
</p>
86 87 88 89 90 91 92
<p>
  November
  2019 <b>&raquo;&nbsp;</b>.
  A <a href="/tutorials/2019-11-HPNS-Inria/">StarPU tutorial</a> will
  be given as part of the Inria automn school "High Performance
  Numerical Simulation".
</p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
93
<p>
94 95 96 97 98 99 100 101 102 103 104 105
October
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      release 1.3.3 of StarPU is now
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
106 107 108 109 110 111 112 113 114 115 116 117
June
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      release 1.3.2 of StarPU is now
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
118 119 120 121 122 123 124
May 2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.8 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources. This is really intented to be the last release for the
      branch 1.1.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
125 126 127 128 129 130 131 132 133 134 135 136
April
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      release 1.3.1 of StarPU is now
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
137
March
138
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
139
      release 1.3.0 of StarPU is now
140 141 142 143 144 145 146 147 148
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
149 150 151 152 153 154 155
February 2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.8 release of StarPU is now available!</b></a>.
	The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
	Phi support, an OpenMP runtime support, and a new internal
	communication system for MPI.
	(The release 1.2.7 is broken and should not be used)
</p>
156
</div>
157 158

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
159
Get the latest StarPU news by subscribing to the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-announce">starpu-announce mailing list</a>.
160
See also the full <a href="news/">news</a>.
161 162
</div>

163 164 165 166 167 168 169 170 171 172 173
<div class="section" id="video">
<h3>Video Conference</h3>
<p>
A video recording (26') of a <a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/>presentation at the XDC2014 conference</a> gives an overview of StarPU
(<a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/xdc_starpu.pdf>slides</a>):
</p>
<center>
<iframe width="420" height="315" src="https://www.youtube.com/embed/frsWSqb8UJU" frameborder="0" allowfullscreen></iframe>
</center>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
174
<div class="section" id="tutorial">
175 176 177 178 179 180 181 182 183 184
<h3>Tutorial material</h3>
<p>
The latest tutorial material for StarPU is composed of two parts:
<ul>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/01_introducing_starpu.pdf">Introducing StarPU</a></li>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/02_mastering_starpu.pdf">Mastering StarPU</a></li>
</ul>
</p>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
185 186 187 188 189 190 191 192
<div class="section" id="slides">
<h3>Set of slides</h3>
<p>
A <a href="slides.pdf">set of slides</a> is also available to get an overview of
StarPU.
</p>
</div>

193 194
<div class="section" id="contact">
<h3>Contact</h3>
195
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
196 197 198
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
199
<p>Details of the <a href="people/">StarPU team people</a> are also available.</p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
200 201
</div>

202 203
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
204

205
<h4>Portability</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
206
  <p>
207 208 209 210 211 212 213 214
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
215 216 217 218
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
219 220 221
StarPU can execute them concurrently, e.g. one per socket, provided that the
task implementations support it (which is the case for MKL, but unfortunately
most often not for OpenMP).
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
222 223
  </p>

224 225 226
<h4>Genericity</h4>
  <p>
The StarPU programming interface is very generic. For intance, various data
227
structures are supported mainline (vectors, dense matrices, CSR/BCSR/COO sparse matrices, ...),
228 229 230 231 232 233
but application-specific data structures can also be supported, provided that
the application describes how data is to be transfered (e.g. a series of
contiguous blocks). That was for instance used for hierarchically-compressed
matrices (h-matrices).
  </p>

234
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
235
  <p>
236
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
237
data management library enforces memory coherency over the machine: before a
238 239
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
240 241
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
242
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
243
(including <b>GPU-GPU direct transfers</b>) to achieve the most of the architecture.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
244 245
  </p>

246
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
247
  <p>
248
Dependencies between tasks can be given either of several ways, to provide the
249 250
programmer with best flexibility:
  <ul>
251
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
252
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
253
    tasks (thus including tasks which have not been created yet),</li>
254
    <li><b>explicitly</b> between pairs of tasks,</li>
255
  </ul>
256 257
  </p>
  <p>
258 259 260
  These dependencies are computed in a completely decentralized way, and can be
  introduced completely dynamically as tasks get submitted by the application
  while tasks previously submitted are being executed.
261 262
  </p>
  <p>
263 264 265 266
StarPU also supports an OpenMP-like <a href="doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
  </p>
  <p>
It also supports a <a href="doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
267 268
  </p>

269 270 271 272 273
  <p>
It also supports transparent dependencies tracking between hierarchical subpieces of data
through asynchronous partitioning (new in v1.3).
  </p>

274 275 276
<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
277
portable performances by efficiently (and easily) using all computing resources
278
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
279
machine, for instance by using scheduling strategies based on auto-tuned
280 281 282
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
283
Various strategies and variants are available. Some of them are centralized, but
284
most of them are <b>completely distributed</b>. <i>dmdas</i> (a data-locality-aware MCT strategy,
285
thus similar to heft but starts executing tasks before the whole task graph is
286 287 288
submitted, thus allowing dynamic task submission and a decentralized scheduler,
as well as an energy optimizing extension), <i>eager</i> (dumb centralized
queue), <i>lws</i> (decentralized locality-aware work-stealing), ...
289
The overhead per task is typically around the order of
290 291 292
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
negligible.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
293 294
  </p>

295 296
<h4>Clusters</h4>
  <p>
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318
To deal with clusters, StarPU can nicely integrate with <a
	href="doc/html/MPISupport.html">MPI</a>, through explicit or implicit
support, according to the application's preference.

    <ul>
        <li>Explicit network communication requests can be emitted, which will
then be <b>automatically combined and overlapped</b> with the intra-node data
transfers and computation,
        <li>The application can also just provide the whole task graph, a
data distribution over MPI nodes, and StarPU will automatically determine which
MPI node should execute which task, and <b>automatically generate all required
MPI communications</b> accordingly (new in v0.9). We have gotten excellent
scaling on a 256-node cluster with GPUs, we have not yet had the opportunity
to test on a yet larger cluster. We have however measured that with naive task
submission, it should scale to a thousand nodes, and with pruning-tuned task
submission, it should scale to about a <b>million nodes</b>.
        <li>Starting with v1.3, the application can also just provide the
whole task graph, and let StarPU decide the data distribution and task
distribution, thanks to a master-slave mechanism. This will however by nature
have a more limited scalability than the fully distributed paradigm mentioned
above.
    </ul>
319 320 321 322 323
  </p>

<h4>Out of core</h4>
  <p>
When memory is not big enough for the working set, one may have to resort to
Nathalie Furmento's avatar
Nathalie Furmento committed
324
using disks. StarPU makes this seamless thanks to its <a href="doc/html/OutOfCore.html">out of core support</a> (new in v1.2).
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
325 326
StarPU will <b>automatically evict</b> data from the main memory in advance, and
<b>prefetch back</b> required data before it is needed for tasks.
327 328
  </p>

329
  <!--
330 331 332
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
333
  that <a href="doc/html/cExtensions.html">extends the C programming
334 335
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
336 337
  StarPU program</b> (new in v1.0).
</p>
338 339 340 341 342 343
  -->

<h4>Fortran interface</h4>
<p>
  StarPU comes with native Fortran bindings and examples.
</p>
344

THIBAULT Samuel's avatar
THIBAULT Samuel committed
345 346
<h4>OpenMP 4 -compatible interface</h4>
<p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
347
  <a href=http://kstar.gforge.inria.fr/>K'Star</a> provides an OpenMP
THIBAULT Samuel's avatar
THIBAULT Samuel committed
348 349 350 351 352 353 354 355 356
  4 -compatible interface on top of StarPU. This allows to just rebuild OpenMP
  applications with the K'Star source-to-source compiler, then build it with the
  usual compiler, and the result will use the StarPU runtime.
</p>
<p>
  K'Star also provides some extensions to the OpenMP 4 standard, to let the
  StarPU runtime perform online optimizations.
</p>

357 358 359 360
<h4>OpenCL-compatible interface</h4>
<p>
  StarPU provides an <a href="doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
  which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
361 362
</p>

363 364 365 366
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
367
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1).  This allows
368 369 370 371 372
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

373 374
<h4>All in all</h4>
  <p>
375
All that means that, with the help
376
of <a href="doc/html/cExtensions.html">StarPU's extensions to the C
377 378
language</a>, the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also valid
THIBAULT Samuel's avatar
THIBAULT Samuel committed
379
StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
380
distribution over MPI nodes, it is even a distributed version!
381 382 383 384 385 386 387 388 389 390 391 392 393
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
394

395
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
396
<ul>
397
<li>SMP/Multicore Processors (x86, PPC, ARM, ... all Debian architecture have been tested) </li>
Nathalie Furmento's avatar
Nathalie Furmento committed
398
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in v1.1)</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
399 400
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
Nathalie Furmento's avatar
Nathalie Furmento committed
401 402
<li>Intel SCC (experimental, new in v1.2)</li>
<li>Intel MIC / Xeon Phi (new in v1.2)</li>
403
</ul>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
404

405
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
406
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
407 408
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
409 410 411
<li>Windows</li>
</ul>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
412 413 414 415 416 417 418 419 420 421 422
<h4>Stability</h4>
<p>
StarPU is checked every night with
<ul>
<li>Valgrind / Helgrind</li>
<li>gcc' Address/Leak/Thread/Undefined Sanitizers</li>
<li>cppcheck</li>
<li>Coverity</li>
</ul>
</p>

423
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
424 425 426 427 428 429 430 431 432
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

433
<p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
434 435 436 437 438 439
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
440
<center><a href="./images/greedy-lu-16k-fx5800.png"> <img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
441 442
</p>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
443 444 445 446 447 448
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
449
GFlop/s</b>
450
<center><a href="./images/dmda-lu-16k-fx5800.png"><img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
451 452
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
453 454
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
455
graph, as shown below (new in v1.1).
THIBAULT Samuel's avatar
THIBAULT Samuel committed
456 457 458
</p>

<center>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
459
<a href="images/temanejo.png"><img src="images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
460 461
</center>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
462 463
</div>

464 465 466 467 468
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
469
architectures, here is a non-exhaustive list (feel free to ask to be added to the
470
list!):
471 472 473
</p>

<ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
474
	<li>AL4SAN, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
475
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
476
	<li><a href="http://exa2pro.eu">Exa2pro</a>, Enhancing Programmability and boosting Performance Portability for Exascale Computing Systems</li>
THIBAULT Samuel's avatar
fix URL  
THIBAULT Samuel committed
477
	<li><a href="http://github.com/ecrc/exageostat">ExaGeoStat</a>, Machine learning framework for Climate/Weather prediction applications</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
478
	<li><a href="https://hal.inria.fr/hal-01507613">FLUSEPA</a>, Navier-Stokes Solver for Unsteady Problems with Bodies in Relative Motion</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
479
	<li><a href="http://github.com/ecrc/hicma">HiCMA</a>, Low-rank general linear algebra library</li>
480
	<li>hmat, hierarchical matrix C/C++ library</li>
481
        <li><a href=http://kstar.gforge.inria.fr/>K'Star</a>, OpenMP 4 - compatible interface on top of StarPU.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
482
	<li><a href="http://github.com/ecrc/ksvd">KSVD</a>, dense SVD on distributed-memory manycore systems</li>
483 484
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
	<li><a href="https://gitlab.inria.fr/solverstack/maphys">MaPHyS</a>, Massively Parallel Hybrid Solver</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
485
	<li><a href="http://github.com/ecrc/moao">MOAO</a>, HPC framework for computational astronomy, servicing the European Extremely Large Telescope and the Japanese Subaru Telescope</li>
486
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
487
	<li>PEPPHER, Performance Portability and Programmability for Heterogeneous Many-core Architectures</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
488
	<li><a href="http://github.com/ecrc/qdwh">QDWH</a>, QR-based Dynamically Weighted Halley</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
489
	<li><a href="http://buttari.perso.enseeiht.fr/qr_mumps/">qr_mumps</a>, sparse linear algebra library</li>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
490
	<li><a href="http://scalfmm-public.gforge.inria.fr/doc/">ScalFMM</a>, N-body interaction simulation using the Fast Multipole Method. </li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
491
	<li><a href="https://tel.archives-ouvertes.fr/tel-01410049/">SCHNAPS</a>, Solver for Conservative Hypebolic Non-linear systems Applied to PlasmaS. </li>
492 493
	<li><a href="https://hal.archives-ouvertes.fr/hal-01086246">SignalPU</a>, a Dataflow-Graph-specific programming model. </li>
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
494
	<li><a href="http://github.com/ecrc/stars-h">STARS-H</a>, HPC low-rank matrix market</li>
495
	<li><a href="http://www.xcalablemp.org/">XcalableMP</a>, Directive-based language eXtension for Scalable and performance-aware Parallel Programming</li>
496 497
</ul>

498
<p>
499
You can find <a href="#PublicationsOnApplications">below</a> the list of publications related to applications using StarPU.
500 501
</p>

502 503
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
504 505 506 507
<div class="section" id="tryit">
<h3>Give it a try!</h3>
<p>
You can easily try the performance on the Cholesky factorization for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
508 509 510
instance. Make sure to have the pkg-config and
<a href="http://www.open-mpi.org/projects/hwloc/">hwloc</a>
software installed for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
511 512 513
proper CPU control and BLAS kernels for your computation units and configured in
your environment (e.g. MKL for CPUs and CUBLAS for GPUs).
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
514 515

<tt><pre>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
516 517 518 519 520 521
$ wget http://starpu.gforge.inria.fr/files/starpu-someversion.tar.gz
$ tar xf starpu-someversion.tar.gz
$ cd starpu-someversion
$ ./configure
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
THIBAULT Samuel's avatar
THIBAULT Samuel committed
522
$ STARPU_SCHED=dmdas mpirun -np 4 -machinefile mymachines ./mpi/examples/matrix_decomposition/mpi_cholesky_distributed -size $((960*40*4)) -nblocks $((40*4))</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
523 524 525 526 527

<p>Note that the dmdas scheduler uses performance models, and thus needs
calibration execution before exhibiting optimized performance (until the "model
something is not calibrated enough" messages go away).</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
528 529 530 531 532 533 534 535 536 537 538 539 540 541
<p>To get a glimpse at what happened, you can get an execution trace by
installing
<a href="http://savannah.nongnu.org/projects/fkt">FxT</a>
and <a href="http://vite.gforge.inria.fr/">ViTE</a>, and enabling traces:
</p>

<tt><pre>
$ ./configure --with-fxt
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
$ ./tools/starpu_fxt_tool -i /tmp/prof_file_${USER}_0
$ vite paje.trace
</pre></tt>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
542 543 544 545 546 547 548 549 550 551
<p>
Starting with StarPU 1.1, it is also possible to reproduce the performance that
we show in our articles on our machines, by installing simgrid, and then using
the simulation mode of StarPU using the performance models of our machines:
</p>
  <tt><pre>
$ ./configure --enable-simgrid
$ make -j 12
$ STARPU_PERF_MODEL_DIR=$PWD/tools/perfmodels/sampling STARPU_HOSTNAME=mirage STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
# size	ms	GFlops
THIBAULT Samuel's avatar
THIBAULT Samuel committed
552
38400	9915	1903.7</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
553 554
<p>(MPI simulation is not supported yet)</p>

555 556 557 558
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
559
listed <a href="./publications">here</a>
560 561 562
with the corresponding Bibtex entries.
</p>

563 564
<p>
A good overview is available in
565 566 567
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

568 569
<p>
If you need to cite StarPU, please
570
reference <a href="publications/Year/2011.html#AugThiNamWac11CCPE">[StarPU: A Unified Platform
571 572 573 574 575
    for Task Scheduling on Heterogeneous Multicore Architectures]</a>
for a general presentation. Other sub-sections below will give you
references for more specific aspects of StarPU.
</p>

576
<h4>General Presentations</h4> 
577
<a name="PublicationsGeneralPresentations"></a>
578 579
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
580 581 582 583 584 585 586
<a name="thibault:tel-01959127"></a>Samuel Thibault<br/>
<strong>On Runtime Systems for Task-based Programming on Heterogeneous Platforms</strong><br/>
Habilitation à diriger des recherches, Université de Bordeaux, December 2018<br/>
[<a href="https://hal.inria.fr/tel-01959127">WWW</a>]
[<a href="https://hal.inria.fr/tel-01959127/file/hdr.pdf">PDF</a>]
</li>
<li>
587 588 589
<a name="Aug11Thesis"></a>Cédric Augonnet<br/>
<strong>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2011<br/>
590
[<a href="http://tel.archives-ouvertes.fr/tel-00777154">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
591
[<a href="http://tel.archives-ouvertes.fr/tel-00777154/document">PDF</a>]
592 593
</li>
<li>
594 595
<a name="AugThiNamWac11CCPE"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
596
<em>CCPE - Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011<br/>
597
[<a href="http://hal.inria.fr/inria-00550877">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
598
[<a href="http://hal.inria.fr/inria-00550877/document">PDF</a>]
599
[doi:<a href="http://dx.doi.org/10.1002/cpe.1631">10.1002/cpe.1631</a>]
600
</li>
601
<li>
602 603
<a name="AugThiNamWac10RR7240"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
604
Research Report RR-7240, INRIA, March 2010<br/>
605
[<a href="http://hal.inria.fr/inria-00467677">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
606
[<a href="http://hal.inria.fr/inria-00467677/document">PDF</a>]
607 608
</li>
<li>
609 610 611 612
<a name="Aug09Renpar19"></a>Cédric Augonnet<br/>
<strong>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</strong><br/>
In <em>19èmes Rencontres Francophones du Parallélisme</em>, Toulouse / France, September 2009<br/>
Note: Best Paper Award<br/>
613
[<a href="http://hal.inria.fr/inria-00411581">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
614
[<a href="http://hal.inria.fr/inria-00411581/document">PDF</a>]
615 616
</li>
<li>
617 618
<a name="AugThiNamWac09Europar"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
619
In <em>Euro-Par - 15th International Conference on Parallel Processing</em>, volume 5704 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 863-874, August 2009<br/>
620
Springer<br/>
621
[<a href="http://hal.inria.fr/inria-00384363">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
622
[<a href="http://hal.inria.fr/inria-00384363/document">PDF</a>]
623 624 625 626 627 628 629 630
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03869-3_80">10.1007/978-3-642-03869-3_80</a>]
</li>
<li>
<a name="AugNam08HPPC"></a>Cédric Augonnet and Raymond Namyst<br/>
<strong>A unified runtime system for heterogeneous multicore architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of <em>Lecture Notes in Computer Science</em>, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008<br/>
Springer<br/>
<strong>ISBN:</strong> 978-3-642-00954-9<br/>
631
[<a href="http://hal.inria.fr/inria-00326917">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
632
[<a href="http://hal.inria.fr/inria-00326917/document">PDF</a>]
633 634 635 636 637 638
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-00955-6_22">10.1007/978-3-642-00955-6_22</a>]
</li>
<li>
<a name="Aug08Master"></a>Cédric Augonnet<br/>
<strong>Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes</strong><br/>
Mémoire de DEA, Université Bordeaux 1, June 2008<br/>
639
[<a href="http://hal.inria.fr/inria-00289361">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
640
[<a href="http://hal.inria.fr/inria-00289361/document">PDF</a>]
641 642
</li>
</ol>
643
<h4>On Composability</h4> 
644
<a name="PublicationsOnComposability"></a>
Nathalie Furmento's avatar
Nathalie Furmento committed
645 646
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
647 648 649 650 651 652 653
<a name="hugo:tel-01162975"></a>Andra-Ecaterina Hugo<br/>
<strong>Composability of parallel codes on heterogeneous architectures</strong><br/>
Theses, Université de Bordeaux, December 2014<br/>
[<a href="https://tel.archives-ouvertes.fr/tel-01162975">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01162975/file/HUGO_ANDRA_2014.pdf">PDF</a>]
</li>
<li>
654 655 656
<a name="AH13Renpar"></a>Andra Hugo<br/>
<strong>Le problème de la composition parallèle : une approche supervisée</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
657
[<a href="http://hal.inria.fr/hal-00773610">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
658
[<a href="http://hal.inria.fr/hal-00773610/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
659 660
</li>
<li>
661 662 663
<a name="hugo:hal-00824514"></a>Andra Hugo, Abdou Guermouche, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Composing multiple StarPU applications over heterogeneous machines: a supervised approach</strong><br/>
In <em>Third International Workshop on Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May 2013<br/>
664
[<a href="http://hal.inria.fr/hal-00824514">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
665
[<a href="http://hal.inria.fr/hal-00824514/document">PDF</a>]
666 667 668 669 670
</li>
<li>
<a name="AH11Master"></a>Andra Hugo<br/>
<strong>Composabilité de codes parallèles sur architectures hétérogènes</strong><br/>
Mémoire de Master, Université Bordeaux 1, June 2011<br/>
671
[<a href="http://hal.inria.fr/inria-00619654/en/">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
672
[<a href="http://hal.inria.fr/inria-00619654/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
673 674
</li>
</ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
675 676 677 678 679 680 681 682 683 684 685
<h4>On Parallel Tasks</h4> 
<a name="PublicationsOnParallelTasks"></a>
<ol>
<li>
<a name="cojean:tel-01816341"></a>Terry Cojean<br/>
<strong>Programmation of heterogeneous architectures using moldable tasks</strong><br/>
Theses, Université de Bordeaux, March 2018<br/>
[<a href="https://tel.archives-ouvertes.fr/tel-01816341">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01816341/file/COJEAN_TERRY_2018.pdf">PDF</a>]
</li>
</ol>
686
<h4>On Scheduling</h4> 
687
<a name="PublicationsOnScheduling"></a>
688
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
689
<li>
690 691 692 693 694 695 696 697
<a name="bramas:hal-02120736"></a>Bérenger Bramas<br/>
<strong>Impact study of data locality on task-based applications through the Heteroprio scheduler</strong><br/>
<em>PeerJ Computer Science</em>, May 2019<br/>
[<a href="https://hal.inria.fr/hal-02120736">WWW</a>]
[<a href="https://hal.inria.fr/hal-02120736/file/peerj-cs-190.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.7717/peerj-cs.190">10.7717/peerj-cs.190</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
698 699 700 701 702 703 704 705 706
<a name="leandronesi:hal-02275363"></a>Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic,  and Lucas Mello Schnorr<br/>
<strong>Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms</strong><br/>
In <em>2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)</em>, Larnaca, Cyprus, pages 142-151, May 2019<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-02275363">WWW</a>]
[<a href="https://hal.inria.fr/hal-02275363/file/CCGRID_camera_ready.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/CCGRID.2019.00025">10.1109/CCGRID.2019.00025</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
707 708 709 710 711 712 713
<a name="alias:hal-02421327"></a>Christophe Alias, Samuel Thibault,  and Laure Gonnord<br/>
<strong>A Compiler Algorithm to Guide Runtime Scheduling</strong><br/>
Research Report RR-9315, INRIA Grenoble ; INRIA Bordeaux, December 2019<br/>
[<a href="https://hal.inria.fr/hal-02421327">WWW</a>]
[<a href="https://hal.inria.fr/hal-02421327/file/RR-9315.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
714 715
<a name="garciapinto:hal-01616632"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault,  and Vincent Danjean<br/>
<strong>A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
716
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, 30, April 2018<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
717 718 719 720 721
[<a href="https://hal.inria.fr/hal-01616632">WWW</a>]
[<a href="https://hal.inria.fr/hal-01616632/file/CCPE_article_submitted_2018_02_06.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1002/cpe.4472">10.1002/cpe.4472</a>]
</li>
<li>
722
<a name="pinto:hal-01842038"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Arnaud Legrand, Samuel Thibault, Luka Stanisic,  and Vincent Danjean<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
723
<strong>Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Hìbridos</strong><br/>
THIBAULT Samuel's avatar
typo  
THIBAULT Samuel committed
724
In <em>WPerformance - 17o Workshop em Desempenho de Sistemas Computacionais e de Comunicação</em>, Natal, Brazil, July 2018<br/>
725 726 727 728
[<a href="https://hal.inria.fr/hal-01842038">WWW</a>]
[<a href="https://hal.inria.fr/hal-01842038/file/181587_1.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
729 730
<a name="kumar:tel-01538516"></a>Suraj Kumar<br/>
<strong>Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
731
PhD thesis, Université de Bordeaux, April 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
732 733 734 735
[<a href="https://tel.archives-ouvertes.fr/tel-01538516">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01538516/file/KUMAR_SURAL_2017.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
736 737 738 739 740 741 742 743
<a name="beaumont:hal-01386174"></a>O. Beaumont, L. Eyraud-Dubois,  and S. Kumar<br/>
<strong>Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs</strong><br/>
In <em>2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</em>, pages 768-777, May 2017<br/>
[<a href="https://hal.inria.fr/hal-01386174">WWW</a>]
[<a href="https://hal.inria.fr/hal-01386174/file/heteroPrioApproxProofsRR.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/IPDPS.2017.71">10.1109/IPDPS.2017.71</a>]
</li>
<li>
744 745 746 747 748 749 750 751
<a name="agullo:hal-01223573"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Are Static Schedules so Bad ? A Case Study on Cholesky Factorization</strong><br/>
In <em>IPDPS'16</em>, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, Chicago, IL, United States, May 2016<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-01223573">WWW</a>]
[<a href="https://hal.inria.fr/hal-01223573/file/heteroprioCameraReady-ieeeCompatiable.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
752 753 754 755
<a name="beaumont:hal-01361992"></a>Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche,  and Suraj Kumar<br/>
<strong>Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources</strong><br/>
In <em>International Conference on High Performance Computing, Data, and Analytics (HiPC)</em>, Hyderabad, India, December 2016<br/>
[<a href="https://hal.inria.fr/hal-01361992">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
756
[<a href="https://hal.inria.fr/hal-01361992v2/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
757 758
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
759 760
<a name="cojean:hal-01181135"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines</strong><br/>
761
In <em>HeteroPar'2016 workshop of Euro-Par</em>, Grenoble, France, August 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
762 763
[<a href="https://hal.inria.fr/hal-01181135">WWW</a>]
[<a href="https://hal.inria.fr/hal-01181135/file/papier%20%281%29.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
764 765
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
766 767
<a name="garciapinto:hal-01353962"></a>Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault,  and Vincent Danjean<br/>
<strong>Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
768
In <em>VPA - 3rd Workshop on Visual Performance Analysis</em>, Salt Lake City, United States, November 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
769 770
Note: Held in conjunction with SC16<br/>
[<a href="https://hal.inria.fr/hal-01353962">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
771 772
[<a href="https://hal.inria.fr/hal-01353962v2/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/VPA.2016.008">10.1109/VPA.2016.008</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
773
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
774
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
775 776 777
<a name="JaBlHU2016a"></a>Johan Janzén, David Black-Schaffer,  and Andra Hugo<br/>
<strong>Partitioning GPUs for Improved Scalability</strong><br/>
In <em>IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, October 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
778
[<a href="http://ieeexplore.ieee.org/abstract/document/7789322/">WWW</a>]
779 780 781
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2016.14">10.1109/SBAC-PAD.2016.14</a>]
</li>
<li>
782 783 784 785 786 787 788
<a name="cojean:hal-01409965"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of modern architectures</strong><br/>
Note: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops, November 2016<br/>
[<a href="https://hal.inria.fr/hal-01409965">WWW</a>]
[<a href="https://hal.inria.fr/hal-01409965/file/submission.pdf">PDF</a>]
</li>
<li>
789 790
<a name="agullo:hal-01120507"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal,  and Samuel Thibault<br/>
<strong>Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
791
In <em>HCW'2015 - Heterogeneity in Computing Workshop of IPDPS</em>, Hyderabad, India, May 2015<br/>
792
[<a href="https://hal.inria.fr/hal-01120507">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
793
[<a href="https://hal.inria.fr/hal-01120507/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
794
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2015.35">10.1109/IPDPSW.2015.35</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
795
</li>
796
<li>
797 798 799
<a name="sergent:hal-00978364"></a>Marc Sergent and Simon Archipoff<br/>
<strong>Modulariser les ordonnanceurs de tâches : une approche structurelle</strong><br/>
In <em>Compas'2014</em>, Neuchâtel, Suisse, April 2014<br/>
800 801
[<a href="http://hal.inria.fr/hal-00978364">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978364/PDF/ordonnanceurs_modulaires.pdf">PDF</a>]
802
</li>
803 804 805 806 807 808 809 810
<li>
<a name="AugCleThiNam10ICPADS"></a>Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Data-Aware Task Scheduling on Multi-Accelerator based Platforms</strong><br/>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, Shanghai, China, December 2010<br/>
[<a href="http://hal.inria.fr/inria-00523937">WWW</a>]
[<a href="http://hal.inria.fr/inria-00523937/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2010.129">10.1109/ICPADS.2010.129</a>]
</li>
811
</ol>
812
<h4>On The C Extensions</h4> 
813
<a name="PublicationsOnTheCExtensions"></a>
814
<ol>
815 816 817 818
<li>
<a name="LC13Report"></a>Ludovic Courtès<br/>
<strong>C Language Extensions for Hybrid CPU/GPU Programming with StarPU</strong><br/>
Research Report RR-8278, INRIA, April 2013<br/>
819 820
[<a href="http://hal.inria.fr/hal-00807033">WWW</a>]
[<a href="http://hal.inria.fr/hal-00807033/PDF/RR-8278.pdf">PDF</a>]
821 822
</li>
</ol>
823
<h4>On OpenMP Support on top of StarPU</h4> 
824
<a name="PublicationsOnOpenMPSupportontopofStarPU"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
825
<ol>
826
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
827 828 829 830 831 832 833 834
<a name="agullo:hal-01517153"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
<strong>Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method</strong><br/>
<em>IEEE Transactions on Parallel and Distributed Systems</em>, April 2017<br/>
[<a href="https://hal.inria.fr/hal-01517153">WWW</a>]
[<a href="https://hal.inria.fr/hal-01517153/file/tpds_kstar_scalfmm_print.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2697857">10.1109/TPDS.2017.2697857</a>]
</li>
<li>
835
<a name="agullo:hal-01372022"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
836
<strong>Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method</strong><br/>
837 838 839 840 841
Research Report RR-8953, Inria, March 2016<br/>
[<a href="https://hal.inria.fr/hal-01372022">WWW</a>]
[<a href="https://hal.inria.fr/hal-01372022/file/RR-8953.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
842
<a name="virouleau:hal-01081974"></a>Philippe Virouleau, Pierrick Brunet, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage,  and Thierry Gautier<br/>
843
<strong>Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
844
In <em>IWOMP2014 - 10th International Workshop on OpenMP</em>, 10th International Workshop on OpenMP, IWOMP2014, Salvador, Brazil, France, pages 16 - 29, September 2014<br/>
845
Springer<br/>
846
[<a href="https://hal.inria.fr/hal-01081974">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
847
[<a href="https://hal.inria.fr/hal-01081974/document">PDF</a>]
848
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-11454-5_2">10.1007/978-3-319-11454-5_2</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
849 850
</li>
</ol>
851
<h4>On MPI Support</h4> 
852
<a name="PublicationsOnMPISupport"></a>
853 854
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
855 856 857 858 859 860 861
<a name="lion:hal-02296118"></a>Romain Lion<br/>
<strong>Tolérance aux pannes dans l'exécution distribuée de graphes de tâches</strong><br/>
In <em>Conférence d'informatique en Parallélisme, Architecture et Système</em>, Anglet, France, June 2019<br/>
[<a href="https://hal.inria.fr/hal-02296118">WWW</a>]
[<a href="https://hal.inria.fr/hal-02296118/file/Compas_Romain_LION_submitted_final.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
862
<a name="agullo:hal-01618526"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
863
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
864
<em>TPDS - IEEE Transactions on Parallel and Distributed Systems</em>, December 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
865 866
[<a href="https://hal.inria.fr/hal-01618526">WWW</a>]
[<a href="https://hal.inria.fr/hal-01618526/file/tpds14.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
867
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2766064">10.1109/TPDS.2017.2766064</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
868 869 870 871
</li>
<li>
<a name="sergent:tel-01483666"></a>Marc Sergent<br/>
<strong>Scalability of a task-based runtime system for dense linear algebra applications</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
872
PhD thesis, Université de Bordeaux, December 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
873 874
[<a href="https://tel.archives-ouvertes.fr/tel-01483666">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01483666/file/SERGENT_MARC_2016.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
875 876
</li>
<li>
877 878 879 880 881 882 883
<a name="agullo:hal-01283949"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Harnessing clusters of hybrid nodes with a sequential task-based programming model</strong><br/>
In <em>8th International Workshop on Parallel Matrix Algorithms and Applications</em>, July 2014<br/>
[<a href="https://hal.inria.fr/hal-01283949">WWW</a>]
[<a href="https://hal.inria.fr/hal-01283949/file/pmaa14.pdf">PDF</a>]
</li>
<li>
884 885
<a name="augonnet:hal-00992208"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
886
Research Report RR-8538, INRIA, May 2014<br/>
887 888 889 890
[<a href="http://hal.inria.fr/hal-00992208">WWW</a>]
[<a href="http://hal.inria.fr/hal-00992208/PDF/RR-8538.pdf">PDF</a>]
</li>
<li>
891 892 893 894 895
<a name="AugAumFurNamThi2012EuroMPI"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst,  and Samuel Thibault<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, <em>EuroMPI 2012</em>, volume 7490 of <em>LNCS</em>, September 2012<br/>
Springer<br/>
Note: Poster Session<br/>
896
[<a href="http://hal.inria.fr/hal-00725477">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
897
[<a href="http://hal.inria.fr/hal-00725477/document">PDF</a>]
898
</li>
899
</ol>
Nathalie Furmento's avatar
Nathalie Furmento committed
900
<h4>On Memory Control</h4> 
901
<a name="PublicationsOnMemoryControl"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
902 903
<ol>
<li>
904 905 906 907 908 909 910
<a name="chevalier:hal-01718280"></a>Arthur Chevalier<br/>
<strong>Critical resources management and scheduling under StarPU</strong><br/>
Master's thesis, Université de Bordeaux, September 2017<br/>
[<a href="https://hal.inria.fr/hal-01718280">WWW</a>]
[<a href="https://hal.inria.fr/hal-01718280/file/Memoire.pdf">PDF</a>]
</li>
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
911
<a name="sergent:hal-01284004"></a>Marc Sergent, David Goudin, Samuel Thibault,  and Olivier Aumage<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
912
<strong>Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
913
In <em>HIPS - 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments</em>, Chicago, United States, May 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
914 915
[<a href="https://hal.inria.fr/hal-01284004">WWW</a>]
[<a href="https://hal.inria.fr/hal-01284004/file/PID4127657.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
916
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2016.105">10.1109/IPDPSW.2016.105</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
917 918
</li>
</ol>
919
<h4>On Performance Model Tuning</h4> 
920
<a name="PublicationsOnPerformanceModelTuning"></a>
921 922
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
923 924 925 926 927 928 929
<a name="agullo:hal-01474556"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic,  and Samuel Thibault<br/>
<strong>Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method</strong><br/>
Research Report RR-9036, INRIA Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01474556">WWW</a>]
[<a href="https://hal.inria.fr/hal-01474556/file/rapport.pdf">PDF</a>]
</li>
<li>
930 931
<a name="AugThiNam09HPPC"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
932
In <em>HPPC - Proceedings of the International Euro-Par Workshops, Highly Parallel Processing on a Chip</em>, volume 6043 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 56-65, August 2009<br/>
933
Springer<br/>
934
[<a href="http://hal.inria.fr/inria-00421333">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
935
[<a href="http://hal.inria.fr/inria-00421333/document">PDF</a>]
936
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-14122-5_9">10.1007/978-3-642-14122-5_9</a>]
937 938
</li>
</ol>
939
<h4>On The Simulation Support through SimGrid</h4> 
940
<a name="PublicationsOnTheSimulationSupportthroughSimGrid"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
941
<ol>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
942
<li>
943 944
<a name="stanisic:hal-01147997"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
945
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, pp 16, May 2015<br/>
946 947
[<a href="https://hal.inria.fr/hal-01147997">WWW</a>]
[<a href="https://hal.inria.fr/hal-01147997/file/CCPE14_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
948
[doi:<a href="http://dx.doi.org/10.1002/cpe.3555">10.1002/cpe.3555</a>]
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
949
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
950
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
951 952 953 954 955 956 957
<a name="stanisic:hal-01180272"></a>Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez,  and Brice Videau<br/>
<strong>Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers</strong><br/>
In <em>The 21st IEEE International Conference on Parallel and Distributed Systems</em>, Melbourne, Australia, December 2015<br/>
[<a href="https://hal.inria.fr/hal-01180272">WWW</a>]
[<a href="https://hal.inria.fr/hal-01180272/file/QRMSTARSG_article.pdf">PDF</a>]
</li>
<li>
958 959
<a name="stanisic:hal-01011633"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
960
In <em>Euro-Par - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014<br/>