index.html 54.8 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
2 3 4 5 6
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
7
<link rel="Shortcut icon" href="http://www.inria.fr/extension/site_inria/design/site_inria/images/favicon.ico" type="image/x-icon" />
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
8 9 10 11
</HEAD>

<body>

12
<div class="title">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
13
<h1><a href="./">StarPU</a></h1>
14 15
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
16

17
<div class="menu">
18
<a href="https://team.inria.fr/storm/">STORM TEAM</a> |
19
&nbsp; &nbsp; &nbsp;
20
|
21
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
22
<a href="#news">News</a> |
23
<a href="#contact">Contact</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
24
<a href="people/">People</a> |
25
<a href="#features">Features</a> |
26
<a href="#software">Software</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
27
<a href="#tryit">Try it!</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
28
<a href="help/">Help</a> |
29
<a href="#publications">Publications</a> |
30
<a href="internships/">Jobs/Interns</a> |
31
<a href="files/">Download</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
32
<a href="tutorials/">Tutorials</a> |
33
<a href="https://gforge.inria.fr/plugins/mediawiki/wiki/starpu/index.php/Main_Page">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
34
</div>
35

36 37
<div class="section" id="overview">
<h3>Overview</h3>
38 39 40 41 42 43
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
44
    <li>A graph of tasks, using either the StarPU's high level <b>GCC plugin</b> pragmas, StarPU's rich <b>C/C++ API</b>, or <b>OpenMP pragmas</b>.</li>
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
63 64 65 66 67 68
<span class="note">The StarPU documentation is available in
<a href="./doc/starpu.pdf">PDF</a> and in <a href="./doc/html/">HTML</a>.</span>
Please note that these documents are up-to-date with the latest release of
StarPU.
</p>
<p>
69 70
The latest documentation in <a href="./testing/master/doc/starpu.pdf">PDF</a>
and <a href="./testing/master/doc/html">HTML</a> is updated everyday, but covers
71
the latest developments which may not be available in the latest release.
72 73 74 75
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
76 77
<h3>News</h3>
<p>
78
Avril 2018 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
79 80
      1.2.4 release of StarPU is now available!</b></a>.
      The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
81 82 83 84
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
85 86 87 88 89
March 2018 <b>&raquo;&nbsp;</b>A <a href="https://events.prace-ri.eu/event/681/">tutorial</a>
      "Runtime systems for heterogeneous platform programming" will be
      given at the Maison de la Simulation in June 2018.
</p>
<p>
90 91 92 93 94 95 96
November 2017 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.3 release of StarPU is now available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
97 98 99
May 2017 <b>&raquo;&nbsp;</b> StarPU is part of the HPCLib project
  that aims at performing static analysis of the Inria Bordeaux
  Sud-Ouest solver stack. See
100
  the <a href="https://sonarqube.bordeaux.inria.fr/sonarqube/dashboard?id=storm%3Astarpu%3Arelease%3Av1.2">StarPU
101 102 103
  page</a> on the sonarqube server of INRIA.
</p>
<p>
104 105 106 107 108 109 110
May 2017 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.2 release of StarPU is now available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
111 112 113 114 115
April 2017 <b>&raquo;&nbsp;</b>A <a href="https://events.prace-ri.eu/event/618/">tutorial</a>
      "Runtime systems for heterogeneous platform programming" will be
      given at the Maison de la Simulation in May 2017.
</p>
<p>
116 117 118 119 120 121
March 2017 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.1 release of StarPU is now available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
122
</div>
123 124

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
125
Get the latest StarPU news by subscribing to the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-announce">starpu-announce mailing list</a>.
126
See also the full <a href="news/">news</a>.
127 128
</div>

129 130 131 132 133 134 135 136 137 138 139
<div class="section" id="video">
<h3>Video Conference</h3>
<p>
A video recording (26') of a <a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/>presentation at the XDC2014 conference</a> gives an overview of StarPU
(<a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/xdc_starpu.pdf>slides</a>):
</p>
<center>
<iframe width="420" height="315" src="https://www.youtube.com/embed/frsWSqb8UJU" frameborder="0" allowfullscreen></iframe>
</center>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
140
<div class="section" id="tutorial">
141 142 143 144 145 146 147 148 149 150
<h3>Tutorial material</h3>
<p>
The latest tutorial material for StarPU is composed of two parts:
<ul>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/01_introducing_starpu.pdf">Introducing StarPU</a></li>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/02_mastering_starpu.pdf">Mastering StarPU</a></li>
</ul>
</p>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
151 152 153 154 155 156 157 158
<div class="section" id="slides">
<h3>Set of slides</h3>
<p>
A <a href="slides.pdf">set of slides</a> is also available to get an overview of
StarPU.
</p>
</div>

159 160
<div class="section" id="contact">
<h3>Contact</h3>
161
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
162 163 164
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
165
<p>Details of the <a href="people/">StarPU team people</a> are also available.</p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
166 167
</div>

168 169
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
170

171
<h4>Portability</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
172
  <p>
173 174 175 176 177 178 179 180
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
181 182 183 184
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
185 186
  </p>

187
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
188
  <p>
189
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
190
data management library enforces memory coherency over the machine: before a
191 192
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
193 194
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
195
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
196
(including <b>GPU-GPU direct transfers</b>) to achieve the most of the architecture.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
197 198
  </p>

199
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
200
  <p>
201
Dependencies between tasks can be given either of several ways, to provide the
202 203
programmer with best flexibility:
  <ul>
204
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
205
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
206
    tasks (thus including tasks which have not been created yet),</li>
207
    <li><b>explicitly</b> between pairs of tasks,</li>
208
  </ul>
209 210
  </p>
  <p>
211 212 213
  These dependencies are computed in a completely decentralized way.
  </p>
  <p>
214 215 216 217
StarPU also supports an OpenMP-like <a href="doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
  </p>
  <p>
It also supports a <a href="doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
218 219 220 221 222
  </p>

<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
223
portable performances by efficiently (and easily) using all computing resources
224
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
225
machine, for instance by using scheduling strategies based on auto-tuned
226 227 228
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
229
Various strategies and variants are available. Some of them are centralized, but
THIBAULT Samuel's avatar
THIBAULT Samuel committed
230
most of them are <b>completely distributed</b>. dmda (a data-locality-aware MCT strategy,
231
thus similar to heft but starts executing tasks before the whole task graph is
232 233 234
submitted, thus allowing dynamic task submission and a decentralized scheduler),
eager (dumb centralized queue), decentralized locality-aware work-stealing, ...
The overhead per task is typically around the order of
235 236 237
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
negligible.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
238 239
  </p>

240 241
<h4>Clusters</h4>
  <p>
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263
To deal with clusters, StarPU can nicely integrate with <a
	href="doc/html/MPISupport.html">MPI</a>, through explicit or implicit
support, according to the application's preference.

    <ul>
        <li>Explicit network communication requests can be emitted, which will
then be <b>automatically combined and overlapped</b> with the intra-node data
transfers and computation,
        <li>The application can also just provide the whole task graph, a
data distribution over MPI nodes, and StarPU will automatically determine which
MPI node should execute which task, and <b>automatically generate all required
MPI communications</b> accordingly (new in v0.9). We have gotten excellent
scaling on a 256-node cluster with GPUs, we have not yet had the opportunity
to test on a yet larger cluster. We have however measured that with naive task
submission, it should scale to a thousand nodes, and with pruning-tuned task
submission, it should scale to about a <b>million nodes</b>.
        <li>Starting with v1.3, the application can also just provide the
whole task graph, and let StarPU decide the data distribution and task
distribution, thanks to a master-slave mechanism. This will however by nature
have a more limited scalability than the fully distributed paradigm mentioned
above.
    </ul>
264 265 266 267 268
  </p>

<h4>Out of core</h4>
  <p>
When memory is not big enough for the working set, one may have to resort to
Nathalie Furmento's avatar
Nathalie Furmento committed
269
using disks. StarPU makes this seamless thanks to its <a href="doc/html/OutOfCore.html">out of core support</a> (new in v1.2).
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
270 271
StarPU will <b>automatically evict</b> data from the main memory in advance, and
<b>prefetch back</b> required data before it is needed for tasks.
272 273
  </p>

274 275 276
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
277
  that <a href="doc/html/cExtensions.html">extends the C programming
278 279
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
280 281 282
  StarPU program</b> (new in v1.0).
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
283 284
<h4>OpenMP 4 -compatible interface</h4>
<p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
285
  <a href=http://kstar.gforge.inria.fr/>K'Star</a> provides an OpenMP
THIBAULT Samuel's avatar
THIBAULT Samuel committed
286 287 288 289 290 291 292 293 294
  4 -compatible interface on top of StarPU. This allows to just rebuild OpenMP
  applications with the K'Star source-to-source compiler, then build it with the
  usual compiler, and the result will use the StarPU runtime.
</p>
<p>
  K'Star also provides some extensions to the OpenMP 4 standard, to let the
  StarPU runtime perform online optimizations.
</p>

295 296 297 298
<h4>OpenCL-compatible interface</h4>
<p>
  StarPU provides an <a href="doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
  which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
299 300
</p>

301 302 303 304
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
305
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1).  This allows
306 307 308 309 310
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

311 312
<h4>All in all</h4>
  <p>
313
All that means that, with the help
314
of <a href="doc/html/cExtensions.html">StarPU's extensions to the C
315 316
language</a>, the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also valid
THIBAULT Samuel's avatar
THIBAULT Samuel committed
317
StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
318
distribution over MPI nodes, it is even a distributed version!
319 320 321 322 323 324 325 326 327 328 329 330 331
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
332

333
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
334
<ul>
335
<li>SMP/Multicore Processors (x86, PPC, ARM, ... all Debian architecture have been tested) </li>
Nathalie Furmento's avatar
Nathalie Furmento committed
336
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in v1.1)</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
337 338
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
Nathalie Furmento's avatar
Nathalie Furmento committed
339 340
<li>Intel SCC (experimental, new in v1.2)</li>
<li>Intel MIC / Xeon Phi (new in v1.2)</li>
341
</ul>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
342

343
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
344
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
345 346
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
347 348 349
<li>Windows</li>
</ul>

350
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
351 352 353 354 355 356 357 358 359
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

360
<p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
361 362 363 364 365 366
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
367
<center><a href="./images/greedy-lu-16k-fx5800.png"> <img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
368 369
</p>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
370 371 372 373 374 375
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
376
GFlop/s</b>
377
<center><a href="./images/dmda-lu-16k-fx5800.png"><img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
378 379
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
380 381
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
382
graph, as shown below (new in v1.1).
THIBAULT Samuel's avatar
THIBAULT Samuel committed
383 384 385
</p>

<center>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
386
<a href="images/temanejo.png"><img src="images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
387 388
</center>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
389 390
</div>

391 392 393 394 395
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
396
architectures, here is a non-exhaustive list (feel free to ask to be added to the
397
list!):
398 399 400
</p>

<ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
401
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
THIBAULT Samuel's avatar
fix URL  
THIBAULT Samuel committed
402
	<li><a href="http://github.com/ecrc/exageostat">ExaGeoStat</a>, Machine learning framework for Climate/Weather prediction applications</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
403
	<li><a href="http://github.com/ecrc/hicma">HiCMA</a>, Low-rank general linear algebra library</li>
404
        <li><a href=http://kstar.gforge.inria.fr/>K'Star</a>, OpenMP 4 - compatible interface on top of StarPU.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
405
	<li><a href="http://github.com/ecrc/ksvd">KSVD</a>, dense SVD on distributed-memory manycore systems</li>
406 407
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
	<li><a href="https://gitlab.inria.fr/solverstack/maphys">MaPHyS</a>, Massively Parallel Hybrid Solver</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
408
	<li><a href="http://github.com/ecrc/moao">MOAO</a>, HPC framework for computational astronomy, servicing the European Extremely Large Telescope and the Japanese Subaru Telescope</li>
409
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
410
	<li><a href="http://github.com/ecrc/qdwh">QDWH</a>, QR-based Dynamically Weighted Halley</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
411
	<li><a href="http://buttari.perso.enseeiht.fr/qr_mumps/">qr_mumps</a>, sparse linear algebra library</li>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
412
	<li><a href="http://scalfmm-public.gforge.inria.fr/doc/">ScalFMM</a>, N-body interaction simulation using the Fast Multipole Method. </li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
413
	<li><a href="https://tel.archives-ouvertes.fr/tel-01410049/">SCHNAPS</a>, Solver for Conservative Hypebolic Non-linear systems Applied to PlasmaS. </li>
414 415
	<li><a href="https://hal.archives-ouvertes.fr/hal-01086246">SignalPU</a>, a Dataflow-Graph-specific programming model. </li>
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
416
	<li><a href="http://github.com/ecrc/stars-h">STARS-H</a>, HPC low-rank matrix market</li>
417 418
</ul>

419
<p>
420
You can find <a href="#PublicationsOnApplications">below</a> the list of publications related to applications using StarPU.
421 422
</p>

423 424
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
425 426 427 428
<div class="section" id="tryit">
<h3>Give it a try!</h3>
<p>
You can easily try the performance on the Cholesky factorization for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
429 430 431
instance. Make sure to have the pkg-config and
<a href="http://www.open-mpi.org/projects/hwloc/">hwloc</a>
software installed for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
432 433 434
proper CPU control and BLAS kernels for your computation units and configured in
your environment (e.g. MKL for CPUs and CUBLAS for GPUs).
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
435 436

<tt><pre>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
437 438 439 440 441 442
$ wget http://starpu.gforge.inria.fr/files/starpu-someversion.tar.gz
$ tar xf starpu-someversion.tar.gz
$ cd starpu-someversion
$ ./configure
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
THIBAULT Samuel's avatar
THIBAULT Samuel committed
443
$ STARPU_SCHED=dmdas mpirun -np 4 -machinefile mymachines ./mpi/examples/matrix_decomposition/mpi_cholesky_distributed -size $((960*40*4)) -nblocks $((40*4))</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
444 445 446 447 448

<p>Note that the dmdas scheduler uses performance models, and thus needs
calibration execution before exhibiting optimized performance (until the "model
something is not calibrated enough" messages go away).</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
449 450 451 452 453 454 455 456 457 458 459 460 461 462
<p>To get a glimpse at what happened, you can get an execution trace by
installing
<a href="http://savannah.nongnu.org/projects/fkt">FxT</a>
and <a href="http://vite.gforge.inria.fr/">ViTE</a>, and enabling traces:
</p>

<tt><pre>
$ ./configure --with-fxt
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
$ ./tools/starpu_fxt_tool -i /tmp/prof_file_${USER}_0
$ vite paje.trace
</pre></tt>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
463 464 465 466 467 468 469 470 471 472
<p>
Starting with StarPU 1.1, it is also possible to reproduce the performance that
we show in our articles on our machines, by installing simgrid, and then using
the simulation mode of StarPU using the performance models of our machines:
</p>
  <tt><pre>
$ ./configure --enable-simgrid
$ make -j 12
$ STARPU_PERF_MODEL_DIR=$PWD/tools/perfmodels/sampling STARPU_HOSTNAME=mirage STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
# size	ms	GFlops
THIBAULT Samuel's avatar
THIBAULT Samuel committed
473
38400	9915	1903.7</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
474 475
<p>(MPI simulation is not supported yet)</p>

476 477 478 479
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
480
listed <a href="./publications">here</a>
481 482 483
with the corresponding Bibtex entries.
</p>

484 485
<p>
A good overview is available in
486 487 488
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

489 490
<p>
If you need to cite StarPU, please
491
reference <a href="publications/Year/2011.html#AugThiNamWac11CCPE">[StarPU: A Unified Platform
492 493 494 495 496
    for Task Scheduling on Heterogeneous Multicore Architectures]</a>
for a general presentation. Other sub-sections below will give you
references for more specific aspects of StarPU.
</p>

497
<h4>General Presentations</h4> 
498
<a name="PublicationsGeneralPresentations"></a>
499 500
<ol>
<li>
501 502 503
<a name="Aug11Thesis"></a>Cédric Augonnet<br/>
<strong>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2011<br/>
504
[<a href="http://tel.archives-ouvertes.fr/tel-00777154">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
505
[<a href="http://tel.archives-ouvertes.fr/tel-00777154/document">PDF</a>]
506 507
</li>
<li>
508 509 510
<a name="AugThiNamWac11CCPE"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
<em>Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011<br/>
511
[<a href="http://hal.inria.fr/inria-00550877">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
512
[<a href="http://hal.inria.fr/inria-00550877/document">PDF</a>]
513
[doi:<a href="http://dx.doi.org/10.1002/cpe.1631">10.1002/cpe.1631</a>]
514
</li>
515
<li>
516 517 518 519
<a name="AugThiNamWac10RR7240"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines</strong><br/>
Technical Report 7240, INRIA, March 2010<br/>
[<a href="http://hal.inria.fr/inria-00467677">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
520
[<a href="http://hal.inria.fr/inria-00467677/document">PDF</a>]
521 522
</li>
<li>
523 524 525 526
<a name="Aug09Renpar19"></a>Cédric Augonnet<br/>
<strong>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</strong><br/>
In <em>19èmes Rencontres Francophones du Parallélisme</em>, Toulouse / France, September 2009<br/>
Note: Best Paper Award<br/>
527
[<a href="http://hal.inria.fr/inria-00411581">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
528
[<a href="http://hal.inria.fr/inria-00411581/document">PDF</a>]
529 530
</li>
<li>
531 532 533 534
<a name="AugThiNamWac09Europar"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
In <em>Proceedings of the 15th International Euro-Par Conference</em>, volume 5704 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 863-874, August 2009<br/>
Springer<br/>
535
[<a href="http://hal.inria.fr/inria-00384363">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
536
[<a href="http://hal.inria.fr/inria-00384363/document">PDF</a>]
537 538 539 540 541 542 543 544
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03869-3_80">10.1007/978-3-642-03869-3_80</a>]
</li>
<li>
<a name="AugNam08HPPC"></a>Cédric Augonnet and Raymond Namyst<br/>
<strong>A unified runtime system for heterogeneous multicore architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of <em>Lecture Notes in Computer Science</em>, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008<br/>
Springer<br/>
<strong>ISBN:</strong> 978-3-642-00954-9<br/>
545
[<a href="http://hal.inria.fr/inria-00326917">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
546
[<a href="http://hal.inria.fr/inria-00326917/document">PDF</a>]
547 548 549 550 551 552
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-00955-6_22">10.1007/978-3-642-00955-6_22</a>]
</li>
<li>
<a name="Aug08Master"></a>Cédric Augonnet<br/>
<strong>Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes</strong><br/>
Mémoire de DEA, Université Bordeaux 1, June 2008<br/>
553
[<a href="http://hal.inria.fr/inria-00289361">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
554
[<a href="http://hal.inria.fr/inria-00289361/document">PDF</a>]
555 556
</li>
</ol>
557
<h4>On Composability</h4> 
558
<a name="PublicationsOnComposability"></a>
Nathalie Furmento's avatar
Nathalie Furmento committed
559 560
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
561 562 563 564 565 566 567
<a name="hugo:tel-01162975"></a>Andra-Ecaterina Hugo<br/>
<strong>Composability of parallel codes on heterogeneous architectures</strong><br/>
Theses, Université de Bordeaux, December 2014<br/>
[<a href="https://tel.archives-ouvertes.fr/tel-01162975">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01162975/file/HUGO_ANDRA_2014.pdf">PDF</a>]
</li>
<li>
568 569 570
<a name="AH13Renpar"></a>Andra Hugo<br/>
<strong>Le problème de la composition parallèle : une approche supervisée</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
571
[<a href="http://hal.inria.fr/hal-00773610">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
572
[<a href="http://hal.inria.fr/hal-00773610/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
573 574
</li>
<li>
575 576 577
<a name="hugo:hal-00824514"></a>Andra Hugo, Abdou Guermouche, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Composing multiple StarPU applications over heterogeneous machines: a supervised approach</strong><br/>
In <em>Third International Workshop on Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May 2013<br/>
578
[<a href="http://hal.inria.fr/hal-00824514">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
579
[<a href="http://hal.inria.fr/hal-00824514/document">PDF</a>]
580 581 582 583 584
</li>
<li>
<a name="AH11Master"></a>Andra Hugo<br/>
<strong>Composabilité de codes parallèles sur architectures hétérogènes</strong><br/>
Mémoire de Master, Université Bordeaux 1, June 2011<br/>
585
[<a href="http://hal.inria.fr/inria-00619654/en/">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
586
[<a href="http://hal.inria.fr/inria-00619654/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
587 588
</li>
</ol>
589
<h4>On Scheduling</h4> 
590
<a name="PublicationsOnScheduling"></a>
591
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
592
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
593 594
<a name="kumar:tel-01538516"></a>Suraj Kumar<br/>
<strong>Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
595
PhD thesis, Université de Bordeaux, April 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
596 597 598 599
[<a href="https://tel.archives-ouvertes.fr/tel-01538516">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01538516/file/KUMAR_SURAL_2017.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
600 601 602 603 604 605 606
<a name="garciapinto:hal-01616632"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault,  and Vincent Danjean<br/>
<strong>A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters</strong><br/>
Note: Working paper or preprint, October 2017<br/>
[<a href="https://hal.inria.fr/hal-01616632">WWW</a>]
[<a href="https://hal.inria.fr/hal-01616632/file/CCPE_article_submitted_2017_09_29-gz.pdf">PDF</a>]
</li>
<li>
607 608 609 610 611 612 613 614
<a name="agullo:hal-01223573"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Are Static Schedules so Bad ? A Case Study on Cholesky Factorization</strong><br/>
In <em>IPDPS'16</em>, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, Chicago, IL, United States, May 2016<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-01223573">WWW</a>]
[<a href="https://hal.inria.fr/hal-01223573/file/heteroprioCameraReady-ieeeCompatiable.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
615 616 617 618
<a name="beaumont:hal-01361992"></a>Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche,  and Suraj Kumar<br/>
<strong>Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources</strong><br/>
In <em>International Conference on High Performance Computing, Data, and Analytics (HiPC)</em>, Hyderabad, India, December 2016<br/>
[<a href="https://hal.inria.fr/hal-01361992">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
619
[<a href="https://hal.inria.fr/hal-01361992v2/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
620 621
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
622 623
<a name="cojean:hal-01181135"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines</strong><br/>
624
In <em>HeteroPar'2016 workshop of Euro-Par</em>, Grenoble, France, August 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
625 626
[<a href="https://hal.inria.fr/hal-01181135">WWW</a>]
[<a href="https://hal.inria.fr/hal-01181135/file/papier%20%281%29.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
627 628
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
629 630 631 632 633 634 635
<a name="garciapinto:hal-01353962"></a>Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault,  and Vincent Danjean<br/>
<strong>Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach</strong><br/>
In <em>3rd Workshop on Visual Performance Analysis (VPA)</em>, Salt Lake City, United States, November 2016<br/>
Note: Held in conjunction with SC16<br/>
[<a href="https://hal.inria.fr/hal-01353962">WWW</a>]
[<a href="https://hal.inria.fr/hal-01353962/file/VPA_2016_paper_3.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
636
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
637 638 639
<a name="JaBlHU2016a"></a>Johan Janzén, David Black-Schaffer,  and Andra Hugo<br/>
<strong>Partitioning GPUs for Improved Scalability</strong><br/>
In <em>IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, October 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
640
[<a href="http://ieeexplore.ieee.org/abstract/document/7789322/">WWW</a>]
641 642 643
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2016.14">10.1109/SBAC-PAD.2016.14</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
644 645 646 647 648 649
<a name="beaumont:hal-01386174"></a>Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs</strong><br/>
Note: Working paper or preprint, October 2016<br/>
[<a href="https://hal.inria.fr/hal-01386174">WWW</a>]
[<a href="https://hal.inria.fr/hal-01386174/file/heteroPrioApproxProofsRR.pdf">PDF</a>]
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
650
<li>
651 652 653 654 655 656 657
<a name="cojean:hal-01409965"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of modern architectures</strong><br/>
Note: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops, November 2016<br/>
[<a href="https://hal.inria.fr/hal-01409965">WWW</a>]
[<a href="https://hal.inria.fr/hal-01409965/file/submission.pdf">PDF</a>]
</li>
<li>
658 659 660
<a name="agullo:hal-01120507"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal,  and Samuel Thibault<br/>
<strong>Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms</strong><br/>
In <em>Heterogeneity in Computing Workshop 2015</em>, Hyderabad, India, May 2015<br/>
661
[<a href="https://hal.inria.fr/hal-01120507">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
662
[<a href="https://hal.inria.fr/hal-01120507/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
663
</li>
664
<li>
665 666 667
<a name="sergent:hal-00978364"></a>Marc Sergent and Simon Archipoff<br/>
<strong>Modulariser les ordonnanceurs de tâches : une approche structurelle</strong><br/>
In <em>Compas'2014</em>, Neuchâtel, Suisse, April 2014<br/>
668 669
[<a href="http://hal.inria.fr/hal-00978364">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978364/PDF/ordonnanceurs_modulaires.pdf">PDF</a>]
670 671
</li>
</ol>
672
<h4>On The C Extensions</h4> 
673
<a name="PublicationsOnTheCExtensions"></a>
674
<ol>
675 676 677 678
<li>
<a name="LC13Report"></a>Ludovic Courtès<br/>
<strong>C Language Extensions for Hybrid CPU/GPU Programming with StarPU</strong><br/>
Research Report RR-8278, INRIA, April 2013<br/>
679 680
[<a href="http://hal.inria.fr/hal-00807033">WWW</a>]
[<a href="http://hal.inria.fr/hal-00807033/PDF/RR-8278.pdf">PDF</a>]
681 682
</li>
</ol>
683
<h4>On OpenMP Support on top of StarPU</h4> 
684
<a name="PublicationsOnOpenMPSupportontopofStarPU"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
685
<ol>
686
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
687 688 689 690 691 692 693 694
<a name="agullo:hal-01517153"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
<strong>Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method</strong><br/>
<em>IEEE Transactions on Parallel and Distributed Systems</em>, April 2017<br/>
[<a href="https://hal.inria.fr/hal-01517153">WWW</a>]
[<a href="https://hal.inria.fr/hal-01517153/file/tpds_kstar_scalfmm_print.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2697857">10.1109/TPDS.2017.2697857</a>]
</li>
<li>
695
<a name="agullo:hal-01372022"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
696
<strong>Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method</strong><br/>
697 698 699 700 701
Research Report RR-8953, Inria, March 2016<br/>
[<a href="https://hal.inria.fr/hal-01372022">WWW</a>]
[<a href="https://hal.inria.fr/hal-01372022/file/RR-8953.pdf">PDF</a>]
</li>
<li>
702 703 704 705
<a name="virouleau:hal-01081974"></a>Philippe Virouleau, Pierrick BRUNET, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage,  and Thierry Gautier<br/>
<strong>Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite</strong><br/>
In <em>10th International Workshop on OpenMP, IWOMP2014</em>, 10th International Workshop on OpenMP, IWOMP2014, Salvador, Brazil, France, pages 16 - 29, September 2014<br/>
Springer<br/>
706
[<a href="https://hal.inria.fr/hal-01081974">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
707
[<a href="https://hal.inria.fr/hal-01081974/document">PDF</a>]
708
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-11454-5_2">10.1007/978-3-319-11454-5_2</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
709 710
</li>
</ol>
711
<h4>On MPI Support</h4> 
712
<a name="PublicationsOnMPISupport"></a>
713 714
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
715
<a name="agullo:hal-01618526"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
716
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
717 718 719
<em>IEEE Transactions on Parallel and Distributed Systems</em>, 2017<br/>
[<a href="https://hal.inria.fr/hal-01618526">WWW</a>]
[<a href="https://hal.inria.fr/hal-01618526/file/tpds14.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
720 721 722 723
</li>
<li>
<a name="sergent:tel-01483666"></a>Marc Sergent<br/>
<strong>Scalability of a task-based runtime system for dense linear algebra applications</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
724
PhD thesis, Université de Bordeaux, December 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
725 726
[<a href="https://tel.archives-ouvertes.fr/tel-01483666">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01483666/file/SERGENT_MARC_2016.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
727 728
</li>
<li>
729 730 731 732 733 734 735
<a name="agullo:hal-01283949"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Harnessing clusters of hybrid nodes with a sequential task-based programming model</strong><br/>
In <em>8th International Workshop on Parallel Matrix Algorithms and Applications</em>, July 2014<br/>
[<a href="https://hal.inria.fr/hal-01283949">WWW</a>]
[<a href="https://hal.inria.fr/hal-01283949/file/pmaa14.pdf">PDF</a>]
</li>
<li>
736 737 738 739 740 741 742
<a name="augonnet:hal-00992208"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
Rapport de recherche RR-8538, INRIA, May 2014<br/>
[<a href="http://hal.inria.fr/hal-00992208">WWW</a>]
[<a href="http://hal.inria.fr/hal-00992208/PDF/RR-8538.pdf">PDF</a>]
</li>
<li>
743 744 745 746 747
<a name="AugAumFurNamThi2012EuroMPI"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst,  and Samuel Thibault<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, <em>EuroMPI 2012</em>, volume 7490 of <em>LNCS</em>, September 2012<br/>
Springer<br/>
Note: Poster Session<br/>
748
[<a href="http://hal.inria.fr/hal-00725477">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
749
[<a href="http://hal.inria.fr/hal-00725477/document">PDF</a>]
750
</li>
751
</ol>
Nathalie Furmento's avatar
Nathalie Furmento committed
752
<h4>On Memory Control</h4> 
753
<a name="PublicationsOnMemoryControl"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
754 755
<ol>
<li>
756 757 758 759 760 761 762
<a name="chevalier:hal-01718280"></a>Arthur Chevalier<br/>
<strong>Critical resources management and scheduling under StarPU</strong><br/>
Master's thesis, Université de Bordeaux, September 2017<br/>
[<a href="https://hal.inria.fr/hal-01718280">WWW</a>]
[<a href="https://hal.inria.fr/hal-01718280/file/Memoire.pdf">PDF</a>]
</li>
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
763
<a name="sergent:hal-01284004"></a>Marc Sergent, David Goudin, Samuel Thibault,  and Olivier Aumage<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
764
<strong>Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System</strong><br/>
Nathalie Furmento's avatar
Nathalie Furmento committed
765
In <em>21st International Workshop on High-Level Parallel Programming Models and Supportive Environments</em>, Chicago, United States, May 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
766 767 768 769
[<a href="https://hal.inria.fr/hal-01284004">WWW</a>]
[<a href="https://hal.inria.fr/hal-01284004/file/PID4127657.pdf">PDF</a>]
</li>
</ol>
770
<h4>On Data Transfer Management</h4> 
771
<a name="PublicationsOnDataTransferManagement"></a>
772 773
<ol>
<li>
774 775 776
<a name="AugCleThiNam10ICPADS"></a>Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Data-Aware Task Scheduling on Multi-Accelerator based Platforms</strong><br/>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, Shanghai, China, December 2010<br/>
777
[<a href="http://hal.inria.fr/inria-00523937">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
778
[<a href="http://hal.inria.fr/inria-00523937/document">PDF</a>]
779
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2010.129">10.1109/ICPADS.2010.129</a>]
780 781
</li>
</ol>
782
<h4>On Performance Model Tuning</h4> 
783
<a name="PublicationsOnPerformanceModelTuning"></a>
784 785
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
786 787 788 789 790 791 792
<a name="agullo:hal-01474556"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic,  and Samuel Thibault<br/>
<strong>Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method</strong><br/>
Research Report RR-9036, INRIA Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01474556">WWW</a>]
[<a href="https://hal.inria.fr/hal-01474556/file/rapport.pdf">PDF</a>]
</li>
<li>
793 794 795 796
<a name="AugThiNam09HPPC"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2009, HPPC'09</em>, volume 6043 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 56-65, August 2009<br/>
Springer<br/>
797
[<a href="http://hal.inria.fr/inria-00421333">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
798
[<a href="http://hal.inria.fr/inria-00421333/document">PDF</a>]
799
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-14122-5_9">10.1007/978-3-642-14122-5_9</a>]
800 801
</li>
</ol>
802
<h4>On The Simulation Support through SimGrid</h4> 
803
<a name="PublicationsOnTheSimulationSupportthroughSimGrid"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
804
<ol>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
805
<li>
806 807 808
<a name="stanisic:hal-01147997"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
<em>Concurrency and Computation: Practice and Experience</em>, pp 16, May 2015<br/>
809 810
[<a href="https://hal.inria.fr/hal-01147997">WWW</a>]
[<a href="https://hal.inria.fr/hal-01147997/file/CCPE14_article.pdf">PDF</a>]
811
[doi:<a href="http://dx.doi.org/10.1002/cpe">10.1002/cpe</a>]
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
812
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
813
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
814 815 816 817 818 819 820
<a name="stanisic:hal-01180272"></a>Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez,  and Brice Videau<br/>
<strong>Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers</strong><br/>
In <em>The 21st IEEE International Conference on Parallel and Distributed Systems</em>, Melbourne, Australia, December 2015<br/>
[<a href="https://hal.inria.fr/hal-01180272">WWW</a>]
[<a href="https://hal.inria.fr/hal-01180272/file/QRMSTARSG_article.pdf">PDF</a>]
</li>
<li>
821 822 823 824
<a name="stanisic:hal-01011633"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
In <em>Euro-par - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014<br/>
Springer-Verlag<br/>
825 826
[<a href="http://hal.inria.fr/hal-01011633">WWW</a>]
[<a href="http://hal.inria.fr/hal-01011633/PDF/StarPUSG_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
827 828
</li>
</ol>
829
<h4>On The Cell Support</h4> 
830
<a name="PublicationsOnTheCellSupport"></a>
831 832
<ol>
<li>
833 834 835
<a name="AugThiNamNij09Samos"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Maik Nijhuis<br/>
<strong>Exploiting the Cell/BE architecture with the StarPU unified runtime system</strong><br/>
In <em>SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation</em>, volume 5657 of <em>Lecture Notes in Computer Science</em>, Samos, Greece, July 2009<br/>
836
[<a href="http://hal.inria.fr/inria-00378705">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
837
[<a href="http://hal.inria.fr/inria-00378705/document">PDF</a>]
838
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03138-0_36">10.1007/978-3-642-03138-0_36</a>]
839 840
</li>
</ol>
841
<h4>On Applications</h4> 
842
<a name="PublicationsOnApplications"></a>
843
<ol>
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
844
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
845 846 847 848 849 850 851 852
<a name="couteyencarpaye:hal-01507613"></a>Jean Marie Couteyen Carpaye, Jean Roman,  and Pierre Brenner<br/>
<strong>Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping</strong><br/>
<em>International Journal of Computational Science and Engineering</em>, pp 1 - 22, 2017<br/>
[<a href="https://hal.inria.fr/hal-01507613">WWW</a>]
[<a href="https://hal.inria.fr/hal-01507613/file/flusepa-task-hal-inria-preprint.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1016/j.jocs.2017.03.008">10.1016/j.jocs.2017.03.008</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
853 854 855 856 857 858 859
<a name="agullo:hal-01473475"></a>Emmanuel Agullo, Alfredo Buttari, Mikko Byckling, Abdou Guermouche,  and Ian Masliah<br/>
<strong>Achieving high-performance with a sparse direct solver on Intel KNL</strong><br/>
Research Report RR-9035, Inria Bordeaux Sud-Ouest ; CNRS-IRIT ; Intel corporation ; Université Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01473475">WWW</a>]
[<a href="https://hal.inria.fr/hal-01473475/file/RR-9035.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
860 861 862 863 864 865 866
<a name="agullo:hal-01387482"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz,  and Luka Stanisic<br/>
<strong>Task-based fast multipole method for clusters of multicore processors</strong><br/>
Research Report RR-8970, Inria Bordeaux Sud-Ouest, October 2016<br/>
[<a href="https://hal.inria.fr/hal-01387482">WWW</a>]
[<a href="https://hal.inria.fr/hal-01387482/file/report-8970.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
867 868 869 870 871 872 873
<a name="agullo:hal-01316982"></a>E Agullo, L Giraud, A Guermouche, S Nakov,  and Jean Roman<br/>
<strong>Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures</strong><br/>
Research Report 8912, Inria Bordeaux Sud-Ouest, May 2016<br/>
[<a href="https://hal.inria.fr/hal-01316982">WWW</a>]
[<a href="https://hal.inria.fr/hal-01316982/file/RR-8912.pdf">PDF</a>]
</li>
<li>
874 875 876
<a name="rossignon:tel-01230876"></a>Corentin Rossignon<br/>
<strong>A fine grain model programming for parallelization of sparse linear solver</strong><br/>
PhD thesis, Université de Bordeaux, July 2015<br/>
877 878
[<a href="https://tel.archives-ouvertes.fr/tel-01230876">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01230876/file/ROSSIGNON_CORENTIN_2015.pdf">PDF</a>]
THIBAULT Samuel's avatar
update  
THIBAULT Samuel committed
879
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
880
<li>
881 882 883
<a name="MaMiDuAuThiAoNa15"></a>Vìctor Martìnez, David Michéa, Fabrice Dupros, Olivier Aumage, Samuel Thibault, Hideo Aochi,  and Philippe Olivier Alexandre Navaux<br/>
<strong>Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system</strong><br/>
In <em>27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, Florianopolis, Brazil, October 2015<br/>
884 885
[<a href="https://hal.inria.fr/hal-01182746">WWW</a>]
[<a href="https://hal.inria.fr/hal-01182746/file/sbac2015_soumission.pdf">PDF</a>]
886 887
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
888 889 890 891 892 893 894 895
<a name="agullo:hal-00911856"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner,  and Toru Takahashi<br/>
<strong>Task-Based FMM for Multicore Architectures</strong><br/>
<em>SIAM Journal on Scientific Computing</em>, 36(1):66-93, 2014<br/>
[<a href="https://hal.inria.fr/hal-00911856">WWW</a>]
[<a href="https://hal.inria.fr/hal-00911856/file/sisc-cpu.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1137/130915662">10.1137/130915662</a>]
</li>
<li>
896 897 898 899
<a name="sylvain:hal-01005765"></a>Sylvain Henry, Alexandre Denis, Denis Barthou, Marie-Christine Counilh,  and Raymond Namyst<br/>
<strong>Toward OpenCL Automatic Multi-Device Support</strong><br/>
In Fernando Silva, Ines Dutra,  and Vitor Santos Costa, editors, <em>Euro-Par 2014</em>, Porto, Portugal, August 2014<br/>
Springer<br/>
900 901
[<a href="http://hal.inria.fr/hal-01005765">WWW</a>]
[<a href="http://hal.inria.fr/hal-01005765/PDF/final.pdf">PDF</a>]
902 903
</li>
<li>
904 905 906 907 908
<a name="lacoste:hal-00987094"></a>Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault,  and George Bosilca<br/>
<strong>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</strong><br/>
In <em>HCW'2014 workshop of IPDPS</em>, Phoenix, États-Unis, May 2014<br/>
IEEE<br/>
Note: RR-8446 RR-8446<br/>
909 910
[<a href="http://hal.inria.fr/hal-00987094">WWW</a>]
[<a href="http://hal.inria.fr/hal-00987094/PDF/sparsegpus.pdf">PDF</a>]
911 912
</li>
<li>
913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929
<a name="lacoste:hal-00925017"></a>Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault,  and George Bosilca<br/>
<strong>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</strong><br/>
Rapport de recherche RR-8446, INRIA, January 2014<br/>
[<a href="http://hal.inria.fr/hal-00925017">WWW</a>]
[<a href="http://hal.inria.fr/hal-00925017/PDF/RR-8446.pdf">PDF</a>]
</li>
<li>
<a name="sergent:hal-00978602"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Overview of Distributed Linear Algebra on Hybrid Nodes over the StarPU Runtime</strong><br/>
SIAM Conference on Parallel Processing for Scientific Computing, February 2014<br/>
[<a href="http://hal.inria.fr/hal-00978602">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978602/PDF/siampp14.pdf">PDF</a>]
</li>
<li>
<a name="Bor13Thesis"></a>Cyril Bordage<br/>
<strong>Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2013<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
930 931
[<a href="https://tel.archives-ouvertes.fr/tel-00958494">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-00958494/document">PDF</a>]
932 933 934 935 936 937
</li>
<li>
<a name="Hen13Thesis"></a>Sylvain Henry<br/>
<strong>Modèles de programmation et supports exécutifs pour architectures hétérogènes</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, November 2013<br/>
[<a href="http://tel.archives-ouvertes.fr/tel-00948309">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
938
[<a href="http://tel.archives-ouvertes.fr/tel-00948309/document">PDF</a>]
939 940
</li>
<li>
941 942 943
<a name="hen13fhpc"></a>Sylvain Henry<br/>
<strong>ViperVM: a Runtime System for Parallel Functional High-Performance Computing on Heterogeneous Architectures</strong><br/>
In <em>2nd Workshop on Functional High-Performance Computing (FHPC'13)</em>, Boston, États-Unis, September 2013<br/>
944 945
[<a href="http://hal.inria.fr/hal-00851122">WWW</a>]
[<a href="http://hal.inria.fr/hal-00851122/PDF/fhpc13.pdf">PDF</a>]
946 947
</li>
<li>
948 949 950
<a name="odajima:hal-00920915"></a>Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, Toshihiro Hanawa, Yuetsu Kodama, Raymond Namyst, Samuel Thibault,  and Olivier Aumage<br/>
<strong>Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing</strong><br/>
In <em>The 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013)</em>, Vietri sul Mare, Italie, December 2013<br/>
951 952
[<a href="http://hal.inria.fr/hal-00920915">WWW</a>]
[<a href="http://hal.inria.fr/hal-00920915/PDF/ADPC2013-117.pdf">PDF</a>]
953 954 955 956 957 958
</li>
<li>
<a name="ohshima:hal-00926144"></a>Satoshi Ohshima, Satoshi Katagiri, Kengo Nakajima, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Implementation of FEM Application on GPU with StarPU</strong><br/>
In <em>SIAM CSE13 - SIAM Conference on Computational Science and Engineering 2013</em>, Boston, États-Unis, February 2013<br/>
SIAM<br/>
959
[<a href="http://hal.inria.fr/hal-00926144">WWW</a>]
960 961 962 963 964
</li>
<li>
<a name="Ros13Renpar"></a>Corentin Rossignon<br/>
<strong>Optimisation du produit matrice-vecteur creux sur architecture GPU pour un simulateur de reservoir</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>