index.html 62.4 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
2
3
4
5
6
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
7
<link rel="Shortcut icon" href="http://www.inria.fr/extension/site_inria/design/site_inria/images/favicon.ico" type="image/x-icon" />
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
8
9
10
11
</HEAD>

<body>

12
<div class="title">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
13
<h1><a href="./">StarPU</a></h1>
14
15
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
16

17
<div class="menu">
18
<a href="https://team.inria.fr/storm/">STORM TEAM</a> |
19
&nbsp; &nbsp; &nbsp;
20
|
21
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
22
<a href="#news">News</a> |
23
<a href="#contact">Contact</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
24
<a href="people/">People</a> |
25
<a href="#features">Features</a> |
26
<a href="#software">Software</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
27
<a href="#tryit">Try it!</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
28
<a href="help/">Help</a> |
29
<a href="#publications">Publications</a> |
30
<a href="internships/">Jobs/Interns</a> |
31
<a href="files/">Download</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
32
<a href="market/">Market</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
33
<a href="tutorials/">Tutorials</a> |
34
<a href="https://gforge.inria.fr/plugins/mediawiki/wiki/starpu/index.php/Main_Page">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
35
</div>
36

37
38
<div class="section" id="overview">
<h3>Overview</h3>
39
40
41
42
43
44
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
45
    <li>A graph of tasks, using either the StarPU's high level <b>GCC plugin</b> pragmas, StarPU's rich <b>C/C++ API</b>, or <b>OpenMP pragmas</b>.</li>
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
64
65
66
67
68
69
<span class="note">The StarPU documentation is available in
<a href="./doc/starpu.pdf">PDF</a> and in <a href="./doc/html/">HTML</a>.</span>
Please note that these documents are up-to-date with the latest release of
StarPU.
</p>
<p>
70
71
The latest documentation in <a href="./testing/master/doc/starpu.pdf">PDF</a>
and <a href="./testing/master/doc/html">HTML</a> is updated everyday, but covers
72
the latest developments which may not be available in the latest release.
73
74
75
76
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
77
78
<h3>News</h3>
<p>
79
80
81
82
83
84
85
86
87
88
89
90
June
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      release 1.3.2 of StarPU is now
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
91
92
93
94
95
96
97
May 2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.8 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources. This is really intented to be the last release for the
      branch 1.1.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
98
99
100
101
102
103
104
105
106
107
108
109
April
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      release 1.3.1 of StarPU is now
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
110
March
111
2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
112
      release 1.3.0 of StarPU is now
113
114
115
116
117
118
119
120
121
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
<p>
122
123
124
125
126
127
128
129
February 2019 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.8 release of StarPU is now available!</b></a>.
	The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
	Phi support, an OpenMP runtime support, and a new internal
	communication system for MPI.
	(The release 1.2.7 is broken and should not be used)
</p>
<p>
130
131
132
133
134
135
136
September 2018 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.6 release of StarPU is now available!</b></a>.
      The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
137
138
139
140
141
142
143
August 2018 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      1.2.5 release of StarPU is now available!</b></a>.
      The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
144
Avril 2018 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
145
146
      1.2.4 release of StarPU is now available!</b></a>.
      The 1.2 release serie notably brings an out-of-core support, a MIC Xeon
147
148
149
150
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
151
152
153
154
March 2018 <b>&raquo;&nbsp;</b>A <a href="https://events.prace-ri.eu/event/681/">tutorial</a>
      "Runtime systems for heterogeneous platform programming" will be
      given at the Maison de la Simulation in June 2018.
</p>
155
</div>
156
157

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
158
Get the latest StarPU news by subscribing to the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-announce">starpu-announce mailing list</a>.
159
See also the full <a href="news/">news</a>.
160
161
</div>

162
163
164
165
166
167
168
169
170
171
172
<div class="section" id="video">
<h3>Video Conference</h3>
<p>
A video recording (26') of a <a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/>presentation at the XDC2014 conference</a> gives an overview of StarPU
(<a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/xdc_starpu.pdf>slides</a>):
</p>
<center>
<iframe width="420" height="315" src="https://www.youtube.com/embed/frsWSqb8UJU" frameborder="0" allowfullscreen></iframe>
</center>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
173
<div class="section" id="tutorial">
174
175
176
177
178
179
180
181
182
183
<h3>Tutorial material</h3>
<p>
The latest tutorial material for StarPU is composed of two parts:
<ul>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/01_introducing_starpu.pdf">Introducing StarPU</a></li>
<li><a href="http://starpu.gforge.inria.fr/tutorials/2016-06-PATC/slides/02_mastering_starpu.pdf">Mastering StarPU</a></li>
</ul>
</p>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
184
185
186
187
188
189
190
191
<div class="section" id="slides">
<h3>Set of slides</h3>
<p>
A <a href="slides.pdf">set of slides</a> is also available to get an overview of
StarPU.
</p>
</div>

192
193
<div class="section" id="contact">
<h3>Contact</h3>
194
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
195
196
197
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
198
<p>Details of the <a href="people/">StarPU team people</a> are also available.</p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
199
200
</div>

201
202
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
203

204
<h4>Portability</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
205
  <p>
206
207
208
209
210
211
212
213
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
214
215
216
217
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
218
219
220
StarPU can execute them concurrently, e.g. one per socket, provided that the
task implementations support it (which is the case for MKL, but unfortunately
most often not for OpenMP).
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
221
222
  </p>

223
224
225
<h4>Genericity</h4>
  <p>
The StarPU programming interface is very generic. For intance, various data
226
structures are supported mainline (vectors, dense matrices, CSR/BCSR/COO sparse matrices, ...),
227
228
229
230
231
232
but application-specific data structures can also be supported, provided that
the application describes how data is to be transfered (e.g. a series of
contiguous blocks). That was for instance used for hierarchically-compressed
matrices (h-matrices).
  </p>

233
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
234
  <p>
235
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
236
data management library enforces memory coherency over the machine: before a
237
238
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
239
240
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
241
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
242
(including <b>GPU-GPU direct transfers</b>) to achieve the most of the architecture.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
243
244
  </p>

245
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
246
  <p>
247
Dependencies between tasks can be given either of several ways, to provide the
248
249
programmer with best flexibility:
  <ul>
250
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
251
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
252
    tasks (thus including tasks which have not been created yet),</li>
253
    <li><b>explicitly</b> between pairs of tasks,</li>
254
  </ul>
255
256
  </p>
  <p>
257
258
259
  These dependencies are computed in a completely decentralized way, and can be
  introduced completely dynamically as tasks get submitted by the application
  while tasks previously submitted are being executed.
260
261
  </p>
  <p>
262
263
264
265
StarPU also supports an OpenMP-like <a href="doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
  </p>
  <p>
It also supports a <a href="doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
266
267
  </p>

268
269
270
271
272
  <p>
It also supports transparent dependencies tracking between hierarchical subpieces of data
through asynchronous partitioning (new in v1.3).
  </p>

273
274
275
<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
276
portable performances by efficiently (and easily) using all computing resources
277
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
278
machine, for instance by using scheduling strategies based on auto-tuned
279
280
281
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
282
Various strategies and variants are available. Some of them are centralized, but
283
most of them are <b>completely distributed</b>. <i>dmdas</i> (a data-locality-aware MCT strategy,
284
thus similar to heft but starts executing tasks before the whole task graph is
285
286
287
submitted, thus allowing dynamic task submission and a decentralized scheduler,
as well as an energy optimizing extension), <i>eager</i> (dumb centralized
queue), <i>lws</i> (decentralized locality-aware work-stealing), ...
288
The overhead per task is typically around the order of
289
290
291
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
negligible.
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
292
293
  </p>

294
295
<h4>Clusters</h4>
  <p>
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
To deal with clusters, StarPU can nicely integrate with <a
	href="doc/html/MPISupport.html">MPI</a>, through explicit or implicit
support, according to the application's preference.

    <ul>
        <li>Explicit network communication requests can be emitted, which will
then be <b>automatically combined and overlapped</b> with the intra-node data
transfers and computation,
        <li>The application can also just provide the whole task graph, a
data distribution over MPI nodes, and StarPU will automatically determine which
MPI node should execute which task, and <b>automatically generate all required
MPI communications</b> accordingly (new in v0.9). We have gotten excellent
scaling on a 256-node cluster with GPUs, we have not yet had the opportunity
to test on a yet larger cluster. We have however measured that with naive task
submission, it should scale to a thousand nodes, and with pruning-tuned task
submission, it should scale to about a <b>million nodes</b>.
        <li>Starting with v1.3, the application can also just provide the
whole task graph, and let StarPU decide the data distribution and task
distribution, thanks to a master-slave mechanism. This will however by nature
have a more limited scalability than the fully distributed paradigm mentioned
above.
    </ul>
318
319
320
321
322
  </p>

<h4>Out of core</h4>
  <p>
When memory is not big enough for the working set, one may have to resort to
Nathalie Furmento's avatar
Nathalie Furmento committed
323
using disks. StarPU makes this seamless thanks to its <a href="doc/html/OutOfCore.html">out of core support</a> (new in v1.2).
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
324
325
StarPU will <b>automatically evict</b> data from the main memory in advance, and
<b>prefetch back</b> required data before it is needed for tasks.
326
327
  </p>

328
329
330
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
331
  that <a href="doc/html/cExtensions.html">extends the C programming
332
333
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
334
335
336
  StarPU program</b> (new in v1.0).
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
337
338
<h4>OpenMP 4 -compatible interface</h4>
<p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
339
  <a href=http://kstar.gforge.inria.fr/>K'Star</a> provides an OpenMP
THIBAULT Samuel's avatar
THIBAULT Samuel committed
340
341
342
343
344
345
346
347
348
  4 -compatible interface on top of StarPU. This allows to just rebuild OpenMP
  applications with the K'Star source-to-source compiler, then build it with the
  usual compiler, and the result will use the StarPU runtime.
</p>
<p>
  K'Star also provides some extensions to the OpenMP 4 standard, to let the
  StarPU runtime perform online optimizations.
</p>

349
350
351
352
<h4>OpenCL-compatible interface</h4>
<p>
  StarPU provides an <a href="doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
  which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
353
354
</p>

355
356
357
358
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
359
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1).  This allows
360
361
362
363
364
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

365
366
<h4>All in all</h4>
  <p>
367
All that means that, with the help
368
of <a href="doc/html/cExtensions.html">StarPU's extensions to the C
369
370
language</a>, the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also valid
THIBAULT Samuel's avatar
THIBAULT Samuel committed
371
StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
372
distribution over MPI nodes, it is even a distributed version!
373
374
375
376
377
378
379
380
381
382
383
384
385
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
386

387
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
388
<ul>
389
<li>SMP/Multicore Processors (x86, PPC, ARM, ... all Debian architecture have been tested) </li>
Nathalie Furmento's avatar
Nathalie Furmento committed
390
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in v1.1)</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
391
392
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
Nathalie Furmento's avatar
Nathalie Furmento committed
393
394
<li>Intel SCC (experimental, new in v1.2)</li>
<li>Intel MIC / Xeon Phi (new in v1.2)</li>
395
</ul>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
396

397
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
398
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
399
400
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
401
402
403
<li>Windows</li>
</ul>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
404
405
406
407
408
409
410
411
412
413
414
<h4>Stability</h4>
<p>
StarPU is checked every night with
<ul>
<li>Valgrind / Helgrind</li>
<li>gcc' Address/Leak/Thread/Undefined Sanitizers</li>
<li>cppcheck</li>
<li>Coverity</li>
</ul>
</p>

415
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
416
417
418
419
420
421
422
423
424
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

425
<p>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
426
427
428
429
430
431
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
432
<center><a href="./images/greedy-lu-16k-fx5800.png"> <img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
433
434
</p>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
435
436
437
438
439
440
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
441
GFlop/s</b>
442
<center><a href="./images/dmda-lu-16k-fx5800.png"><img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website  
Nathalie Furmento committed
443
444
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
445
446
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
447
graph, as shown below (new in v1.1).
THIBAULT Samuel's avatar
THIBAULT Samuel committed
448
449
450
</p>

<center>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
451
<a href="images/temanejo.png"><img src="images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
452
453
</center>

Nathalie Furmento's avatar
website  
Nathalie Furmento committed
454
455
</div>

456
457
458
459
460
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
461
architectures, here is a non-exhaustive list (feel free to ask to be added to the
462
list!):
463
464
465
</p>

<ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
466
	<li>AL4SAN, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
467
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
468
	<li><a href="http://exa2pro.eu">Exa2pro</a>, Enhancing Programmability and boosting Performance Portability for Exascale Computing Systems</li>
THIBAULT Samuel's avatar
fix URL    
THIBAULT Samuel committed
469
	<li><a href="http://github.com/ecrc/exageostat">ExaGeoStat</a>, Machine learning framework for Climate/Weather prediction applications</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
470
	<li><a href="https://hal.inria.fr/hal-01507613">FLUSEPA</a>, Navier-Stokes Solver for Unsteady Problems with Bodies in Relative Motion</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
471
	<li><a href="http://github.com/ecrc/hicma">HiCMA</a>, Low-rank general linear algebra library</li>
472
	<li>hmat, hierarchical matrix C/C++ library</li>
473
        <li><a href=http://kstar.gforge.inria.fr/>K'Star</a>, OpenMP 4 - compatible interface on top of StarPU.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
474
	<li><a href="http://github.com/ecrc/ksvd">KSVD</a>, dense SVD on distributed-memory manycore systems</li>
475
476
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
	<li><a href="https://gitlab.inria.fr/solverstack/maphys">MaPHyS</a>, Massively Parallel Hybrid Solver</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
477
	<li><a href="http://github.com/ecrc/moao">MOAO</a>, HPC framework for computational astronomy, servicing the European Extremely Large Telescope and the Japanese Subaru Telescope</li>
478
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
479
	<li>PEPPHER, Performance Portability and Programmability for Heterogeneous Many-core Architectures</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
480
	<li><a href="http://github.com/ecrc/qdwh">QDWH</a>, QR-based Dynamically Weighted Halley</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
481
	<li><a href="http://buttari.perso.enseeiht.fr/qr_mumps/">qr_mumps</a>, sparse linear algebra library</li>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
482
	<li><a href="http://scalfmm-public.gforge.inria.fr/doc/">ScalFMM</a>, N-body interaction simulation using the Fast Multipole Method. </li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
483
	<li><a href="https://tel.archives-ouvertes.fr/tel-01410049/">SCHNAPS</a>, Solver for Conservative Hypebolic Non-linear systems Applied to PlasmaS. </li>
484
485
	<li><a href="https://hal.archives-ouvertes.fr/hal-01086246">SignalPU</a>, a Dataflow-Graph-specific programming model. </li>
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
486
	<li><a href="http://github.com/ecrc/stars-h">STARS-H</a>, HPC low-rank matrix market</li>
487
	<li><a href="http://www.xcalablemp.org/">XcalableMP</a>, Directive-based language eXtension for Scalable and performance-aware Parallel Programming</li>
488
489
</ul>

490
<p>
491
You can find <a href="#PublicationsOnApplications">below</a> the list of publications related to applications using StarPU.
492
493
</p>

494
495
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
496
497
498
499
<div class="section" id="tryit">
<h3>Give it a try!</h3>
<p>
You can easily try the performance on the Cholesky factorization for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
500
501
502
instance. Make sure to have the pkg-config and
<a href="http://www.open-mpi.org/projects/hwloc/">hwloc</a>
software installed for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
503
504
505
proper CPU control and BLAS kernels for your computation units and configured in
your environment (e.g. MKL for CPUs and CUBLAS for GPUs).
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
506
507

<tt><pre>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
508
509
510
511
512
513
$ wget http://starpu.gforge.inria.fr/files/starpu-someversion.tar.gz
$ tar xf starpu-someversion.tar.gz
$ cd starpu-someversion
$ ./configure
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
THIBAULT Samuel's avatar
THIBAULT Samuel committed
514
$ STARPU_SCHED=dmdas mpirun -np 4 -machinefile mymachines ./mpi/examples/matrix_decomposition/mpi_cholesky_distributed -size $((960*40*4)) -nblocks $((40*4))</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
515
516
517
518
519

<p>Note that the dmdas scheduler uses performance models, and thus needs
calibration execution before exhibiting optimized performance (until the "model
something is not calibrated enough" messages go away).</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
520
521
522
523
524
525
526
527
528
529
530
531
532
533
<p>To get a glimpse at what happened, you can get an execution trace by
installing
<a href="http://savannah.nongnu.org/projects/fkt">FxT</a>
and <a href="http://vite.gforge.inria.fr/">ViTE</a>, and enabling traces:
</p>

<tt><pre>
$ ./configure --with-fxt
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
$ ./tools/starpu_fxt_tool -i /tmp/prof_file_${USER}_0
$ vite paje.trace
</pre></tt>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
534
535
536
537
538
539
540
541
542
543
<p>
Starting with StarPU 1.1, it is also possible to reproduce the performance that
we show in our articles on our machines, by installing simgrid, and then using
the simulation mode of StarPU using the performance models of our machines:
</p>
  <tt><pre>
$ ./configure --enable-simgrid
$ make -j 12
$ STARPU_PERF_MODEL_DIR=$PWD/tools/perfmodels/sampling STARPU_HOSTNAME=mirage STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
# size	ms	GFlops
THIBAULT Samuel's avatar
THIBAULT Samuel committed
544
38400	9915	1903.7</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
545
546
<p>(MPI simulation is not supported yet)</p>

547
548
549
550
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
551
listed <a href="./publications">here</a>
552
553
554
with the corresponding Bibtex entries.
</p>

555
556
<p>
A good overview is available in
557
558
559
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

560
561
<p>
If you need to cite StarPU, please
562
reference <a href="publications/Year/2011.html#AugThiNamWac11CCPE">[StarPU: A Unified Platform
563
564
565
566
567
    for Task Scheduling on Heterogeneous Multicore Architectures]</a>
for a general presentation. Other sub-sections below will give you
references for more specific aspects of StarPU.
</p>

568
<h4>General Presentations</h4> 
569
<a name="PublicationsGeneralPresentations"></a>
570
571
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
572
573
574
575
576
577
578
<a name="thibault:tel-01959127"></a>Samuel Thibault<br/>
<strong>On Runtime Systems for Task-based Programming on Heterogeneous Platforms</strong><br/>
Habilitation à diriger des recherches, Université de Bordeaux, December 2018<br/>
[<a href="https://hal.inria.fr/tel-01959127">WWW</a>]
[<a href="https://hal.inria.fr/tel-01959127/file/hdr.pdf">PDF</a>]
</li>
<li>
579
580
581
<a name="Aug11Thesis"></a>Cédric Augonnet<br/>
<strong>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2011<br/>
582
[<a href="http://tel.archives-ouvertes.fr/tel-00777154">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
583
[<a href="http://tel.archives-ouvertes.fr/tel-00777154/document">PDF</a>]
584
585
</li>
<li>
586
587
<a name="AugThiNamWac11CCPE"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
588
<em>CCPE - Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011<br/>
589
[<a href="http://hal.inria.fr/inria-00550877">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
590
[<a href="http://hal.inria.fr/inria-00550877/document">PDF</a>]
591
[doi:<a href="http://dx.doi.org/10.1002/cpe.1631">10.1002/cpe.1631</a>]
592
</li>
593
<li>
594
595
<a name="AugThiNamWac10RR7240"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
596
Research Report RR-7240, INRIA, March 2010<br/>
597
[<a href="http://hal.inria.fr/inria-00467677">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
598
[<a href="http://hal.inria.fr/inria-00467677/document">PDF</a>]
599
600
</li>
<li>
601
602
603
604
<a name="Aug09Renpar19"></a>Cédric Augonnet<br/>
<strong>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</strong><br/>
In <em>19èmes Rencontres Francophones du Parallélisme</em>, Toulouse / France, September 2009<br/>
Note: Best Paper Award<br/>
605
[<a href="http://hal.inria.fr/inria-00411581">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
606
[<a href="http://hal.inria.fr/inria-00411581/document">PDF</a>]
607
608
</li>
<li>
609
610
<a name="AugThiNamWac09Europar"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
611
In <em>Euro-Par - 15th International Conference on Parallel Processing</em>, volume 5704 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 863-874, August 2009<br/>
612
Springer<br/>
613
[<a href="http://hal.inria.fr/inria-00384363">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
614
[<a href="http://hal.inria.fr/inria-00384363/document">PDF</a>]
615
616
617
618
619
620
621
622
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03869-3_80">10.1007/978-3-642-03869-3_80</a>]
</li>
<li>
<a name="AugNam08HPPC"></a>Cédric Augonnet and Raymond Namyst<br/>
<strong>A unified runtime system for heterogeneous multicore architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of <em>Lecture Notes in Computer Science</em>, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008<br/>
Springer<br/>
<strong>ISBN:</strong> 978-3-642-00954-9<br/>
623
[<a href="http://hal.inria.fr/inria-00326917">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
624
[<a href="http://hal.inria.fr/inria-00326917/document">PDF</a>]
625
626
627
628
629
630
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-00955-6_22">10.1007/978-3-642-00955-6_22</a>]
</li>
<li>
<a name="Aug08Master"></a>Cédric Augonnet<br/>
<strong>Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes</strong><br/>
Mémoire de DEA, Université Bordeaux 1, June 2008<br/>
631
[<a href="http://hal.inria.fr/inria-00289361">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
632
[<a href="http://hal.inria.fr/inria-00289361/document">PDF</a>]
633
634
</li>
</ol>
635
<h4>On Composability</h4> 
636
<a name="PublicationsOnComposability"></a>
Nathalie Furmento's avatar
Nathalie Furmento committed
637
638
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
639
640
641
642
643
644
645
<a name="hugo:tel-01162975"></a>Andra-Ecaterina Hugo<br/>
<strong>Composability of parallel codes on heterogeneous architectures</strong><br/>
Theses, Université de Bordeaux, December 2014<br/>
[<a href="https://tel.archives-ouvertes.fr/tel-01162975">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01162975/file/HUGO_ANDRA_2014.pdf">PDF</a>]
</li>
<li>
646
647
648
<a name="AH13Renpar"></a>Andra Hugo<br/>
<strong>Le problème de la composition parallèle : une approche supervisée</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
649
[<a href="http://hal.inria.fr/hal-00773610">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
650
[<a href="http://hal.inria.fr/hal-00773610/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
651
652
</li>
<li>
653
654
655
<a name="hugo:hal-00824514"></a>Andra Hugo, Abdou Guermouche, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Composing multiple StarPU applications over heterogeneous machines: a supervised approach</strong><br/>
In <em>Third International Workshop on Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May 2013<br/>
656
[<a href="http://hal.inria.fr/hal-00824514">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
657
[<a href="http://hal.inria.fr/hal-00824514/document">PDF</a>]
658
659
660
661
662
</li>
<li>
<a name="AH11Master"></a>Andra Hugo<br/>
<strong>Composabilité de codes parallèles sur architectures hétérogènes</strong><br/>
Mémoire de Master, Université Bordeaux 1, June 2011<br/>
663
[<a href="http://hal.inria.fr/inria-00619654/en/">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
664
[<a href="http://hal.inria.fr/inria-00619654/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
665
666
</li>
</ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
667
668
669
670
671
672
673
674
675
676
677
<h4>On Parallel Tasks</h4> 
<a name="PublicationsOnParallelTasks"></a>
<ol>
<li>
<a name="cojean:tel-01816341"></a>Terry Cojean<br/>
<strong>Programmation of heterogeneous architectures using moldable tasks</strong><br/>
Theses, Université de Bordeaux, March 2018<br/>
[<a href="https://tel.archives-ouvertes.fr/tel-01816341">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01816341/file/COJEAN_TERRY_2018.pdf">PDF</a>]
</li>
</ol>
678
<h4>On Scheduling</h4> 
679
<a name="PublicationsOnScheduling"></a>
680
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
681
<li>
682
683
684
685
686
687
688
689
<a name="bramas:hal-02120736"></a>Bérenger Bramas<br/>
<strong>Impact study of data locality on task-based applications through the Heteroprio scheduler</strong><br/>
<em>PeerJ Computer Science</em>, May 2019<br/>
[<a href="https://hal.inria.fr/hal-02120736">WWW</a>]
[<a href="https://hal.inria.fr/hal-02120736/file/peerj-cs-190.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.7717/peerj-cs.190">10.7717/peerj-cs.190</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
690
691
692
693
694
695
696
697
698
<a name="leandronesi:hal-02275363"></a>Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic,  and Lucas Mello Schnorr<br/>
<strong>Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms</strong><br/>
In <em>2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)</em>, Larnaca, Cyprus, pages 142-151, May 2019<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-02275363">WWW</a>]
[<a href="https://hal.inria.fr/hal-02275363/file/CCGRID_camera_ready.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/CCGRID.2019.00025">10.1109/CCGRID.2019.00025</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
699
700
<a name="garciapinto:hal-01616632"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault,  and Vincent Danjean<br/>
<strong>A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
701
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, 30, April 2018<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
702
703
704
705
706
[<a href="https://hal.inria.fr/hal-01616632">WWW</a>]
[<a href="https://hal.inria.fr/hal-01616632/file/CCPE_article_submitted_2018_02_06.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1002/cpe.4472">10.1002/cpe.4472</a>]
</li>
<li>
707
<a name="pinto:hal-01842038"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Arnaud Legrand, Samuel Thibault, Luka Stanisic,  and Vincent Danjean<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
708
<strong>Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Hìbridos</strong><br/>
THIBAULT Samuel's avatar
typo    
THIBAULT Samuel committed
709
In <em>WPerformance - 17o Workshop em Desempenho de Sistemas Computacionais e de Comunicação</em>, Natal, Brazil, July 2018<br/>
710
711
712
713
[<a href="https://hal.inria.fr/hal-01842038">WWW</a>]
[<a href="https://hal.inria.fr/hal-01842038/file/181587_1.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
714
715
<a name="kumar:tel-01538516"></a>Suraj Kumar<br/>
<strong>Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
716
PhD thesis, Université de Bordeaux, April 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
717
718
719
720
[<a href="https://tel.archives-ouvertes.fr/tel-01538516">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01538516/file/KUMAR_SURAL_2017.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
721
722
723
724
725
726
727
728
<a name="beaumont:hal-01386174"></a>O. Beaumont, L. Eyraud-Dubois,  and S. Kumar<br/>
<strong>Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs</strong><br/>
In <em>2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</em>, pages 768-777, May 2017<br/>
[<a href="https://hal.inria.fr/hal-01386174">WWW</a>]
[<a href="https://hal.inria.fr/hal-01386174/file/heteroPrioApproxProofsRR.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/IPDPS.2017.71">10.1109/IPDPS.2017.71</a>]
</li>
<li>
729
730
731
732
733
734
735
736
<a name="agullo:hal-01223573"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Are Static Schedules so Bad ? A Case Study on Cholesky Factorization</strong><br/>
In <em>IPDPS'16</em>, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, Chicago, IL, United States, May 2016<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-01223573">WWW</a>]
[<a href="https://hal.inria.fr/hal-01223573/file/heteroprioCameraReady-ieeeCompatiable.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
737
738
739
740
<a name="beaumont:hal-01361992"></a>Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche,  and Suraj Kumar<br/>
<strong>Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources</strong><br/>
In <em>International Conference on High Performance Computing, Data, and Analytics (HiPC)</em>, Hyderabad, India, December 2016<br/>
[<a href="https://hal.inria.fr/hal-01361992">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
741
[<a href="https://hal.inria.fr/hal-01361992v2/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
742
743
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
744
745
<a name="cojean:hal-01181135"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines</strong><br/>
746
In <em>HeteroPar'2016 workshop of Euro-Par</em>, Grenoble, France, August 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
747
748
[<a href="https://hal.inria.fr/hal-01181135">WWW</a>]
[<a href="https://hal.inria.fr/hal-01181135/file/papier%20%281%29.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
749
750
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
751
752
<a name="garciapinto:hal-01353962"></a>Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault,  and Vincent Danjean<br/>
<strong>Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
753
In <em>VPA - 3rd Workshop on Visual Performance Analysis</em>, Salt Lake City, United States, November 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
754
755
Note: Held in conjunction with SC16<br/>
[<a href="https://hal.inria.fr/hal-01353962">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
756
757
[<a href="https://hal.inria.fr/hal-01353962v2/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/VPA.2016.008">10.1109/VPA.2016.008</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
758
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
759
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
760
761
762
<a name="JaBlHU2016a"></a>Johan Janzén, David Black-Schaffer,  and Andra Hugo<br/>
<strong>Partitioning GPUs for Improved Scalability</strong><br/>
In <em>IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, October 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
763
[<a href="http://ieeexplore.ieee.org/abstract/document/7789322/">WWW</a>]
764
765
766
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2016.14">10.1109/SBAC-PAD.2016.14</a>]
</li>
<li>
767
768
769
770
771
772
773
<a name="cojean:hal-01409965"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of modern architectures</strong><br/>
Note: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops, November 2016<br/>
[<a href="https://hal.inria.fr/hal-01409965">WWW</a>]
[<a href="https://hal.inria.fr/hal-01409965/file/submission.pdf">PDF</a>]
</li>
<li>
774
775
<a name="agullo:hal-01120507"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal,  and Samuel Thibault<br/>
<strong>Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
776
In <em>HCW'2015 - Heterogeneity in Computing Workshop of IPDPS</em>, Hyderabad, India, May 2015<br/>
777
[<a href="https://hal.inria.fr/hal-01120507">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
778
[<a href="https://hal.inria.fr/hal-01120507/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
779
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2015.35">10.1109/IPDPSW.2015.35</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
780
</li>
781
<li>
782
783
784
<a name="sergent:hal-00978364"></a>Marc Sergent and Simon Archipoff<br/>
<strong>Modulariser les ordonnanceurs de tâches : une approche structurelle</strong><br/>
In <em>Compas'2014</em>, Neuchâtel, Suisse, April 2014<br/>
785
786
[<a href="http://hal.inria.fr/hal-00978364">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978364/PDF/ordonnanceurs_modulaires.pdf">PDF</a>]
787
</li>
788
789
790
791
792
793
794
795
<li>
<a name="AugCleThiNam10ICPADS"></a>Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Data-Aware Task Scheduling on Multi-Accelerator based Platforms</strong><br/>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, Shanghai, China, December 2010<br/>
[<a href="http://hal.inria.fr/inria-00523937">WWW</a>]
[<a href="http://hal.inria.fr/inria-00523937/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2010.129">10.1109/ICPADS.2010.129</a>]
</li>
796
</ol>
797
<h4>On The C Extensions</h4> 
798
<a name="PublicationsOnTheCExtensions"></a>
799
<ol>
800
801
802
803
<li>
<a name="LC13Report"></a>Ludovic Courtès<br/>
<strong>C Language Extensions for Hybrid CPU/GPU Programming with StarPU</strong><br/>
Research Report RR-8278, INRIA, April 2013<br/>
804
805
[<a href="http://hal.inria.fr/hal-00807033">WWW</a>]
[<a href="http://hal.inria.fr/hal-00807033/PDF/RR-8278.pdf">PDF</a>]
806
807
</li>
</ol>
808
<h4>On OpenMP Support on top of StarPU</h4> 
809
<a name="PublicationsOnOpenMPSupportontopofStarPU"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
810
<ol>
811
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
812
813
814
815
816
817
818
819
<a name="agullo:hal-01517153"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
<strong>Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method</strong><br/>
<em>IEEE Transactions on Parallel and Distributed Systems</em>, April 2017<br/>
[<a href="https://hal.inria.fr/hal-01517153">WWW</a>]
[<a href="https://hal.inria.fr/hal-01517153/file/tpds_kstar_scalfmm_print.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2697857">10.1109/TPDS.2017.2697857</a>]
</li>
<li>
820
<a name="agullo:hal-01372022"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
821
<strong>Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method</strong><br/>
822
823
824
825
826
Research Report RR-8953, Inria, March 2016<br/>
[<a href="https://hal.inria.fr/hal-01372022">WWW</a>]
[<a href="https://hal.inria.fr/hal-01372022/file/RR-8953.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
827
<a name="virouleau:hal-01081974"></a>Philippe Virouleau, Pierrick Brunet, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage,  and Thierry Gautier<br/>
828
<strong>Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
829
In <em>IWOMP2014 - 10th International Workshop on OpenMP</em>, 10th International Workshop on OpenMP, IWOMP2014, Salvador, Brazil, France, pages 16 - 29, September 2014<br/>
830
Springer<br/>
831
[<a href="https://hal.inria.fr/hal-01081974">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
832
[<a href="https://hal.inria.fr/hal-01081974/document">PDF</a>]
833
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-11454-5_2">10.1007/978-3-319-11454-5_2</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
834
835
</li>
</ol>
836
<h4>On MPI Support</h4> 
837
<a name="PublicationsOnMPISupport"></a>
838
839
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
840
<a name="agullo:hal-01618526"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
841
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
842
<em>TPDS - IEEE Transactions on Parallel and Distributed Systems</em>, December 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
843
844
[<a href="https://hal.inria.fr/hal-01618526">WWW</a>]
[<a href="https://hal.inria.fr/hal-01618526/file/tpds14.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
845
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2766064">10.1109/TPDS.2017.2766064</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
846
847
848
849
</li>
<li>
<a name="sergent:tel-01483666"></a>Marc Sergent<br/>
<strong>Scalability of a task-based runtime system for dense linear algebra applications</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
850
PhD thesis, Université de Bordeaux, December 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
851
852
[<a href="https://tel.archives-ouvertes.fr/tel-01483666">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01483666/file/SERGENT_MARC_2016.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
853
854
</li>
<li>
855
856
857
858
859
860
861
<a name="agullo:hal-01283949"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Harnessing clusters of hybrid nodes with a sequential task-based programming model</strong><br/>
In <em>8th International Workshop on Parallel Matrix Algorithms and Applications</em>, July 2014<br/>
[<a href="https://hal.inria.fr/hal-01283949">WWW</a>]
[<a href="https://hal.inria.fr/hal-01283949/file/pmaa14.pdf">PDF</a>]
</li>
<li>
862
863
<a name="augonnet:hal-00992208"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
864
Research Report RR-8538, INRIA, May 2014<br/>
865
866
867
868
[<a href="http://hal.inria.fr/hal-00992208">WWW</a>]
[<a href="http://hal.inria.fr/hal-00992208/PDF/RR-8538.pdf">PDF</a>]
</li>
<li>
869
870
871
872
873
<a name="AugAumFurNamThi2012EuroMPI"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst,  and Samuel Thibault<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, <em>EuroMPI 2012</em>, volume 7490 of <em>LNCS</em>, September 2012<br/>
Springer<br/>
Note: Poster Session<br/>
874
[<a href="http://hal.inria.fr/hal-00725477">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
875
[<a href="http://hal.inria.fr/hal-00725477/document">PDF</a>]
876
</li>
877
</ol>
Nathalie Furmento's avatar
Nathalie Furmento committed
878
<h4>On Memory Control</h4> 
879
<a name="PublicationsOnMemoryControl"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
880
881
<ol>
<li>
882
883
884
885
886
887
888
<a name="chevalier:hal-01718280"></a>Arthur Chevalier<br/>
<strong>Critical resources management and scheduling under StarPU</strong><br/>
Master's thesis, Université de Bordeaux, September 2017<br/>
[<a href="https://hal.inria.fr/hal-01718280">WWW</a>]
[<a href="https://hal.inria.fr/hal-01718280/file/Memoire.pdf">PDF</a>]
</li>
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
889
<a name="sergent:hal-01284004"></a>Marc Sergent, David Goudin, Samuel Thibault,  and Olivier Aumage<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
890
<strong>Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
891
In <em>HIPS - 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments</em>, Chicago, United States, May 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
892
893
[<a href="https://hal.inria.fr/hal-01284004">WWW</a>]
[<a href="https://hal.inria.fr/hal-01284004/file/PID4127657.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
894
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2016.105">10.1109/IPDPSW.2016.105</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
895
896
</li>
</ol>
897
<h4>On Performance Model Tuning</h4> 
898
<a name="PublicationsOnPerformanceModelTuning"></a>
899
900
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
901
902
903
904
905
906
907
<a name="agullo:hal-01474556"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic,  and Samuel Thibault<br/>
<strong>Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method</strong><br/>
Research Report RR-9036, INRIA Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01474556">WWW</a>]
[<a href="https://hal.inria.fr/hal-01474556/file/rapport.pdf">PDF</a>]
</li>
<li>
908
909
<a name="AugThiNam09HPPC"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
910
In <em>HPPC - Proceedings of the International Euro-Par Workshops, Highly Parallel Processing on a Chip</em>, volume 6043 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 56-65, August 2009<br/>
911
Springer<br/>
912
[<a href="http://hal.inria.fr/inria-00421333">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
913
[<a href="http://hal.inria.fr/inria-00421333/document">PDF</a>]
914
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-14122-5_9">10.1007/978-3-642-14122-5_9</a>]
915
916
</li>
</ol>
917
<h4>On The Simulation Support through SimGrid</h4> 
918
<a name="PublicationsOnTheSimulationSupportthroughSimGrid"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
919
<ol>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
920
<li>
921
922
<a name="stanisic:hal-01147997"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
923
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, pp 16, May 2015<br/>
924
925
[<a href="https://hal.inria.fr/hal-01147997">WWW</a>]
[<a href="https://hal.inria.fr/hal-01147997/file/CCPE14_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
926
[doi:<a href="http://dx.doi.org/10.1002/cpe.3555">10.1002/cpe.3555</a>]
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
927
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
928
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
929
930
931
932
933
934
935
<a name="stanisic:hal-01180272"></a>Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez,  and Brice Videau<br/>
<strong>Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers</strong><br/>
In <em>The 21st IEEE International Conference on Parallel and Distributed Systems</em>, Melbourne, Australia, December 2015<br/>
[<a href="https://hal.inria.fr/hal-01180272">WWW</a>]
[<a href="https://hal.inria.fr/hal-01180272/file/QRMSTARSG_article.pdf">PDF</a>]
</li>
<li>
936
937
<a name="stanisic:hal-01011633"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
938
In <em>Euro-Par - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014<br/>
939
Springer-Verlag<br/>
940
941
[<a href="http://hal.inria.fr/hal-01011633">WWW</a>]
[<a href="http://hal.inria.fr/hal-01011633/PDF/StarPUSG_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
942
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-09873-9_5">10.1007/978-3-319-09873-9_5</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
943
944
</li>
</ol>
945
<h4>On The Cell Support</h4> 
946
<a name="PublicationsOnTheCellSupport"></a>
947
948
<ol>
<li>
949
950
951
<a name="AugThiNamNij09Samos"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Maik Nijhuis<br/>
<strong>Exploiting the Cell/BE architecture with the StarPU unified runtime system</strong><br/>
In <em>SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation</em>, volume 5657 of <em>Lecture Notes in Computer Science</em>, Samos, Greece, July 2009<br/>
952
[<a href="http://hal.inria.fr/inria-00378705">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
953
[<a href="http://hal.inria.fr/inria-00378705/document">PDF</a>]
954
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03138-0_36">10.1007/978-3-642-03138-0_36</a>]
955
956
</li>
</ol>
957
<h4>On Applications</h4> 
958
<a name="PublicationsOnApplications"></a>
959
<ol>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
960
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
961
962
963
964
965
966
967
968
<a name="couteyencarpaye:hal-01507613"></a>Jean Marie Couteyen Carpaye, Jean Roman,  and Pierre Brenner<br/>
<strong>Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping</strong><br/>
<em>International Journal of Computational Science and Engineering</em>, pp 1 - 22, 2017<br/>
[<a href="https://hal.inria.fr/hal-01507613">WWW</a>]
[<a href="https://hal.inria.fr/hal-01507613/file/flusepa-task-hal-inria-preprint.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1016/j.jocs.2017.03.008">10.1016/j.jocs.2017.03.008</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
969
970
971
972
973
974
975
<a name="agullo:hal-01473475"></a>Emmanuel Agullo, Alfredo Buttari, Mikko Byckling, Abdou Guermouche,  and Ian Masliah<br/>
<strong>Achieving high-performance with a sparse direct solver on Intel KNL</strong><br/>
Research Report RR-9035, Inria Bordeaux Sud-Ouest ; CNRS-IRIT ; Intel corporation ; Université Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01473475">WWW</a>]
[<a href="https://hal.inria.fr/hal-01473475/file/RR-9035.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
976
977
978
979
980
981
982
<a name="agullo:hal-01387482"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz,  and Luka Stanisic<br/>
<strong>Task-based fast multipole method for clusters of multicore processors</strong><br/>
Research Report RR-8970, Inria Bordeaux Sud-Ouest, October 2016<br/>
[<a href="https://hal.inria.fr/hal-01387482">WWW</a>]
[<a href="https://hal.inria.fr/hal-01387482/file/report-8970.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
983
984
985
986
987
988
989
<a name="agullo:hal-01316982"></a>E Agullo, L Giraud, A Guermouche, S Nakov,  and Jean Roman<br/>
<strong>Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures</strong><br/>
Research Report 8912, Inria Bordeaux Sud-Ouest, May 2016<br/>
[<a href="https://hal.inria.fr/hal-01316982">WWW</a>]
[<a href="https://hal.inria.fr/hal-01316982/file/RR-8912.pdf">PDF</a>]
</li>
<li>
990
991
992
<a name="rossignon:tel-01230876"></a>Corentin Rossignon<br/>
<strong>A fine grain model programming for parallelization of sparse linear solver</strong><br/>
PhD thesis, Université de Bordeaux, July 2015<br/>
993
994
[<a href="https://tel.archives-ouvertes.fr/tel-01230876">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01230876/file/ROSSIGNON_CORENTIN_2015.pdf">PDF</a>]
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
995
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
996
<li>
997
998
<a name="MaMiDuAuThiAoNa15"></a>Vìctor Martìnez, David Michéa, Fabrice Dupros, Olivier Aumage, Samuel Thibault, Hideo Aochi,  and Philippe Olivier Alexandre Navaux<br/>
<strong>Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
999
In <em>SBAC-PAD - 27th International Symposium on Computer Architecture and High Performance Computing</em>, Florianopolis, Brazil, October 2015<br/>
1000
1001
[<a href="https://hal.inria.fr/hal-01182746">WWW</a>]
[<a href="https://hal.inria.fr/hal-01182746/file/sbac2015_soumission.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1002
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2015.33">10.1109/SBAC-PAD.2015.33</a>]
1003
1004
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1005
1006
1007
1008
1009
1010
1011
1012
<a name="agullo:hal-00911856"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner,  and Toru Takahashi<br/>
<strong>Task-Based FMM for Multicore Architectures</strong><br/>
<em>SIAM Journal on Scientific Computing</em>, 36(1):66-93, 2014<br/>
[<a href="https://hal.inria.fr/hal-00911856">WWW</a>]
[<a href="https://hal.inria.fr/hal-00911856/file/sisc-cpu.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1137/130915662">10.1137/130915662</a>]
</li>
<li>
1013
1014
1015
1016
<a name="sylvain:hal-01005765"></a>Sylvain Henry, Alexandre Denis, Denis Barthou, Marie-Christine Counilh,  and Raymond Namyst<br/>
<strong>Toward OpenCL Automatic Multi-Device Support</strong><br/>
In Fernando Silva, Ines Dutra,  and Vitor Santos Costa, editors, <em>Euro-Par 2014</em>, Porto, Portugal, August 2014<br/>
Springer<br/>
1017
1018
[<a href="http://hal.inria.fr/hal-01005765">WWW</a>]
[<a href="http://hal.inria.fr/hal-01005765/PDF/final.pdf">PDF</a>]
1019
1020
</li>
<li>
1021
1022
<a name="lacoste:hal-00987094"></a>Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault,  and George Bosilca<br/>
<strong>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1023
In <em>HCW'2014 - Heterogeneity in Computing Workshop of IPDPS</em>, Phoenix, États-Unis, May 2014<br/>
1024
IEEE<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1025
Note: RR-8446<br/>
1026
1027
[<a href="http://hal.inria.fr/hal-00987094">WWW</a>]
[<a href="http://hal.inria.fr/hal-00987094/PDF/sparsegpus.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1028
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2014.9">10.1109/IPDPSW.2014.9</a>]
1029
1030
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
1031
1032
1033
1034