index.html 67.4 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
2
3
4
5
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
Nathalie Furmento's avatar
Nathalie Furmento committed
6
<link rel="stylesheet" type="text/css" href="/style.css" />
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
7
8
9
10
</HEAD>

<body>

11
<div class="title">
Nathalie Furmento's avatar
Nathalie Furmento committed
12
<h1><a href="/">StarPU</a></h1>
13
14
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
15

16
<div class="menu">
17
<a href="https://team.inria.fr/storm/">STORM TEAM</a> |
18
&nbsp; &nbsp; &nbsp;
19
|
20
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
21
<a href="#news">News</a> |
22
<a href="#contact">Contact</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
23
<a href="/people/">People</a> |
24
<a href="#features">Features</a> |
25
<a href="#software">Software</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
26
<a href="#tryit">Try it!</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
27
<a href="help/">Help</a> |
28
<a href="#publications">Publications</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
29
30
31
32
<a href="/internships/">Jobs/Interns</a> |
<a href="/files/">Download</a> |
<a href="/market/">Market</a> |
<a href="/tutorials/">Tutorials</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
33
<a href="/files/testing/morse/master/">Benchmarks</a> |
THIBAULT Samuel's avatar
THIBAULT Samuel committed
34
<a href="https://gitlab.inria.fr/starpu/starpu-intranet/-/wikis/home">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
35
</div>
36

37
38
<div class="section" id="overview">
<h3>Overview</h3>
39
40
41
42
43
44
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
45
    <li>A graph of tasks, using either StarPU's rich <b>C/C++ API</b>, or <b>OpenMP pragmas</b>.</li>
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
64
<span class="note">The StarPU documentation is available in
Nathalie Furmento's avatar
Nathalie Furmento committed
65
<a href="/files/doc/starpu.pdf">PDF</a> and in <a href="/files/doc/html/">HTML</a>.</span>
66
67
68
69
Please note that these documents are up-to-date with the latest release of
StarPU.
</p>
<p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
70
The latest documentation in <a href="/files/testing/master/doc/starpu.pdf">PDF</a>
71
and <a href="/files/testing/master/doc/html/">HTML</a> is updated everyday, but covers
72
the latest developments which may not be available in the latest release.
73
74
75
76
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
77
<h3>News</h3>
78
<p>
79
October
Nathalie Furmento's avatar
Nathalie Furmento committed
80
2020 <b>&raquo;&nbsp;</b><a href="/files/index.html#1.3"><b>The
81
      release 1.3.7 of StarPU is now
82
83
84
85
86
87
88
89
      available!</b></a> The 1.3 release brings among other
      functionalities a MPI master-slave support, a tool to replay
      execution through SimGrid, a HDF5 implementation of the
      Out-of-core, a new implementation of StarPU-MPI on top of
      NewMadeleine, implicit support for asynchronous partition
      planning, a resource management module to share processor cores
      and accelerator devices with other parallel runtime systems, ...
</p>
90
91
92
<p>
June 2020 <b>&raquo;&nbsp;</b><a href="/files/index.html#1.2"><b>The
      1.2.10 release of StarPU is now available!</b></a>.
Nathalie Furmento's avatar
Nathalie Furmento committed
93
	The 1.2 release series notably brings an out-of-core support, a MIC Xeon
94
95
96
	Phi support, an OpenMP runtime support, and a new internal
	communication system for MPI.
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
97
98
99
100
101
102
103
<p>
  November
  2019 <b>&raquo;&nbsp;</b>.
  A <a href="/tutorials/2019-11-HPNS-Inria/">StarPU tutorial</a> will
  be given as part of the Inria automn school "High Performance
  Numerical Simulation".
</p>
104
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
105
May 2019 <b>&raquo;&nbsp;</b><a href="/files/index.html#1.1"><b>The
106
107
108
109
110
      v1.1.8 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources. This is really intented to be the last release for the
      branch 1.1.
</p>
111
</div>
112
113

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
114
Get the latest StarPU news by subscribing to the <a href="https://sympa.inria.fr/sympa/info/starpu-announce">starpu-announce mailing list</a>.
115
See also the full <a href="news/">news</a>.
116
117
</div>

118
119
120
<div class="section" id="video">
<h3>Video Conference</h3>
<p>
121
A video recording (26') of a <a href="http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/">presentation at the XDC2014 conference</a> gives an overview of StarPU
122
123
124
125
126
127
128
(<a href=http://www.x.org/wiki/Events/XDC2014/XDC2014ThibaultStarPU/xdc_starpu.pdf>slides</a>):
</p>
<center>
<iframe width="420" height="315" src="https://www.youtube.com/embed/frsWSqb8UJU" frameborder="0" allowfullscreen></iframe>
</center>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
129
<div class="section" id="tutorial">
130
131
132
133
<h3>Tutorial material</h3>
<p>
The latest tutorial material for StarPU is composed of two parts:
<ul>
Nathalie Furmento's avatar
Nathalie Furmento committed
134
135
<li><a href="/tutorials/2016-06-PATC/slides/01_introducing_starpu.pdf">Introducing StarPU</a></li>
<li><a href="/tutorials/2016-06-PATC/slides/02_mastering_starpu.pdf">Mastering StarPU</a></li>
136
137
138
139
</ul>
</p>
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
140
141
142
<div class="section" id="slides">
<h3>Set of slides</h3>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
143
A <a href="/slides.pdf">set of slides</a> is also available to get an overview of
THIBAULT Samuel's avatar
THIBAULT Samuel committed
144
145
146
147
StarPU.
</p>
</div>

148
149
<div class="section" id="contact">
<h3>Contact</h3>
150
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
151
<pre>
152
<a href="mailto:starpu-devel@inria.fr?subject=StarPU">starpu-devel@inria.fr</a>
153
</pre>
Nathalie Furmento's avatar
Nathalie Furmento committed
154
<p>Details of the <a href="/people/">StarPU team people</a> are also available.</p>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
155
156
</div>

157
158
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
159

160
<h4>Portability</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
161
  <p>
162
163
164
165
166
167
168
169
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
170
171
172
173
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
174
175
176
StarPU can execute them concurrently, e.g. one per socket, provided that the
task implementations support it (which is the case for MKL, but unfortunately
most often not for OpenMP).
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
177
178
  </p>

179
180
<h4>Genericity</h4>
  <p>
Nathalie Furmento's avatar
Nathalie Furmento committed
181
The StarPU programming interface is very generic. For instance, various data
182
structures are supported mainline (vectors, dense matrices, CSR/BCSR/COO sparse matrices, ...),
183
184
185
186
187
188
but application-specific data structures can also be supported, provided that
the application describes how data is to be transfered (e.g. a series of
contiguous blocks). That was for instance used for hierarchically-compressed
matrices (h-matrices).
  </p>

189
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
190
  <p>
191
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
192
data management library enforces memory coherency over the machine: before a
193
194
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
195
196
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
197
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
198
(including <b>GPU-GPU direct transfers</b>) to achieve the most of the architecture.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
199
200
  </p>

201
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
202
  <p>
203
Dependencies between tasks can be given either of several ways, to provide the
204
205
programmer with best flexibility:
  <ul>
206
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
207
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
208
    tasks (thus including tasks which have not been created yet),</li>
209
    <li><b>explicitly</b> between pairs of tasks,</li>
210
  </ul>
211
212
  </p>
  <p>
213
214
215
  These dependencies are computed in a completely decentralized way, and can be
  introduced completely dynamically as tasks get submitted by the application
  while tasks previously submitted are being executed.
216
217
  </p>
  <p>
Nathalie Furmento's avatar
Nathalie Furmento committed
218
StarPU also supports an OpenMP-like <a href="/files/doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
219
220
  </p>
  <p>
Nathalie Furmento's avatar
Nathalie Furmento committed
221
It also supports a <a href="/files/doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
222
223
  </p>

224
225
  <p>
It also supports transparent dependencies tracking between hierarchical subpieces of data
226
227
through asynchronous partitioning, allowing seamless concurrent read access to different
partitioning levels (new in v1.3).
228
229
  </p>

230
231
232
<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
233
portable performances by efficiently (and easily) using all computing resources
234
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
235
machine, for instance by using scheduling strategies based on auto-tuned
236
237
238
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
239
Various strategies and variants are available. Some of them are centralized, but
240
most of them are <b>completely distributed</b>. <i>dmdas</i> (a data-locality-aware MCT strategy,
241
thus similar to heft but starts executing tasks before the whole task graph is
242
243
244
submitted, thus allowing dynamic task submission and a decentralized scheduler,
as well as an energy optimizing extension), <i>eager</i> (dumb centralized
queue), <i>lws</i> (decentralized locality-aware work-stealing), ...
245
The overhead per task is typically around the order of
246
247
248
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
negligible.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
249
250
  </p>

251
252
<h4>Clusters</h4>
  <p>
253
To deal with clusters, StarPU can nicely integrate with <a
Nathalie Furmento's avatar
Nathalie Furmento committed
254
	href="/files/doc/html/MPISupport.html">MPI</a>, through explicit or implicit
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
support, according to the application's preference.

    <ul>
        <li>Explicit network communication requests can be emitted, which will
then be <b>automatically combined and overlapped</b> with the intra-node data
transfers and computation,
        <li>The application can also just provide the whole task graph, a
data distribution over MPI nodes, and StarPU will automatically determine which
MPI node should execute which task, and <b>automatically generate all required
MPI communications</b> accordingly (new in v0.9). We have gotten excellent
scaling on a 256-node cluster with GPUs, we have not yet had the opportunity
to test on a yet larger cluster. We have however measured that with naive task
submission, it should scale to a thousand nodes, and with pruning-tuned task
submission, it should scale to about a <b>million nodes</b>.
        <li>Starting with v1.3, the application can also just provide the
whole task graph, and let StarPU decide the data distribution and task
distribution, thanks to a master-slave mechanism. This will however by nature
have a more limited scalability than the fully distributed paradigm mentioned
above.
    </ul>
275
276
277
278
279
  </p>

<h4>Out of core</h4>
  <p>
When memory is not big enough for the working set, one may have to resort to
Nathalie Furmento's avatar
Nathalie Furmento committed
280
using disks. StarPU makes this seamless thanks to its <a href="/files/doc/html/OutOfCore.html">out of core support</a> (new in v1.2).
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
281
282
StarPU will <b>automatically evict</b> data from the main memory in advance, and
<b>prefetch back</b> required data before it is needed for tasks.
283
284
  </p>

285
  <!--
286
287
288
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
Nathalie Furmento's avatar
Nathalie Furmento committed
289
  that <a href="/files/doc/html/cExtensions.html">extends the C programming
290
291
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
292
293
  StarPU program</b> (new in v1.0).
</p>
294
295
296
297
298
299
  -->

<h4>Fortran interface</h4>
<p>
  StarPU comes with native Fortran bindings and examples.
</p>
300

THIBAULT Samuel's avatar
THIBAULT Samuel committed
301
302
<h4>OpenMP 4 -compatible interface</h4>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
303
  <a href="http://kstar.gforge.inria.fr/">K'Star</a> provides an OpenMP
THIBAULT Samuel's avatar
THIBAULT Samuel committed
304
305
306
307
308
309
310
311
312
  4 -compatible interface on top of StarPU. This allows to just rebuild OpenMP
  applications with the K'Star source-to-source compiler, then build it with the
  usual compiler, and the result will use the StarPU runtime.
</p>
<p>
  K'Star also provides some extensions to the OpenMP 4 standard, to let the
  StarPU runtime perform online optimizations.
</p>

313
314
<h4>OpenCL-compatible interface</h4>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
315
  StarPU provides an <a href="/files/doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
316
  which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
317
318
</p>

319
320
321
322
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
323
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1).  This allows
324
325
326
327
328
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

329
330
<h4>All in all</h4>
  <p>
331
332
333
All that means that the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also (a
almost) valid StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
334
distribution over MPI nodes, it is even a distributed version!
335
336
337
338
339
340
341
342
343
344
345
346
347
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
348

349
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
350
<ul>
351
<li>SMP/Multicore Processors (x86, PPC, ARM, ... all Debian architecture have been tested) </li>
Nathalie Furmento's avatar
Nathalie Furmento committed
352
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in v1.1)</li>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
353
<li>OpenCL devices</li>
Nathalie Furmento's avatar
Nathalie Furmento committed
354
355
<li>Intel SCC (experimental, new in v1.2)</li>
<li>Intel MIC / Xeon Phi (new in v1.2)</li>
356
</ul>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
357

358
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
359
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
360
361
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
362
363
364
<li>Windows</li>
</ul>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
365
366
367
368
369
370
371
372
373
374
375
<h4>Stability</h4>
<p>
StarPU is checked every night with
<ul>
<li>Valgrind / Helgrind</li>
<li>gcc' Address/Leak/Thread/Undefined Sanitizers</li>
<li>cppcheck</li>
<li>Coverity</li>
</ul>
</p>

376
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
377
378
379
380
381
382
383
384
385
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

386
<p>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
387
388
389
390
391
392
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
Nathalie Furmento's avatar
Nathalie Furmento committed
393
<center><a href="/images/greedy-lu-16k-fx5800.png"> <img src="/images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
394
395
</p>

Nathalie Furmento's avatar
website    
Nathalie Furmento committed
396
397
398
399
400
401
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
402
GFlop/s</b>
Nathalie Furmento's avatar
Nathalie Furmento committed
403
<center><a href="/images/dmda-lu-16k-fx5800.png"><img src="/images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
404
405
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
406
407
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
408
graph, as shown below (new in v1.1).
THIBAULT Samuel's avatar
THIBAULT Samuel committed
409
410
411
</p>

<center>
Nathalie Furmento's avatar
Nathalie Furmento committed
412
<a href="/images/temanejo.png"><img src="/images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
413
414
</center>

Nathalie Furmento's avatar
website    
Nathalie Furmento committed
415
416
</div>

417
418
419
420
421
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
422
architectures, here is a non-exhaustive list (feel free to ask to be added to the
423
list!):
424
425
426
</p>

<ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
427
	<li><a href="https://github.com/ecrc/al4san">AL4SAN</a>, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
428
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
429
	<li><a href="http://exa2pro.eu">Exa2pro</a>, Enhancing Programmability and boosting Performance Portability for Exascale Computing Systems</li>
THIBAULT Samuel's avatar
fix URL    
THIBAULT Samuel committed
430
	<li><a href="http://github.com/ecrc/exageostat">ExaGeoStat</a>, Machine learning framework for Climate/Weather prediction applications</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
431
	<li><a href="https://hal.inria.fr/hal-01507613">FLUSEPA</a>, Navier-Stokes Solver for Unsteady Problems with Bodies in Relative Motion</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
432
	<li><a href="http://github.com/ecrc/hicma">HiCMA</a>, Low-rank general linear algebra library</li>
433
	<li>hmat, hierarchical matrix C/C++ library</li>
Nathalie Furmento's avatar
Nathalie Furmento committed
434
        <li><a href="http://kstar.gforge.inria.fr/">K'Star</a>, OpenMP 4 - compatible interface on top of StarPU.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
435
	<li><a href="http://github.com/ecrc/ksvd">KSVD</a>, dense SVD on distributed-memory manycore systems</li>
436
437
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
	<li><a href="https://gitlab.inria.fr/solverstack/maphys">MaPHyS</a>, Massively Parallel Hybrid Solver</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
438
	<li>MASA-StarPU, Parallel Sequence Comparison</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
439
	<li><a href="http://github.com/ecrc/moao">MOAO</a>, HPC framework for computational astronomy, servicing the European Extremely Large Telescope and the Japanese Subaru Telescope</li>
440
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
441
	<li>PEPPHER, Performance Portability and Programmability for Heterogeneous Many-core Architectures</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
442
	<li><a href="http://github.com/ecrc/qdwh">QDWH</a>, QR-based Dynamically Weighted Halley</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
443
	<li><a href="http://buttari.perso.enseeiht.fr/qr_mumps/">qr_mumps</a>, sparse linear algebra library</li>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
444
	<li><a href="http://scalfmm-public.gforge.inria.fr/doc/">ScalFMM</a>, N-body interaction simulation using the Fast Multipole Method. </li>
Nathalie Furmento's avatar
Nathalie Furmento committed
445
	<li><a href="https://tel.archives-ouvertes.fr/tel-01410049/">SCHNAPS</a>, Solver for Conservative Hyperbolic Non-linear systems Applied to PlasmaS. </li>
446
447
	<li><a href="https://hal.archives-ouvertes.fr/hal-01086246">SignalPU</a>, a Dataflow-Graph-specific programming model. </li>
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
448
	<li><a href="https://github.com/NLAFET/StarNEig/">StarNEig</a>, a dense nonsymmetric (generalized) eigenvalue solving library.</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
449
	<li><a href="http://github.com/ecrc/stars-h">STARS-H</a>, HPC low-rank matrix market</li>
450
	<li><a href="http://www.xcalablemp.org/">XcalableMP</a>, Directive-based language eXtension for Scalable and performance-aware Parallel Programming</li>
451
452
</ul>

453
<p>
454
You can find <a href="#PublicationsOnApplications">below</a> the list of publications related to applications using StarPU.
455
456
</p>

457
458
</div>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
459
460
461
462
<div class="section" id="tryit">
<h3>Give it a try!</h3>
<p>
You can easily try the performance on the Cholesky factorization for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
463
464
465
instance. Make sure to have the pkg-config and
<a href="http://www.open-mpi.org/projects/hwloc/">hwloc</a>
software installed for
THIBAULT Samuel's avatar
THIBAULT Samuel committed
466
467
468
proper CPU control and BLAS kernels for your computation units and configured in
your environment (e.g. MKL for CPUs and CUBLAS for GPUs).
</p>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
469
470

<tt><pre>
471
$ wget http://files.inria.fr/starpu/starpu-someversion/starpu-someversion.tar.gz
THIBAULT Samuel's avatar
THIBAULT Samuel committed
472
473
474
475
476
$ tar xf starpu-someversion.tar.gz
$ cd starpu-someversion
$ ./configure
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
THIBAULT Samuel's avatar
THIBAULT Samuel committed
477
$ STARPU_SCHED=dmdas mpirun -np 4 -machinefile mymachines ./mpi/examples/matrix_decomposition/mpi_cholesky_distributed -size $((960*40*4)) -nblocks $((40*4))</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
478
479
480
481
482

<p>Note that the dmdas scheduler uses performance models, and thus needs
calibration execution before exhibiting optimized performance (until the "model
something is not calibrated enough" messages go away).</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
483
484
485
486
487
488
489
490
491
492
493
494
495
496
<p>To get a glimpse at what happened, you can get an execution trace by
installing
<a href="http://savannah.nongnu.org/projects/fkt">FxT</a>
and <a href="http://vite.gforge.inria.fr/">ViTE</a>, and enabling traces:
</p>

<tt><pre>
$ ./configure --with-fxt
$ make -j 12
$ STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
$ ./tools/starpu_fxt_tool -i /tmp/prof_file_${USER}_0
$ vite paje.trace
</pre></tt>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
497
498
499
500
501
502
503
504
505
506
<p>
Starting with StarPU 1.1, it is also possible to reproduce the performance that
we show in our articles on our machines, by installing simgrid, and then using
the simulation mode of StarPU using the performance models of our machines:
</p>
  <tt><pre>
$ ./configure --enable-simgrid
$ make -j 12
$ STARPU_PERF_MODEL_DIR=$PWD/tools/perfmodels/sampling STARPU_HOSTNAME=mirage STARPU_SCHED=dmdas ./examples/cholesky/cholesky_implicit -size $((960*40)) -nblocks 40
# size	ms	GFlops
THIBAULT Samuel's avatar
THIBAULT Samuel committed
507
38400	9915	1903.7</pre></tt>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
508
509
<p>(MPI simulation is not supported yet)</p>

510
511
512
513
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
Nathalie Furmento's avatar
Nathalie Furmento committed
514
listed <a href="/publications">here</a>
515
516
517
with the corresponding Bibtex entries.
</p>

518
519
<p>
A good overview is available in
520
521
522
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

523
524
<p>
If you need to cite StarPU, please
Nathalie Furmento's avatar
Nathalie Furmento committed
525
reference <a href="/publications/Year/2011.html#AugThiNamWac11CCPE">[StarPU: A Unified Platform
526
527
528
529
530
    for Task Scheduling on Heterogeneous Multicore Architectures]</a>
for a general presentation. Other sub-sections below will give you
references for more specific aspects of StarPU.
</p>

531
<h4>General Presentations</h4> 
532
<a name="PublicationsGeneralPresentations"></a>
533
534
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
535
536
537
538
539
540
541
<a name="thibault:tel-01959127"></a>Samuel Thibault<br/>
<strong>On Runtime Systems for Task-based Programming on Heterogeneous Platforms</strong><br/>
Habilitation à diriger des recherches, Université de Bordeaux, December 2018<br/>
[<a href="https://hal.inria.fr/tel-01959127">WWW</a>]
[<a href="https://hal.inria.fr/tel-01959127/file/hdr.pdf">PDF</a>]
</li>
<li>
542
543
544
<a name="Aug11Thesis"></a>Cédric Augonnet<br/>
<strong>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</strong><br/>
PhD thesis, Université Bordeaux 1, 351 cours de la Libération --- 33405 TALENCE cedex, December 2011<br/>
545
[<a href="http://tel.archives-ouvertes.fr/tel-00777154">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
546
[<a href="http://tel.archives-ouvertes.fr/tel-00777154/document">PDF</a>]
547
548
</li>
<li>
549
550
<a name="AugThiNamWac11CCPE"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
551
<em>CCPE - Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011<br/>
552
[<a href="http://hal.inria.fr/inria-00550877">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
553
[<a href="http://hal.inria.fr/inria-00550877/document">PDF</a>]
554
[doi:<a href="http://dx.doi.org/10.1002/cpe.1631">10.1002/cpe.1631</a>]
555
</li>
556
<li>
557
558
<a name="AugThiNamWac10RR7240"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
559
Research Report RR-7240, INRIA, March 2010<br/>
560
[<a href="http://hal.inria.fr/inria-00467677">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
561
[<a href="http://hal.inria.fr/inria-00467677/document">PDF</a>]
562
563
</li>
<li>
564
565
566
567
<a name="Aug09Renpar19"></a>Cédric Augonnet<br/>
<strong>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</strong><br/>
In <em>19èmes Rencontres Francophones du Parallélisme</em>, Toulouse / France, September 2009<br/>
Note: Best Paper Award<br/>
568
[<a href="http://hal.inria.fr/inria-00411581">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
569
[<a href="http://hal.inria.fr/inria-00411581/document">PDF</a>]
570
571
</li>
<li>
572
573
<a name="AugThiNamWac09Europar"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
574
In <em>Euro-Par - 15th International Conference on Parallel Processing</em>, volume 5704 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 863-874, August 2009<br/>
575
Springer<br/>
576
[<a href="http://hal.inria.fr/inria-00384363">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
577
[<a href="http://hal.inria.fr/inria-00384363/document">PDF</a>]
578
579
580
581
582
583
584
585
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03869-3_80">10.1007/978-3-642-03869-3_80</a>]
</li>
<li>
<a name="AugNam08HPPC"></a>Cédric Augonnet and Raymond Namyst<br/>
<strong>A unified runtime system for heterogeneous multicore architectures</strong><br/>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of <em>Lecture Notes in Computer Science</em>, Las Palmas de Gran Canaria, Spain, pages 174-183, August 2008<br/>
Springer<br/>
<strong>ISBN:</strong> 978-3-642-00954-9<br/>
586
[<a href="http://hal.inria.fr/inria-00326917">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
587
[<a href="http://hal.inria.fr/inria-00326917/document">PDF</a>]
588
589
590
591
592
593
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-00955-6_22">10.1007/978-3-642-00955-6_22</a>]
</li>
<li>
<a name="Aug08Master"></a>Cédric Augonnet<br/>
<strong>Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes</strong><br/>
Mémoire de DEA, Université Bordeaux 1, June 2008<br/>
594
[<a href="http://hal.inria.fr/inria-00289361">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
595
[<a href="http://hal.inria.fr/inria-00289361/document">PDF</a>]
596
597
</li>
</ol>
598
<h4>On Composability</h4> 
599
<a name="PublicationsOnComposability"></a>
Nathalie Furmento's avatar
Nathalie Furmento committed
600
601
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
602
603
<a name="hugo:tel-01162975"></a>Andra-Ecaterina Hugo<br/>
<strong>Composability of parallel codes on heterogeneous architectures</strong><br/>
Nathalie Furmento's avatar
Nathalie Furmento committed
604
Ph.D Thesis, Université de Bordeaux, December 2014<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
605
606
607
608
[<a href="https://tel.archives-ouvertes.fr/tel-01162975">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01162975/file/HUGO_ANDRA_2014.pdf">PDF</a>]
</li>
<li>
609
610
611
<a name="AH13Renpar"></a>Andra Hugo<br/>
<strong>Le problème de la composition parallèle : une approche supervisée</strong><br/>
In <em>21èmes Rencontres Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013<br/>
612
[<a href="http://hal.inria.fr/hal-00773610">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
613
[<a href="http://hal.inria.fr/hal-00773610/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
614
615
</li>
<li>
616
617
618
<a name="hugo:hal-00824514"></a>Andra Hugo, Abdou Guermouche, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Composing multiple StarPU applications over heterogeneous machines: a supervised approach</strong><br/>
In <em>Third International Workshop on Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May 2013<br/>
619
[<a href="http://hal.inria.fr/hal-00824514">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
620
[<a href="http://hal.inria.fr/hal-00824514/document">PDF</a>]
621
622
623
624
625
</li>
<li>
<a name="AH11Master"></a>Andra Hugo<br/>
<strong>Composabilité de codes parallèles sur architectures hétérogènes</strong><br/>
Mémoire de Master, Université Bordeaux 1, June 2011<br/>
626
[<a href="http://hal.inria.fr/inria-00619654/en/">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
627
[<a href="http://hal.inria.fr/inria-00619654/document">PDF</a>]
Nathalie Furmento's avatar
Nathalie Furmento committed
628
629
</li>
</ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
630
631
632
633
634
635
<h4>On Parallel Tasks</h4> 
<a name="PublicationsOnParallelTasks"></a>
<ol>
<li>
<a name="cojean:tel-01816341"></a>Terry Cojean<br/>
<strong>Programmation of heterogeneous architectures using moldable tasks</strong><br/>
Nathalie Furmento's avatar
Nathalie Furmento committed
636
Ph.D Thesis, Université de Bordeaux, March 2018<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
637
638
639
640
[<a href="https://tel.archives-ouvertes.fr/tel-01816341">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01816341/file/COJEAN_TERRY_2018.pdf">PDF</a>]
</li>
</ol>
641
<h4>On Scheduling</h4> 
642
<a name="PublicationsOnScheduling"></a>
643
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
644
<li>
645
646
647
648
649
650
651
652
<a name="bramas:hal-02120736"></a>Bérenger Bramas<br/>
<strong>Impact study of data locality on task-based applications through the Heteroprio scheduler</strong><br/>
<em>PeerJ Computer Science</em>, May 2019<br/>
[<a href="https://hal.inria.fr/hal-02120736">WWW</a>]
[<a href="https://hal.inria.fr/hal-02120736/file/peerj-cs-190.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.7717/peerj-cs.190">10.7717/peerj-cs.190</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
653
654
655
656
657
658
659
660
661
<a name="leandronesi:hal-02275363"></a>Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic,  and Lucas Mello Schnorr<br/>
<strong>Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms</strong><br/>
In <em>2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)</em>, Larnaca, Cyprus, pages 142-151, May 2019<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-02275363">WWW</a>]
[<a href="https://hal.inria.fr/hal-02275363/file/CCGRID_camera_ready.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/CCGRID.2019.00025">10.1109/CCGRID.2019.00025</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
662
663
664
665
666
667
668
<a name="alias:hal-02421327"></a>Christophe Alias, Samuel Thibault,  and Laure Gonnord<br/>
<strong>A Compiler Algorithm to Guide Runtime Scheduling</strong><br/>
Research Report RR-9315, INRIA Grenoble ; INRIA Bordeaux, December 2019<br/>
[<a href="https://hal.inria.fr/hal-02421327">WWW</a>]
[<a href="https://hal.inria.fr/hal-02421327/file/RR-9315.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
669
670
<a name="garciapinto:hal-01616632"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault,  and Vincent Danjean<br/>
<strong>A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
671
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, 30, April 2018<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
672
673
674
675
676
[<a href="https://hal.inria.fr/hal-01616632">WWW</a>]
[<a href="https://hal.inria.fr/hal-01616632/file/CCPE_article_submitted_2018_02_06.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1002/cpe.4472">10.1002/cpe.4472</a>]
</li>
<li>
677
<a name="pinto:hal-01842038"></a>Vinicius Garcia Pinto, Lucas Mello Schnorr, Arnaud Legrand, Samuel Thibault, Luka Stanisic,  and Vincent Danjean<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
678
<strong>Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Hìbridos</strong><br/>
THIBAULT Samuel's avatar
typo    
THIBAULT Samuel committed
679
In <em>WPerformance - 17o Workshop em Desempenho de Sistemas Computacionais e de Comunicação</em>, Natal, Brazil, July 2018<br/>
680
681
682
683
[<a href="https://hal.inria.fr/hal-01842038">WWW</a>]
[<a href="https://hal.inria.fr/hal-01842038/file/181587_1.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
684
685
<a name="kumar:tel-01538516"></a>Suraj Kumar<br/>
<strong>Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
686
PhD thesis, Université de Bordeaux, April 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
687
688
689
690
[<a href="https://tel.archives-ouvertes.fr/tel-01538516">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01538516/file/KUMAR_SURAL_2017.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
691
692
693
694
695
696
697
698
<a name="beaumont:hal-01386174"></a>O. Beaumont, L. Eyraud-Dubois,  and S. Kumar<br/>
<strong>Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs</strong><br/>
In <em>2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</em>, pages 768-777, May 2017<br/>
[<a href="https://hal.inria.fr/hal-01386174">WWW</a>]
[<a href="https://hal.inria.fr/hal-01386174/file/heteroPrioApproxProofsRR.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/IPDPS.2017.71">10.1109/IPDPS.2017.71</a>]
</li>
<li>
699
700
701
702
703
704
705
706
<a name="agullo:hal-01223573"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois,  and Suraj Kumar<br/>
<strong>Are Static Schedules so Bad ? A Case Study on Cholesky Factorization</strong><br/>
In <em>IPDPS'16</em>, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, Chicago, IL, United States, May 2016<br/>
IEEE<br/>
[<a href="https://hal.inria.fr/hal-01223573">WWW</a>]
[<a href="https://hal.inria.fr/hal-01223573/file/heteroprioCameraReady-ieeeCompatiable.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
707
708
709
710
<a name="beaumont:hal-01361992"></a>Olivier Beaumont, Terry Cojean, Lionel Eyraud-Dubois, Abdou Guermouche,  and Suraj Kumar<br/>
<strong>Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources</strong><br/>
In <em>International Conference on High Performance Computing, Data, and Analytics (HiPC)</em>, Hyderabad, India, December 2016<br/>
[<a href="https://hal.inria.fr/hal-01361992">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
711
[<a href="https://hal.inria.fr/hal-01361992v2/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
712
713
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
714
715
<a name="cojean:hal-01181135"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines</strong><br/>
716
In <em>HeteroPar'2016 workshop of Euro-Par</em>, Grenoble, France, August 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
717
718
[<a href="https://hal.inria.fr/hal-01181135">WWW</a>]
[<a href="https://hal.inria.fr/hal-01181135/file/papier%20%281%29.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
719
720
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
721
722
<a name="garciapinto:hal-01353962"></a>Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault,  and Vincent Danjean<br/>
<strong>Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
723
In <em>VPA - 3rd Workshop on Visual Performance Analysis</em>, Salt Lake City, United States, November 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
724
725
Note: Held in conjunction with SC16<br/>
[<a href="https://hal.inria.fr/hal-01353962">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
726
727
[<a href="https://hal.inria.fr/hal-01353962v2/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/VPA.2016.008">10.1109/VPA.2016.008</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
728
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
729
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
730
731
732
<a name="JaBlHU2016a"></a>Johan Janzén, David Black-Schaffer,  and Andra Hugo<br/>
<strong>Partitioning GPUs for Improved Scalability</strong><br/>
In <em>IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</em>, October 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
733
[<a href="http://ieeexplore.ieee.org/abstract/document/7789322/">WWW</a>]
734
735
736
[doi:<a href="http://dx.doi.org/10.1109/SBAC-PAD.2016.14">10.1109/SBAC-PAD.2016.14</a>]
</li>
<li>
737
738
739
740
741
742
743
<a name="cojean:hal-01409965"></a>Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst,  and Pierre-André Wacrenier<br/>
<strong>Resource aggregation for task-based Cholesky Factorization on top of modern architectures</strong><br/>
Note: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops, November 2016<br/>
[<a href="https://hal.inria.fr/hal-01409965">WWW</a>]
[<a href="https://hal.inria.fr/hal-01409965/file/submission.pdf">PDF</a>]
</li>
<li>
744
745
<a name="agullo:hal-01120507"></a>Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal,  and Samuel Thibault<br/>
<strong>Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
746
In <em>HCW'2015 - Heterogeneity in Computing Workshop of IPDPS</em>, Hyderabad, India, May 2015<br/>
747
[<a href="https://hal.inria.fr/hal-01120507">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
748
[<a href="https://hal.inria.fr/hal-01120507/document">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
749
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2015.35">10.1109/IPDPSW.2015.35</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
750
</li>
751
<li>
752
753
754
<a name="sergent:hal-00978364"></a>Marc Sergent and Simon Archipoff<br/>
<strong>Modulariser les ordonnanceurs de tâches : une approche structurelle</strong><br/>
In <em>Compas'2014</em>, Neuchâtel, Suisse, April 2014<br/>
755
756
[<a href="http://hal.inria.fr/hal-00978364">WWW</a>]
[<a href="http://hal.inria.fr/hal-00978364/PDF/ordonnanceurs_modulaires.pdf">PDF</a>]
757
</li>
758
759
760
761
762
763
764
765
<li>
<a name="AugCleThiNam10ICPADS"></a>Cédric Augonnet, Jérôme Clet-Ortega, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Data-Aware Task Scheduling on Multi-Accelerator based Platforms</strong><br/>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, Shanghai, China, December 2010<br/>
[<a href="http://hal.inria.fr/inria-00523937">WWW</a>]
[<a href="http://hal.inria.fr/inria-00523937/document">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2010.129">10.1109/ICPADS.2010.129</a>]
</li>
766
</ol>
767
<h4>On The C Extensions</h4> 
768
<a name="PublicationsOnTheCExtensions"></a>
769
<ol>
770
771
772
773
<li>
<a name="LC13Report"></a>Ludovic Courtès<br/>
<strong>C Language Extensions for Hybrid CPU/GPU Programming with StarPU</strong><br/>
Research Report RR-8278, INRIA, April 2013<br/>
774
775
[<a href="http://hal.inria.fr/hal-00807033">WWW</a>]
[<a href="http://hal.inria.fr/hal-00807033/PDF/RR-8278.pdf">PDF</a>]
776
777
</li>
</ol>
778
<h4>On OpenMP Support on top of StarPU</h4> 
779
<a name="PublicationsOnOpenMPSupportontopofStarPU"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
780
<ol>
781
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
782
783
784
785
786
787
788
789
<a name="agullo:hal-01517153"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
<strong>Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method</strong><br/>
<em>IEEE Transactions on Parallel and Distributed Systems</em>, April 2017<br/>
[<a href="https://hal.inria.fr/hal-01517153">WWW</a>]
[<a href="https://hal.inria.fr/hal-01517153/file/tpds_kstar_scalfmm_print.pdf">PDF</a>]
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2697857">10.1109/TPDS.2017.2697857</a>]
</li>
<li>
790
<a name="agullo:hal-01372022"></a>Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud,  and Samuel Pitoiset<br/>
791
<strong>Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method</strong><br/>
792
793
794
795
796
Research Report RR-8953, Inria, March 2016<br/>
[<a href="https://hal.inria.fr/hal-01372022">WWW</a>]
[<a href="https://hal.inria.fr/hal-01372022/file/RR-8953.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
797
<a name="virouleau:hal-01081974"></a>Philippe Virouleau, Pierrick Brunet, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage,  and Thierry Gautier<br/>
798
<strong>Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
799
In <em>IWOMP2014 - 10th International Workshop on OpenMP</em>, 10th International Workshop on OpenMP, IWOMP2014, Salvador, Brazil, France, pages 16 - 29, September 2014<br/>
800
Springer<br/>
801
[<a href="https://hal.inria.fr/hal-01081974">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
802
[<a href="https://hal.inria.fr/hal-01081974/document">PDF</a>]
803
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-11454-5_2">10.1007/978-3-319-11454-5_2</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
804
805
</li>
</ol>
806
<h4>On MPI Support</h4> 
807
<a name="PublicationsOnMPISupport"></a>
808
809
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
810
811
812
813
814
815
<a name="denis:hal-02872765"></a>Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher,  and Samuel Thibault<br/>
<strong>Using Dynamic Broadcasts to improve Task-Based Runtime Performances</strong><br/>
In <em>Euro-Par - 26th International European Conference on Parallel and Distributed Computing</em>, Euro-Par 2020, Warsaw, Poland, August 2020<br/>
Rzadca and Malawski, Springer<br/>
[<a href="https://hal.inria.fr/hal-02872765">WWW</a>]
[<a href="https://hal.inria.fr/hal-02872765/file/dynamic_broadcasts.pdf">PDF</a>]
THIBAULT Samuel's avatar
doi    
THIBAULT Samuel committed
816
[doi:<a href="http://dx.doi.org/10.1007/978-3-030-57675-2_28">10.1007/978-3-030-57675-2_28</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
817
818
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
819
820
821
822
823
824
825
<a name="lion:hal-02970529"></a>Romain Lion and Samuel Thibault<br/>
<strong>From tasks graphs to asynchronous distributed checkpointing with local restart</strong><br/>
In <em>2020 IEEE/ACM 10th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)</em>, Atlanta, United States, November 2020<br/>
[<a href="https://hal.archives-ouvertes.fr/hal-02970529">WWW</a>]
[<a href="https://hal.archives-ouvertes.fr/hal-02970529/file/2020001221.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
826
827
828
829
830
831
832
<a name="lion:hal-02296118"></a>Romain Lion<br/>
<strong>Tolérance aux pannes dans l'exécution distribuée de graphes de tâches</strong><br/>
In <em>Conférence d'informatique en Parallélisme, Architecture et Système</em>, Anglet, France, June 2019<br/>
[<a href="https://hal.inria.fr/hal-02296118">WWW</a>]
[<a href="https://hal.inria.fr/hal-02296118/file/Compas_Romain_LION_submitted_final.pdf">PDF</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
833
<a name="agullo:hal-01618526"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
834
<strong>Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
835
<em>TPDS - IEEE Transactions on Parallel and Distributed Systems</em>, December 2017<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
836
837
[<a href="https://hal.inria.fr/hal-01618526">WWW</a>]
[<a href="https://hal.inria.fr/hal-01618526/file/tpds14.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
838
[doi:<a href="http://dx.doi.org/10.1109/TPDS.2017.2766064">10.1109/TPDS.2017.2766064</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
839
840
841
842
</li>
<li>
<a name="sergent:tel-01483666"></a>Marc Sergent<br/>
<strong>Scalability of a task-based runtime system for dense linear algebra applications</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
843
PhD thesis, Université de Bordeaux, December 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
844
845
[<a href="https://tel.archives-ouvertes.fr/tel-01483666">WWW</a>]
[<a href="https://tel.archives-ouvertes.fr/tel-01483666/file/SERGENT_MARC_2016.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
846
847
</li>
<li>
848
849
850
851
852
853
854
<a name="agullo:hal-01283949"></a>Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent,  and Samuel Thibault<br/>
<strong>Harnessing clusters of hybrid nodes with a sequential task-based programming model</strong><br/>
In <em>8th International Workshop on Parallel Matrix Algorithms and Applications</em>, July 2014<br/>
[<a href="https://hal.inria.fr/hal-01283949">WWW</a>]
[<a href="https://hal.inria.fr/hal-01283949/file/pmaa14.pdf">PDF</a>]
</li>
<li>
855
856
<a name="augonnet:hal-00992208"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Samuel Thibault,  and Raymond Namyst<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
857
Research Report RR-8538, INRIA, May 2014<br/>
858
859
860
861
[<a href="http://hal.inria.fr/hal-00992208">WWW</a>]
[<a href="http://hal.inria.fr/hal-00992208/PDF/RR-8538.pdf">PDF</a>]
</li>
<li>
862
863
864
865
866
<a name="AugAumFurNamThi2012EuroMPI"></a>Cédric Augonnet, Olivier Aumage, Nathalie Furmento, Raymond Namyst,  and Samuel Thibault<br/>
<strong>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators</strong><br/>
In Siegfried Benkner Jesper Larsson Träff and Jack Dongarra, editors, <em>EuroMPI 2012</em>, volume 7490 of <em>LNCS</em>, September 2012<br/>
Springer<br/>
Note: Poster Session<br/>
867
[<a href="http://hal.inria.fr/hal-00725477">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
868
[<a href="http://hal.inria.fr/hal-00725477/document">PDF</a>]
869
</li>
870
</ol>
Nathalie Furmento's avatar
Nathalie Furmento committed
871
<h4>On Memory Control</h4> 
872
<a name="PublicationsOnMemoryControl"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
873
874
<ol>
<li>
875
876
877
878
879
880
881
<a name="chevalier:hal-01718280"></a>Arthur Chevalier<br/>
<strong>Critical resources management and scheduling under StarPU</strong><br/>
Master's thesis, Université de Bordeaux, September 2017<br/>
[<a href="https://hal.inria.fr/hal-01718280">WWW</a>]
[<a href="https://hal.inria.fr/hal-01718280/file/Memoire.pdf">PDF</a>]
</li>
<li>
Nathalie Furmento's avatar
Nathalie Furmento committed
882
<a name="sergent:hal-01284004"></a>Marc Sergent, David Goudin, Samuel Thibault,  and Olivier Aumage<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
883
<strong>Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
884
In <em>HIPS - 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments</em>, Chicago, United States, May 2016<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
885
886
[<a href="https://hal.inria.fr/hal-01284004">WWW</a>]
[<a href="https://hal.inria.fr/hal-01284004/file/PID4127657.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
887
[doi:<a href="http://dx.doi.org/10.1109/IPDPSW.2016.105">10.1109/IPDPSW.2016.105</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
888
889
</li>
</ol>
890
<h4>On Performance Model Tuning</h4> 
891
<a name="PublicationsOnPerformanceModelTuning"></a>
892
893
<ol>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
894
895
896
897
898
899
900
<a name="agullo:hal-01474556"></a>Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic,  and Samuel Thibault<br/>
<strong>Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method</strong><br/>
Research Report RR-9036, INRIA Bordeaux, February 2017<br/>
[<a href="https://hal.inria.fr/hal-01474556">WWW</a>]
[<a href="https://hal.inria.fr/hal-01474556/file/rapport.pdf">PDF</a>]
</li>
<li>
901
902
<a name="AugThiNam09HPPC"></a>Cédric Augonnet, Samuel Thibault,  and Raymond Namyst<br/>
<strong>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
903
In <em>HPPC - Proceedings of the International Euro-Par Workshops, Highly Parallel Processing on a Chip</em>, volume 6043 of <em>Lecture Notes in Computer Science</em>, Delft, The Netherlands, pages 56-65, August 2009<br/>
904
Springer<br/>
905
[<a href="http://hal.inria.fr/inria-00421333">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
906
[<a href="http://hal.inria.fr/inria-00421333/document">PDF</a>]
907
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-14122-5_9">10.1007/978-3-642-14122-5_9</a>]
908
909
</li>
</ol>
910
<h4>On The Simulation Support through SimGrid</h4> 
911
<a name="PublicationsOnTheSimulationSupportthroughSimGrid"></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
912
<ol>
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
913
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
914
915
<a name="daoudi:hal-02933803"></a>Idriss Daoudi, Philippe Virouleau, Thierry Gautier, Samuel Thibault,  and Olivier Aumage<br/>
<strong>sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
916
917
In <em>IWOMP 2020 - 16th International Workshop on OpenMP</em>, volume 12295 of <em>LNCS</em>, Austin / Virtual, United States, September 2020<br/>
Springer<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
918
[<a href="https://hal.inria.fr/hal-02933803">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
919
[<a href="https://hal.inria.fr/hal-02933803/file/p05_daoudi.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
920
921
922
[doi:<a href="http://dx.doi.org/10.1007/978-3-030-58144-2_13">10.1007/978-3-030-58144-2_13</a>]
</li>
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
923
924
925
926
927
928
929
<a name="thibault:hal-02943753"></a>Samuel Thibault, Luka Stanisic,  and Arnaud Legrand<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-based Runtime System, an Opportunity for Task Graph Scheduling</strong><br/>
In <em>SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2020)</em>, Seattle, United States, February 2020<br/>
[<a href="https://hal.inria.fr/hal-02943753">WWW</a>]
[<a href="https://hal.inria.fr/hal-02943753/file/20-02-15-siampp-seattle.pdf">PDF</a>]
</li>
<li>
930
931
<a name="stanisic:hal-01147997"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
932
<em>CCPE - Concurrency and Computation: Practice and Experience</em>, pp 16, May 2015<br/>
933
934
[<a href="https://hal.inria.fr/hal-01147997">WWW</a>]
[<a href="https://hal.inria.fr/hal-01147997/file/CCPE14_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
935
[doi:<a href="http://dx.doi.org/10.1002/cpe.3555">10.1002/cpe.3555</a>]
THIBAULT Samuel's avatar
update    
THIBAULT Samuel committed
936
</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
937
<li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
938
939
940
941
942
<a name="stanisic:hal-01180272"></a>Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez,  and Brice Videau<br/>
<strong>Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers</strong><br/>
In <em>The 21st IEEE International Conference on Parallel and Distributed Systems</em>, Melbourne, Australia, December 2015<br/>
[<a href="https://hal.inria.fr/hal-01180272">WWW</a>]
[<a href="https://hal.inria.fr/hal-01180272/file/QRMSTARSG_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
Add doi    
THIBAULT Samuel committed
943
[doi:<a href="http://dx.doi.org/10.1109/ICPADS.2015.67">10.1109/ICPADS.2015.67</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
944
945
</li>
<li>
946
947
<a name="stanisic:hal-01011633"></a>Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau,  and Jean-François Méhaut<br/>
<strong>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</strong><br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
948
In <em>Euro-Par - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014<br/>
949
Springer-Verlag<br/>
950
951
[<a href="http://hal.inria.fr/hal-01011633">WWW</a>]
[<a href="http://hal.inria.fr/hal-01011633/PDF/StarPUSG_article.pdf">PDF</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
952
[doi:<a href="http://dx.doi.org/10.1007/978-3-319-09873-9_5">10.1007/978-3-319-09873-9_5</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
953
954
</li>
</ol>
955
<h4>On The Cell Support</h4> 
956
<a name="PublicationsOnTheCellSupport"></a>
957
958
<ol>
<li>
959
960
961
<a name="AugThiNamNij09Samos"></a>Cédric Augonnet, Samuel Thibault, Raymond Namyst,  and Maik Nijhuis<br/>
<strong>Exploiting the Cell/BE architecture with the StarPU unified runtime system</strong><br/>
In <em>SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation</em>, volume 5657 of <em>Lecture Notes in Computer Science</em>, Samos, Greece, July 2009<br/>
962
[<a href="http://hal.inria.fr/inria-00378705">WWW</a>]
THIBAULT Samuel's avatar
THIBAULT Samuel committed
963
[<a href="http://hal.inria.fr/inria-00378705/document">PDF</a>]
964
[doi:<a href="http://dx.doi.org/10.1007/978-3-642-03138-0_36">10.1007/978-3-642-03138-0_36</a>]
965
966
</li>
</ol>
Nathalie Furmento's avatar