index.html 22.8 KB
Newer Older
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
2
3
4
5
6
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
7
<link rel="Shortcut icon" href="http://www.inria.fr/extension/site_inria/design/site_inria/images/favicon.ico" type="image/x-icon" />
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
8
9
10
11
</HEAD>

<body>

12
<div class="title">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
13
<h1><a href="./">StarPU</a></h1>
14
15
<h2>A Unified Runtime System for Heterogeneous Multicore Architectures</h2>
</div>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
16

17
<div class="menu">
18
<a href="http://runtime.bordeaux.inria.fr/">RUNTIME TEAM</a> |
19
&nbsp; &nbsp; &nbsp;
20
|
21
<a href="#overview">Overview</a> |
Nathalie Furmento's avatar
Nathalie Furmento committed
22
<a href="#news">News</a> |
23
<a href="#contact">Contact</a> |
24
<a href="#features">Features</a> |
25
26
<a href="#software">Software</a> |
<a href="#publications">Publications</a> |
27
<a href="internships/">Jobs/Interns</a> |
28
29
30
<a href="files/">Download</a> |
<a href="tutorials">Tutorials</a> |
<a href="https://wiki.bordeaux.inria.fr/runtime/doku.php?id=starpu">Intranet</a>
Nathalie Furmento's avatar
Nathalie Furmento committed
31
</div>
32

33
34
<div class="section" id="overview">
<h3>Overview</h3>
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
  <p>
<span class="important">StarPU is a task programming library for hybrid architectures</span>
<ol>
<li><b>The application provides algorithms and constraints</b>
    <ul>
    <li>CPU/GPU implementations of tasks</li>
    <li>A graph of tasks, using either the StarPU's high level <b>GCC plugin</b> pragmas or StarPU's rich <b>C API</b></li>
    </ul>
<br>
</li>
<li><b>StarPU handles run-time concerns</b>
    <ul>
    <li>Task dependencies</li>
    <li>Optimized heterogeneous scheduling</li>
    <li>Optimized data transfers and replication between main memory and discrete memories</li>
    <li>Optimized cluster communications</li>
    </ul>
</li>
</ol>
</p>
<p>
<span class="important">Rather than handling low-level issues, <b>programmers can concentrate on algorithmic concerns!</b></span>
</p>

<p>
60
<span class="note">The StarPU documentation is available in <a href="./doc/starpu.pdf">PDF</a> and in <a href="./doc/index.html">HTML</a>.</span> Please note that these documents are up-to-date with the latest release of StarPU.
61
62
63
64
</p>
</div>

<div class="section emphasize newslist" id="news">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
65
66
<h3>News</h3>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
67
68
69
70
71
72
73
74
May 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      second release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
75
April 2015 <b>&raquo;&nbsp;</b>A <a href="https://events.prace-ri.eu/event/339/">tutorial</a> on runtime systems including
76
77
78
StarPU will be given at INRIA Bordeaux in June 2015.
</p>
<p>
79
80
81
82
83
84
85
86
March 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      first release candidate of the v1.2.0 release of StarPU is now
      available!</b></a>.
      This release notably brings an out-of-core support, a MIC Xeon
      Phi support, an OpenMP runtime support, and a new internal
      communication system for MPI.
</p>
<p>
87
88
89
90
91
92
March 2015 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.4 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
<p>
93
94
95
96
97
98
September 2014 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.3 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
<p>
99
100
101
102
103
104
June 2014 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
      v1.1.2 release of StarPU is now available!</b></a>. This release notably brings the concept of
      scheduling contexts which allows to separate computation
      resources.
</p>
<p>
AUMAGE Olivier's avatar
AUMAGE Olivier committed
105
May 2014 <b>&raquo;&nbsp;</b>Open <a href="https://www.inria.fr/en/institute/recruitment/offers/young-graduate-engineers-research-and-development/%28view%29/details.html?id=PNGFK026203F3VBQB6G68LOE1&LOV5=4510&ContractType=4545&LG=EN&Resultsperpage=20&nPostingID=8751&nPostingTargetID=14612&option=52&sort=DESC&nDepartmentID=10"><b>Engineer Position</b></a>.
106
</p>
107
</div>
108
109

<div class="section emphasizebot" style="text-align: right; font-style: italic;">
110
Get the latest StarPU news by subscribing to the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-announce">starpu-announce mailing list</a>.
111
See also the full <a href="news/">news</a>.
112
113
114
115
</div>

<div class="section" id="contact">
<h3>Contact</h3>
116
<p>For any questions regarding StarPU, please contact the StarPU developers mailing list.</p>
117
118
119
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
120
121
</div>

122
123
<div class="section" id="features">
<h3>Features</h3>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
124

125
<h4>Portability</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
126
  <p>
127
128
129
130
131
132
133
134
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named <em>codelet</em>. Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet can run on heterogeneous architectures, <b>it
is possible to specify one function for each architectures</b> (e.g. one function
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
THIBAULT Samuel's avatar
THIBAULT Samuel committed
135
136
One can even specify <b>several functions for each architecture</b> as well as
<b>parallel imeplementations</b> (e.g. in OpenMP), and StarPU will
THIBAULT Samuel's avatar
THIBAULT Samuel committed
137
automatically determine which version is best for each input size.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
138
139
  </p>

140
<h4>Data transfers</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
141
  <p>
142
To relieve programmers from the burden of explicit data transfers, a high-level
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
143
data management library enforces memory coherency over the machine: before a
144
145
codelet starts (e.g. on an accelerator), all its <b>data are automatically made
available on the compute resource</b>. Data are also kept on e.g. GPUs as long as
THIBAULT Samuel's avatar
THIBAULT Samuel committed
146
147
they are needed for further tasks. When a device runs out of memory, StarPU uses
an LRU strategy to <b>evict unused data</b>. StarPU also takes care of <b>automatically
148
149
prefetching</b> data, which thus permits to <b>overlap data transfers with computations</b>
(including GPU-GPU direct transfers) to achieve the most of the architecture.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
150
151
  </p>

152
<h4>Dependencies</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
153
  <p>
154
155
156
Dependencies between tasks can be given several ways, to provide the
programmer with best flexibility:
  <ul>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
157
158
    <li><b>explicitly</b> between pairs of tasks,</li>
    <li>explicitly through <b>tags</b> which act as rendez-vous points between
159
    tasks (thus including tasks which have not been created yet),</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
160
    <li><b>implicitly</b> from RAW, WAW, and WAR data dependencies.</li>
161
  </ul>
162
163
  </p>
  <p>
164
StarPU also supports an OpenMP-like <a href="doc/html/AdvancedExamples.html#DataReduction">reduction</a> access mode.
165
166
167
168
169
  </p>

<h4>Heterogeneous Scheduling</h4>
  <p>
StarPU obtains
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
170
portable performances by efficiently (and easily) using all computing resources
171
at the same time. StarPU also takes advantage of the <b>heterogeneous</b> nature of a
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
172
machine, for instance by using scheduling strategies based on auto-tuned
173
174
175
performance models. These determine the relative performance achieved
by the different processing units for the various kinds of task, and thus
permits to <b>automatically let processing units execute the tasks they are the best for</b>.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
176
177
  </p>

178
179
<h4>Clusters</h4>
  <p>
180
To deal with clusters, StarPU can nicely integrate with <a href="doc/html/MPISupport.html">MPI</a> through
181
182
183
184
185
186
187
explicit network communications, which will then be <b>automatically combined and
overlapped</b> with the intra-node data transfers and computation. The application
can also just provide the whole task graph, a data distribution over MPI nodes, and StarPU
will automatically determine which MPI node should execute which task, and
<b>generate all required MPI communications</b> accordingly.
  </p>

188
189
190
<h4>Extensions to the C Language</h4>
<p>
  StarPU comes with a GCC plug-in
191
  that <a href="doc/html/cExtensions.html">extends the C programming
192
193
194
195
196
  language</a> with pragmas and attributes that make it easy
  to <b>annotate a sequential C program to turn it into a parallel
  StarPU program</b>.
</p>

197
198
199
200
201
202
203
204
205
206
<h4>Simulation support</h4>
<p>
  StarPU can very accurately simulate an application execution
  and measure the resulting performance thanks to using the
  <a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a>.  This allows
  to quickly experiment with various scheduling heuristics, various application
  algorithms, and even various platforms (available GPUs and CPUs, available
  bandwidth)!
</p>

207
208
<h4>All in all</h4>
  <p>
209
All that means that, with the help
210
of <a href="doc/html/cExtensions.html">StarPU's extensions to the C
211
212
language</a>, the following sequential source code of a tiled version of
the classical Cholesky factorization algorithm using BLAS is also valid
THIBAULT Samuel's avatar
THIBAULT Samuel committed
213
StarPU code, possibly running on all the CPUs and GPUs, and given a data
Nathalie Furmento's avatar
Nathalie Furmento committed
214
distribution over MPI nodes, it is even a distributed version!
215
216
217
218
219
220
221
222
223
224
225
226
227
  </p>

  <tt><pre>
for (k = 0; k < tiles; k++) {
  potrf(A[k,k])
  for (m = k+1; m < tiles; m++)
    trsm(A[k,k], A[m,k])
  for (m = k+1; m < tiles; m++)
    syrk(A[m,k], A[m, m])
  for (m = k+1, m < tiles; m++)
    for (n = k+1, n < m; n++)
      gemm(A[m,k], A[n,k], A[m,n])
}</pre></tt>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
228

229
<h4>Supported Architectures</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
230
231
232
233
234
235
<ul>
<li>SMP/Multicore Processors (x86, PPC, ...) </li>
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU)</li>
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
</ul>
236
237
238
and soon
<ul>
<li>Intel SCC</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
239
<li>Intel MIC / Xeon Phi</li>
240
</ul>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
241

242
<h4>Supported Operating Systems</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
243
<ul>
Ludovic Courtès's avatar
Ludovic Courtès committed
244
245
<li>GNU/Linux</li>
<li>Mac OS X</li>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
246
247
248
<li>Windows</li>
</ul>

249
<h4>Performance analysis tools</h4>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
250
251
252
253
254
255
256
257
258
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

259
<p>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
260
261
262
263
264
265
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
266
<center><a href="./images/greedy-lu-16k-fx5800.png"> <img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)" width="75%"></a></center>
267
268
</p>

Nathalie Furmento's avatar
website    
Nathalie Furmento committed
269
270
271
272
273
274
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
275
GFlop/s</b>
276
<center><a href="./images/dmda-lu-16k-fx5800.png"><img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="75%"></a></center>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
277
278
</p>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
279
280
281
282
283
284
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
graph, as shown below.
</p>

<center>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
285
<a href="images/temanejo.png"><img src="images/temanejo.png" width="50%"/></a>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
286
287
</center>

Nathalie Furmento's avatar
website    
Nathalie Furmento committed
288
289
</div>

290
291
292
293
294
295
296
297
298
299
<div class="section" id="software">
<h3>Software using StarPU</h3>

<p>
Some software is known for being able to use StarPU to tackle heterogeneous
architectures, here is a non-exhaustive list:
</p>

<ul>
	<li><a href="http://icl.cs.utk.edu/magma/">MAGMA</a>, dense linear algebra library, starting from version 1.1</li>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
300
	<li><a href="https://project.inria.fr/chameleon/">Chameleon</a>, dense linear algebra library</li>
301
302
303
304
	<li><a href="http://www.ida.liu.se/~chrke/skepu/">SkePU</a>, a skeleton programming framework.</li>
	<li><a href="http://pastix.gforge.inria.fr/">PaStiX</a>, sparse linear algebra library, starting from version 5.2.1</li>
</ul>

305
306
307
308
309
<p>
You can find below the list of publications related to applications
using StarPU.
</p>

310
311
</div>

312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
<div class="section" id="publications">
<h3>Publications</h3>
<p>
All StarPU related publications are also
listed <a href="http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html">here</a>
with the corresponding Bibtex entries.
</p>

<p>A good overview is available in
the following <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.
</p>

<h4>General presentations</h4>
<ol>
<li>
327
328
329
330
331
C. Augonnet.
<br/>
<b>Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective</b>.
PhD thesis, Université Bordeaux 1, December 2011.
<br/>
332
Available <a href="http://tel.archives-ouvertes.fr/tel-00777154">here</a>.
333
334
</li>
<li>
335
336
337
338
339
340
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier.
<br/>
<b>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.</b>
<em>Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009</em>, 23:187-198, February 2011.
<br/>
Available <a href="http://hal.inria.fr/inria-00550877">here</a>.
341
</li>
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
<li>
C. Augonnet.
<br/>
<b>StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes</b>.
In <em>19èmes Rencontres Francophones du Parallélisme</em>, September 2009. Note: Best Paper Award.
<br/>
Available <a href="http://hal.inria.fr/inria-00411581">here</a>. (French version)
</li>
<li>
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier.
<br/>
<b>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.</b>
In <em>Proceedings of the 15th International Euro-Par Conference</em>, volume 5704 of LNCS, August 2009.
<br/>
Available <a href="http://hal.inria.fr/inria-00384363">here</a>. (short version)
357
</li>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
358

359
360
361
362
363
364
365
366
367
368
<li>
C. Augonnet and R. Namyst.
<br/>
<b>A unified runtime system for heterogeneous multicore architectures.</b>
In <em>Proceedings of the International Euro-Par Workshops 2008, HPPC'08</em>, volume 5415 of LNCS, August 2008.
<br/>
Available <a href="http://hal.inria.fr/inria-00326917">here</a>. (early version)
</li>
</ol>

Nathalie Furmento's avatar
Nathalie Furmento committed
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
<h4>On Composability</h4>
<ol>
<li>
A. Hugo, A. Guermouche, R. Namyst, and P.-A. Wacrenier.
<br/>
<b>Composing multiple StarPU applications over heterogeneous machines:
  a supervised approach.</b> In <em>Third International Workshop on
  Accelerators and Hybrid Exascale Systems</em>, Boston, USA, May
2013.
<br/>
Available <a href="http://hal.inria.fr/hal-00824514">here</a>.
</li>

<li>
A. Hugo.
<br/>
<b>Le problème de la composition parallèle : une approche
  supervisée.</b> In <em>21èmes Rencontres Francophones du
  Parallélisme (RenPar'21)</em>, Grenoble, France, January 2013.
<br/>
Available <a href="http://hal.inria.fr/hal-00773610">here</a>.
</li>
</ol>

393
394
395
396
397
398
399
400
401
402
403
404
405
<h4>On Scheduling</h4>
<ol>
<li>
M. Sergent and S. Archipoff.
<br/>
<b>Modulariser les ordonnanceurs de tâches : une approche structurelle.</b> In
<em>Conférence d’informatique en Parallélisme, Architecture et Système
	(Compas'2014)</em>, Neuchâtel, Switzerland, April 2014.  
<br/>
Available <a href="http://hal.inria.fr/hal-00978364">here</a>.
</li>
</ol>

406
407
408
409
410
411
412
413
414
415
416
417
<h4>On the C Extensions</h4>
<ol>
<LI>
L. Courtès.
<br/>
<b>C Language Extensions for Hybrid CPU/GPU Programming with
  StarPU.</b>
<br/>
Available <a href="http://hal.inria.fr/hal-00807033/en">here</a>.
</li>
</ol>

418
419
420
421
422
<h4>On MPI support</h4>
<ol>
<li>
C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault.
<br/>
Nathalie Furmento's avatar
Nathalie Furmento committed
423
<b>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators.</b>
424
425
426
427
428
429
430
INRIA Research Report RR-8538, May 2014.
<br/>
Available <a href="http://hal.inria.fr/hal-00992208">here</a>.
</li>
<li>
C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault.
<br/>
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
<b>StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators.</b>
In <em>EuroMPI 2012</em>, volume 7490 of LNCS, September 2012. Note: Poster Session.
<br/>
Available <a href="http://hal.inria.fr/hal-00725477">here</a>.
</li>
</ol>

<h4>On data transfer management</h4>
<ol>
<li>
C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst
<br/>
<b>Data-Aware Task Scheduling on Multi-Accelerator based Platforms.</b>
In <em>The 16th International Conference on Parallel and Distributed Systems (ICPADS)</em>, December 2010.
<br/>
Available <a href="http://hal.inria.fr/inria-00523937">here</a>.
</li>
</ol>

<h4>On performance model tuning</h4>
<ol>
<li>
C. Augonnet, S. Thibault, and R. Namyst.
<br/>
<b>Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures.</b>
In <em>Proceedings of the International Euro-Par Workshops 2009, HPPC'09</em>, volume 6043 of LNCS, August 2009.
<br/>
Available <a href="http://hal.inria.fr/inria-00421333">here</a>.
</li>
</ol>

THIBAULT Samuel's avatar
THIBAULT Samuel committed
462
463
464
465
466
467
468
469
470
471
<h4>On the simulation support through SimGrid</h4>
<ol>
<li>
L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J.-F. Méhaut.<br/>
<b>Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures</b>
In <em>Euro-par 2014 - 20th International Conference on Parallel Processing</em>, Porto, Portugal, August 2014.<br/>
Available <a href="http://hal.inria.fr/hal-01011633">here</a>.
</li>
</ol>

472
473
474
475
476
477
478
479
480
481
482
483
484
485
<h4>On the Cell support</h4>
<ol>
<li>
C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis.
<br/>
<b>Exploiting the Cell/BE architecture with the StarPU unified runtime system.</b>
In <em>SAMOS Workshop - International Workshop on Systems, Architectures, Modeling, and Simulation</em>, volume 5657 of LNCS, July 2009.
<br/>
Available <a href="http://hal.inria.fr/inria-00378705">here</a>.
</li>
</ol>

<h4>On Applications</h4>
<ol>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
486
487
488
489
490
491
492
493
494

<li>
S. Henry, A. Denis, D. Barthou, M.-C. Counilh, R. Namyst<br/>
<b>Toward OpenCL Automatic Multi-Device Support</b>
<em>Euro-Par 2014</em>, Porto, Portugal, August 2014.<br/>
Available <a href="http://hal.inria.fr/hal-01005765">here</a>.
</li>

<li>
THIBAULT Samuel's avatar
typo    
THIBAULT Samuel committed
495
X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca<br/>
THIBAULT Samuel's avatar
THIBAULT Samuel committed
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
<b>Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes</b>
<em>HCW'2014 workshop of IPDPS</em>, May 2014.<br/>
Available <a href="http://hal.inria.fr/hal-00987094">here</a>.
</li>

<li>
T. Odajima, T. Boku, M. Sato, T. Hanawa, Y. Kodama, R. Namyst, S. Thibault, and O. Aumage<br/>
<b>Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing</b>
In <em>The 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013)</em>, Vietri sul Mare, Italy.
December 2013.<br/>
Available <a href="http://hal.inria.fr/hal-00920915">here</a>.
</li>

<li>
S. Henry<br/>
<b>Modèles de programmation et supports exécutifs pour architectures hétérogènes</b>.
PhD thesis, Université Bordeaux 1, Novembre 2013.<br/>
Available <a href="http://tel.archives-ouvertes.fr/tel-00948309">here</a>.
</li>

<li>
S. Ohshima, S. Katagiri, K. Nakajima, S. Thibault, and R. Namyst<br/>
<b>Implementation of FEM Application on GPU with StarPU</b>
In <em>SIAM CSE13 - SIAM Conference on Computational Science and Engineering 2013</em>, Boston, USA
February 2013.<br/>
Available <a href="http://hal.inria.fr/hal-00926144">here</a>.
</li>

Nathalie Furmento's avatar
Nathalie Furmento committed
524
525
526
<li>
C. Rossignon.<br/>
<b>Optimisation du produit matrice-vecteur creux sur architecture GPU
527
  pour un simulateur de réservoir.</b> In <em>21èmes Rencontres
Nathalie Furmento's avatar
Nathalie Furmento committed
528
529
530
531
532
  Francophones du Parallélisme (RenPar'21)</em>, Grenoble, France,
January 2013.<br/>
Available <a href="http://hal.inria.fr/hal-00773571">here</a>.
</li>

533
534
535
536
537
538
539
540
<li>
S. Henry, A. Denis, and D. Barthou.<br/>
<b>Programmation unifiée multi-accélérateur OpenCL</b>.
<em>Techniques et Sciences Informatiques</em>, (8-9-10):1233-1249, 2012.
<br/>
Available <a href="http://hal.inria.fr/hal-00772742">here</a>
</li>

541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
<li>
S.A. Mahmoudi, P. Manneback, C. Augonnet, and S. Thibault.<br/>
<b>Traitements d'Images sur Architectures Parallèles et Hétérogènes.</b>
<em>Technique et Science Informatiques</em>, 2012.
<br/>
Available <a href="http://hal.inria.fr/hal-00714858/">here</a>.
</li>

<li>
S. Benkner, S. Pllana, J.L. Träff, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney, and V. Osipov.
<br/>
<b>PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems.</b> <em>IEEE Micro</em>, 31(5):28-41, September 2011.
<br/>
Available <a href="http://hal.inria.fr/hal-00648480">here</a>.
</li>

<li>
U. Dastgeer, C. Kessler, and S. Thibault.<br/>
<b>Flexible runtime support for efficient skeleton programming on hybrid systems.</b>
In <em>Proceedings of the International Conference on Parallel Computing (ParCo), Applications, Tools and Techniques on the Road to Exascale Computing</em>, volume 22 of Advances of Parallel Computing, August 2011.
<br/>
Available <a href="http://hal.inria.fr/inria-00606200/">here</a>.
</li>

<li>
S. Henry.
<br/>
<b>Programmation multi-accélérateurs unifiée en OpenCL.</b>
In <em>20èmes Rencontres Francophones du Parallélisme (RenPar'20)</em>, May 2011.
<br/>
Available <a href="http://hal.archives-ouvertes.fr/hal-00643257">here</a>.
</li>

<li>
S.A. Mahmoudi, P. Manneback, C. Augonnet, and S. Thibault.
<br/>
<b>Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multicoeurs hétérogènes.</b>
In <em>20èmes Rencontres Francophones du Parallélisme</em>, May 2011.
<br/>
Available <a href="http://hal.inria.fr/inria-00606195">here</a>.
</li>

<li>
E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, and S. Tomov.
<br/>
<b>A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs.</b>
In <em>GPU Computing Gems, volume 2.</em>, September 2010.
<br/>
Available <a href="http://hal.inria.fr/inria-00547847">here</a>.
<li>
E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief, S. Thibault, and S. Tomov.
<br/>
<b>QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators</b>.
In <em>25th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2011)</em>, May 2011.
<br/>
Available <a href="http://hal.inria.fr/inria-00547614">here</a>.
</li>
<li>
E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, J. Roman, S. Thibault, and S. Tomov.
<br/>
<b>Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators.</b>
In <em>Symposium on Application Accelerators in High Performance Computing (SAAHPC)</em>, July 2010.
<br/>
Available <a href="http://hal.inria.fr/inria-00547616">here</a>.
</li>
<li>
E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, and S. Tomov.
<br/>
<b>LU factorization for accelerator-based systems.</b>
In <em>9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11)</em>, June 2011.
<br/>
Available <a href="http://hal.inria.fr/hal-00654193">here</a>
</li>
</ol>

Nathalie Furmento's avatar
website    
Nathalie Furmento committed
616
617
</div>

618
<div class="section bot">
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
619
<p class="updated">
620
  Last updated on 2012/10/03.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
621
</p>
622
</div>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
623
624
625

</body>
</html>