index.html 10.3 KB
Newer Older
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<TITLE>StarPU</TITLE>
<link rel="stylesheet" type="text/css" href="style.css" />
</HEAD>

<body>

<h1><a href="./">StarPU</a></h1>
<h1 class="sub">A Unified Runtime System for Heterogeneous Multicore Architectures</h1>

Nathalie Furmento's avatar
Nathalie Furmento committed
15
<hr class="main"/>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
16
17
18
19
20
21
22
23
<a href="/Runtime/">RUNTIME homepage</a> |
<a href="/Publis/">Publications</a> |
<a href="/Runtime/software.html">Software</a> |
<a href="/Runtime/index.html.en#contacts">Contacts</a> |
<a href="/Intranet/">Intranet</a>

<hr class="main"/>

Nathalie Furmento's avatar
Nathalie Furmento committed
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<div class="section" id="internships">
<h3>2011 M2 Internships</h3>
<p>
Extending StarPU:
<ul>
<li>Team Runtime: <a href="http://dept-info.labri.fr/~magoni/m2r/support-executif-satanas-aumage.pdf">Support exécutif scalable pour les architectures hybrides distribuées</a></li>
<li>Team Runtime: <a href="http://dept-info.labri.fr/~magoni/m2r/taches-divisibles-starpu-runtime-namyst.pdf">Programmation des architectures hétérogènes à l’aide de tâches divisibles</a></li>
</ul>
</p>
<p>
Using StarPU:
<ul>
<li>Team Runtime: <a href="http://dept-info.labri.fr/~magoni/m2r/generation-automatique-satanas-barthou.pdf">Génération automatique de tâches parallèles sur architecture hybride: application à la Chromo-dynamique quantique</a>
<li>Team HiePacs: <a href="http://dept-info.labri.fr/~magoni/m2r/solveurs-directs-gpu-HiePACS.pdf">Solveur direct pour architectures multicoeurs accélérées par des GPUs</a></li>
</ul>
</p>
</div>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
41
42
43
<div class="section" id="news">
<h3>News</h3>
<p>
44
March
45
2012 <b>&raquo;&nbsp;</b><a href="http://gforge.inria.fr/frs/?group_id=1570"><b>The
46
      v1.0.0 release of StarPU is now available!</b></a>. This release provides notably a gcc plugin to
Nathalie Furmento's avatar
Nathalie Furmento committed
47
48
      extend the C interface with pragmas which allows to easily
      define codelets and issue tasks, and a new multi-format
Nathalie Furmento's avatar
typo    
Nathalie Furmento committed
49
      interface which permits to use different binary formats on CPUs &
Nathalie Furmento's avatar
Nathalie Furmento committed
50
      GPUs.
51
<br/>
52
53
54
See the announcements
on <a href="http://lwn.net/Articles/489337/">lwn</a>, <a href="http://article.gmane.org/gmane.comp.gcc.devel/125551">
  the gcc mailing list</a> and <a href="http://www.phoronix.com/scan.php?page=news_item&px=MTA4MDI">phoronix</a>.
Nathalie Furmento's avatar
Nathalie Furmento committed
55
56
</p>
<p>
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
May 2011 <b>&raquo;&nbsp;</b> <a href="http://gforge.inria.fr/frs/?group_id=1570"><b>StarPU 0.9.1 is now available !</b></a>
This release provides a reduction mode, an external API for schedulers, theoretical bounds, power-based optimization, parallel
tasks, an MPI DSM, profiling interfaces, an initial support for CUDA4 (GPU-GPU
transfers), improved documentation and of course various fixes.
</p>
<p>
September 2010 <b>&raquo;&nbsp;</b> Discover how we ported the MAGMA and the PLASMA libraries on top of StarPU in collaboration with ICL/UTK in <a href="http://www.netlib.org/lapack/lawnspdf/lawn230.pdf"><b>this Lapack Working Note</b></a>.
</p>
<p>
August 2010 <b>&raquo;&nbsp;</b> <a href="http://gforge.inria.fr/frs/?group_id=1570"><b>StarPU 0.4 is now available !</b></a>
This release provides support for task-based dependencies, implicit data-based dependencies
(RAW/WAR/WAW), profiling feedback, an MPI layer, OpenCL and Windows support, as
well as an API naming revamp.
</p>
<p>
July 2010 <b>&raquo;&nbsp;</b> StarPU was presented during a tutorial entitled "Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and the DPLASMA and StarPU Scheduler" at SAAHPC in Knoxville, TN (USA) <a href="http://runtime.bordeaux.inria.fr/StarPU/saahpc.pdf">(slides)</a>
</p>
<p>
May 2010 <b>&raquo;&nbsp;</b> Want to get an overview of StarPU ? Check out our <a href="http://hal.archives-ouvertes.fr/inria-00467677">latest research report</a>!
</p>
<!--<p>
October 2009 <b>&raquo;&nbsp;</b> <a href="http://gforge.inria.fr/frs/?group_id=1570"><b>StarPU 0.2.901 (0.3-rc1) is now available !</b></a>
This release adds support for asynchronous GPUs and heterogeneous multi-GPU platforms as well as many other improvements.
</p> -->
<p>
June 2009 <b>&raquo;&nbsp;</b>NVIDIA granted the StarPU team with a professor partnership and donated several high-end CUDA-capable cards.
</p>
</div>

<div class="section" id="download">
<h3>Download</h3>
<!-- PLEASE LEAVE "Powered By Gforge" on your site -->
<a href="http://gforge.org/"><img src="/Images/pow-gforge.png" align="right" alt="Powered By GForge Collaborative Development Environment" border="0"></a>
<p>
Nathalie Furmento's avatar
Nathalie Furmento committed
91
<b>&raquo;&nbsp;</b>All releases and the development tree of StarPU are freely available on <a href="http://gforge.inria.fr/projects/starpu/">INRIA's gforge</a> under the LGPL license. Some releases are available under the BSD license.
Nathalie Furmento's avatar
website    
Nathalie Furmento committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
</p>
<p>
<b>&raquo;&nbsp;</b>Get the <a href="http://gforge.inria.fr/frs/?group_id=1570">latest release</a>
</p>
<p>
<b>&raquo;&nbsp;</b>Get the <a href="http://starpu.gforge.inria.fr/testing/">latest nightly snapshot</a>.
</p>
<p>
<b>&raquo;&nbsp;</b>Current development version is also accessible via svn
</p>
<p style ="text-indent:25px">
svn checkout svn://scm.gforge.inria.fr/svn/starpu/trunk StarPU
</p>
</div>

<div class="section" id="StarPU">
<h3><b>StarPU</b> Overview</h3>

  <p>
Traditional processors have reached architectural limits which heterogeneous
multicore designs and hardware specialization (e.g. coprocessors, accelerators,
...) intend to address. However, exploiting such machines introduces numerous
challenging issues at all levels, ranging from programming models and compilers
to the design of scalable hardware solutions. The design of efficient runtime
systems for these architectures is a critical issue.  StarPU typically makes it
much easier for high performance libraries or compiler environments to exploit
heterogeneous multicore machines possibly equipped with GPGPUs or Cell
processors: rather than handling low-level issues, programmers may concentrate
on algorithmic concerns. 
  </p>

  <p>
Portability is obtained by the means of a unified abstraction of the machine.
StarPU offers a unified offloadable task abstraction named "codelet". Rather
than rewriting the entire code, programmers can encapsulate existing functions
within codelets. In case a codelet may run on heterogeneous architectures, it
is possible to specify one function for each architectures (e.g. one function
for CUDA and one function for CPUs). StarPU takes care to schedule and execute
those codelets as efficiently as possible over the entire machine. In order to
relieve programmers from the burden of explicit data transfers, a high-level
data management library enforces memory coherency over the machine: before a
codelet starts (e.g. on an accelerator), all its data are transparently made
available on the compute resource.
  </p>

  <p>
Given its expressive interface and portable scheduling policies, StarPU obtains
portable performances by efficiently (and easily) using all computing resources
at the same time. StarPU also takes advantage of the heterogeneous nature of a
machine, for instance by using scheduling strategies based on auto-tuned
performance models.
  </p>

</div>

<div class="section" id="Supported platforms">
<h3>Supported Architectures</h3>
<ul>
<li>SMP/Multicore Processors (x86, PPC, ...) </li>
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU)</li>
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
</ul>
</div>

<div class="section" id="Supported OS">
<h3>Supported Operating Systems</h3>
<ul>
<li>Linux</li>
<li>Mac OS/X</li>
<li>Windows</li>
</ul>
</div>

<div class="section" id="Performance analysis tools">
<h3>Performance analysis tools</h3>
  <p>
In order to understand the performance obtained by StarPU, it is helpful to
visualize the actual behaviour of the applications running on complex
heterogeneous multicore architectures.  StarPU therefore makes it possible to
generate Pajé traces that can be visualized thanks to the <a
href="http://vite.gforge.inria.fr/"><b>ViTE</b> (Visual Trace Explorer) open
source tool.</a>
  </p>

  <p> 
<b>Example:</b> LU decomposition on 3 CPU cores and a GPU using a very simple
greedy scheduling strategy. The green (resp. red) sections indicate when the
corresponding processing unit is busy (resp. idle). The number of ready tasks
is displayed in the curve on top: it appears that with this scheduling policy,
the algorithm suffers a certain lack of parallelism. <b>Measured speed: 175.32
GFlop/s</b>
</p> 
<p>
<img src="./images/greedy-lu-16k-fx5800.png" alt="LU decomposition (greedy)"  width="900"> </p>
<p>
This second trace depicts the behaviour of the same application using a
scheduling strategy trying to minimize load imbalance thanks to auto-tuned
performance models and to keep data locality as high as possible. In this
example, the Pajé trace clearly shows that this scheduling strategy outperforms
the previous one in terms of processor usage. <b>Measured speed: 239.60
GFlop/s</b> 
</p>
  <p> <img src="./images/dmda-lu-16k-fx5800.png" alt="LU decomposition (dmda)" width="900"> </p>

</div>

<div class="section" id="publis">
<h3>Documentation and Related Publications</h3>
<ul>
<li> <a href="http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html">Papers related to StarPU</a>, notably:
<ul>
<li>a good overview in the <a href="http://hal.archives-ouvertes.fr/inria-00467677">Research Report</a>.</li>
<li><a href=http://hal.inria.fr/inria-00547847>a hybridization
methodology</a>, and application to <a href=http://hal.inria.fr/inria-00547614>QR</a>, <a
href=http://hal.inria.fr/inria-00547616>cholesky</a>, and soon LU.</li>
<li><a href=http://hal.inria.fr/inria-00523937>Memory management,
memory-aware scheduling and MPI management</a>.</li>
<li><a href=http://hal.inria.fr/inria-00421333>Calibration of performance
models</a>.</li>
<li><a href=http://hal.inria.fr/inria-00378705>Cell/BE port</a>.</li>
</ul>
<li> StarPU documentation is available in <a href="./starpu.pdf">PDF</a> and in <a href="./starpu.html">HTML</a>. Please note that these documents are up-to-date with the SVN repository and not with the latest release of StarPU.
</ul>
</div>

<div class="section" id="contact">
<h3>Contact</h3>
For any questions regarding StarPU, please contact the StarPU developers mailing list.
<pre>
<a href="mailto:starpu-devel@lists.gforge.inria.fr?subject=StarPU">starpu-devel@lists.gforge.inria.fr</a>
</pre>
</div>

<hr class="main" />

<p class="updated">
  Last updated on 2011/01/19.
</p>

</body>
</html>