Commit 5fdadbf1 authored by THIBAULT Samuel's avatar THIBAULT Samuel
Browse files

Add a few bullets, specify which version supports what

git-svn-id: svn+ssh://scm.gforge.inria.fr/svn/starpu/website@15646 176f6dd6-97d6-42f4-bd05-d3db9ad07c7a
parent 39a4860e
......@@ -133,9 +133,10 @@ is possible to specify one function for each architectures</b> (e.g. one functio
for CUDA and one function for CPUs). StarPU takes care of scheduling and
executing those codelets as efficiently as possible over the entire machine, include
multiple GPUs.
One can even specify <b>several functions for each architecture</b> as well as
<b>parallel imeplementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size.
One can even specify <b>several functions for each architecture</b> (new in
v1.0) as well as
<b>parallel implementations</b> (e.g. in OpenMP), and StarPU will
automatically determine which version is best for each input size (new in v0.9).
</p>
<h4>Data transfers</h4>
......@@ -162,7 +163,10 @@ programmer with best flexibility:
</ul>
</p>
<p>
StarPU also supports an OpenMP-like <a href="doc/html/AdvancedExamples.html#DataReduction">reduction</a> access mode.
StarPU also supports an OpenMP-like <a href="doc/html/DataManagement.html#DataReduction">reduction</a> access mode (new in v0.9).
</p>
<p>
It also supports a <a href="doc/html/DataManagement.html#DataCommute">commute</a> access mode to allow data access commutativity (new in v1.2).
</p>
<h4>Heterogeneous Scheduling</h4>
......@@ -183,7 +187,23 @@ explicit network communications, which will then be <b>automatically combined an
overlapped</b> with the intra-node data transfers and computation. The application
can also just provide the whole task graph, a data distribution over MPI nodes, and StarPU
will automatically determine which MPI node should execute which task, and
<b>generate all required MPI communications</b> accordingly.
<b>generate all required MPI communications</b> accordingly (new in v0.9).
</p>
<h4>Out of core</h4>
<p>
When memory is not big enough for the working set, one may have to resort to
using disks. StarPU makes this seamless thanks to its <a href="doc/html/OutOfCore.html">out of core support</a> (new in 1.2).
StarPU will automatically evict data from the main memory in advance, and
prefetch back required data before it is needed for tasks.
<p>
To deal with clusters, StarPU can nicely integrate with <a href="doc/html/MPISupport.html">MPI</a> through
explicit network communications, which will then be <b>automatically combined and
overlapped</b> with the intra-node data transfers and computation. The application
can also just provide the whole task graph, a data distribution over MPI nodes, and StarPU
will automatically determine which MPI node should execute which task, and
<b>generate all required MPI communications</b> accordingly (new in v0.9).
</p>
<h4>Extensions to the C Language</h4>
......@@ -192,14 +212,20 @@ will automatically determine which MPI node should execute which task, and
that <a href="doc/html/cExtensions.html">extends the C programming
language</a> with pragmas and attributes that make it easy
to <b>annotate a sequential C program to turn it into a parallel
StarPU program</b>.
StarPU program</b> (new in v1.0).
</p>
<h4>OpenCL-compatible interface</h4>
<p>
StarPU provides an <a href="doc/html/SOCLOpenclExtensions.html">OpenCL-compatible interface, SOCL</a>
which allows to simply run OpenCL applications on top of StarPU (new in v1.0).
</p>
<h4>Simulation support</h4>
<p>
StarPU can very accurately simulate an application execution
and measure the resulting performance thanks to using the
<a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a>. This allows
<a href="http://simgrid.gforge.inria.fr">SimGrid simulator</a> (new in v1.1). This allows
to quickly experiment with various scheduling heuristics, various application
algorithms, and even various platforms (available GPUs and CPUs, available
bandwidth)!
......@@ -230,11 +256,11 @@ for (k = 0; k < tiles; k++) {
<h4>Supported Architectures</h4>
<ul>
<li>SMP/Multicore Processors (x86, PPC, ...) </li>
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU)</li>
<li>NVIDIA GPUs (e.g. heterogeneous multi-GPU), with pipelined and concurrent kernel execution support (new in v1.2) and GPU-GPU direct transfers (new in 1.1)</li>
<li>OpenCL devices</li>
<li>Cell Processors (experimental)</li>
</ul>
and soon
and soon (in v1.2)
<ul>
<li>Intel SCC</li>
<li>Intel MIC / Xeon Phi</li>
......@@ -279,7 +305,7 @@ GFlop/s</b>
<p>
<a href="http://www.hlrs.de/temanejo">Temanejo</a> can be used to debug the task
graph, as shown below.
graph, as shown below (new in v1.1).
</p>
<center>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment