comment on scale-in and scale-out

......@@ -181,7 +181,10 @@ permits to <b>automatically let processing units execute the tasks they are the
work-stealing, ... The overhead per task is typically around the order of
magnitude of a microsecond. Tasks should thus be a few orders of magnitude
bigger, such as 100 microseconds or 1 millisecond, to make the overhead
......@@ -191,7 +194,11 @@ explicit network communications, which will then be <b>automatically combined an
overlapped</b> with the intra-node data transfers and computation. The application
<b>generate all required MPI communications</b> accordingly (new in v0.9). We
have gotten excellent scaling on a 144-node cluster with GPUs, we have not yet
had the opportunity to test on a yet larger cluster. We have however measured
that with naive task submission, it should scale to a thousand nodes, and with
pruning-tuned task submission, it should scale to about a million nodes.
<h4>Out of core</h4>
