Mentions légales du service

Skip to content
Snippets Groups Projects
EYRAUD-DUBOIS Lionel's avatar
EYRAUD-DUBOIS Lionel authored
Fix: Typo in formula b from HLP-b

See merge request !6
4d964cc0
History

pmtool: Post-Mortem Tool

Purpose: Analyse post-mortem the behavior of StarPU applications. Provide lower bounds on makespan. Study the performance of different schedulers in a simple context.

Limitations: ignore communications for the moment. Branch comms attempts to remove this limitation.

Compiling: This project uses cmake. Make a new build/ directory, cd to it, and use cmake .. to generate Makefiles. Then compile with make.

All bounds require CPLEX, which can be specified with the -DCPLEX_ROOT_DIR=<...> option to cmake.

Depends on librec-dev (See here), and CPLEX depends on libgpg-error-dev and libgcrypt-dev.

Table of contents

  1. How to use
  2. Typical usage
  3. Algorithms
    1. Heft
    2. List
    3. HetPrio
    4. indep
    5. dmdas
    6. greedy
    7. zhang
  4. Bounds
  5. More specific options

How to use:

Standard usage

pmtool [INPUT FILES] [OPTIONS]

where [OPTIONS] are:

  • -a <algname>, --alg=<algname>
    Use algorithm algname to compute a schedule. Complete syntax is algname:option1=value1:option2=value2:.... A list of possible algorithms can be obtained with --help. It is possible to give several -a options to try several algorithms on all input files. It is possible to specifiy several values for one option with option=value1,value2,value3 ; then all combination of option values will be used.

  • -b <boundname>, --bound=<boundname>
    Use boundname to compute a bound. Same option syntax as above.

  • -o option=value
    Apply this option to all subsequent algorithms and bounds

  • -p <platformFile>, --platform=<platformFile>
    Add platformFile to the list of platform files to use. Need at least one if the instance does not contain execution times for tasks.

  • -v <verb>, --verbose=<verb>
    Set the verbosity level to verb. Applies also to all algorithms and bounds specified after this option. Usually only useful for debugging.

  • -s <savefile>, --save=<savefile>
    Save all the produced schedules in file savefile.

  • -d <type>, --default=<type>
    Use default algorithms and bounds. This option comes in three different flavors (default = normal):

    • fast
      Use low complexity algorithms, to be used on large instances. Uses area bound, and heft and HeteroPrio algorithms with rank=min.
    • normal
      Use reasonable complexity algorithms: depbound and heft and HeteroPrio algorithms with rank=min and heft.
    • slow
      Use slow algorithms for very precise results: depbound and interval bounds, heft and HeteroPrio algorithms, and greedy which use the allocation from the interval bound.
  • --threads, --threads=<nb>
    Use parallel threads for algorithms (bounds are already parallel thanks to CPLEX). If unspecified, use the number of threads available on the machine.

See more specific options at the end of the document.

More details

Input files ending with .rec are read assuming librec format compatible with StarPU, others use an in-house format. Formats are specified here.

The output contains one header, then one line per file and per algorithm, where the header is

input algorithm option1 option2 mkspan time

Example:

./build/pmtool data/CholeskySirocco-*.pm -b area -a heft:rank=min -a dmdas:rank=heft -a hetprio:verbosity=1
input algorithm rank verbosity mkspan time
data/CholeskySirocco-12.pm area NA NA 125274 63
data/CholeskySirocco-12.pm heft min NA 179923 1
data/CholeskySirocco-12.pm dmdas heft NA 199231 2
data/CholeskySirocco-12.pm HetPrio NA 1 174862 3
data/CholeskySirocco-32.pm area NA NA 2.01504e+06 45
data/CholeskySirocco-32.pm heft min NA 2.27366e+06 40
data/CholeskySirocco-32.pm dmdas heft NA 2.19026e+06 44
data/CholeskySirocco-32.pm HetPrio NA 1 2.09165e+06 20
data/CholeskySirocco-64.pm area NA NA 1.59353e+07 67
data/CholeskySirocco-64.pm heft min NA 1.62474e+07 542
data/CholeskySirocco-64.pm dmdas heft NA 1.64605e+07 358
data/CholeskySirocco-64.pm HetPrio NA 1 1.59718e+07 147

Typical usage

  1. Obtain a trace file from a StarPU application

starpu_app starpu_fxt_tool -i /tmp/prof_file_ starpu_perfmodel_recdump tasks.rec-o platform_file.rec


2.  Run `pmtool` on the trace  
    ```
pmtool -p platform_file.rec tasks.rec -b area -a heft:rank=min,heft -a hetprio:rank=min,heft:verbosity=5
pmtool -p platform_file.rec tasks.rec -t 0.1 -b interval:pre=4:post=4:paths=yes:limit=10:share=ival:save=filename -a greedy:file=int@ival[alloc]:rank=alloc:stealer=1:export=file.rec

Algorithm description and options

Common options to all algorithms:

These options can be used for all algorithms.

  • Boolean options have values yes or no
  • verbosity
    default 0, is set automatically if the algorithm appears after a global -v option
  • save=<name>
    provide a filename to save the schedule of this algorithm. Not to be used with several option values (option=value1,value2) since they will overwrite each other
  • export
    exports the allocation of the algorithm, in librec format
  • bubble
    additional mandatory option: bubbleType obscure option to make a specific export, divides schedule in bubbles depending on start times of tasks of type bubbleType, and outputs for each task the bubble in which it starts, in which it ends, and its allocation
  • share=KEY
    creates shares with keys KEY[alloc], KEY[worker], KEY[end] and KEY[start] which provide respectively, for each task, its allocation (which resource type it was assigned to), the worker it was assigned to, its start time and its end time, as computed by this algorithm. Shares can be reused by various algorithms, notably the greedy scheduler.

heft

A real HEFT algorithm. It considers each task in priority order, and schedule it ASAP given all previous decisions, maybe filling holes left by previously scheduled tasks.

Options

  • rank (default min)
    specifies the priority order. Higher values means more priority and thus executed earlier. Any order (except fifo) can be prefixed with # to reverse it. Possible values:
    • fifo
      tasks are ordered in the same order as they become ready
    • none
      use the order in which tasks are given in the input file
    • min, heft, area, unit
      orders based on the task graph: priorities are equal to bottom levels of tasks according to a specific task weight:
      • min: smallest execution time accross all ressources
      • heft: average execution time accross all ressources
      • area: average execution time accross all ressources weighted by the proportion in the area bound for that task type.
      • unit: each task has weight 1.
    • ext@FILE
      Use external priorities from a file (all values in order in which the tasks are given in the instance file)
    • int@KEY
      Use internal priorites, as computed from another algorithm or bound and shared with the share or export option
    • lex@order1+order2+...+ordern
      Combine several orderings in lexicographic order (can not be fifo)

list

A standard LIST algorithm (each time a resource is idle, assign the highest priority ready task to it).

Options

  • rank
    same option as in the heft algorithm

hetprio

HeteroPrio, as discussed in Suraj's thesis. Resource-centric algorithm, each time a resource is idle, assign the most suited task for it (highest priority in case of ties for fast resources, lowest priority for small resources). Fast resources are allowed to steal from other resources (and restart) if there is no available tasks. Stealing chooses the highest priority task that the stealer can finish before the victim.

Options

  • rank
    same as for the heft algorithm
  • combFactor (default 0)
    combines several task types together (and consider them a tie) if their "suitability" is within combFactor
  • front (boolean, default no)
    small resources also pick highest priority tasks in case of ties.
  • ssbaf (boolean, default no) -- sort steal by acceleration factor
    If yes, stealers choose the most "suited" task when stealing
  • stealLatest (boolean, default no)
    If yes, stealers choose the task that finishes last. ssbaf has priority.
  • stealidle (boolean, default yes)
    If yes, resources only steal when they're idle. If no, if the best suited available task is less suited than a steal candidate, steal it. Should be paired with ssbaf to make sense.

True Hetero Prio

There is another implementation of heteroprio called truehp, restricted to the case of 2 types of processors. It is faster when there are many types of tasks (and thus few tasks of each type) because it stores all tasks in one queue order by acceleration factor, whereas hetprio maintains one queue per type of task, ordered according to its rank option. truehp does not accept any option for now.

indep

Algorithm based on iteratively solving independent tasks scheduling. Based on a Greedy algorithm which maintains a list of ready tasks allocated to each resource, and schedules them in order of priority. Independent task solver is called to assign tasks to resources. All of these only work with two types of resources.

Options

dmdas

Emulates the dmdas scheduler from StarPU. Considers ready tasks in priority order, and assign them on the resource on which they finish earliest, given tasks already assigned. High priority tasks can overcome already assigned low priority tasks, but resource selector does not know it.

Options

  • rank

greedy

Greedy algorithm with pre-allocated tasks. Requires a file option

Options

  • rank
    same as usual, with one additional possible value (which is the default):
    • alloc
      bottom-level priority computation (similar to min, heft and area), where task weight is its execution time on its allocated resource.
  • file (required)
    Provide the name of a file from which the pre-allocation will be read, one number of resource type per task, in the order of the instance. CAREFUL of the -t option, which renumbers resources.
  • key
    Uses pre-allocation shared with share or export option. Often KEY[alloc] actually.
  • comms
    If yes, takes communication into account (if they are defined in the tasks.rec input file). Default is no.
  • stealer (default -1)
    specifies a resource type which is allowed to steal from others, by number. CAREFUL of -t option, which renumbers resources.
  • stealerName
    specifies a resource type which is allowed to steal, by name. Stealing only happens when idle, and steals the task with the best acceleration factor compared to the victim (highest priority in case of ties). If both stealer and stealerName are specified, stealer has precedence.

zhang

Online version of the indep zhang algorithm. Less costly in execution time. Shares a lot of the implementation with greedy.

Options

Same options as greedy, except that it does not have a file option and cannot use the alloc ranking scheme. Additional options:

  • lambda
  • beta
  • theta
  • phi
    Allows to change the values of the parameter of the algorithm. Default are as in the paper.

lp

Algorithm only meant for independent tasks, which solves a basic ILP to get an optimal assignment of tasks to machines.

Options

  • limit specifies a time limit in seconds.
  • gap specifies a gap limit between the current solution and the upper bound.

Bound description and options

Reminder: all bounds require to compile with CPLEX.

Common options and concepts

  • option verbosity
    same as before.

  • option share=KEY same as for algorithms, each bound can share some values, with the identifier KEY[PARAM]. Bounds share different parameters.

  • hybridization
    Almost all bounds are hybridized by default, in the following sense: Some tasks are taken away from the instance, at the beginning and end of the DAG. The shortest maximal path of these subdags are computed, and their lenghts are added to the result of the bound on the remaining DAG. This provides a bound which can be stronger than the boun on the whole DAG. Tasks are take away one after the other, until this does not improve the result.

    Hybridized bounds have the following options:

    • hybrid (default yes)
      can be set to no to disable hybridization
    • limit (default 0)
      if non-zero, once the sets of tasks to take away are decided, the bound is computed in "tryhard" mode (usually looking for an integer solution) with the given time limit.

Possible bounds:

  • area
    Simple area bound, with one variable per type of tasks and type of resource. Very quick.

  • cp
    Simple critical path bound. Very quick.

  • dep: mixed area-dependency bound. Always better than max(area, cp), but slower
    Options:

    • mode: normal (default) or concurrent
      if concurrent, uses Cplex's concurrent solving mode
    • outint (boolean, default no) if present and non-zero time limit, outputs the computed integer solution
  • iterdep: a refinment of the dep bound.
    Once dep bound is computed, go around the graph to search for a pair

    (u,v)(u,v)
    such that the total work of tasks
    tt
    such that
    u t vu ~ t ~ v
    is too much to process between end(u) and start(v). Add a local area bound between
    uu
    and
    vv
    , and solve again. Empirically, costly to compute and not much gain to have.

    Options: same as dep

  • mixed: Based on the paper by J. Langou
    Idea: computes the time to the end for each task with the dep bound. Then for all task

    ii
    ;

    • compute the area bound of
      {jtimeToEndjtimeToEndi}\{j | \text{timeToEnd}_j \geq \text{timeToEnd}_i\}
    • bound=area+timeToEndibound = area + \text{timeToEnd}_i

    Return best computed bound. Very slow, not strong enough to justify it.

  • interval: Interval Bound.
    Based on dividing the schedule in intervals, and writing area bounds for each interval. Intervals are made along a critical path (of length

    ll
    ) with
    2l+12l+1
    intervals (cuts at start and end of each task of the critical path). One variable for each task, in each interval, on each resource, with dependency constraints and area constrains in each interval. There are start variables (like in dep bound), but they are only loosely linked to the intervals because it's hard to express.

    This is the only bound which is not hybridized, because it does not help.

    Options:

    • weight: min (default) or heft
      specifies how the critical path is computed for the intervals
    • pre and post (default -1)
      If both are positive, this indicates that we keep only pre intervals at the beginning, and post at the end ; the middle intervals are merged together. Avoids a lot of computing.
    • paths (boolean, default yes)
      If yes, add more constraints about the time each path gets allocated in each interval, to make sure that the interval is long enough. Only a limited amount of paths are used, ensuring that each task has a path from and to the critical path.
    • share
      can share [alloc] (interval in which the task has done most of its computation), [med] (for each task, in which interval it has been allocated half its work), and [avg] (average interval in which task is performed)

More specific options

Here are some more specific options to pmtool, not used very often:

  • --best=<file>
    Output the best makespan obtained. If file is present, save the best schedule obtained for each instance in file.

  • --use-tags
    Append the content of the Tag field in instance.rec input files to job ids.

  • submit-order
    Use the SubmitOrder field in instance.rec input files instead of JobId.

  • export-type
    When exporting in .rec format with the export= option of algorithms, specify all workers of this type instead of the particular worker.

  • export-order
    When exporting in .rec format with the export= option of algorithms, specify the ordering of tasks on workers in addition to the allocation to workers. This is not compatible with --export-type.

  • -r <repartFile>
    Output a summary of the repartition of task types on resource types in the repartFile. If there are only two types of resources, the standard output also indicates CPU and GPU usage.

  • -w, --output-raw
    Raw algorithm names, instead of R-friendly column output.

  • -t <tol>, --tolerance=<tol>
    Tolerance for merging resources into the same type. If all execution times of two resources are within (1+tol) of each other, these resources are merged and the average execution times are used.

  • --no-header
    Does not output a header

  • --no-dep
    Ignore dependencies in input files.

  • --rev-dep
    Reverse dependencies.

  • --no-convert
    Does not convert indices in input. To be used if task IDs count from 0 to (n-1), instead of the default 1 to n.