README.md 3.08 KB
Newer Older
1
2
3
4
5
6
7
8
9
* Installation

Before running install.sh, make sure
opam and git are installed on your system
Then sh install.sh should do the job

Most files of interest are located in ml_utils.
Another README can be found there. It explains
the role of each executable and how to launch them
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

* Workflow 

TTiLe purpose is to automatically generate efficient implementations
for convolution on any architecture.
It can be used either as a standalone tool (for which we only support a
sequential optimization for now) or in conjunction with TVM.
The pipeline differs a bit between both but is essentially the same.

It consists of the following steps :

1) Test and select microkernels for a given architecture.

See ml_utils/microkernel_search/mickernel_search.ml.
You can either print all microkernels results in stdout or
use them to build the so-called classes of microkernels.
These classes should then be placed in a new architecture module
in ml_utils/search/arch_info.ml, along with cache informations.
See other examples in this file.

2) For a given convolution size, research of possible permutations.
We rely on ioopt for this. As this task is quite long and requires
a version of ioopt installed, it can be done offline for convenience,
see code in ml_utils/ioopt_cache/ioopt_cache.ml.

Either way, whenever a research of permutations is launched, TTiLe
will first look into perm_cache.json to find the specific 
convolution/arch/ukernel class tuple for which it needs a permutation,
and falls back to ioopt if it is not there. In the latter case if ioopt
is not present it will simply crash (this could be improved).

3) The previous steps allow us to define a search space of candidates
that we now want to prune.
We can either :
** not prune at all and take all candidates
** sample our search space and take n candidates randomly
** or sort candidate with some metric, then take the n best along this metric.

Examples can be found in ics_search.ml or microsearch.ml.

4) Python interface

Our generation strategy can be called via python.
To build and install a python frontend, just run
make python_install in ml_utils,
then export TTILE_ROOT=<path of matmul_bench>

this will install a module ttilepy with a function gen_scheme.
```python
def gen_scheme(architecture, num_candidates, f, c, y, x, h ,w, stride)
```

TOLLENAERE Nicolas's avatar
TOLLENAERE Nicolas committed
62
63
64
The code can be found in ml_utils/python_frontend/python_frontend.ml
architecture is one of skylake, xeon, broadwell, E52630, chifflet or silver.
Others can be added by modifying the function arch_of_string.
65
66
67
68
69
70
71
72
73
74
75

gen_schemes returns l:List[List[scheme_out, footprint_in, size_y, code]]
where scheme out is the scheme found out of the tensorize part, in format
[('x', 10), ('y, 13)...]
footprint_int is the size of every dimension inside the tensorize,
size_y is the global size of y in this sub-part of the computation
and code is a string containing the c code to tensorize.
len(l) = num_candidate (or less if less than num_candidate were found)
for each l' in l, there is either one or two element in l' depending on
the presence of a lambda or not.
See tvm_ttile for the call of this function inside our custom tvm implementation.