Add new low rank kernels
This MR add new low rank kernels support and associated testings. New features include :
- Low rank triangular solve (lrtrsm)
- Low rank symmetric Rank-k update (lrrk)
- Low rank matrix multiplication
C = alpha * opA(A) * opB(B) + beta * C
for alltransA
/transB
/beta
combinations. Previously onlytransA = NoTrans
,transB = (Conj)Trans
,beta = 1.0
was supported.
This is still work in progress, some cases are not implemented. As an example the special case "beta != 1.0" is not handled in low rank matrix addition kernel.