Slow runs of ZLANGE for Max and Frobenius norms
Hello!
I think I found a bug in workspace allocation of chameleon_pzlange_generic
function, lines 438-440 and 450-452. Idea behind these descriptors is to store result of LANGE operation on each tile. So, it should be MT-by-NT matrix for Max norm (maximum per each block) and 2*MT-by-NT matrix for Frobenius norm (scale and scaled sum of squares per each block). However, what I see in the code is allocation of MT-by-N and 2*MT-by-N matrices. Fix for this is obvious, so I am not putting any pull request.
My test code was running for 12 seconds to compute norm of 10k-by-10k matrix before fix and 0.5 seconds after this fix.
Edit: chameleon_pzlansy_generic
is also corrupted by this bug.