Slow runs of ZLANGE for Max and Frobenius norms
I think I found a bug in workspace allocation of
chameleon_pzlange_generic function, lines 438-440 and 450-452. Idea behind these descriptors is to store result of LANGE operation on each tile. So, it should be MT-by-NT matrix for Max norm (maximum per each block) and 2*MT-by-NT matrix for Frobenius norm (scale and scaled sum of squares per each block). However, what I see in the code is allocation of MT-by-N and 2*MT-by-N matrices. Fix for this is obvious, so I am not putting any pull request.
My test code was running for 12 seconds to compute norm of 10k-by-10k matrix before fix and 0.5 seconds after this fix.
chameleon_pzlansy_generic is also corrupted by this bug.