Diagonal copy support
All data descriptor for temporary copies of the diagonal to release dependencies on lower/upper parts should be moved to the driver level to avoid synchronization steps when possible. This is already done in the new HQR kernels but should be done in:
- pzgelqf.c
- pzgelqfrh.c
- pzgeqrf.c
- pzgeqrfrh.c
- pzhetrd_he2hb.c
- pztpgqrt.c
- pzunglq.c
- pzunglqrh.c
- pzungqr.c
- pzungqrrh.c
- pzunmlq.c
- pzunmlqrh.c
- pzunmqr.c
- pzunmqrrh.c