Output of Factorization, Solve time and GFlops (F90 api)
On output of the calls
sla_lap(ib)%iparm(IPARM_VERBOSE) = PastixVerboseNot
...
! 1- Initialize the parameters and the solver
call pastixInit( sla_lap(ib)%pastix_data, 0, sla_lap(ib)%iparm, sla_lap(ib)%dparm )
! 2- Analyze the problem
call pastix_task_analyze( sla_lap(ib)%pastix_data, sla_lap(ib)%spm, info )
! 3- Factorize the matrix
call pastix_task_numfact( sla_lap(ib)%pastix_data, sla_lap(ib)%spm, info )
The diagnostic prints
write(6,*) ' Matrix ', ib
write(6,*) ' Time for analysys ', sla_lap(ib)%dparm(DPARM_ANALYZE_TIME)
write(6,*) ' Pred Time for fact ', sla_lap(ib)%dparm(DPARM_PRED_FACT_TIME)
write(6,*) ' Time for factorization ', sla_lap(ib)%dparm(DPARM_FACT_TIME)
write(6,*) ' GFlops/s for fact ', sla_lap(ib)%dparm(DPARM_FACT_FLOPS)
Give systematically null factorization times and very optimistic ;-) GFlops/s
Matrix 1
Time for analysys 3.892183303833008E-003
Pred Time for fact 0.115354254012610
Time for factorization 0.000000000000000E+000
GFlops/s for fact 5135859720.59899
Notice that, since several factorization run in parallel on OpenMP threads, the verbosity has to be switched off (set to PastixVerboseNot
) and all the prints are postponed.
For a test, I switched off the parallelization, set the verbosity to PastixVerboseNo
and interspersed the a posteriori write obtaning
+-------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L 2451183
Fill-in 14.324351
Number of operations in full-rank: LL^t 900.43 MFlops
Prediction:
Model AMD 6180 MKL
Time to factorize 1.220890e-01 s
Time for analyze 3.082991e-03 s
Time for analysys 3.082990646362305E-003
Pred Time for fact 0.122088950728251
+-------------------------------------------------+
Factorization step:
Factorization used: LL^t
Time to initialize internal csc 1.364207e-02 s
Time to initialize coeftab 1.336455e-02 s
Time to factorize 1.121373e-01 s ( 9.89 GFlop/s)
Number of operations 1.11 GFlops
Number of static pivots 17
Time for factorization 0.000000000000000E+000
GFlops/s for fact 10622676294.4213
Not tested yet with solution times