Add options `--forcegpu` and `--profile`
This PR:
- Restore the
--forcegpu
option to enforce kernels to run on the GPU whenever a CUDA implementation is available. - Restore the
--profile
option to display the performance profile of each kernel and their distribution among the resources. - Add a cmake option
CHAMELEON_SIMULATION_EXTENDED
to enable non GPU kernels to be easily simulated on the GPU through simgrid.