WIP: First working version of the HIP driver (synchronous)
This is a WIP of the HIP driver for starPU. Will be working on a better Configuration/compilation for it, which has been made specific for the environment it has been developed on (temporarily and for ease of use). Developed and tested with ROCm 5.0.1 (rocm-5.0.1/lib64 and rocm-5.0.1/hip/lib have been added to LD_LIBRARY_PATH manually).
Only ported the synchronous version called hip0, mirroring cuda0. Compiled using Clang 13.0.0 (from aocc) Tests carried out on a single node with 8 MI100 GPUs:
-/examples/basic_examples/vector_scal
: added hip kernel => 1 task ran successfully on GPU returning the correct result
-/examples/basic_examples/block
: added hip kernel => 1 task ran successfully on GPU returning the correct result
-/examples/basic_examples/mult
: Specific kernel written to perform a Matrix Multiplication with multiple tasks on multiple GPUs => 16 tasks ran on all 8 GPUs returning the correct result (compared to the computation made fully on CPU)
On top of a potential update for the Configuration/compilation of HIP, the main focus is adding asynchronous support for HIP and is currently in the works (taking cuda1 as an example).
Only driver_hip_init.c driver_hip.h and driver_hip0.c are important in the driver folder.