Several fixes in perfmodel_bus
- Rename
benchmark_all_gpu_devices()
tobenchmark_all_memory_nodes()
: this function doesn't benchmark only RAM<->GPU memory transfers, but also e.g. transfers between NUMA nodes if NUMA support is enabled. The function name can be important because it is displayed in the printf when the calibration is done. - Remove useless code: no need to get the configuration and the number of NUMA nodes and cores, since it is stored the first time in global variables.
- Fix a bug when finding a core belonging to a NUMA node: use attribute
type
instead ofdepth
ofhwloc_obj_t
to know if it is a PU object or not. - Bind the thread doing the
memcpy
when benchmarking memory transfers between NUMA nodes. Do we need to bind it several times like when benchmarking RAM<->GPU transfers (cf commentshack to avoid third party libs to rebind threads
) ? - Factorize code to get a core belonging to a NUMA node, required for the previous point.