benchmark_all_memory_nodes(): this function doesn't benchmark only RAM<->GPU memory transfers, but also e.g. transfers between NUMA nodes if NUMA support is enabled. The function name can be important because it is displayed in the printf when the calibration is done.
- Remove useless code: no need to get the configuration and the number of NUMA nodes and cores, since it is stored the first time in global variables.
- Fix a bug when finding a core belonging to a NUMA node: use attribute
hwloc_obj_tto know if it is a PU object or not.
- Bind the thread doing the
memcpywhen benchmarking memory transfers between NUMA nodes. Do we need to bind it several times like when benchmarking RAM<->GPU transfers (cf comments
hack to avoid third party libs to rebind threads) ?
- Factorize code to get a core belonging to a NUMA node, required for the previous point.