Bind thread benchmarking transfers between NUMA nodes for perfmodel

* Factorize code to get a CPU inside a NUMA node
* Fix a bug while looking for a CPU inside a NUMA node
3 jobs for !45 with numa-perfmodel in 21 minutes and 36 seconds (queued for 2 seconds)
latest merge request