Fix bad construct with omp in FfmmAlgorithmThread
-
Maintainer
Hi @pesterie I am not sure I understand correctly this commit, could you tell me more? I do not get why
MaxThreads
is not initialized in the construction, and why it is initialized in a parallel section? Also, originally, the idea was to have one initialization at a time (as you have seen withcritical (InitFFmmAlgorithmThread)
I guess), this can be removed I agree, we can consider that it must be thread safe to copy a kernel. However, the loop you have made to init all the kernels is not clear to me, why not a simple parallel section where each thread inits its is own kernel (as it was before)? And theif
in the loop is also surprising to me. Thanks! -
Maintainer
Hi @bramas, thanks for your comment. I agree, the fix is ugly and will be changed to a more elegant omp flavor. I totally agree about RAII but in that particular case the call to
omp_get_max_threads()
is an error. This omp function can return whatever it wants. For example on a machine with hyper-threading turned on you get the number of logical cores as the number of threads. This case happens if you are lucky. The runtime can return different values. The omp specification is the following : "Theomp_get_max_threads
routine returns an upper bound on the number of threads that could be used to form a new team if aparallel
construct without anum_threads
clause were encountered after execution returns from this routine." This can be used if you want to allocate sufficient memory for a following parallel section. In our case we want to keep the control on the number of threads especially to avoid hyper-threading effects. In a further version of the library, I think that this value should be defaulted to the number of physical cores of the machines if the user has no clue about the number of cores. But we still need to be able to change this value at runtime if we want. What are your thoughts on this ? Thanks ! -
Maintainer
Agree about
omp_get_max_threads
but users expect to control the number of threads with the env variableOMP_NUM_THREADS
if they know that this is an openmp based application. And so a user might be surprised if the code run using all cores whenOMP_NUM_THREADS
was set to let say 1. Moreover, if the application is executed with a limited access to the cores (for instance when taskset, numactl, etc. are used)omp_get_max_threads
returns the number of available cores (even so it seems to be fragile because it looks like it is not guarantee by the standard from the text you pasted). Of course we could look at the process affinity to find out this number, but then we should make it clear somewhere. I suppose you already know so I just write it for the record, but to control at runtime whatomp_get_max_threads
returns, one can useomp_set_num_threads
to specify a value. Anyway, I would follow your choices at the end :)My point was more about the loop and the test to set the kernel, I do not understand them.
With the following code I am sure that all threads are going to init a kernel:
#pragma parallel { kernel[omp_get_thread_num()] = init_kernel(); }
You code is expected to do the same:
#pragma omp parallel for shared(kernels) for(int i = 0; i < omp_get_num_threads(); ++i) { if(i == omp_get_thread_num()) { this->kernels[i] = new KernelClass(*inKernels); } }
But in your case, it seems a more complicated way to do it, is not it? Plus, the test must always be true (or the code does not do what it is implemented for)?
-
Maintainer
I do not want to be too invasive, I am just asking because maybe there is something I do not know such that there is a difference between both codes?