Commit 2ead912a authored by ESTERIE Pierre's avatar ESTERIE Pierre

Fix bad construct with omp in FfmmAlgorithmThread

parent 71995752
...@@ -51,7 +51,7 @@ class FFmmAlgorithmThread : public FAbstractAlgorithm, public FAlgorithmTimers{ ...@@ -51,7 +51,7 @@ class FFmmAlgorithmThread : public FAbstractAlgorithm, public FAlgorithmTimers{
static const int SizeShape = P2PExclusionClass::SizeShape; static const int SizeShape = P2PExclusionClass::SizeShape;
int shapeLeaf[SizeShape]; int shapeLeaf[SizeShape];
const int MaxThreads; ///< The maximum number of threads. int MaxThreads; ///< The maximum number of threads.
const int OctreeHeight; ///< The height of the given tree. const int OctreeHeight; ///< The height of the given tree.
...@@ -73,18 +73,26 @@ public: ...@@ -73,18 +73,26 @@ public:
FFmmAlgorithmThread(OctreeClass* const inTree, KernelClass* const inKernels, FFmmAlgorithmThread(OctreeClass* const inTree, KernelClass* const inKernels,
const int inUserChunkSize = 10, const int inLeafLevelSeperationCriteria = 1) const int inUserChunkSize = 10, const int inLeafLevelSeperationCriteria = 1)
: tree(inTree) , kernels(nullptr), iterArray(nullptr), leafsNumber(0), : tree(inTree) , kernels(nullptr), iterArray(nullptr), leafsNumber(0),
MaxThreads(FEnv::GetValue("SCALFMM_ALGO_NUM_THREADS",omp_get_max_threads())), OctreeHeight(tree->getHeight()), OctreeHeight(tree->getHeight()),
userChunkSize(inUserChunkSize), leafLevelSeparationCriteria(inLeafLevelSeperationCriteria) { userChunkSize(inUserChunkSize), leafLevelSeparationCriteria(inLeafLevelSeperationCriteria) {
FAssertLF(tree, "tree cannot be null"); FAssertLF(tree, "tree cannot be null");
FAssertLF(leafLevelSeparationCriteria < 3, "Separation criteria should be < 3"); FAssertLF(leafLevelSeparationCriteria < 3, "Separation criteria should be < 3");
FAssertLF(0 < userChunkSize, "Chunk size should be > 0"); FAssertLF(0 < userChunkSize, "Chunk size should be > 0");
this->kernels = new KernelClass*[MaxThreads]; #pragma omp parallel shared(MaxThreads)
#pragma omp parallel num_threads(MaxThreads)
{ {
#pragma omp critical (InitFFmmAlgorithmThread) #pragma omp single nowait
{ {
this->kernels[omp_get_thread_num()] = new KernelClass(*inKernels); MaxThreads = omp_get_num_threads();
}
}
this->kernels = new KernelClass*[MaxThreads];
#pragma omp parallel for shared(kernels)
for(int i = 0; i < omp_get_num_threads(); ++i) {
if(i == omp_get_thread_num()) {
this->kernels[i] = new KernelClass(*inKernels);
} }
} }
......
  • Hi @pesterie I am not sure I understand correctly this commit, could you tell me more? I do not get why MaxThreads is not initialized in the construction, and why it is initialized in a parallel section? Also, originally, the idea was to have one initialization at a time (as you have seen with critical (InitFFmmAlgorithmThread) I guess), this can be removed I agree, we can consider that it must be thread safe to copy a kernel. However, the loop you have made to init all the kernels is not clear to me, why not a simple parallel section where each thread inits its is own kernel (as it was before)? And the if in the loop is also surprising to me. Thanks!

  • Hi @bramas, thanks for your comment. I agree, the fix is ugly and will be changed to a more elegant omp flavor. I totally agree about RAII but in that particular case the call to omp_get_max_threads() is an error. This omp function can return whatever it wants. For example on a machine with hyper-threading turned on you get the number of logical cores as the number of threads. This case happens if you are lucky. The runtime can return different values. The omp specification is the following : "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." This can be used if you want to allocate sufficient memory for a following parallel section. In our case we want to keep the control on the number of threads especially to avoid hyper-threading effects. In a further version of the library, I think that this value should be defaulted to the number of physical cores of the machines if the user has no clue about the number of cores. But we still need to be able to change this value at runtime if we want. What are your thoughts on this ? Thanks !

  • Agree about omp_get_max_threads but users expect to control the number of threads with the env variable OMP_NUM_THREADS if they know that this is an openmp based application. And so a user might be surprised if the code run using all cores when OMP_NUM_THREADS was set to let say 1. Moreover, if the application is executed with a limited access to the cores (for instance when taskset, numactl, etc. are used) omp_get_max_threads returns the number of available cores (even so it seems to be fragile because it looks like it is not guarantee by the standard from the text you pasted). Of course we could look at the process affinity to find out this number, but then we should make it clear somewhere. I suppose you already know so I just write it for the record, but to control at runtime what omp_get_max_threads returns, one can use omp_set_num_threads to specify a value. Anyway, I would follow your choices at the end :)

    My point was more about the loop and the test to set the kernel, I do not understand them.

    With the following code I am sure that all threads are going to init a kernel:

    #pragma parallel
    {
         kernel[omp_get_thread_num()] = init_kernel();
    }

    You code is expected to do the same:

            #pragma omp parallel for shared(kernels)
            for(int i = 0; i < omp_get_num_threads(); ++i) {
                if(i == omp_get_thread_num()) {
                    this->kernels[i] = new KernelClass(*inKernels);
                }
            }

    But in your case, it seems a more complicated way to do it, is not it? Plus, the test must always be true (or the code does not do what it is implemented for)?

  • I do not want to be too invasive, I am just asking because maybe there is something I do not know such that there is a difference between both codes?

Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment