Add group support - Server side
Following the development of the new launcher a first adaptation of the server was implemented (see nserver.py
) for MELISSA-DL
. In order to pave the way towards a use of the new launcher with MELISSA-SA
, a first important step consists in converting the simulation based server into a more generic group based version.
Development status:
-
Adding the group_size
option intooptions.py
->STUDY_OPTIONS['group_size'] = X
. -
Creating a group
class defined by agroup_size
, agroup_id
and a list ofsimulation
objects ->fault_tolerance.py
.Note: since the launcher will only monitor the full group, the
nb_failure
attribute should be attached to thegroup
instead of thesimulation
object. -
The introduction of such entity will require to define more elaborate functions to launch the group related jobs -> nserver.py
. -
Because launching grouped jobs relies on specific submission commands (e.g. openmpi mpmd and slurm heterogeneous) the launcher will at least need to know the group_size
argument. This will require a specific communication between both the server and the launcher (see issue #37) at the beginning of the study.
Side notes:
-
This need comes from the Sobol indices computation procedure which is based on the pick freeze method i.e. for
p
parameters, groups ofp+2
simulations are launched. -
When correctly dimensionalized, such allocation enables to keep group members on the same node.
-
Because group members share the same
MPI_COMM_WORLD
, the communicator must be initialized in a specific fashion. Here is an example inFortran
:
! The new MPI communicator is build by splitting MPI_COMM_WORLD by simulation inside the group.
! In the case of a single simulation group, this is equivalent to MPI_Comm_dup.
call mpi_comm_rank(MPI_COMM_WORLD, me, statinfo)
call MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_APPNUM, appnum, statinfo);
call MPI_Comm_split(MPI_COMM_WORLD, appnum, me, comm, statinfo);
call mpi_comm_rank(comm, me, statinfo)
call mpi_comm_size(comm, np, statinfo)
- The addition of this feature will be the occasion to clearly distinguish client/group and simulation.