Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Melissa Melissa
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • melissamelissa
  • MelissaMelissa
  • Merge requests
  • !103

Tensorflow server

  • Review changes

  • Download
  • Patches
  • Plain diff
Merged SCHOULER Marc requested to merge tensorflow-server into develop Mar 14, 2023
  • Overview 3
  • Commits 74
  • Pipelines 55
  • Changes 23

This MR introduces a tensorflow deep learning server.

To-do:

  • tf_server.py
  • config_mpi_tf.json
  • tf_heatpde_dl_server.py
  • tf-plot-result-dl.py
  • dataset.py
  • tensorboard_logger
  • CI test
  • multi-gpu test

Note: as opposed to torch.DDP, it seems that tensorflow.MultiWorkerMirroredStrategy requires multiple adjustments for multi-gpu per node execution to work in the Melissa frame.

With TF, setting a specific GPU can be done by setting an environment variable CUDA_VISIBLE_DEVICES=GPU_ID before running the server as explained on the following threads:

  • Tensorflow set CUDA_VISIBLE_DEVICES within jupyter,
  • How do I select which GPU to run a job on?,
  • How to set specific gpu in tensorflow?.

Another solution would be to use tf.config.set_visible_devices before defining the distributed strategy:

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.set_visible_devices(physical_devices[self.rank], 'GPU')

This would perhaps avoid the necessity to hard code the CUDA environment variable declaration before executing the server.

edit: this was successfully tested on JZ and was hence implemented.

Additional changes and remarks:

Since Jean-Zay prevents users from loading both torch and tensorflow at the same time, the following modifications were made:

  • The MelissaIterableDataset was subclassed in TorchMelissaIterableDataset and TfMelissaIterableDataset classes,
  • Because of the torch dependency induced by SummaryWriter, new TorchTensorboardLogger and TfTensorboardLogger classes were introduced.

Finally, TensorboardLogger was initialized before the distribution strategy which is not allowed:

Important: There is currently a TensorFlow limitation on declaring the strategy; it must be done before any other call to a TensorFlow operation.

It was then moved at the end of the setup_environment method of each DL server.

Edited May 04, 2023 by SCHOULER Marc
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: tensorflow-server