|
|
[Damaris](https://project.inria.fr/damaris/) is a middleware for I/O and data management targeting large-scale, MPI-based HPC simulations. It initially proposed to dedicate cores for asynchronous I/O in multicore nodes of recent HPC platforms, with an emphasis on ease of integration in existing simulation, efficient resource usage (with the use of shared memory) and simplicity of extension through plugins. Over the years,[Damaris](https://project.inria.fr/damaris/) has evolved into a more elaborate system, providing the possibility to use dedicated cores or dedicated nodes to data processing and I/O. It proposes a seamless connection to the VisIt software to enable in situ visualization with minimum impact on run time. [Damaris](https://project.inria.fr/damaris/) provides an extremely simple API and can be easily integrated in existing large-scale simulations.
|
|
|
|
|
|
## Problem Statement
|
|
|
Most HPC simulations work through a series of iterations, generating a large dataset on each iteration, as a result:
|
|
|
|
|
|
* They triggers a heavy I/O burst at each iteration, that leads to inefficient I/O management and unpredictable variability in execution time (a.k.a jitter)
|
|
|
* In the usual approach, the datasets are shipped to some auxiliary post-processing platforms for analysis and visualization
|
|
|
* This I/O transfer is very costly, and no output is available till the end of the post-processing visualization phase
|
|
|
|
|
|
## Solution
|
|
|
To the mentioned problem, [Damaris](https://project.inria.fr/damaris/), that is a middleware for data management targeting large-scale, MPI-based HPC simulations is designed.
|
|
|
|
|
|
* "In-situ" data analysis and visualization by some dedicated cores/nodes of the simulation platform, in parallel with the computation
|
|
|
* Asynchronous and fast data transfer from HPC simulation application to Damaris using Damaris API
|
|
|
* Semantic-aware simulation dataset processing by extending Damaris through plug-ins
|
|
|
|
|
|
## Benefits
|
|
|
Any HPC simulation can be benefited from Damaris for its I/O optimization:
|
|
|
|
|
|
* Data analysis and visualization during the simulation, without any need to external data post-processing
|
|
|
* Effective usage of processing cores, by overlapping data processing and I/Os with computation
|
|
|
* No need to transfer huge simulation datasets to any auxiliary post-processing platform, but only processed results
|
|
|
* Easy integration with existing simulation applications through a simple API
|
|
|
* Integration with existing data analysis and visualization tools through plug-ins
|
|
|
|
|
|
## Use cases
|
|
|
Those simulation applications that model complex structures, dynamics, phenomena or behaviors, in order to predict their specific concerns to the highest possible degree of precision, can be considered as a beneficiary of [Damaris](https://project.inria.fr/damaris/). Some examples include:
|
|
|
|
|
|
* Computer Aided Engineering
|
|
|
* Geophysics and Oil Applications
|
|
|
* Weather Prediction and Tornado Simulation
|
|
|
* Numerical Analysis
|
|
|
* Aerospace Studies,
|
|
|
* Chemical and Pharmaceutical Studies,
|
|
|
* Energy Research,
|
|
|
* Computational Fluid Dynamics
|
|
|
|
|
|
## Technology
|
|
|
The following technologies has been adopted for development, benchmarking and validation of [Damaris](https://project.inria.fr/damaris/):
|
|
|
|
|
|
* Development Technologies: C++, MPI, Fortran (around 27,000 LOC)
|
|
|
* Supported platforms: From commodity clusters to supercomputers
|
|
|
* Extendability: Through plug-ins (C++, Fortran, Shell scripts, Python)
|
|
|
* Interface: Simple API in C++ and Fortran
|
|
|
* Validated on: Top500-class supercomputers (Titan, Jaguar, Kraken), IBM Blue Gene platforms, Cray *Blue Waters, French Grid5000
|
|
|
* Simulation codes: Tornadoes (CM1), Ocean-Land-Atmosphere (OLAM), Navier-Stokes equations (Nek5000)
|
|
|
* Visualization toolkits: VisIt, ParaView
|
|
|
|
|
|
## Comparaison
|
|
|
Compared to known I/O interfacing approaches, [Damaris](https://project.inria.fr/damaris/) shows better performance:
|
|
|
|
|
|
* Jitter-free I/O access and predictable write time compared to well-known approaches such as file-per-process and collective I/Os (on Green5000, Kraken and BluePrint)
|
|
|
* Almost 100% scalability for running CM1 simulation on Kraken with 10,000 cores
|
|
|
* Aggregate throughput improvement up to at least 6 times for running CM1 on Kraken and Grid5000
|
|
|
|
|
|
# Documentation
|
|
|
Here are some preliminary documents for Damaris.
|
|
|
|
|
|
## Environment Preparation
|
|
|
To start building Damaris, you need a Linux environment, prepared for installation. [This document](https://project.inria.fr/damaris/environment-preparation/) helps you to prepare your own Linux environment for Damaris compilation. To prepare your environment on Docker, you can take a look at [this document](https://project.inria.fr/damaris/docker-container/) as well.
|
|
|
|
|
|
## Compilation
|
|
|
For compiling Damaris, you should first configure the cmake files and then build it. Take a look at [this document](https://project.inria.fr/damaris/damaris-compilation/) to find out how to build Damaris. If you are going to use CLion to compile Damaris, take a look at [this guide](https://project.inria.fr/damaris/clion-ide/).
|
|
|
|
|
|
## Running Examples
|
|
|
After building Damaris successfully, you can run some examples. Check [this guide](https://project.inria.fr/damaris/running-examples/) to know more about developed examples and also running them.
|
|
|
|
|
|
## Continuous Integration
|
|
|
To check the integrity of Damaris frequently, Damaris benefits from Inria's continuous integration infrastructure to do automatic builds, deployments and tests. For more information about this technology and checking the last status of the build integrity, you can check [this page](https://ci.inria.fr/damaris/)
|
|
|
|
|
|
# Involved People and Organizations
|
|
|
[Damaris](https://project.inria.fr/damaris/) was developed by the KerData team at INRIA Rennes, within the framework of the Joint Laboratory for Extreme Scale Computing (JLESC), a collaboration between INRIA, the University of Illinois at Urbana Champaign, Argonne National Laboratory and Barcelona Supercomputing Center, and the Data@Exascale associated team between KerData, ANL and UIUC. |
|
|
\ No newline at end of file |