Mentions légales du service

Skip to content
Snippets Groups Projects
user avatar
SanchezVic34 authored
dd52218b
History

OpenSocInt: A Multi-modal Training Environment for Human-Aware Social Navigation

Framework for training and using a Deep Reinforcement Learning (DRL) Agent for Human-Aware social navigation in a 2D environment.

Example of scenarios

  • [Far-Left] : Scenario with randomly positioned static obstacle and robot with random start and goal position.
  • [Left-Center] : Scenario with humans and robot where everyone has a random start and goal position.
  • [Right-Center] : Scenario with humans and robot where everyone is positionned in a circle and has to go to the opposite direction.
  • [Far-Right] : Scenario with randomly positioned static obstacle, humans and robot having all random start and goal position.

Note : In all these example the robot takes random actions.

Example of trajectories on different scenarios with trained Agents

  • [Left] Comparison of two policies on a static obstacle avoidance task. During the training procedure, the start and goal positions are fixed and the obstacles are randomly positioned. We can observe that both policies are performing well while taking slighlty different trajectories.

  • [Center] Comparison of different policies on our door-walls scenario. In this scenario there are four static walls close to the room center that are always at these positions during the whole training procedure. Then there are four back doors perpendicular to the four front center walls. These four doors have a probability of 0.5 to be on a side and 0.5 to be on the other side. The start position is always in the center and the goal position is unniformly chosen between postiion over the four centered walls. This scenario was created to observe the benefit to use occupancy grid as input data (that contains data over beyond concealed object) compared to raycast input data (that contains data limited by concealed object). Here all agents where trained using raycast input data. We can observe that two policies over five are making directly the right choice of turning right while the other three are turning left and going bacward afterwards. We think it is interresting that the agent 'understands' it has to go back when encountering a door even though the data it received are not sequential.

  • [Right] Example of trajectory that we obtained will learning a human aware behavior. In this scenario, there are five humans placed in a circle that are all going straight to the opposite side of the circle. The goal of the agent is to go to the opposite side of the circle while avoiding human by respecting there social space. We can observe that the robot manage to avoid humans while maintaining a certain social distance.

Setup

Requirements

Requires Python 3.8.

Required packages (will be installed automatically):

  • torch == 1.13.1
  • numpy == 1.22.4
  • gymnasium == 0.28.1
  • tensorboard == 2.13.0
  • torch == 1.13.1
  • matplotlib == 3.7.1
  • tqdm == 4.65.0
  • scikit-image == 0.21.0
  • Cython == 0.29.35
  • experiment-utilities == 0.3.6
  • opencv-python == 4.8.0.74
  • scikit-fmm == 2023.4.2
  • mpi_sim

Installation of Development Environment

It is recommended to use a conda environment or virtualenv for development. The required packages can be found in requirements.txt. To create the environment and install the packages:

To download the code

git clone git@gitlab.inria.fr:robotlearn/OpenSocInt.git
cd OpenSocInt
bash pull_submodules.sh

To create the conda environment

conda create --name opensocintenv python=3.8
conda activate opensocintenv
conda install pip
pip install -U pip
python3.8 -m pip install -e .
python3.8 -m pip install -e modules/multiparty_interaction_simulator

General

On the one hand, there is our simulator for multi-agent interaction scenarios in a 2D environment called Multiparty Interaction Simulator. It allows to simulate social environment scenarios with static obstacles and humans. On the other hand there is our gym-like environment called multi-party interaction environment (mpi-env) that is an interface between the simulator and the Reinforcement Learning (RL) agent. It allows the RL agent to interact with the simulator through formated functions : step and reset. This framework allow to test different RL agent performance easily.

Examples

Training

To launch a training of a RL agent one can run train_policy.py.

Testing

To launch a testing of a RL agent one can run test_policy.py.

Description

Agent

This directory contains all the Reinforcement Learning algorithm implemented.

Autoencoder

This directory contains all the necessary modules for using an autoencoder. Useful for pre-training the encoder used for processing 2D Occupancy Grids data.

Dataset_maker

This directory contains all the necessary modules for creating a dataset to train the autoencoder.

Env

mpi_env

This environment uses the aforementionned multiparty interaction simulation to simulate interaction between an agent and its environment. The simulated environment is a squared room with four walls. The room can contain furnitures like round tables, chairs or benchs. It can also contain human moving around.

real_env

This environment is almost like the former one but it additionally allows to add an external global map to the simulation, it was created for sim2real application. While importing an occupancy mapping from a real room, one can use this environment to simulate the 2D interaction in a simulated version of a real room.

Exp

This folder contains the files for training and evaluating the agent's.

Networks

You can find under this folder different neural networks adapted to the RL algorithm. Depending if the environment has a discrete or continuous action space, the network outputs will be different.

PreNavEncoder

This folder contains the multi-modal PreNavEncoder. Depending on what is provided by the observation, this networks adapts itself to be compatible.

Plotter

This folders contains functions to test the agents performance and plot their policies.

Interpretability

One can plot saliency maps from the actor network for a given agent with saliency_map.py and obtain the following results :

Policy plot

Within policy_evaluation one can find several function to plot trajectories and compare policies.

Visualization

Within visualization one can find several function for visualization. For example, one can plot raycast projection or reconstruction of occupancy grid in the auto-encoder.

Trained_models

This folder contains some network weights that can be used to play around with the environement.