# eXplainable GP-based RL Policy
A python implementation of symbolic policy for interpretable reinforcement learning using genetic programming.  
## Setup
### Requirements
- numpy
- deap
- qdpy
- pygraphviz (for easier understanding of program)

### Installing dependencies
- clone this repo
- install with ` python -m pip install -r requirement.txt ` for base installation (no pygraphiz)
- install with ` python -m pip install -r requirement_with_pygrphivz.txt `  if you want to visualize program easily
- install with `conda env create -f environment.yml` if you want to create a separate python environment with all the dependencies

## How to use
### Core functions
Core function and representation are in the GPRL folder.  
```
.  
|── GPRL  
|   |── containers           # Fix a bug in qdpy grid 0.1.2.1 (current last stable version)
|   |── genetic_programming  # Individual definition of linear GP and Team for deap  
|   |── MCTS                 # Nested Monte-Carlo code  
|   |── utils                # Various utils and callback functions to run easily experiments  
|   |── algorithms.py        # deap like algorithm using toolbox  
|   |── factory.py           # Abstract class to make better used of toolbox between script  
|   |── UCB.py               # Subclass of deap base Fitness to use UCB  
└── ...
```
By using DEAP and these functions, we can conduct our experiments. Examples can be found at :  
<https://github.com/DEAP/deap/tree/master/examples>  

### Experiments script
Each experiment code is available  in separate script using DEAP. More details can be found in the `Readme.md` of experiments folder

### Main evolve script
The `evolve.py` script use configuration files in `.yml` to launch experiments. This script let you run QD, Tree GP and Linear GP.  
Basically, you can run an experiment with this command :
```
python evolve.py --conf /path/to/conf.yml
```
By default, the results is saved in the `results/` folder.  

### yaml configurations file
Here is a skeleton for the `conf.yml` file. This shows how an experiment can be set up
```
algorithm:
  name: # algorithm name in deap (algorithms.<name>) or algoritm name from GPRL (algo.name) 
  args:
    # args of the algorithm chosen (lambda_, mu, ngen ...)

population:
  init_size: #size of the population (int)
  
selection:
  name: # selection method for the evolutionnay algorithm. ex: selTournament (from deap.tools.sel*)
  args:
    # argument for the selection method. ex: tournsize: 5

individual: # Individual representation ("Tree" or "Linear")

params:
  env: # env-id from the gym/bullet env. ex:"MountainCarContinuous-v0"
  function_set: #Function set size ("small" or "extended")
  c: # Exploration constante for UCB (float)
  n_episodes: # Number of episode per evaluation (int)
  n_steps: # Number of step per evaluation (int)
  gamma: # Discount factor γ (float in [0,1])
  n_thread: # Number of thread to use (int)
  ... (many others depending of the individual representation (Tree or Linear). see conf/ for examples)
seed: #set seed for random
```

## See the result
Once an experiment is finished, you can see inspect results like in `tutorial.ipynb`. This notebook show how to see and run an individual from a saved population.

## Environments

| **Environment**       | **Name**                            |
|-----------------------|-------------------------------------|
| Cartpole              | CartPole-v1                         |
| Acrobot               | Acrobot-v1                          |
| MountainCar           | MountainCarContinuous-v0            |
| Pendulum              | Pendulum-v0                         |
| InvDoublePend         | InvertedDoublePendulumBulletEnv-v0  |
| InvPendSwingUp        | InvertedPendulumSwingupBulletEnv-v0 |
| LunarLander           | LunarLanderContinuous-v2            |
| BipedalWalker         | BipedalWalker-v3                    |
| BipedalWalkerHardCore | BipedalWalkerHardcore-v2            |
| Hopper                | HopperBulletEnv-v0                  |