eXplainable GP-based RL Policy
A python implementation of symbolic policy for interpretable reinforcement learning using genetic programming.
Setup
Requirements
- numpy
- deap
- qdpy
- pygraphviz (for easier understanding of program)
Installing dependencies
- clone this repo
- install with
python -m pip install -r requirement.txt
for base installation (no pygraphiz) - install with
python -m pip install -r requirement_with_pygrphivz.txt
if you want to visualize program easily
How to use
Core functions
Core function and representation are in the GPRL folder.
.
|── GPRL
| |── containers # Fix a bug in qdpy grid 0.1.2.1 (current last stable version)
| |── genetic_programming # Individual definition of linear GP and Team for deap
| |── MCTS # Nested Monte-Carlo code
| |── utils # Various utils and callback functions to run easily experiments
| |── algorithms.py # deap like algorithm using toolbox
| |── factory.py # Abstract class to make better used of toolbox between script
| |── UCB.py # Subclass of deap base Fitness to use UCB
└── ...
By using DEAP and these functions, we can conduct our experiments. Examples can be found at :
https://github.com/DEAP/deap/tree/master/examples
Experiments script
Each experiment code is available in separate script using DEAP. More details can be found in the Readme.md
of experiments folder
Main evolve script
The evolve.py
script use configuration files in .yml
to launch experiments. This script let you run QD, Tree GP and Linear GP.
Basically, you can run an experiment with this command :
python evolve.py --conf /path/to/conf.yml
By default, the results is saved in the results/
folder.
yaml configurations file
Here is a skeleton for the conf.yml
file. This shows how an experiment can be set up
algorithm:
name: # algorithm name in deap (algorithms.<name>) or algoritm name from GPRL (algo.name)
args:
# args of the algorithm chosen (lambda_, mu, ngen ...)
population:
init_size: #size of the population (int)
selection:
name: # selection method for the evolutionnay algorithm. ex: selTournament (from deap.tools.sel*)
args:
# argument for the selection method. ex: tournsize: 5
individual: # Individual representation ("Tree" or "Linear")
params:
env: # env-id from the gym/bullet env. ex:"MountainCarContinuous-v0"
function_set: #Function set size ("small" or "extended")
c: # Exploration constante for UCB (float)
n_episodes: # Number of episode per evaluation (int)
n_steps: # Number of step per evaluation (int)
gamma: # Discount factor γ (float in [0,1])
n_thread: # Number of thread to use (int)
... (many others depending of the individual representation (Tree or Linear). see conf/ for examples)
seed: #set seed for random
See the result
Once an experiment is finished, you can see inspect results like in tutorial.ipynb
. This notebook show how to see and run an individual from a saved population.