# eXplainable GP-based RL Policy A python implementation of symbolic policy for interpretable reinforcement learning using genetic programming. ## Setup ### Requirements - numpy - deap - qdpy - pygraphviz (for easier understanding of program) ### Installing dependencies - clone this repo - install with ` python -m pip install -r requirement.txt ` for base installation (no pygraphiz) - install with ` python -m pip install -r requirement_with_pygrphivz.txt ` if you want to visualize program easily - install with `conda env create -f environment.yml` if you want to create a separate python environment with all the dependencies ## How to use ### Core functions Core function and representation are in the GPRL folder. ``` . |── GPRL | |── containers # Fix a bug in qdpy grid 0.1.2.1 (current last stable version) | |── genetic_programming # Individual definition of linear GP and Team for deap | |── MCTS # Nested Monte-Carlo code | |── utils # Various utils and callback functions to run easily experiments | |── algorithms.py # deap like algorithm using toolbox | |── factory.py # Abstract class to make better used of toolbox between script | |── UCB.py # Subclass of deap base Fitness to use UCB └── ... ``` By using DEAP and these functions, we can conduct our experiments. Examples can be found at : <https://github.com/DEAP/deap/tree/master/examples> ### Experiments script Each experiment code is available in separate script using DEAP. More details can be found in the `Readme.md` of experiments folder ### Main evolve script The `evolve.py` script use configuration files in `.yml` to launch experiments. This script let you run QD, Tree GP and Linear GP. Basically, you can run an experiment with this command : ``` python evolve.py --conf /path/to/conf.yml ``` By default, the results is saved in the `results/` folder. ### yaml configurations file Here is a skeleton for the `conf.yml` file. This shows how an experiment can be set up ``` algorithm: name: # algorithm name in deap (algorithms.<name>) or algoritm name from GPRL (algo.name) args: # args of the algorithm chosen (lambda_, mu, ngen ...) population: init_size: #size of the population (int) selection: name: # selection method for the evolutionnay algorithm. ex: selTournament (from deap.tools.sel*) args: # argument for the selection method. ex: tournsize: 5 individual: # Individual representation ("Tree" or "Linear") params: env: # env-id from the gym/bullet env. ex:"MountainCarContinuous-v0" function_set: #Function set size ("small" or "extended") c: # Exploration constante for UCB (float) n_episodes: # Number of episode per evaluation (int) n_steps: # Number of step per evaluation (int) gamma: # Discount factor γ (float in [0,1]) n_thread: # Number of thread to use (int) ... (many others depending of the individual representation (Tree or Linear). see conf/ for examples) seed: #set seed for random ``` ## See the result Once an experiment is finished, you can see inspect results like in `tutorial.ipynb`. This notebook show how to see and run an individual from a saved population. ## Environments | **Environment** | **Name** | |-----------------------|-------------------------------------| | Cartpole | CartPole-v1 | | Acrobot | Acrobot-v1 | | MountainCar | MountainCarContinuous-v0 | | Pendulum | Pendulum-v0 | | InvDoublePend | InvertedDoublePendulumBulletEnv-v0 | | InvPendSwingUp | InvertedPendulumSwingupBulletEnv-v0 | | LunarLander | LunarLanderContinuous-v2 | | BipedalWalker | BipedalWalker-v3 | | BipedalWalkerHardCore | BipedalWalkerHardcore-v2 | | Hopper | HopperBulletEnv-v0 |