From 720b3badad6f6eb02b8c5a5894b8a06eead74fa3 Mon Sep 17 00:00:00 2001 From: Chris Reinke <chris.reinke@inria.fr> Date: Thu, 4 Nov 2021 09:35:53 +0100 Subject: [PATCH] updated readme --- readme.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/readme.md b/readme.md index e6eb567..2c352fa 100644 --- a/readme.md +++ b/readme.md @@ -1,18 +1,19 @@ # Xi-Learning -Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://www.arxiv.org). +Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://arxiv.org/abs/2110.15701). Authors: [Chris Reinke](https://www.scirei.net/), [Xavier Alameda-Pineda](http://xavirema.eu/) -Copyright: INRIA, 2021 +Copyright: [INRIA](https://www.inria.fr/fr), 2021 -License: GNU General Public License v3.0 or later +License: [GNU General Public License v3.0 or later](https://gitlab.inria.fr/robotlearn/xi_learning/-/blob/master/license.txt) -<!-- ## Introduction +Blog post with more details about the project: [Xi-Learning](https://team.inria.fr/robotlearn/xi_learning/) -Xi-Learning is a Reinforcement Learning framework for Transfer Learning between tasks that differ in their reward functions. -It is based on the concept of Successor Features. -Xi agents learn --> + +## Abstract + +Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξ-learning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions but also in the case of linearly decomposable reward functions. ## Setup -- GitLab