updated readme

720b3bad · REINKE Chris · ac40db59 · 720b3bad
Commit 720b3bad authored 3 years ago by REINKE Chris
--- a/readme.md
+++ b/readme.md
 # Xi-Learning
-Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://www.arxiv.org).
+Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://arxiv.org/abs/2110.15701).
 Authors: [Chris Reinke](https://www.scirei.net/), [Xavier Alameda-Pineda](http://xavirema.eu/)
-Copyright: INRIA, 2021
+Copyright: [INRIA](https://www.inria.fr/fr), 2021
-License: GNU General Public License v3.0 or later
+License: [GNU General Public License v3.0 or later](https://gitlab.inria.fr/robotlearn/xi_learning/-/blob/master/license.txt)
-<!-- ## Introduction
+Blog post with more details about the project: [Xi-Learning](https://team.inria.fr/robotlearn/xi_learning/)
-Xi-Learning is a Reinforcement Learning framework for Transfer Learning between tasks that differ in their reward functions.
-It is based on the concept of Successor Features.
+## Abstract
-Xi agents learn  -->
+Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξ-learning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions but also in the case of linearly decomposable reward functions.
 ## Setup