diff --git a/readme.md b/readme.md index e6eb567aa08bbb87556cd5bdbbca144d2d8b1ee9..2c352faf2dee778836ef15fe9b2fc7d53997c4d7 100644 --- a/readme.md +++ b/readme.md @@ -1,18 +1,19 @@ # Xi-Learning -Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://www.arxiv.org). +Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://arxiv.org/abs/2110.15701). Authors: [Chris Reinke](https://www.scirei.net/), [Xavier Alameda-Pineda](http://xavirema.eu/) -Copyright: INRIA, 2021 +Copyright: [INRIA](https://www.inria.fr/fr), 2021 -License: GNU General Public License v3.0 or later +License: [GNU General Public License v3.0 or later](https://gitlab.inria.fr/robotlearn/xi_learning/-/blob/master/license.txt) -<!-- ## Introduction +Blog post with more details about the project: [Xi-Learning](https://team.inria.fr/robotlearn/xi_learning/) -Xi-Learning is a Reinforcement Learning framework for Transfer Learning between tasks that differ in their reward functions. -It is based on the concept of Successor Features. -Xi agents learn --> + +## Abstract + +Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξ-learning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions but also in the case of linearly decomposable reward functions. ## Setup