From 720b3badad6f6eb02b8c5a5894b8a06eead74fa3 Mon Sep 17 00:00:00 2001
From: Chris Reinke <chris.reinke@inria.fr>
Date: Thu, 4 Nov 2021 09:35:53 +0100
Subject: [PATCH] updated readme

---
 readme.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/readme.md b/readme.md
index e6eb567..2c352fa 100644
--- a/readme.md
+++ b/readme.md
@@ -1,18 +1,19 @@
 # Xi-Learning
 
-Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://www.arxiv.org).
+Source code of the Xi-Learning framework and its experimental evaluation for the paper: [Xi-learning: Successor Feature Transfer Learning for General Reward Functions](https://arxiv.org/abs/2110.15701).
 
 Authors: [Chris Reinke](https://www.scirei.net/), [Xavier Alameda-Pineda](http://xavirema.eu/)
 
-Copyright: INRIA, 2021
+Copyright: [INRIA](https://www.inria.fr/fr), 2021
 
-License: GNU General Public License v3.0 or later
+License: [GNU General Public License v3.0 or later](https://gitlab.inria.fr/robotlearn/xi_learning/-/blob/master/license.txt)
 
-<!-- ## Introduction
+Blog post with more details about the project: [Xi-Learning](https://team.inria.fr/robotlearn/xi_learning/)
 
-Xi-Learning is a Reinforcement Learning framework for Transfer Learning between tasks that differ in their reward functions.
-It is based on the concept of Successor Features.
-Xi agents learn  -->
+
+## Abstract
+
+Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξ-learning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions but also in the case of linearly decomposable reward functions.
 
 
 ## Setup
-- 
GitLab