Changes

Sylvain DOUTE · d34ba379
--- a/Initialisation_and_training_of_the_GLLiM_model.md
+++ b/Initialisation_and_training_of_the_GLLiM_model.md
+#### Initialisation and training of the GLLiM model
+
+The Gaussian Locally Linear Mapping (GLLiM) model is a parametric statistical model closely related to Gaussian mixtures. It can describe the direct and inverse interaction between X and Y by a combination of local affine transformations and is adapted to solving inversion regression problems in a Bayesian framework. We refer the reader to [Kugler et al. 2020 and Deleforge et al., 2014] for a definition and detailed description of the model. 
+
+The local transformation function $`\tau_k`$ from X to Y is given by:
+
+```math
+Y=\sum_{k=1}^{K} I(Z=k)\left(A_{k} X+b_{k}+E_{k}\right)
+```
+
+where $`A_{k} \in R^{D \times L}`$ and $`b_{k} \in R^{D}`$ define the transformation $`\tau_{k}`$, $`Z`$ is a latent variable ($`Z=k`$ if and only if Y is the image of X by the transformation $`\tau_{k}`$), and $`I`$ is the indicator function. $`E_{k} \in R^{D}`$ represents both the noise errors in the observations and the transformation errors.
+
+Under the assumption that $`E_{k}`$ is a centered Gaussian variable with a covariance matrix $`\Sigma_{k} \in R^{D \times D}`$ which does not depend on $`X`$ and $`Y`$, we obtain :
+
+```math
+p(Y=y \mid X=x, Z=k ; \theta)=\mathcal{N}\left(y ; A_{k} x+b_{k}, \Sigma_{k}\right)
+```
+
+In addition the variable $`X`$ is supposed to follow a mixture of $`K`$ Gaussians components which gives:
+
+```math
+\begin{gathered}
+p(X=x \mid Z=k ; \theta)=\mathcal{N}\left(x ; c_{k}, \Gamma_{k}\right) \\
+p(Z=k ; \theta)=\pi_{k}
+\end{gathered}
+```
+
+where $`c_{k} \in R^{L}`$ the centers, $`\Gamma_{k} \in R^{L \times L}`$ the covariance matrices et $`\sum_{k=1}^{K} \pi_{k}=1`$ the component weights.
+
+The model splits the space $`R^{L}`$ into $`k`$ partitions $`R_{k}`$, where $`R_{k}`$ is the region where the transformation $`\tau_{k}`$ is most probable. The GLLiM parameter vector is thus:
+
+```math
+\theta=\left\{c_{k}, \Gamma_{k}, \pi_{k}, A_{k}, b_{k}, \Sigma_{k}\right\}_{k=1}^{K}
+```
+
+##### Learning step
+
+The expectation-maximization algorithm (often abbreviated as EM), proposed by (Dempster et al., 1977), is an iterative algorithm which makes it possible to find the maximum likelihood parameters of a probabilistic model. We will see in this section an adaptation of the algorithm to estimate the parameter vector $`\theta`$ of the GLLiM model.
+
+The EM algorithm is defined as follows:
+
+- Initialisation de $`\theta^{0}`$ (see next section)
+- $`i=0`$
+- As long as the algorithm has not converged to a local maximum likelihood :
+  - Step E: Calculate the expectation of the likelihood taking into account the last observed variables.
+  - Step M: Estimate the maximum likelihood of the parameters by maximizing the likelihood found in step E.
+- $`i=i+1`$
+- end
+
+In the case of the GLLiM model, the EM algorithm is described as follows:
+
+The E step is determined by:
+
+```math
+r_{n k}:=P\left(Z=k \mid X, Y=x_{n}, y_{n}\right)=\frac{\pi_{k} \mathcal{N}\left(y_{n} ; A_{k} x_{n}+b_{k} \Sigma_{k}\right) \mathcal{N}\left(x_{n} ; c_{k}, \Gamma_{k}\right)}{\sum_{j=1}^{K} \pi_{j} \mathcal{N}\left(y_{n} ; A_{k} x_{n}+b_{k} \Sigma_{j}\right) \mathcal{N}\left(x_{n} ; c_{j}, \Gamma_{j}\right)}
+```
+
+the learning database being $`\left\{x_{n},y_{n}\right\}_{n=1}^{n=N} \in R^{L}`$. 
+
+Noting $`r_{k}=\sum_{n=1}^{N} r_{n k}`$, the step M is given by:
+
+```math
+\begin{aligned}
+\pi_{k} &=\frac{r_{k}}{N} \\
+c_{k} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}} x_{n} \\
+\Gamma_{k} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}}\left(x_{n}-c_{k}\right)\left(x_{n}-c_{k}\right)^{T} \\
+A_{k} &=Y_{k} X_{k}\left(X_{k} X_{k}^{T}\right)^{-1} \\
+b_{k} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}}\left(y_{n}-A_{k} x_{n}\right) \\
+\Sigma_{k} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}}\left(y_{n}-A_{k} x_{n}-b_{k}\right)\left(y_{n}-A_{k} x_{n}-b_{k}\right)^{T}
+\end{aligned}
+```
+
+where
+
+```math
+\begin{aligned}
+X_{k} &=\frac{1}{\sqrt{r_{k}}}\left[\sqrt{r_{1 k}}\left(x_{1}-\overline{x_{k}}\right) \ldots \sqrt{r_{N k}}\left(x_{N}-\overline{x_{k}}\right)\right], \\
+Y_{k} &=\frac{1}{\sqrt{r_{k}}}\left[\sqrt{r_{1 k}}\left(y_{1}-\overline{y_{k}}\right) \ldots \sqrt{r_{N k}}\left(y_{N}-\overline{y_{k}}\right)\right], \\
+\overline{x_{k}} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}} x_{n k}, \\
+\overline{y_{k}} &=\sum_{n=1}^{N} \frac{r_{n k}}{r_{k}} y_{n k}
+\end{aligned}
+```
+
+This training algorithm ("GLLiM-EM method") is recommended if the covariance matrices $`\Gamma_{k}\left(\operatorname{resp} \Sigma_{k}\right)`$ are diagonal or isotropic matrices (same variance imposed for all the variables). It is very sensitive to the initialisation of the parameter vector $`\theta`$. So, in the case where no constraint is available, it is advised to use the joint Gaussian Mixtures Model (GMM) model equivalent to the GLLiM model (Deleforge et al., 2014) to carry out the training phase ("GMM-EM method"). This equivalence is very interesting in implementation because it makes it possible to use existing libraries of the EM algorithm on the GMMs. The idea is to transform the GLLiM model into GMM, train on the GMM with the EM algorithm and then return to the GLLiM model. Note that the GMM-EM method is itself initialised by an iterative kmeans algorithm. 
+
+##### Initialisation step
+
+There are two ways to generate an initial proposition for vector $`\theta`$:
+
+> "Fixed" Initialisation strategy : we take uniform $`Z`$ (i.e. $`\left.\pi_{k}=\frac{1}{K}\right), c_{k}`$ uniformly distributed in the parameter space, and $`\Gamma_{k }=\sqrt{\frac{1}{k^{1 / L}}} * I_{L}`$. Then we build a mixture model only on $`X`$ parameterized by the means $`c_{k}`$, the covariance matrices $`\Gamma_{k}`$ and the weights $`\pi_{k}`$. The posterior probability law of the mixture makes it possible to construct the initial vector $`\theta^0`$ of the GLLiM using step M of the GLLiM-EM algorithm.
+
+> "Multiple" Initialisation strategy : In this case, we use the idea of fixed initialization but this time we train the mixture model only on $`X`$ before using its posterior probability law. Once the initial vector $`\theta`$ of the GLLiM has been constructed, a few iterations of the GLLiM-EM algorithm are applied. At the end, we save the vector $`\theta^0`$ obtained and the value of the likelihood reached. This initialization is repeated several times, so we obtain a set of possible initial values of the vector $`\theta^0`$, we choose the one that maximizes the likelihood.
+
+
+