[heterogeneous data] add exercises in mcvae: 1) check alpha parameter; 2) same...

[heterogeneous data] add exercises in mcvae: 1) check alpha parameter; 2) same exercise as the one in PLS section

[heterogeneous data] add exercises in mcvae: 1) check alpha parameter; 2) same...
[heterogeneous data] add exercises in mcvae: 1) check alpha parameter; 2) same exercise as the one in PLS section
3053dab4 · Luigi Antelmi · 46b94cb6 · 3053dab4
Commit 3053dab4 authored 4 years ago by Luigi Antelmi
--- a/heterogeneous_data/multivariate_models_v1_7.ipynb
+++ b/heterogeneous_data/multivariate_models_v1_7.ipynb
@@ -30,14 +30,12 @@
    "#!pip install sklearn\n",
    "#!pip install matplotlib\n",
    "#!pip install numpy\n",
-    "#!pip install torch torchvision\n",
-    " \n",
-    "!git clone https://gitlab.inria.fr/epione_ML/mcvae.git"
+    "#!pip install torch torchvision"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
@@ -50,16 +48,25 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "2.0.0\n"
+      "Cloning into 'mcvae'...\n",
+      "remote: Enumerating objects: 359, done.\u001b[K\n",
+      "remote: Counting objects: 100% (359/359), done.\u001b[K\n",
+      "remote: Compressing objects: 100% (224/224), done.\u001b[K\n",
+      "remote: Total 359 (delta 160), reused 296 (delta 114), pack-reused 0\u001b[K\n",
+      "Receiving objects: 100% (359/359), 4.72 MiB | 4.27 MiB/s, done.\n",
+      "Resolving deltas: 100% (160/160), done.\n",
+      "Checking connectivity... done.\n",
+      "Mcvae version:2.0.0\n"
     ]
    }
   ],
   "source": [
+    "!git clone https://gitlab.inria.fr/epione_ML/mcvae.git\n",
    "import sys\n",
    "import os\n",
    "sys.path.append(os.getcwd() + '/mcvae/src/')\n",
    "import mcvae\n",
-    "print(mcvae.__version__)"
+    "print('Mcvae version:' + mcvae.__version__)"
   ]
  },
  {
@@ -1653,16 +1660,28 @@
    "\n",
    "They are Gaussians with moments parametrized by Neural Networks (or a linear transformation layer in a simple case).\n",
    "\n",
-    "__Exercise__: why is convenient to use $\\log$ values for the parametrization the variance networks output (W_logvar, W_out_logvar)? \n",
-    "\n",
    "<img src=\"https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/vae.svg\" alt=\"img/vae.svg\">\n",
    "\n",
+    "### __Exercise__\n",
+    "Why is convenient to use $\\log$ values for the parametrization the variance networks output (W_logvar, W_out_logvar)?\n",
+    "\n",
    "## Sparse VAE\n",
    "To favour sparsity in the latent space we implement the latent distribution as follows:\n",
    "\n",
-    "$$q(\\mathbf{z|x}) = \\mathcal{N}(\\mathbf{z|\\mu_x; \\alpha \\odot \\mu_x^2})$$\n",
+    "$$q(\\mathbf{z|x}) = \\mathcal{N}(\\mathbf{z|\\mu_x; \\alpha \\odot \\mu_x^2}),$$\n",
+    "\n",
+    "where:\n",
    "\n",
-    "Tha parameter $\\alpha$ represents to the [odds](https://en.wikipedia.org/wiki/Odds) of pruning the $i$-th latent dimension according to (element-wise):\n",
+    "$$\n",
+    "\\mathbf{\\alpha \\odot \\mu_x^2} = \n",
+    "\\begin{bmatrix}\n",
+    "\\ddots &        & 0                        \\\\\n",
+    "       & \\alpha_i [\\mathbf{\\mu_x}]_i^2 &   \\\\\n",
+    "0      &        & \\ddots                   \n",
+    "\\end{bmatrix}.\n",
+    "$$\n",
+    "\n",
+    "Tha parameter $\\alpha_i$ represents the [odds](https://en.wikipedia.org/wiki/Odds) of pruning the $i$-th latent dimension according to:\n",
    "\n",
    "$$\\alpha_i = \\frac{p_i}{1 - p_i}$$\n",
    "\n",
@@ -1673,7 +1692,8 @@
    "\n",
    "<img src=\"https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/mcvae.svg\" alt=\"img/mcvae.svg\">\n",
    "\n",
-    "__Excercise__: sketch the Sparse MCVAE with 3 channels.[[solution]](https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/sparse_mcvae_3ch.svg)"
+    "### __Exercise__\n",
+    "Sketch the Sparse MCVAE with 3 channels. [[Solution]](https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/sparse_mcvae_3ch.svg)"
   ]
  },
  {
@@ -2164,11 +2184,15 @@
    "\n",
    "model_sparse1 = Mcvae(sparse=True, **init_dict)\n",
    "model_sparse1.to(DEVICE)\n",
-    "print(model_sparse1)\n",
-    "\n",
-    "print('Is the log_alpha parameter the same for every channel?')\n",
-    "log_alpha = model_sparse1.vae[0].log_alpha\n",
-    "print(np.all([log_alpha is vae.log_alpha for vae in model_sparse1]))"
+    "print(model_sparse1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "__Exercise.__ Check if the sparsity parameter log_alpha is the same for every VAE in the MCvae model.\n",
+    "- Hint: you can acces the model modules and parameters with the dot notation (model.submodel.parameter)"
   ]
  },
  {
@@ -2902,6 +2926,25 @@
    "plt.show()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "__Exercise.__ Compare Mcvae, Sparse Mcvae, and PLS\n",
+    "- Reuse the observations X1, X2, and X3 created earlier.\n",
+    "- Fit a Mcvae model to predict X3 from X1.\n",
+    "- Fit a Mcvae Sparse model to predict X3 from X1.\n",
+    "- Fit a PLS model to predict X3 from X1.\n",
+    "- Compare the predictions for all the three methods."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
  {
   "cell_type": "markdown",
   "metadata": {
@@ -4025,7 +4068,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.12"
+   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 %% Cell type:code id: tags:
  
 ``` python
 ## Uncomment the following lines to set up the python environment
 ## Probably not needed depending on where you are running this notebook
  
 #!pip install torch
 #!pip install nilearn
 #!pip install nibabel
 #!pip install pandas
 #!pip install sklearn
 #!pip install matplotlib
 #!pip install numpy
 #!pip install torch torchvision
-
-!git clone https://gitlab.inria.fr/epione_ML/mcvae.git
 ```
  
 %% Output
  
    fatal: destination path 'mcvae' already exists and is not an empty directory.
  
 %% Cell type:code id: tags:
  
 ``` python
+!git clone https://gitlab.inria.fr/epione_ML/mcvae.git
 import sys
 import os
 sys.path.append(os.getcwd() + '/mcvae/src/')
 import mcvae
-print(mcvae.__version__)
+print('Mcvae version:' + mcvae.__version__)
 ```
  
 %% Output
  
-    2.0.0
+    Cloning into 'mcvae'...
+    remote: Enumerating objects: 359, done.[K
+    remote: Counting objects: 100% (359/359), done.[K
+    remote: Compressing objects: 100% (224/224), done.[K
+    remote: Total 359 (delta 160), reused 296 (delta 114), pack-reused 0[K
+    Receiving objects: 100% (359/359), 4.72 MiB | 4.27 MiB/s, done.
+    Resolving deltas: 100% (160/160), done.
+    Checking connectivity... done.
+    Mcvae version:2.0.0
  
 %% Cell type:code id: tags:
  
 ``` python
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline
 from sklearn.cross_decomposition import PLSCanonical, CCA
 ```
  
 %% Cell type:markdown id: tags:
  
 # Random data generation
  
 %% Cell type:markdown id: tags:
  
 Our journey in the analysis of heterogeneous data starts with the generation of synthetic data. We first need to generate multivarite correlated random variables X and Y. To do so, we rely on the generative model we have seen during lesson:
  
 $$ z\sim\mathcal{N}(0,1),$$
 $$ X = z w_x,$$
 $$ Y = z w_y.$$
  
 %% Cell type:code id: tags:
  
 ``` python
 # #############################################################################
  
 # N subjects
 n = 500
 # here we define 2 Gaussian latents variables z = (l_1, l_2)
 np.random.seed(42)
 l1 = np.random.normal(size=n)
 l2 = np.random.normal(size=n)
  
 latents = np.array([l1, l2]).T
  
 # We define two random transformations from the latent space to the space of X and Y respectively
 transform_x = np.random.randint(-8,8, size = 10).reshape([2,5])
 transform_y = np.random.randint(-8,8, size = 10).reshape([2,5])
  
 # We compute data X = z w_x, and Y = z w_y
 X = latents.dot(transform_x)
 Y = latents.dot(transform_y)
  
 # We add some random Gaussian noise
 X = X + 2*np.random.normal(size = n*5).reshape((n, 5))
 Y = Y + 2*np.random.normal(size = n*5).reshape((n, 5))
  
  
 print('The latent space has dimension ' + str(latents.shape))
 print('The transformation for X has dimension ' + str(transform_x.shape))
 print('The transformation for Y has dimension ' + str(transform_y.shape))
  
 print('X has dimension ' + str(X.shape))
 print('Y has dimension ' + str(Y.shape))
 ```
  
 %% Output
  
    The latent space has dimension (500, 2)
    The transformation for X has dimension (2, 5)
    The transformation for Y has dimension (2, 5)
    X has dimension (500, 5)
    Y has dimension (500, 5)
  
 %% Cell type:code id: tags:
  
 ``` python
 dimension_to_plot = 0
  
 plt.scatter(X[:,1], Y[:,2])
 plt.xlabel('dimension X' + str(dimension_to_plot))
 plt.ylabel('dimension Y' + str(dimension_to_plot))
 plt.title('Generated data')
 plt.plot()
 ```
  
 %% Output
  
    []
  

  
 %% Cell type:markdown id: tags:
  
 ## PLS and scikit-learn: basic use
  
  
 Our newly generated data can be already used to test the PLS and CCA provided by standard machie learning packages, such as scikit-learn.
  
 %% Cell type:code id: tags:
  
 ``` python
 ##########################################################
 # We first split the data in trainig and validation sets
  
 # The training set is composed by a random sample of dimension N/2
 train_idx = np.random.choice(range(X.shape[0]), size = int(X.shape[0]/2), replace = False)
 X_train = X[train_idx, :]
  
 # The testing set is composed by the remaining subjects
 test_idx = np.where(np.in1d(range(X.shape[0]), train_idx, assume_unique =True, invert = True))[0]
 X_test = X[test_idx, :]
  
 # We reuse the same indices to split the data Y
 Y_train = Y[train_idx, :]
 Y_test = Y[test_idx, :]
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 #######################################
 # We fit PLS as provided by scikit-learn
  
 #Defining PLS object
 plsca = PLSCanonical(n_components=2)
  
 #Fitting on train data
 plsca.fit(X_train, Y_train)
  
 #We project the training data in the latent dimension
 X_train_r, Y_train_r = plsca.transform(X_train, Y_train)
 #We project the testing data in the latent dimension
 X_test_r, Y_test_r = plsca.transform(X_test, Y_test)
 ```
  
 %% Cell type:markdown id: tags:
  
 We note that the projections in the latent space retrieved by PLS are indeed correlated. The different dimensions of the projections are however uncorrelated.
  
 %% Cell type:code id: tags:
  
 ``` python
 # Scatter plot of scores
 # ~~~~~~~~~~~~~~~~~~~~~~
 # 1) On diagonal plot X vs Y scores on each components
 plt.figure(figsize=(12, 8))
 plt.subplot(221)
 plt.scatter(X_train_r[:, 0], Y_train_r[:, 0], label="train",
            marker="o", c="b", s=25)
 plt.scatter(X_test_r[:, 0], Y_test_r[:, 0], label="test",
            marker="o", c="r", s=25)
 plt.xlabel("x scores")
 plt.ylabel("y scores")
 plt.title('Comp. 1: X vs Y (test corr = %.2f)' %
          np.corrcoef(X_test_r[:, 0], Y_test_r[:, 0])[0, 1])
  
 plt.legend(loc="best")
  
 plt.subplot(224)
 plt.scatter(X_train_r[:, 1], Y_train_r[:, 1], label="train",
            marker="o", c="b", s=25)
 plt.scatter(X_test_r[:, 1], Y_test_r[:, 1], label="test",
            marker="o", c="r", s=25)
 plt.xlabel("x scores")
 plt.ylabel("y scores")
 plt.title('Comp. 2: X vs Y (test corr = %.2f)' %
          np.corrcoef(X_test_r[:, 1], Y_test_r[:, 1])[0, 1])
  
 plt.legend(loc="best")
  
 # 2) Off diagonal plot components 1 vs 2 for X and Y
 plt.subplot(222)
 plt.scatter(X_train_r[:, 0], X_train_r[:, 1], label="train",
            marker="*", c="b", s=50)
 plt.scatter(X_test_r[:, 0], X_test_r[:, 1], label="test",
            marker="*", c="r", s=50)
 plt.xlabel("X comp. 1")
 plt.ylabel("X comp. 2")
 plt.title('X comp. 1 vs X comp. 2 (test corr = %.2f)'
          % np.corrcoef(X_test_r[:, 0], X_test_r[:, 1])[0, 1])
 plt.legend(loc="best")
  
  
  
 plt.subplot(223)
 plt.scatter(Y_train_r[:, 0], Y_train_r[:, 1], label="train",
            marker="*", c="b", s=50)
 plt.scatter(Y_test_r[:, 0], Y_test_r[:, 1], label="test",
            marker="*", c="r", s=50)
 plt.xlabel("Y comp. 1")
 plt.ylabel("Y comp. 2")
 plt.title('Y comp. 1 vs Y comp. 2 , (test corr = %.2f)'
          % np.corrcoef(Y_test_r[:, 0], Y_test_r[:, 1])[0, 1])
 plt.legend(loc="best")
  
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 #We can check the estimated projections
 print('X projections: \n' + str(plsca.x_weights_))
  
 print('Y projections: \n' + str(plsca.y_weights_))
 ```
  
 %% Output
  
    X projections:
    [[ 0.48433579 -0.25654367]
     [-0.51415729  0.28837788]
     [ 0.57083361  0.41523051]
     [-0.1600194   0.72474008]
     [-0.38678665 -0.39161076]]
    Y projections:
    [[ 0.36148438  0.39913339]
     [-0.39571466  0.70730378]
     [ 0.49905721 -0.35689321]
     [-0.48904463 -0.31141709]
     [-0.4738314  -0.34067659]]
  
 %% Cell type:code id: tags:
  
 ``` python
 # Prediction for the training data
  
 predicted_Y_train = plsca.predict(X_train)
  
 plt.scatter(predicted_Y_train[:,dimension_to_plot], Y_train[:,dimension_to_plot])
 plt.ylabel('target Y' + str(dimension_to_plot))
 plt.xlabel('predicted Y' + str(dimension_to_plot))
  
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 plsca_4 = PLSCanonical(n_components=4)
 plsca_4.fit(X_train, Y_train)
  
 predicted_Y_train = plsca_4.predict(X_train)
  
 print('X projections with 2 components: \n' + str(plsca.x_weights_))
  
 print('Y projections with 2 components: \n' + str(plsca.y_weights_))
  
 print('X projections with 4 components: \n' + str(plsca_4.x_weights_))
  
 print('Y projections with 4 components: \n' + str(plsca_4.y_weights_))
 ```
  
 %% Output
  
    X projections with 2 components:
    [[ 0.48433579 -0.25654367]
     [-0.51415729  0.28837788]
     [ 0.57083361  0.41523051]
     [-0.1600194   0.72474008]
     [-0.38678665 -0.39161076]]
    Y projections with 2 components:
    [[ 0.36148438  0.39913339]
     [-0.39571466  0.70730378]
     [ 0.49905721 -0.35689321]
     [-0.48904463 -0.31141709]
     [-0.4738314  -0.34067659]]
    X projections with 4 components:
    [[ 0.48433579 -0.25654367 -0.42402831  0.54660349]
     [-0.51415729  0.28837788  0.30201819  0.74306133]
     [ 0.57083361  0.41523051  0.64183921 -0.06465157]
     [-0.1600194   0.72474008 -0.51490022 -0.22931801]
     [-0.38678665 -0.39161076  0.22782708 -0.30383862]]
    Y projections with 4 components:
    [[ 0.36148438  0.39913339 -0.5105896   0.60783454]
     [-0.39571466  0.70730378  0.48505202  0.26714451]
     [ 0.49905721 -0.35689321  0.58283628  0.50546537]
     [-0.48904463 -0.31141709  0.17782666  0.30291833]
     [-0.4738314  -0.34067659 -0.36428334  0.46034359]]
  
 %% Cell type:code id: tags:
  
 ``` python
 # We can also predict Y from X
  
 predicted_Y_test = plsca.predict(X_test)
  
 plt.scatter(predicted_Y_test[:,dimension_to_plot], Y_test[:,dimension_to_plot])
 plt.ylabel('target Y' + str(dimension_to_plot))
 plt.xlabel('predicted Y' + str(dimension_to_plot))
  
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 **Exercise.** Generate a synthetic dataset of 200 observations.
 - Each observation is composed by 3 modalities X1, X2, and X3, of dimensions respectively of 4, 7, and 9.
 - Chose the appropriate latent variable representation for the generative model.
 - These modalities are corrupted with Gaussian noise, with different noise variance for each modality.
 - Fit a PLS model to predict X3 from X1.
  
 %% Cell type:code id: tags:
  
 ``` python
 ```
  
 %% Cell type:markdown id: tags:
  
 # Into the guts of latent variable models
  
 After playing with the builti-in implementation of PLS in scikit-learn, we are going to implement our own version based on the NIPALS method.
  
 ## NIPALS for PLS
  
 %% Cell type:code id: tags:
  
 ``` python
 # Nipals method for PLS
  
 n_components = 3
  
  
 # Defining empty arrays where to store results
  
 # Reconstruction from latent space to data
 loading_x = np.ndarray([X.shape[1],n_components])
 loading_y = np.ndarray([Y.shape[1],n_components])
  
 # Projections into the latent space
 weight_x = np.ndarray([X.shape[1],n_components])
 weight_y = np.ndarray([Y.shape[1],n_components])
  
 # Latent variables
 scores_x = np.ndarray([X.shape[0],n_components])
 scores_y = np.ndarray([Y.shape[0],n_components])
  
  
 # Initialization of data matrices
 current_X = X
 current_Y = Y
  
 for i in range(n_components):
    # Initialization of current latent variables as a data column
    t_x = current_X[:,0]
  
    # NIPALS iterations
    for _ in range(100):
        # estimating Y weights given data Y and latent variables from X
        w_y = current_Y.transpose().dot(t_x)/(t_x.transpose().dot(t_x))
        # normalizing Y weights
        w_y = w_y/np.sqrt(np.sum(w_y**2))
  
        # estimating latent variables from Y given data Y and Y weights
        t_y = current_Y.dot(w_y)
        # estimating X weights given data X and latent variables from Y
        w_x = current_X.transpose().dot(t_y)/(t_y.transpose().dot(t_y))
        # normalizing X weights
        w_x = w_x/np.sqrt(np.sum(w_x**2))
  
        # estimating latent variables from X given data X and X weights
        t_x = current_X.dot(w_x)
  
    # Weights are such that X * weights = t
    weight_x[:,i] = w_x
    weight_y[:,i] = w_y
  
    # Latent variables
    scores_x[:,i] = t_x
    scores_x[:,i] = t_y
  
    # Loadings obtained by regressing X on t (X = t * loadings)
  
    loading_x[:,i] = np.dot(current_X.T, t_x)/t_x.transpose().dot(t_x)
    loading_y[:,i] = np.dot(current_Y.T, t_y)/t_y.transpose().dot(t_y)
  
    # Deflation = current_data - current_reconstruction
  
    current_X = current_X - t_x.reshape(len(t_x),1).dot(w_x.reshape(1,len(w_x)))
    current_Y = current_Y - t_y.reshape(len(t_y),1).dot(w_y.reshape(1,len(w_y)))
  
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 print('The estimated projections for X are: \n' + str(weight_x))
  
 print('\n The estimated projections for Y are: \n' + str(weight_y))
 ```
  
 %% Output
  
    The estimated projections for X are:
    [[ 0.36753226  0.32695545  0.31901594]
     [-0.54017132 -0.51142438 -0.13635813]
     [ 0.73596167 -0.54668695 -0.38095079]
     [-0.07131382 -0.54068145  0.43213359]
     [-0.16251074  0.20085364 -0.74011644]]
    
     The estimated projections for Y are:
    [[ 0.22469208 -0.22321827  0.22848154]
     [-0.42360052 -0.82303817 -0.17028297]
     [ 0.34143196  0.23277327 -0.37414229]
     [-0.53070486  0.30658588 -0.6691619 ]
     [-0.60979721  0.35299218  0.57536058]]
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 12))
 plt.subplot(3,2,1)
 plt.scatter(scores_x[:,0], latents[:,0])
 plt.xlabel('PLS 0')
 plt.ylabel('ground truth 0')
 plt.subplot(3,2,2)
 plt.scatter(scores_x[:,0], latents[:,1])
 plt.xlabel('PLS 0')
 plt.ylabel('ground truth 1')
 plt.subplot(3,2,3)
 plt.scatter(scores_x[:,1], latents[:,0])
 plt.xlabel('PLS 1')
 plt.ylabel('ground truth 0')
 plt.subplot(3,2,4)
 plt.scatter(scores_x[:,1], latents[:,1])
 plt.xlabel('PLS 1')
 plt.ylabel('ground truth 1')
 plt.subplot(3,2,5)
 plt.scatter(scores_x[:,2], latents[:,0])
 plt.xlabel('PLS 2')
 plt.ylabel('ground truth 0')
 plt.subplot(3,2,6)
 plt.scatter(scores_x[:,2], latents[:,1])
 plt.xlabel('PLS 2')
 plt.ylabel('ground truth 1')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 Once that the PLS parameters are estimated, we can solve the regression problem for predicting Y from X. We adopt the scheme used in scikit-learn, where a rotation matrix is first estimated to accoung for non-cummutativity between projection (weights) and reconstruction (loadings).
  
  
 %% Cell type:code id: tags:
  
 ``` python
 # Identifying rotation from X to t
 # t_x * loadings_x = X
 # t_x * loadings_x.T * weight = X * weight
 # t_x =  X * weight * (loadings_x.T * weight)^-1 = X * rotations_x
  
 rotations_x = weight_x.dot(np.linalg.pinv(loading_x.T.dot(weight_x)))
  
 # Solving the regression from X to Y
 # Y = t_x * loadings_y.T
 # Y = X * rotations_x * loadings_y.T
  
 regression_coef = np.dot(rotations_x, loading_y.T)
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 18))
 for i in range(Y.shape[1]):
  plt.subplot(Y.shape[1], 1, i+1)
  plt.scatter(X.dot(regression_coef)[:,i], Y[:,i])
  plt.xlabel('predicted dimension ' + str(i))
  plt.ylabel('target dimension ' + str(i))
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # Comparing with SVD of covariance matrix
  
 eig_val_x, eig_vect, eig_val_y = np.linalg.svd(X.transpose().dot(Y))
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Eigenvalues for X \n' + str(np.real(eig_val_x[:,:3])))
 print('Estimated weights for X\n' + str(np.real(weight_x[:,:3])))
 ```
  
 %% Output
  
    Eigenvalues for X
    [[-0.36753226  0.32695545  0.31901594]
     [ 0.54017132 -0.51142438 -0.13635813]
     [-0.73596167 -0.54668695 -0.38095079]
     [ 0.07131382 -0.54068145  0.43213359]
     [ 0.16251074  0.20085364 -0.74011644]]
    Estimated weights for X
    [[ 0.36753226  0.32695545  0.31901594]
     [-0.54017132 -0.51142438 -0.13635813]
     [ 0.73596167 -0.54668695 -0.38095079]
     [-0.07131382 -0.54068145  0.43213359]
     [-0.16251074  0.20085364 -0.74011644]]
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Eigenvalues for Y \n' + str(np.real(eig_val_y.T[:,:3])))
 print('Estimated weights for Y\n' + str(np.real(weight_y[:,:3])))
 ```
  
 %% Output
  
    Eigenvalues for Y
    [[-0.22469208 -0.22321827  0.22848154]
     [ 0.42360052 -0.82303817 -0.17028297]
     [-0.34143196  0.23277327 -0.37414229]
     [ 0.53070486  0.30658588 -0.6691619 ]
     [ 0.60979721  0.35299218  0.57536058]]
    Estimated weights for Y
    [[ 0.22469208 -0.22321827  0.22848154]
     [-0.42360052 -0.82303817 -0.17028297]
     [ 0.34143196  0.23277327 -0.37414229]
     [-0.53070486  0.30658588 -0.6691619 ]
     [-0.60979721  0.35299218  0.57536058]]
  
 %% Cell type:code id: tags:
  
 ``` python
 # PLS in scikit-learn
  
 plsca = PLSCanonical(n_components=3, scale = False)
 plsca.fit(X, Y)
 ```
  
 %% Output
  
    PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=3,
                 scale=False, tol=1e-06)
  
 %% Cell type:code id: tags:
  
 ``` python
 print(plsca.x_weights_)
 print(plsca.y_weights_)
 ```
  
 %% Output
  
    [[ 0.36745572 -0.32646087 -0.30267791]
     [-0.5399922   0.51144303  0.11763542]
     [ 0.73610243  0.54625149  0.37318486]
     [-0.07130735  0.54144866 -0.38043117]
     [-0.16264435 -0.20072864  0.78137902]]
    [[ 0.22485955  0.22212135 -0.2169345 ]
     [-0.42328296  0.82287418  0.1493648 ]
     [ 0.34172183 -0.23372426  0.32376757]
     [-0.53074024 -0.30704996  0.68434385]
     [-0.60976283 -0.35303468 -0.59789433]]
  
 %% Cell type:markdown id: tags:
  
 ## NIPALS for CCA
  
 %% Cell type:code id: tags:
  
 ``` python
 import scipy
  
 # Nipals method for CCA
  
 # Defining empty arrays where to store results
  
 # Reconstruction from latent space to data
 loading_x_cca = np.ndarray([X.shape[1],n_components])
 loading_y_cca = np.ndarray([Y.shape[1],n_components])
  
 # Projections into the latent space
 scores_x_cca = np.ndarray([X.shape[0],n_components])
 scores_y_cca = np.ndarray([Y.shape[0],n_components])
  
 # Latent variables
 weight_x_cca = np.ndarray([X.shape[1],n_components])
 weight_y_cca = np.ndarray([Y.shape[1],n_components])
  
 # Initialization of data matrices
 current_X = X
 current_Y = Y
  
 for i in range(n_components):
    # Initialization of current latent variables as a data column
    t_x = current_X[:,0]
  
    # NIPALS iterations
    for _ in range(500):
        ## CCA variant
        # estimating Y weights given data Y and latent variables from X
        Y_pinv = np.linalg.pinv(current_Y)
        #Y_pinv = np.linalg.solve(current_Y.dot(current_Y.T),current_Y).T
        w_y = Y_pinv.dot(t_x)
  
        # normalizing Y weights
        w_y = w_y/np.sqrt(np.sum(w_y**2))
        # estimating latent variables from Y given data Y and Y weights
        t_y = current_Y.dot(w_y)
  
        ## CCA variant
        # estimating X weights given data X and latent variables from Y
        X_pinv = np.linalg.pinv(current_X)
        #X_pinv = np.linalg.solve(current_X.dot(current_X.T),current_X).T
        w_x = X_pinv.dot(t_y)
  
        # normalizing X weights
        w_x = w_x/np.sqrt(np.sum(w_x**2))
        # estimating latent variables from X given data X and X weights
        t_x = current_X.dot(w_x)
  
  
    # Weights are such that X * weights = t
    weight_x_cca[:,i] = w_x
    weight_y_cca[:,i] = w_y
  
    # Latent dimensions
    scores_x_cca[:,i] = t_x
    scores_x_cca[:,i] = t_y
  
    # Loadings obtained by regressing X on t (X = t * loadings)
  
    loading_x_cca[:,i] = np.dot(current_X.T, t_x)/t_x.transpose().dot(t_x)
    loading_y_cca[:,i] = np.dot(current_Y.T, t_y)/t_y.transpose().dot(t_y)
  
    # Deflation = current_data - current_reconstruction
  
    current_X = current_X - t_x.reshape(len(t_x),1).dot(w_x.reshape(1,len(w_x)))
    current_Y = current_Y - t_y.reshape(len(t_y),1).dot(w_x.reshape(1,len(w_y)))
  
  
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 print(weight_x_cca)
  
 plt.scatter(X.dot(weight_x_cca)[:,2], Y.dot(weight_y_cca)[:,2])
 ```
  
 %% Output
  
    [[ 0.24391049  0.36361822  0.79534955]
     [-0.47437693 -0.58135481  0.07701894]
     [ 0.82921211 -0.45865491 -0.17520398]
     [-0.04561038 -0.56190597  0.56872597]
     [-0.16062744  0.06087473 -0.0856826 ]]
  
    <matplotlib.collections.PathCollection at 0xa19d5a2e8>
  

  
 %% Cell type:code id: tags:
  
 ``` python
 cca = CCA(n_components=3, scale = False)
 cca.fit(X,Y)
 print(cca.x_weights_)
  
 plt.scatter(X.dot(weight_x_cca)[:,2], Y.dot(weight_y_cca)[:,2])
 ```
  
 %% Output
  
    [[ 0.2415487  -0.32772785 -0.24927466]
     [-0.47054284  0.58859583  0.10763174]
     [ 0.83233244  0.44252343  0.28013453]
     [-0.04077413  0.58734795 -0.35211385]
     [-0.16063576 -0.07310815  0.85077496]]
  
    <matplotlib.collections.PathCollection at 0x10624a160>
  

  
 %% Cell type:markdown id: tags:
  
 ## Reduced Rank Regression
  
 We finally review reduced rank regression through eigen-decomposition.
  
 %% Cell type:code id: tags:
  
 ``` python
 # Reduced Rank Regression
  
 n_components = 2
 Gamma = np.eye(n_components)
  
 SYX = np.dot(Y.T,X)
  
 SXX = np.dot(X.T,X)
  
 U, S, V = np.linalg.svd(np.dot(SYX, np.dot(np.linalg.pinv(SXX), SYX.T)))
  
 A = V[0:n_components, :].T
  
 B = np.dot(np.dot(A.T,SYX), np.linalg.pinv(SXX))
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 A
 ```
  
 %% Output
  
    array([[-0.24936409, -0.19522003],
           [ 0.3233369 , -0.86712778],
           [-0.31172187,  0.27190596],
           [ 0.56300866,  0.2411439 ],
           [ 0.64739595,  0.27909733]])
  
 %% Cell type:code id: tags:
  
 ``` python
 B
 ```
  
 %% Output
  
    array([[-0.17847633,  0.34992334, -0.88917092, -0.05742634,  0.16716151],
           [ 0.33903723, -0.60088957, -0.58551613, -0.63995952,  0.10694492]])
  
 %% Cell type:code id: tags:
  
 ``` python
 regression_coef_rrr = np.dot(A,B)
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.scatter(np.dot(X,regression_coef)[:,0],Y[:,0])
 plt.xlabel('predicted dimension 0')
 plt.ylabel('target dimension 0')
 plt.title('RRR prediction')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 ## Sparsity in latent variable models
  
 We now focus on the effect of spurious variables in mutivariate models. To explore this new setting, we are going to add spurious random features to our data matrices X and Y.
  
 %% Cell type:code id: tags:
  
 ``` python
 ## Adding 3 random dimensions
 ## No association is expected from these features
  
 X_ext = np.hstack([X,np.random.randn(n*3).reshape([n,3])])
 Y_ext = np.hstack([Y,np.random.randn(n*3).reshape([n,3])])
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.scatter(X_ext[:,0], Y_ext[:,-1])
 plt.xlabel('X dimension 0')
 plt.ylabel('Y random dimension')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 from sklearn import linear_model
  
 n_components = 3
  
 #### Sparse PLS via regularization in NIPALS [Waaijenborg, et al 2007]
 # Everything is as for the standard NIPALS algorithm, with the added sparse estimation step
  
 loading_x_sparse = np.ndarray([X_ext.shape[1],n_components])
 loading_y_sparse = np.ndarray([Y_ext.shape[1],n_components])
  
 scores_x_sparse = np.ndarray([X_ext.shape[0],n_components])
 scores_y_sparse = np.ndarray([Y_ext.shape[0],n_components])
  
 weight_x_sparse = np.ndarray([X_ext.shape[1],n_components])
 weight_y_sparse = np.ndarray([Y_ext.shape[1],n_components])
  
 current_X = X_ext
 current_Y = Y_ext
  
 ## Penalty parameter for regularization
 penalty = 10
  
 eps = 1e-4
  
 for i in range(n_components):
    t_x = current_X[:,0]
    for _ in range(100):
        w_y = current_Y.transpose().dot(t_x)/(t_x.transpose().dot(t_x))
        w_y = w_y/np.sqrt(np.sum(w_y**2))
        t_y = current_Y.dot(w_y)
        w_x = current_X.transpose().dot(t_y)/(t_y.transpose().dot(t_y))
        w_x = w_x/np.sqrt(np.sum(w_x**2))
        t_x = current_X.dot(w_x)
  
        ## Estimating sparse model for the weights of X
        lasso_x = linear_model.Lasso(alpha = penalty)
        lasso_x.fit(t_x.reshape(-1, 1), current_X)
  
        ## Estimating sparse model for the weights of Y
        lasso_y = linear_model.Lasso(alpha = penalty)
        lasso_y.fit(t_y.reshape(-1, 1), current_Y)
  
        # Replacing the original weights with the sparse estimation
        w_x = (lasso_x.coef_ / (np.sqrt(np.sum(lasso_x.coef_**2) + eps))).reshape([X_ext.shape[1]])
        w_y = (lasso_y.coef_ / (np.sqrt(np.sum(lasso_y.coef_**2) + eps))).reshape([Y_ext.shape[1]])
  
    # Weights are such that X * weights = t
    weight_x_sparse[:,i] = w_x
    weight_y_sparse[:,i] = w_y
  
    # Latent dimensions
    scores_x_sparse[:,i] = t_x
    scores_x_sparse[:,i] = t_y
  
    # Loadings obtained by regressing X on t (X = t * loadings)
  
    loading_x_sparse[:,i] = np.dot(current_X.T, t_x)/t_x.transpose().dot(t_x)
    loading_y_sparse[:,i] = np.dot(current_Y.T, t_y)/t_y.transpose().dot(t_y)
  
    # Deflation
  
    current_X = current_X - t_x.reshape(len(t_x),1).dot(w_x.reshape(1,len(w_x)))
    current_Y = current_Y - t_y.reshape(len(t_y),1).dot(w_y.reshape(1,len(w_y)))
  
 ```
  
 %% Cell type:markdown id: tags:
  
 We observe that the new weights are similar to the ones estimated before. However the parameters associated with the spurious dimension are entirely set to zero. This indicates that the model does not find these features necessary to explain the common variability between X and Y. There are also other weights which are set to zero, corresponding to the third latent dimension. This makes sense, as our synthetic data was indeed created with only two latent dimensions.
  
 %% Cell type:code id: tags:
  
 ``` python
 weight_x_sparse
 ```
  
 %% Output
  
    array([[ 0.35947916,  0.09613335,  0.        ],
           [-0.57208323, -0.5017898 ,  0.        ],
           [ 0.73260855, -0.60111564,  0.        ],
           [-0.01764855, -0.61404951,  0.        ],
           [-0.0795538 ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        , -0.        ]])
  
 %% Cell type:code id: tags:
  
 ``` python
 weight_y_sparse
 ```
  
 %% Output
  
    array([[ 0.19008698, -0.03395428, -0.        ],
           [-0.3361836 , -0.95121052,  0.        ],
           [ 0.29093832,  0.05788278,  0.        ],
           [-0.56391994,  0.17865107,  0.        ],
           [-0.66936112,  0.24197983,  0.        ],
           [-0.        , -0.        , -0.        ],
           [ 0.        , -0.        , -0.        ],
           [-0.        , -0.        , -0.        ]])
  
 %% Cell type:code id: tags:
  
 ``` python
 ## Non-sparse parameters previously estimated
  
 print(weight_x)
 print(weight_y)
 ```
  
 %% Output
  
    [[ 0.36753226  0.32695545  0.31901594]
     [-0.54017132 -0.51142438 -0.13635813]
     [ 0.73596167 -0.54668695 -0.38095079]
     [-0.07131382 -0.54068145  0.43213359]
     [-0.16251074  0.20085364 -0.74011644]]
    [[ 0.22469208 -0.22321827  0.22848154]
     [-0.42360052 -0.82303817 -0.17028297]
     [ 0.34143196  0.23277327 -0.37414229]
     [-0.53070486  0.30658588 -0.6691619 ]
     [-0.60979721  0.35299218  0.57536058]]
  
 %% Cell type:code id: tags:
  
 ``` python
 ## PLS result from scikit-learn PLS on the data augmented with spurious dimensions
  
 plsca = PLSCanonical(n_components=3, scale = False)
 plsca.fit(X_ext, Y_ext)
 print(plsca.x_weights_)
 print(plsca.y_weights_)
 ```
  
 %% Output
  
    [[ 0.36744392 -0.32636735 -0.3736012 ]
     [-0.53999014  0.51133222  0.13293972]
     [ 0.73606624  0.54628291  0.34582048]
     [-0.07131057  0.54141809 -0.49749784]
     [-0.16262691 -0.20075143  0.52023542]
     [ 0.00653338 -0.00825341  0.39402462]
     [ 0.00339241 -0.00798747  0.07331524]
     [ 0.0038984  -0.00566521  0.21066051]]
    [[ 0.22481823  0.22219588 -0.39318404]
     [-0.4232868   0.82282067  0.19587236]
     [ 0.34171379 -0.2337191   0.39240653]
     [-0.53068906 -0.30708105  0.46282095]
     [-0.6097158  -0.353037   -0.46080397]
     [-0.00986828  0.00097623 -0.2131351 ]
     [ 0.00469737  0.00268147 -0.28959681]
     [-0.00361132  0.00533588 -0.31180287]]
  
 %% Cell type:markdown id: tags:
  
 ## Cross-validating components
  
 In addition to sparsity, the optimal number of components in latent variable models can be identified by cross-validation.
 A common strategy is to train the model on a subset of the data and to quantify the *predicted residual error sum of squares* (PRESS) in non-overlapping testing data. We can finally choose the number of latent dimensions leading to the lowest average PRESS.
  
 %% Cell type:code id: tags:
  
 ``` python
 n_cross_valid_run = 200
  
 # max number of components to test
 n_components = 5
  
 rep_results = []
 for i in range(n_components):
   rep_results.append([])
  
 for k in range(n_components):
  for i in range(n_cross_valid_run):
  
    # Sampling disjoint set of indices for splitting the data
    batch1_idx = np.random.choice(range(X_ext.shape[0]), size = int(X_ext.shape[0]/2), replace = False)
    batch2_idx = np.where(np.in1d(range(X_ext.shape[0]), batch1_idx, assume_unique=True, invert = True))[0]
  
    # Creating independent data batches for X
    X_1 = X_ext[batch1_idx, :]
    X_2 = X_ext[batch2_idx, :]
  
    # Creating independent data batches for Y
    Y_1 = Y_ext[batch1_idx, :]
    Y_2 = Y_ext[batch2_idx, :]
  
    # Creating a model for each data batch
    plsca1 = PLSCanonical(n_components = k+1, scale = False)
    plsca2 = PLSCanonical(n_components = k+1, scale = False)
  
    # Fitting a model for each data batch
    plsca1.fit(X_1,Y_1)
    plsca2.fit(X_2,Y_2)
  
    # Quantifying the prediction error on the unseen data batch
    err1 = np.sum((plsca1.predict(X_2) - Y_2)**2)
    err2 = np.sum((plsca2.predict(X_1) - Y_1)**2)
  
    rep_results[k].append(np.mean([err1,err2]))
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.plot(range(1,n_components+1),np.mean(rep_results, 1))
 plt.xlabel('# components')
 plt.ylabel('prediction error')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 # Multi-channel Variational Autoencoder
  
 The last part of this tutorial concerns the use of the *multi-channel variational autoencoder* (https://gitlab.inria.fr/epione_ML/mcvae), a more advanced  methods for the joint analysis and prediction of several modalities.
  
 ## VAE
 The Variational Autoencoer is a latent variable model composed by one encoder and one decoder associated to a single channel.
 The latent distribution and the decoding distribution are implemented as follows:
  
 $$q(\mathbf{z|x}) = \mathcal{N}(\mathbf{z|\mu_x; \Sigma_x})$$
  
 $$p(\mathbf{x|z}) = \mathcal{N}(\mathbf{x|\mu_z; \Sigma_z})$$
  
 They are Gaussians with moments parametrized by Neural Networks (or a linear transformation layer in a simple case).
  
-__Exercise__: why is convenient to use $\log$ values for the parametrization the variance networks output (W_logvar, W_out_logvar)?
-
 <img src="https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/vae.svg" alt="img/vae.svg">
  
+### __Exercise__
+Why is convenient to use $\log$ values for the parametrization the variance networks output (W_logvar, W_out_logvar)?
+
 ## Sparse VAE
 To favour sparsity in the latent space we implement the latent distribution as follows:
  
-$$q(\mathbf{z|x}) = \mathcal{N}(\mathbf{z|\mu_x; \alpha \odot \mu_x^2})$$
+$$q(\mathbf{z|x}) = \mathcal{N}(\mathbf{z|\mu_x; \alpha \odot \mu_x^2}),$$
  
-Tha parameter $\alpha$ represents to the [odds](https://en.wikipedia.org/wiki/Odds) of pruning the $i$-th latent dimension according to (element-wise):
+where:
+
+$$
+\mathbf{\alpha \odot \mu_x^2} =
+\begin{bmatrix}
+\ddots &        & 0                        \\
+       & \alpha_i [\mathbf{\mu_x}]_i^2 &   \\
+0      &        & \ddots
+\end{bmatrix}.
+$$
+
+Tha parameter $\alpha_i$ represents the [odds](https://en.wikipedia.org/wiki/Odds) of pruning the $i$-th latent dimension according to:
  
 $$\alpha_i = \frac{p_i}{1 - p_i}$$
  
 <img src="https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/sparse_vae.svg" alt="img/sparse_vae.svg">
  
 ## MCVAE
 The MultiChannel VAE is built by stacking multiple VAEs and allowing the decoding distributions to be computed from every input channel.
  
 <img src="https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/mcvae.svg" alt="img/mcvae.svg">
  
-__Excercise__: sketch the Sparse MCVAE with 3 channels.[[solution]](https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/sparse_mcvae_3ch.svg)
+### __Exercise__
+Sketch the Sparse MCVAE with 3 channels. [[Solution]](https://gitlab.inria.fr/epione/flhd/-/raw/master/heterogeneous_data/img/sparse_mcvae_3ch.svg)
  
 %% Cell type:code id: tags:
  
 ``` python
 import torch
 from mcvae.models import Mcvae
 from mcvae.models.utils import DEVICE, load_or_fit
 from mcvae.diagnostics import *
 from mcvae.plot import lsplom
  
 print(f"Running on {DEVICE}")  # cpu or gpu
  
 FORCE_REFIT = False  # Set this to 'True' if ou want to refit the models instead of loading them from file.
 ```
  
 %% Output
  
    Running on cpu
  
 %% Cell type:code id: tags:
  
 ``` python
 ### To initialization of the mcvae model you should provide:
  
 # 1 - a list with the different data channels
 data = [X, Y]
 data = [torch.Tensor(_) for _ in data]  #warning: data matrices must be converted to type torch.Tensor
  
 # 1 - a dictionary with the data and model characteristics
 init_dict = {
    'data': data,
    'lat_dim': n_components,
 }
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 # Here we initialize an instance of the model
  
 # Multi-Channel VAE
 torch.manual_seed(24)
 model = Mcvae(**init_dict)
 model.to(DEVICE)
 print(model)
 ```
  
 %% Output
  
    Mcvae(
      (vae): ModuleList(
        (0): VAE(
          (W_mu): Linear(in_features=5, out_features=5, bias=True)
          (W_logvar): Linear(in_features=5, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=5, bias=True)
        )
        (1): VAE(
          (W_mu): Linear(in_features=5, out_features=5, bias=True)
          (W_logvar): Linear(in_features=5, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=5, bias=True)
        )
      )
    )
  
 %% Cell type:code id: tags:
  
 ``` python
 ###################
 ## Model Fitting ##
 ###################
  
 # Set up the optimizer
 adam_lr = 1e-2
 n_epochs = 5000
 model.optimizer = torch.optim.Adam(model.parameters(), lr=adam_lr)
  
 # Fit (or load if already trained)
 load_or_fit(model=model, data=data, epochs=n_epochs, ptfile='model.pt', force_fit=FORCE_REFIT)
 ```
  
 %% Output
  
    Loading model.pt
  
 %% Cell type:code id: tags:
  
 ``` python
 ## Plotting model convergence
 plot_loss(model, skip=100)
 ```
  
 %% Output
  
    skipping first 100 epochs where losses might be very high
  

  
 %% Cell type:markdown id: tags:
  
 The plot above indicates that the model converged smoothly.
  
 %% Cell type:code id: tags:
  
 ``` python
 # We can plot the original data, dimension x dimension
 lsplom(data, title = 'Original data')
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # We can estimate the reconstructed data (decoding from the latent space)
 x_hat = model.reconstruct(data)
  
 # Plotting the reconstructed data across dimensions
 lsplom(x_hat, title = 'Reconstructed data')
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 ## Plotting the weights of the encoder
 plot_weights(model, side = 'encoder')
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 ## Plotting the weights of the decoder
 plot_weights(model, side = 'decoder')
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # Inspecting model parameters
  
 # Decoding parameters
 weights_decoding_X = model.vae[0].W_out.weight.data
 weights_decoding_Y = model.vae[1].W_out.weight.data
  
 print(weights_decoding_X)
 print(weights_decoding_Y)
 ```
  
 %% Output
  
    tensor([[ 0.1148, -0.0684,  0.5098,  0.0365, -1.6451],
            [-0.3095,  0.0943, -0.8667,  0.2497,  2.4173],
            [-0.2510, -0.1134, -2.1274, -0.2352, -2.1068],
            [ 0.6572,  0.0243, -1.3619, -0.0370,  0.7790],
            [-0.3694,  0.0762,  0.7014, -0.0429,  0.3934]])
    tensor([[ 0.1395, -0.1859, -1.0957,  0.0222, -0.5188],
            [-0.1095,  0.1167, -2.1759, -0.0054,  2.1793],
            [-0.4565, -0.0929,  0.2947, -0.0118, -1.3441],
            [-0.2269, -0.0548,  1.7775, -0.0096,  1.4515],
            [ 0.1131, -0.1738,  2.0298,  0.1213,  1.6703]])
  
 %% Cell type:code id: tags:
  
 ``` python
 # Encoding parameters
 weights_encoding_X = model.vae[0].W_mu.weight.data
 weights_encoding_Y = model.vae[1].W_mu.weight.data
  
 print(weights_encoding_X)
 print(weights_encoding_Y)
 ```
  
 %% Output
  
    tensor([[ 0.0909, -0.3644, -0.3227,  0.5998, -0.3149],
            [-0.0150, -0.1016, -0.0240,  0.1764,  0.2570],
            [ 0.0469, -0.1533, -0.2561, -0.1465,  0.0691],
            [ 0.7675,  0.7032, -0.1042, -0.3557, -0.6180],
            [-0.1278,  0.1636, -0.1328,  0.0766,  0.0469]])
    tensor([[ 0.1005, -0.1762, -0.4199, -0.2145,  0.1101],
            [-1.1709, -0.0119, -0.3710,  0.0791, -0.7206],
            [-0.1125, -0.1749,  0.0331,  0.1045,  0.1293],
            [ 0.4766, -0.0190,  0.0869, -0.2600,  0.4588],
            [-0.0017,  0.2063, -0.0976,  0.1288,  0.1290]])
  
 %% Cell type:code id: tags:
  
 ``` python
 # Here we compute the encoding and plot the latent dimensions against our original ground truth for the syntetic data
  
 encoding = model.encode(data)
  
 # Remember: encodings are distributions
 print(f'Encoding distribution q for Channel 0: {encoding[0]}')
  
 # with the method "loc" we extract the mean of the distributions
 encoding_x = encoding[0].loc.data.numpy()
 encoding_y = encoding[1].loc.data.numpy()
 ```
  
 %% Output
  
    Encoding distribution q for Channel 0: Normal(loc: torch.Size([500, 5]), scale: torch.Size([500, 5]))
  
 %% Cell type:markdown id: tags:
  
 We note that the estimated encoding is correlated with the original latent dimensions. There seems to be however some redundancy. This motivates the use of a *sparse model*.
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 12))
 for idx,k in enumerate(range(n_components)):
  plt.subplot(n_components,2,2*idx+1)
  plt.scatter(encoding_x[:,k], latents[:,0])
  plt.xlabel(str('mcvae dim ') + str(k))
  plt.ylabel('ground truth 0')
  plt.subplot(n_components,2,2*idx+2)
  plt.scatter(encoding_x[:,k], latents[:,1])
  plt.xlabel(str('mcvae dim ') + str(k))
  plt.ylabel('ground truth 1')
 plt.tight_layout()
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # Initialize sparse mcvae
  
 model_sparse1 = Mcvae(sparse=True, **init_dict)
 model_sparse1.to(DEVICE)
 print(model_sparse1)
-
-print('Is the log_alpha parameter the same for every channel?')
-log_alpha = model_sparse1.vae[0].log_alpha
-print(np.all([log_alpha is vae.log_alpha for vae in model_sparse1]))
 ```
  
 %% Output
  
    Mcvae(
      (vae): ModuleList(
        (0): VAE(
          (W_mu): Linear(in_features=5, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=5, bias=True)
        )
        (1): VAE(
          (W_mu): Linear(in_features=5, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=5, bias=True)
        )
      )
    )
    Is the log_alpha parameter the same for every channel?
  
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-64-51039ebdfc7f> in <module>
          7 print('Is the log_alpha parameter the same for every channel?')
          8 log_alpha = model_sparse1.vae[0].log_alpha
    ----> 9 print(np.all([log_alpha is vae.log_alpha for vae in model_sparse1]))
  
    TypeError: 'Mcvae' object is not iterable
  
+%% Cell type:markdown id: tags:
+
+__Exercise.__ Check if the sparsity parameter log_alpha is the same for every VAE in the MCvae model.
+- Hint: you can acces the model modules and parameters with the dot notation (model.submodel.parameter)
+
 %% Cell type:code id: tags:
  
 ``` python
 # Optimize
 model_sparse1.optimizer = torch.optim.Adam(model_sparse1.parameters(), lr=adam_lr)
 load_or_fit(model=model_sparse1, data=data, epochs=n_epochs, ptfile='model_sparse1.pt', force_fit=FORCE_REFIT)
 ```
  
 %% Output
  
    Loading model_sparse1.pt
  
 %% Cell type:markdown id: tags:
  
 The sparse model estimates a probability of redundancy associated to each dimension. This means that we can retain only the dimensions with low probability of redundancy. In this case the model correctly identifies only 2 meaningful latent dimensions.
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Probability of redundancy: ', model_sparse1.dropout.data)
 plot_dropout(model_sparse1, sort=False)
 ```
  
 %% Output
  
    Probability of redundancy:  tensor([[0.5597, 0.5693, 0.0177, 0.0296, 0.5730]])
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # We fix a redundancy threshold
 dropout_threshold = 0.20
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 # We plot the remaining latent dimensions
 keep = (model_sparse1.dropout.squeeze() < dropout_threshold).tolist()
 kept_comps = [i for i, kept in enumerate(keep) if kept]
 print(f'kept components: {kept_comps}')
  
 plot_latent_space(model_sparse1, data=data, comp=kept_comps);
 ```
  
 %% Output
  
    kept components: [2, 3]
  

  

  
 %% Cell type:code id: tags:
  
 ``` python
 # We repeate the same exercise with the synthetic data with redundant dimensions
  
 data_sparse = [X_ext, Y_ext]
 data_sparse = [torch.Tensor(_) for _ in data_sparse]
  
 init_dict = {
    'data': data_sparse,
    'lat_dim': n_components + 3,
 }
  
 model_sparse = Mcvae(sparse=True, **init_dict)
 model_sparse.to(DEVICE)
 print(model_sparse)
 ```
  
 %% Output
  
    Mcvae(
      (vae): ModuleList(
        (0): VAE(
          (W_mu): Linear(in_features=8, out_features=8, bias=True)
          (W_out): Linear(in_features=8, out_features=8, bias=True)
        )
        (1): VAE(
          (W_mu): Linear(in_features=8, out_features=8, bias=True)
          (W_out): Linear(in_features=8, out_features=8, bias=True)
        )
      )
    )
  
 %% Cell type:code id: tags:
  
 ``` python
 # Fit (or load model if existing)
 model_sparse.optimizer = torch.optim.Adam(model_sparse.parameters(), lr=adam_lr)
 load_or_fit(model=model_sparse, data=data_sparse, epochs=n_epochs, ptfile='model_sparse.pt', force_fit=FORCE_REFIT)
 ```
  
 %% Output
  
    Loading model_sparse.pt
  
 %% Cell type:markdown id: tags:
  
 We see that the model recognises again only two meaningful latent dimensions
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Probability of redundancy: ', model_sparse.dropout.data)
 indices = np.where(model_sparse.dropout.data.numpy().flatten() < dropout_threshold)[0]
 non_redundant_comps = indices.tolist()
 print(f'Non-redundant components: {non_redundant_comps}')
 plot_dropout(model_sparse, sort=False)
 ```
  
 %% Output
  
    Probability of redundancy:  tensor([[0.7038, 0.7494, 0.7669, 0.0166, 0.8116, 0.0525, 0.6920, 0.7680]])
    Non-redundant components: [3, 5]
  

  
 %% Cell type:code id: tags:
  
 ``` python
 x_hat = model_sparse.reconstruct(data_sparse, dropout_threshold=dropout_threshold)
 lsplom(x_hat, title = 'Reconstructed data')
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 ## Plotting the weights of the decoder
 plt.figure(figsize=(12, 8))
 plot_weights(model_sparse, side = 'decoder')
 ```
  
 %% Output
  
  

  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 8))
 plot_weights(model_sparse, side = 'encoder')
 ```
  
 %% Output
  
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # PLotting estimated encoding vs ground truth
  
 encoding = model_sparse.encode(data_sparse)
  
 encoding_x = encoding[0].loc.detach().numpy()
 encoding_y = encoding[1].loc.detach().numpy()
  
 plt.figure(figsize=(12, 12))
 for idx,k in enumerate(indices):
  plt.subplot(len(indices),2,2*idx+1)
  plt.scatter(encoding_x[:,k], latents[:,0])
  plt.xlabel(str('mcvae dim ') + str(k))
  plt.ylabel('ground truth 0')
  plt.subplot(len(indices),2,2*idx+2)
  plt.scatter(encoding_x[:,k], latents[:,1])
  plt.xlabel(str('mcvae dim ') + str(k))
  plt.ylabel('ground truth 1')
  
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 plot_latent_space(model_sparse, data_sparse, comp=non_redundant_comps);
 ```
  
 %% Output
  

  

  
 %% Cell type:markdown id: tags:
  
 ### Increasing the number of channels
  
 In this section we explore the use of the model on data with multiple modalities (or channels)
  
 %% Cell type:code id: tags:
  
 ``` python
 # generating a new modality Z
 # This modality has meaningful as well as redundant dimensions
  
 size_z = 10
  
 size_z_redundant = 4
  
 np.random.seed(37)
 Z_redundant = np.random.randn(n, size_z_redundant)  # pure noise
  
 transform_z = np.random.randint(-8, 8, size=(2, size_z))
  
 Z = latents.dot(transform_z) + 2*np.random.normal(size=(n, size_z))
 Z = np.hstack([Z, Z_redundant])
  
 print(X_ext.shape,Y_ext.shape, Z.shape)
 ```
  
 %% Output
  
    (500, 8) (500, 8) (500, 14)
  
 %% Cell type:code id: tags:
  
 ``` python
 # Initialize the model
 data_multi = [X_ext, Y_ext, Z]
 data_multi = [torch.Tensor(_) for _ in data_multi]
  
 init_dict = {
    'data': data_multi, # [X, Y, Z]
    'lat_dim': n_components + 3,
 }
  
 model_multi = Mcvae(sparse=True, **init_dict)
 model_multi.to(DEVICE)
 print(model_multi)
 ```
  
 %% Output
  
    Mcvae(
      (vae): ModuleList(
        (0): VAE(
          (W_mu): Linear(in_features=8, out_features=8, bias=True)
          (W_out): Linear(in_features=8, out_features=8, bias=True)
        )
        (1): VAE(
          (W_mu): Linear(in_features=8, out_features=8, bias=True)
          (W_out): Linear(in_features=8, out_features=8, bias=True)
        )
        (2): VAE(
          (W_mu): Linear(in_features=14, out_features=8, bias=True)
          (W_out): Linear(in_features=8, out_features=14, bias=True)
        )
      )
    )
  
 %% Cell type:code id: tags:
  
 ``` python
 # Fit (or load)
 model_multi.optimizer = torch.optim.Adam(model_multi.parameters(), lr=adam_lr)
 load_or_fit(model=model_multi, data=data_multi, epochs=n_epochs, ptfile='model_multi.pt', force_fit=FORCE_REFIT)
 ```
  
 %% Output
  
    Loading model_multi.pt
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Probability of redundancy: ', model_multi.dropout.data.numpy())
 indices = np.where(model_multi.dropout.data.numpy().flatten() < dropout_threshold)[0]
 print('Non-redundant components: ', indices)
 plot_dropout(model_multi, sort=False)
  
 encoding = model_multi.encode(data_multi)
  
 encoding_x = encoding[0].loc.detach().numpy()
 encoding_y = encoding[1].loc.detach().numpy()
 encoding_z = encoding[2].loc.detach().numpy()
  
 plt.figure(figsize=(12, 12))
 for idx,k in enumerate(indices):
    plt.subplot(len(indices),2,2*idx+1)
    plt.scatter(encoding_z[:,k], latents[:,0])
    plt.xlabel(str('mcvae dim ') + str(k))
    plt.ylabel('ground truth 0')
    plt.subplot(len(indices),2,2*idx+2)
    plt.scatter(encoding_z[:,k], latents[:,1])
    plt.xlabel(str('mcvae dim ') + str(k))
    plt.ylabel('ground truth 1')
  
 plt.show()
 ```
  
 %% Output
  
    Probability of redundancy:  [[0.03285351 0.39449248 0.00762853 0.6011913  0.59263474 0.6564989
      0.25333416 0.3858575 ]]
    Non-redundant components:  [0 2]
  

  

  
 %% Cell type:markdown id: tags:
  
 The multi-channel variational autoencoder allows to predict each channel from any other.
  
 %% Cell type:code id: tags:
  
 ``` python
 # We compute the reconstruction of the data from the encoding
 z = [_.sample() for _ in encoding]  # sample the encoding distributions
 p = model_multi.decode(z)  # compute the decoding distributions
  
 # This variable has several dimensions over two indices:
 # the first index indicates the modality to decode(0:x, 1:y, 3:z, ...)
 # the second index indicates the modality from which the encoding is done (0:x, 1:y, 3:z, ...)
  
 # p[x][z]: p(x|z)
 decoding_x_from_x = p[0][0].loc.data.numpy()
 decoding_x_from_y = p[0][1].loc.data.numpy()
 decoding_x_from_z = p[0][2].loc.data.numpy()
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(18, 10))
 plt.subplot(2,3,1)
 plt.scatter(X_ext[:,0], decoding_x_from_x[:,0])
 plt.title('decoding X from X')
 plt.xlabel('X0')
 plt.ylabel('X0 from latent X')
 plt.subplot(2,3,2)
 plt.scatter(X_ext[:,0], decoding_x_from_y[:,0])
 plt.title('decoding X from Y')
 plt.ylabel('X0 from latent Y')
 plt.xlabel('X0')
 plt.subplot(2,3,3)
 plt.scatter(X_ext[:,0], decoding_x_from_z[:,0])
 plt.title('decoding X from Z')
 plt.xlabel('X0')
 plt.ylabel('X0 from latent Z')
 plt.subplot(2,3,4)
 plt.scatter(decoding_x_from_x[:,0], decoding_x_from_y[:,0])
 plt.title('decoding X vs decoding Y')
 plt.xlabel('X0 from latent X')
 plt.ylabel('X0 from latent Y')
 plt.subplot(2,3,5)
 plt.scatter(decoding_x_from_x[:,0], decoding_x_from_z[:,0])
 plt.title('decoding Y vs decoding Z')
 plt.xlabel('X0 from latent X')
 plt.ylabel('X0 from latent Z')
 plt.subplot(2,3,6)
 plt.scatter(decoding_x_from_y[:,0], decoding_x_from_z[:,0])
 plt.title('decoding Y vs decoding Z')
 plt.xlabel('X0 from latent Y')
 plt.ylabel('X0 from latent Z')
 plt.tight_layout()
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
+__Exercise.__ Compare Mcvae, Sparse Mcvae, and PLS
+- Reuse the observations X1, X2, and X3 created earlier.
+- Fit a Mcvae model to predict X3 from X1.
+- Fit a Mcvae Sparse model to predict X3 from X1.
+- Fit a PLS model to predict X3 from X1.
+- Compare the predictions for all the three methods.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
 # Application to (pseudo-) neurological data
  
 We are going to load volumetric and cognitive data for a synthetic sample generated from the ADNI dataset.
 The exercise consists in applying the methods seen so far to understand the relationship between this kind of variables.
  
 %% Cell type:code id: tags:
  
 ``` python
 import pandas as pd
  
 adni = pd.read_csv('https://marcolorenzi.github.io/material/pseudo_adni.csv')
  
 brain_volume_cols = ['WholeBrain.bl', 'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']
 cognition_cols = ['CDRSB.bl', 'ADAS11.bl', 'MMSE.bl', 'RAVLT.immediate.bl', 'RAVLT.learning.bl', 'RAVLT.forgetting.bl', 'FAQ.bl']
  
 adni[brain_volume_cols]
 ```
  
 %% Output
  
         WholeBrain.bl  Ventricles.bl  Hippocampus.bl  MidTemp.bl  Entorhinal.bl
    0         0.684331       0.012699        0.003786    0.012678       0.002214
    1         0.735892       0.012803        0.004866    0.015071       0.003041
    2         0.738731       0.030492        0.004300    0.012419       0.002316
    3         0.696179       0.032797        0.004720    0.012312       0.002593
    4         0.841806       0.004030        0.006820    0.016948       0.002896
    ..             ...            ...             ...         ...            ...
    995       0.767153       0.011417        0.005209    0.012879       0.002208
    996       0.695168       0.011908        0.004641    0.012534       0.002197
    997       0.628691       0.041537        0.003478    0.010870       0.001939
    998       0.714763       0.020461        0.004713    0.013989       0.001981
    999       0.691858       0.030349        0.004237    0.011439       0.002419
    
    [1000 rows x 5 columns]
  
 %% Cell type:code id: tags:
  
 ``` python
 volumes_value = adni[brain_volume_cols].values
  
 # Standardization of volumetric measures
 volumes_value = (volumes_value - volumes_value.mean(0)) / volumes_value.std(0)
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 adni[cognition_cols]
 ```
  
 %% Output
  
         CDRSB.bl  ADAS11.bl  MMSE.bl  RAVLT.immediate.bl  RAVLT.learning.bl  \
    0           1          8     27.0           23.739439                4.0
    1           0          0     30.0           64.933800                9.0
    2           0          8     24.0           36.987722                3.0
    3           0          3     29.0           50.314425                5.0
    4           0          0     30.0           57.217830                9.0
    ..        ...        ...      ...                 ...                ...
    995         1          2     29.0           61.896022                8.0
    996         0          1     29.0           62.083170                8.0
    997         3         14     24.0           22.289059                2.0
    998         0         13     26.0           31.650504                2.0
    999         0         15     28.0           29.089863                3.0
    
         RAVLT.forgetting.bl  FAQ.bl
    0               5.821573       3
    1               4.001653       0
    2               6.876316       0
    3               4.733481       3
    4               7.225401       0
    ..                   ...     ...
    995             1.663102       0
    996             5.241477       1
    997             5.437600       7
    998             1.669603       4
    999             7.703384       4
    
    [1000 rows x 7 columns]
  
 %% Cell type:code id: tags:
  
 ``` python
 cognition_value = adni[cognition_cols].values
  
 # Standardization of cognitive measures
 cognition_value = (cognition_value - cognition_value.mean(0)) / cognition_value.std(0)
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plsca_adni = PLSCanonical(n_components=3, scale = False)
 plsca_adni.fit(cognition_value,volumes_value)
 print(plsca_adni.x_weights_)
 print(plsca_adni.y_weights_)
 ```
  
 %% Output
  
    [[ 0.38442715 -0.0914859   0.3212761 ]
     [ 0.46751116 -0.13784339  0.29848234]
     [-0.42755384  0.04820369  0.06450696]
     [-0.41248938 -0.01061314  0.43066352]
     [-0.3504303  -0.18746317  0.64581907]
     [ 0.08267813  0.96398703  0.23059686]
     [ 0.38866726 -0.07602536  0.38444838]]
    [[-0.38267896  0.49605567  0.326012  ]
     [ 0.30522362 -0.19887222  0.19846392]
     [-0.51586765 -0.39589381  0.68942719]
     [-0.48952454  0.48804833 -0.30834445]
     [-0.5046203  -0.56520397 -0.53286217]]
  
 %% Cell type:code id: tags:
  
 ``` python
 n_cross_valid_run = 200
  
 # number of components to test
 n_components = 5
  
 rep_results = []
 for i in range(n_components):
  rep_results.append([])
  
 for k in range(n_components):
  for i in range(n_cross_valid_run):
  
    # Sampling disjoint set of indices for splitting the data
    batch1_idx = np.random.choice(range(cognition_value.shape[0]), size = int(cognition_value.shape[0]/2), replace = False)
    batch2_idx = np.where(np.in1d(range(cognition_value.shape[0]), batch1_idx, assume_unique=True, invert = True))[0]
  
    # Creating independent data batches for X
    X_1 = cognition_value[batch1_idx, :]
    X_2 = cognition_value[batch2_idx, :]
  
    # Creating independent data batches for Y
    Y_1 = volumes_value[batch1_idx, :]
    Y_2 = volumes_value[batch2_idx, :]
  
    # Creating a model for each data batch
    cca_adni1 = CCA(n_components = k+1, scale = False)
    cca_adni2 = CCA(n_components = k+1, scale = False)
  
    # Fitting a model for each data batch
    cca_adni1.fit(X_1,Y_1)
    cca_adni2.fit(X_2,Y_2)
  
    # Quantifying the prediction error on the unseen data batch
    err1 = np.sum((cca_adni1.predict(X_2) - Y_2)**2)
    err2 = np.sum((cca_adni2.predict(X_1) - Y_1)**2)
  
    rep_results[k].append(np.mean([err1,err2]))
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.plot(range(1,n_components+1),np.mean(rep_results, 1))
 plt.xlabel('# components')
 plt.ylabel('prediction error')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 plsca_adni = PLSCanonical(n_components=1, scale = False)
 plsca_adni.fit(cognition_value,volumes_value)
  
 f = plt.figure(figsize=(15,10))
 f.add_subplot(2,1,1)
 plt.bar(np.arange(len(plsca_adni.x_weights_[:,0])), plsca_adni.x_weights_[:,0], tick_label = cognition_cols)
 plt.title('Component x')
 f.add_subplot(2,1,2)
 plt.bar(np.arange(len(plsca_adni.y_weights_[:,0])), plsca_adni.y_weights_[:,0], tick_label = brain_volume_cols)
 plt.title('Component y')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 plt.scatter(plsca_adni.x_scores_[:,0], plsca_adni.y_scores_[:,0], c = cognition_value[:,1])
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 ## McVAE
  
 In this last application we apply the multichannel autoencoder to the pseudo-ADNI data, for jointly modeling different modalities across individuals. We focus on the joint analysis of:
  
 - brain volumes;
 - sociodemographic information (e.g. age, sex, scholarity);
 - cognition;
 - apoe genotype;
 - fluid biomarkers (abeta and tay concentrations in the CSF).
  
 %% Cell type:markdown id: tags:
  
 We first import and standardize the different data modalities.
  
 %% Cell type:code id: tags:
  
 ``` python
 adni = pd.read_csv('https://marcolorenzi.github.io/material/pseudo_adni.csv')
  
 normalize = lambda _: (_ - _.mean(0))/_.std(0)
  
 volume_cols = ['WholeBrain.bl', 'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']
 volumes_value = adni[volume_cols].values
 volumes_value = normalize(volumes_value)
  
 demog_cols = ['SEX', 'AGE', 'PTEDUCAT']
 demog_value = adni[demog_cols].values
 demog_value = normalize(demog_value)
  
 cognition_cols = ['CDRSB.bl', 'ADAS11.bl', 'MMSE.bl', 'RAVLT.immediate.bl', 'RAVLT.learning.bl', 'RAVLT.forgetting.bl', 'FAQ.bl']
 cognition_value = adni[cognition_cols].values
 cognition_value = normalize(cognition_value)
  
 apoe_cols = ['APOE4']
 apoe_value = adni[apoe_cols].values
 apoe_value = normalize(apoe_value)
  
 fluid_cols = ['ABETA.MEDIAN.bl', 'PTAU.MEDIAN.bl', 'TAU.MEDIAN.bl']
 fluid_value = adni[fluid_cols].values
 fluid_value = normalize(fluid_value)
  
 # Creating a list with multimodal data
 data_adni = [volumes_value, demog_value, cognition_value, apoe_value, fluid_value]
  
 # Transform as a pytorch Tensor for compatibility
 data_adni = [torch.Tensor(_) for _ in data_adni]
  
 print(f'We have {len(data_adni)} channels in total as an input for the model')
 ```
  
 %% Output
  
    We have 5 channels in total as an input for the model
  
 %% Cell type:code id: tags:
  
 ``` python
 # Defining data and model characteristics
  
 init_dict = {
    'data': data_adni,
    'lat_dim': 5, # We fit 5 latent dimensions
 }
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 # Initialize the model
  
 model_adni = Mcvae(sparse=True, **init_dict)
 model_adni.to(DEVICE)
 print(model_adni)
 ```
  
 %% Output
  
    Mcvae(
      (vae): ModuleList(
        (0): VAE(
          (W_out_logvar): Parameter(shape=torch.Size([1, 5]))
          (log_alpha): Parameter(shape=torch.Size([1, 5]))
          (W_mu): Linear(in_features=5, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=5, bias=True)
        )
        (1): VAE(
          (W_out_logvar): Parameter(shape=torch.Size([1, 3]))
          (log_alpha): Parameter(shape=torch.Size([1, 5]))
          (W_mu): Linear(in_features=3, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=3, bias=True)
        )
        (2): VAE(
          (W_out_logvar): Parameter(shape=torch.Size([1, 7]))
          (log_alpha): Parameter(shape=torch.Size([1, 5]))
          (W_mu): Linear(in_features=7, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=7, bias=True)
        )
        (3): VAE(
          (W_out_logvar): Parameter(shape=torch.Size([1, 1]))
          (log_alpha): Parameter(shape=torch.Size([1, 5]))
          (W_mu): Linear(in_features=1, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=1, bias=True)
        )
        (4): VAE(
          (W_out_logvar): Parameter(shape=torch.Size([1, 3]))
          (log_alpha): Parameter(shape=torch.Size([1, 5]))
          (W_mu): Linear(in_features=3, out_features=5, bias=True)
          (W_out): Linear(in_features=5, out_features=3, bias=True)
        )
      )
    )
  
 %% Cell type:code id: tags:
  
 ``` python
 adam_lr = 1e-2
 n_epochs = 6000
  
 model_adni.optimizer = torch.optim.Adam(model_adni.parameters(), lr=adam_lr)
 load_or_fit(model=model_adni, data=data_adni, epochs=n_epochs, ptfile='model_adni.pt', force_fit=FORCE_REFIT)
 ```
  
 %% Output
  
    Loading model_adni.pt
  
 %% Cell type:markdown id: tags:
  
 The model identified only a significant dimension, that we are going to store and analyze:
  
 %% Cell type:code id: tags:
  
 ``` python
 print('Rendundancy: ', model_adni.dropout.detach().numpy())
 significant_dim = np.where(model_adni.dropout.detach().numpy()<0.5)[1]
 print('Significant dimensions: ', significant_dim)
 plot_dropout(model_adni, sort=False)
 ```
  
 %% Output
  
    Rendundancy:  [[0.47644782 0.99128443 0.9764916  0.9853288  0.9925022 ]]
    Significant dimensions:  [0]
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # Here we store in a list the deconding weights estimated for each modality.
 # We are interested in the decoding weights corresponding to the non-redundant dimension
  
 decoding_weights = []
 decoding_weights = [vae.W_out.weight.data.numpy()[:, significant_dim[0]] for vae in model_adni]
  
 decoding_weights[0]  # Channel 0: Brain Volumes
 ```
  
 %% Output
  
    array([ 0.2002608 , -0.14793997,  0.26713213,  0.236022  ,  0.24407719],
          dtype=float32)
  
 %% Cell type:markdown id: tags:
  
 The weights give us a nice way to interpret how the different modalities interact together.
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 12))
 plt.subplot(5,1,1)
 plt.bar(np.arange(len(decoding_weights[0])), decoding_weights[0], tick_label = volume_cols)
 plt.subplot(5,1,2)
 plt.bar(np.arange(len(decoding_weights[1])), decoding_weights[1], tick_label = demog_cols)
 plt.subplot(5,1,3)
 plt.bar(np.arange(len(decoding_weights[2])), decoding_weights[2], tick_label = cognition_cols)
 plt.subplot(5,1,4)
 plt.bar(np.arange(len(decoding_weights[3])), decoding_weights[3], tick_label = apoe_cols)
 plt.subplot(5,1,5)
 plt.bar(np.arange(len(decoding_weights[4])), decoding_weights[4], tick_label = fluid_cols)
 ```
  
 %% Output
  
    <BarContainer object of 3 artists>
  

  
 %% Cell type:markdown id: tags:
  
 Once the model is learnt we can use it for prediction. For example we can predict brain volumes from the cognitive data:
  
 %% Cell type:code id: tags:
  
 ``` python
 # Predicting volumes (channel 0) from cognition (channel 2)
 predictions = model_adni.reconstruct(data_adni, reconstruct_from=[2])
 decoding_volume_from_cognition = predictions[0].data.numpy()
 ```
  
 %% Cell type:code id: tags:
  
 ``` python
 plt.figure(figsize=(12, 28))
  
 # Plotting predictions for each volumetric feature
 for i in range(5):
    plt.subplot(5,1,i+1)
    plt.scatter(decoding_volume_from_cognition[:,i], volumes_value[:,i])
    plt.title('reconstruction ' + volume_cols[i])
    plt.xlabel('predicted')
    plt.ylabel('target')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id: tags:
  
 We are finally going to compare the multichannel model with the standard PLS modeling cognition and brain volumes jointly:
  
 %% Cell type:code id: tags:
  
 ``` python
 plsca = PLSCanonical(n_components=1, scale = False)
 plsca.fit(cognition_value,volumes_value)
 print(plsca.x_weights_)
 print(plsca.y_weights_)
 ```
  
 %% Output
  
    [[ 0.38442715]
     [ 0.46751116]
     [-0.42755384]
     [-0.41248938]
     [-0.3504303 ]
     [ 0.08267813]
     [ 0.38866726]]
    [[-0.38267896]
     [ 0.30522362]
     [-0.51586765]
     [-0.48952454]
     [-0.5046203 ]]
  
 %% Cell type:markdown id: tags:
  
 We obtain predictions from the PLS model and compare them with the predictions of mcvae. We observe a strong correlation between predictions. However, it can be noted that as we increase the number of dimensions of CCA, the difference between prediction increases, as CCA tends to overfit.
  
 %% Cell type:code id: tags:
  
 ``` python
 predicted_plsca = plsca.predict(cognition_value)
  
 plt.figure(figsize=(26, 18))
  
 for i in range(5):
    plt.subplot(10,2,2*i+1)
    plt.scatter(predicted_plsca[:,i], volumes_value[:,i])
    plt.title('reconstruction' + volume_cols[i])
    plt.xlabel('predicted')
    plt.ylabel('target')
  
 for i in range(5):
    plt.subplot(10,2,2*i+2)
    plt.scatter(predicted_plsca[:,i], decoding_volume_from_cognition[:,i])
    plt.title('reconstruction' + volume_cols[i])
    plt.xlabel('cca')
    plt.ylabel('mcvae')
  
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id: tags:
  
 ``` python
 # Comparing average reconstruction errors between models
  
 print('Reconstruction error:')
 print(f'CCA: {((predicted_plsca-volumes_value)**2).sum()}')
 print(f'mcvae: {((decoding_volume_from_cognition-volumes_value)**2).sum()}')
 ```
  
 %% Output
  
    Reconstruction error:
    CCA: 4631.245631476227
    mcvae: 3964.8306431191213
  
 %% Cell type:markdown id: tags:
  
 It is interesting to notice that mcvae leads to a lower reconstruction error than CCA, as the prediction of mcvae benefits from training from other modalities as well.