This document gathers the source codes used to produce the different figures in the paper Large-Scale Correlation Screening under Dependence for Brain Functional Connectivity Network Inference, hereinafter referred as the paper.

1. Synthetic data

Synthetic datasets of two regions are generated such that, for each region, the intra-correlation coefficients follow a Toeplitz structure. More specifically, \(Cor(X_i^a, X_j^a) = max(1 − |j − i|/10, \rho^{aa}_{min})\), where \(|j − i|\) is the uniform norm between voxels \(i\) and \(j\), and \(\rho^{aa}_{min}\) the minimal population intra-correlation of a region \(a\). The population intr-correlation is constant across all pairs of voxels.

The code used to generate such datasets can be found in the following file: simulation_data_functions.R.

source("./simulation_data_functions.R")

p <- 200 # number of variables
n <- 100 # number of samples
rintra1 <- .9 # intra-correlation of region 1
rintra2 <- .9 # intra-correlation of region 2
rinter <- .3 # inter-correlation

simu <- buildSimData( T=n,rmin1=rintra1, rmin2=rintra2, rinter=rinter, vecn=c(p,p,5,5))
Y1 <- simu$data[[1]]$Y  # region 1
Y2 <- simu$data[[2]]$Y # region 2

2. Impact of Intra-Correlation on Inter-Correlation Distributions

Inter-correlation distributions for different intra-correlations (synthetic data generated as described in Section 2.2 of the paper).

Inter-correlation distributions for different intra-correlations (synthetic data generated as described in Section 2.2 of the paper).

The code used to produce Figure 2 in the paper can be found in the following file: InterDistribution.R

3. Characterizing the Number of Discoveries

3.1 Synthetic dataset

In this section, a given region was simulated as a standard multivariate normal dataset with \(p\) variables, \(n\) samples per variable and a given intra-correlation:

p1 <- 200 # number of variables
n <- 100 # number of samples
rintra <- .9 # intra-correlation
Y1 <- rnorm_multi(n,mu=rep(0,p1),sd=rep(1,p1), r=rintra) # region

This dataset corresponds to that described in Section 2.2 of the paper.

3.2 Maximum-based expression: \(N^{ab}\)

The code used to generate Figure 2 (which is represented in Figure 3 of the paper) and estimate \(N^{ab}\) and its average can be found in the following file: NbDiscoveries.R.
Normalized number of discoveries $N^{ab}/p_a$ as a function of the correlation threshold (synthetic data generated as described in Section 2.2 of the paper).

Normalized number of discoveries \(N^{ab}/p_a\) as a function of the correlation threshold (synthetic data generated as described in Section 2.2 of the paper).

3.3 Sum-based expression: \(N_e^{ab}\)

The code used to generate Figure 3 of the paper and estimate \(N_e^{ab}\) and its average can be found in the following file: NbDiscoveries.R.
Normalized number of discoveries $N_e^{ab}/p_a p_b$ as a function of the correlation threshold (synthetic data generated as described in Section 2.2 of the paper).

Normalized number of discoveries \(N_e^{ab}/p_a p_b\) as a function of the correlation threshold (synthetic data generated as described in Section 2.2 of the paper).

3.4 \(\widehat{\nu}^{ab}\) and \(\widehat{\nu}_e^{ab}\)

The code used to produce Figure 4 of the paper and represent the estimated number of discoveries \(N^{ab}, N_e^{ab}\) and their proposed approximations \(\widehat{\nu}^{ab}\), \(\widehat{\nu}_e^{ab}\) can be found in the following file: ENbDiscoveries.R.
Normalized number of discoveries $\widehat{
u}^{ab}/p_a$, $\widehat{
u}_e^{ab}/p_ap_b$, $N^{ab}/p_a$ and $N_e^{ab}/p_a p_b$ as a function of the correlation threshold for different minimal intra-correlation values $
ho_{min}^{aa}, 
ho_{min}^{bb}$ and a zero inter-correlation, with $p_a = p_b = 200, n=100$ (synthetic data generated as described in Section 2.2 of the paper).

Normalized number of discoveries \(\widehat{ u}^{ab}/p_a\), \(\widehat{ u}_e^{ab}/p_ap_b\), \(N^{ab}/p_a\) and \(N_e^{ab}/p_a p_b\) as a function of the correlation threshold for different minimal intra-correlation values $ ho_{min}^{aa}, ho_{min}^{bb}$ and a zero inter-correlation, with \(p_a = p_b = 200, n=100\) (synthetic data generated as described in Section 2.2 of the paper).

4. Correlation Threshold

The source code used to generate Figure 5 in the paper and compute the correlation thresholds for different intra-correlation values and different sample sizes in Section 5 of the paper can be found in the following file: Thresholds.R. Functions to estimate correlation thresholds can be found in: threshold_functions.R.

Comparison of different critical thresholds for 50 replicates of data simulated as described in Section 2.2 with $p=150$, $n=100$, zero inter-correlation and varying constant intra-correlation values. The FWER- and quantile-based thresholds were computed for $alpha=0$.

Comparison of different critical thresholds for 50 replicates of data simulated as described in Section 2.2 with \(p=150\), \(n=100\), zero inter-correlation and varying constant intra-correlation values. The FWER- and quantile-based thresholds were computed for \(alpha=0\).

5. Network Inference on a Synthetic Dataset

The code used to infer networks from synthetic datasets as described in Section 6 and to fill Table 1 in the paper can be found in the following file: NetworkInferenceSeveralRegionsData.R.

6. Network Inference on a Real Dataset: fMRI Brain Data of Rats

Brain functional connectivity networks of a dead rat inferred applying the proposed method of the paper with the quantile-based threshold.

Brain functional connectivity networks of a dead rat inferred applying the proposed method of the paper with the quantile-based threshold.

The code used to infer and plot functional brain connectivity networks of any given rat can be found in the following file: NetworkInferenceRatData.R. The raw dataset used in the paper is freely available at https://10.5281/zenodo.2452871.