Integration of Attentive-SPIDNA

We would like to integrate the new version of SPIDNA that includes our custom attention mechanism into dnadna. The architecture is called MixAttSPIDNA in pjobic's code, but it should be renamed as Attentive-SPIDNA since that is the name we will be using in the paper. We tested different versions of this architecture but ideally, we would like to include the one that yielded the best results, i.e., MixAttSPIDNA_HubBlock_ResNet_OnScenario_Unfreezing in the pjobic/dev/hdgp_scenario branch.

The priority is to release the trained architecture available in /home/tau/pjobic/dnadna/hdgp/run_015/hdgp_run_015_best_net.pth on the titanic server. It requires modifying the dataloader because the prediction with this one are made by scenario. It means that the input tensor should contain all the DNA fragments (SNP matrices) that correspond to the same scenario.

I will list here some caveats and issues that I faced when I reused pjobic's code that might be useful:

The number of SNPs and haplotypes (columns and rows) of the SNP matrix can vary in the real and simulated datasets. From our experiments, the best way to handle this is to pad all matrices with the value 255 (pjobic wrote -1 in the code but this value change to 255 due to type conversion). The final dimension of the matrix should be 50 haplotypes x 400 SNPs (matrices that are originally bigger than that are cropped).
If a column only contains 0 after removing haplotypes (cropping) in a SNP matrix, it is no longer a SNP. Therefore the column should be removed and the vector of position should be updated.

Here is a dirty piece of code that takes into account the two points mentioned: attentive_spidna_dataloader.py

The second step is to make this architecture trainable, which will require more modifications because:

It was trained using a learning rate scheduler. This feature is already asked in this issue.
The network is first trained to make one prediction peer DNA fragments (also called "replicates" in the code) and then, the part of the network making prediction peer scenario is trained. As suggested by the name "Unfreezing", when the "scenario part" of the network is trained, the weights of the first part are also trained.
For this last point, I would need to take more time and dive into the code to see how pjobic implemented it, but the idea is to enable the minibatches to include all the SNP matrices of multiple scenarios. Therefore, the loss combines the prediction for multiple scenarios before being backpropagated.

I hope I am not missing anything and I'm available to answer your questions.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Integration of Attentive-SPIDNA