-
dexiong chen authoreddexiong chen authored
This is the Pytorch implementation of CKN-seq for reproducing the results of the paper
Dexiong Chen, Laurent Jacob, Julien Mairal. Biological Sequence Modeling with Convolutional Kernel Networks. preprint BiorXiv. 2018.
recomb2019
.
If you want to reproduce the results in the paper please use the branch CKN-seq is free for academic use only © 2018 All rights reserved - Inria, CNRS, Univ. Grenoble Alpes
If you intend to use the software in a non-academic context, please contact dexiong.chen@inria.fr
Installation
We strongly recommend users to use anaconda to install the following packages (link to pytorch)
numpy
scipy
scikit-learn=0.19.0
pytorch=0.4.1
biopython=1.69
pandas
matplotlib
seaborn
Then we install CKN-seq
pip install .
or simply run
export PYTHONPATH=$PWD:$PYTHONPATH
Training model on ENCODE ChIP-seq and SCOP 1.67 datasets.
Download data from DeepBind website and put encode folder to data/. Download data from jlstm website and put it to data/SCOP167-superfamily/.
Supervised training without perturbation
-
One-layer model
If you want to train a one-layer CKN-seq
cd experiments python train_encode.py --num-motifs 128 --method sup --outdir ../out
on SCOP 1.67
cd experiments python train_scop.py --num-motifs 128 --method sup --outdir ../out
You can add
--use-cuda
if you have cuda installed. Specify--tfid INDEX_OF_DATASET
to train a model on a given dataset. Add--logo
when training and evaluating your model to generate sequence logos. -
Two-layer model
cd experiments python train_encode.py --num-layers 2 --num-motifs 64 16 --len-motifs 12 3 --subsamplings 2 1 --kernel-params 0.3 0.6 --method sup --outdir ../out
Unsupervised training
CKN-seq also provides an unsupervised training fashion, however the number of filters should be much larger to achieve similar performance as supervised training. We give an example of training a one-layer model here
python train_encode.py --num-motifs 4096 --batch-size 200 --method unsup --outdir ../out
Augmented training and hybrid training with perturbation
Supervised CKN-seq can achieve better performance on small-scale datasets by augmenting training samples with mismatch noises. An example for training with adding 20% mismatch noise to training samples
python train_encode.py --epochs 150 --method sup --outdir ../out --noise 0.2
It can be further improved using the hybrid model described in the paper once you have trained an unsupervised model saved in $PATH_TO_A_TRAINED_UNSUP_MODEL
.
python train_encode.py --epochs 150 --method sup --outdir ../out --noise 0.2 --unsup-dir $PATH_TO_A_TRAINED_UNSUP_MODEL
Further use of CKN-seq
If you want to create a CKN-seq network, run
from ckn.models import supCKN, unsupCKN
if method == "sup":
M = supCKN
elif method == "unsup":
M = unsupCKN
n_alphabet = 4
model = M(4, [16], [12], [1], alpha=1e-6, reverse_complement=True, fit_bias=True,
global_pool='mean')
# consult ckn/models.py for more options
Then you can train the network with a given data loader
# sup train
model.sup_train(data_loader, criterion, optimizer, lr_scheduler, epochs=100,
val_loader=val_loader, use_cuda=True)
# unsup train with cross validation for regularization parameter
model.unsup_cross_val(train_loader,
n_sampling_patches=args.n_sampling_patches,
alpha_grid=search_grid,
use_cuda=True)