Mentions légales du service

Skip to content
Snippets Groups Projects

This is the Pytorch implementation of CKN-seq for reproducing the results of the paper

Dexiong Chen, Laurent Jacob, Julien Mairal. Biological Sequence Modeling with Convolutional Kernel Networks. preprint BiorXiv. 2018.

If you want to reproduce the results in the paper please use the branch recomb2019.

CKN-seq is free for academic use only © 2018 All rights reserved - Inria, CNRS, Univ. Grenoble Alpes

If you intend to use the software in a non-academic context, please contact dexiong.chen@inria.fr

Installation

We strongly recommend users to use anaconda to install the following packages (link to pytorch)

numpy
scipy
scikit-learn=0.19.0
pytorch=0.4.1
biopython=1.69
pandas
matplotlib
seaborn

Then we install CKN-seq

pip install .

or simply run

export PYTHONPATH=$PWD:$PYTHONPATH

Training model on ENCODE ChIP-seq and SCOP 1.67 datasets.

Download data from DeepBind website and put encode folder to data/. Download data from jlstm website and put it to data/SCOP167-superfamily/.

Supervised training without perturbation

  • One-layer model

    If you want to train a one-layer CKN-seq

     cd experiments
     python train_encode.py --num-motifs 128 --method sup --outdir ../out

    on SCOP 1.67

     cd experiments
     python train_scop.py --num-motifs 128 --method sup --outdir ../out

    You can add --use-cuda if you have cuda installed. Specify --tfid INDEX_OF_DATASET to train a model on a given dataset. Add --logo when training and evaluating your model to generate sequence logos.

  • Two-layer model

     cd experiments
     python train_encode.py --num-layers 2 --num-motifs 64 16 --len-motifs 12 3 --subsamplings 2 1 --kernel-params 0.3 0.6 --method sup --outdir ../out

Unsupervised training

CKN-seq also provides an unsupervised training fashion, however the number of filters should be much larger to achieve similar performance as supervised training. We give an example of training a one-layer model here

python train_encode.py --num-motifs 4096 --batch-size 200 --method unsup --outdir ../out

Augmented training and hybrid training with perturbation

Supervised CKN-seq can achieve better performance on small-scale datasets by augmenting training samples with mismatch noises. An example for training with adding 20% mismatch noise to training samples

python train_encode.py --epochs 150 --method sup --outdir ../out --noise 0.2

It can be further improved using the hybrid model described in the paper once you have trained an unsupervised model saved in $PATH_TO_A_TRAINED_UNSUP_MODEL.

python train_encode.py --epochs 150 --method sup --outdir ../out --noise 0.2 --unsup-dir $PATH_TO_A_TRAINED_UNSUP_MODEL

Further use of CKN-seq

If you want to create a CKN-seq network, run

from ckn.models import supCKN, unsupCKN
if method == "sup":
	M = supCKN
elif method == "unsup":
	M = unsupCKN

n_alphabet = 4
model = M(4, [16], [12], [1], alpha=1e-6, reverse_complement=True, fit_bias=True,
            global_pool='mean')
# consult ckn/models.py for more options

Then you can train the network with a given data loader

# sup train
model.sup_train(data_loader, criterion, optimizer, lr_scheduler, epochs=100,
                val_loader=val_loader, use_cuda=True)

# unsup train with cross validation for regularization parameter
model.unsup_cross_val(train_loader,
                n_sampling_patches=args.n_sampling_patches,
                alpha_grid=search_grid,
                use_cuda=True)