Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 2090eb01 authored by hua-ting.yao's avatar hua-ting.yao Committed by node
Browse files

Update on Overleaf.

parent 4f473f1c
No related branches found
No related tags found
No related merge requests found
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
......@@ -141,8 +141,6 @@ fontsize=\footnotesize
\maketitle
\TODO{Release 1.0; update references}
\begin{abstract}
Applications in biotechnology and bio-medical research call for effective strategies to design novel RNAs with very specific properties. Such advanced design tasks require support by computational design tools but at the same time put high demands on their flexibility and expressivity to model the applications-specific requirements. To address such demands, we present the computational framework \Infrared. It supports developing advanced customized design tools, which generate RNA sequences with specific properties, often in a few lines of Python code.
This text guides the reader in tutorial-format through the development of complex design applications.
......@@ -223,7 +221,7 @@ Then, we simply add constraints for all these structures to our model. For this
and generate samples from this model in exactly the same way as before. The involved computation and algorithmic structure required by the multi-target case is handled transparently "under the hood". Internally, the problem is solved by an efficient algorithm that automatically adapts in complexity to the dependency network due to the multiple target structures.
\paragraph{Chapter overview.} For preparation, we describe the installation of the library and recommended additional software. In Section Methods, we systematically develop a complex multi-target design application using the Infrared framework. We start by targeting positive design objectives in increasingly complex settings. This is followed by a discuss of several approaches that integrate negative design objectives. Thus, we show-case constraint generation as we aplied in RNAPOND, stochastic optimization based on the library and finally, a full-fledged application to the design of an AND-riboswitch.
\paragraph{Chapter overview.} For preparation, we describe the installation of the library and recommended additional software. In Section Methods, we systematically develop a complex multi-target design application using the Infrared framework. We start by targeting positive design objectives in increasingly complex settings. This is followed by a discuss of several approaches that integrate negative design objectives. Thus, we show-case constraint generation as we applied in RNAPOND, stochastic optimization based on the library and finally, a full-fledged application to the design of an AND-riboswitch.
\section{Material: Installing \Infrared }
\label{sec:installation}
......@@ -233,7 +231,7 @@ We recommend installing \Infrared using the package manager
\subsubsection{Package manager \software{Conda} installation.}
Unless \software{Conda} is already installed on you system, we recommend to install it in the form of \software{Miniconda} from
Unless \software{Conda} is already installed on your system, we recommend to install it in the form of \software{Miniconda} from
\url{https://conda.io/en/latest/miniconda.html}. The page contains installation instructions for Windows, MacOs and Linux.
\subsubsection{\Infrared installation.}
......@@ -244,7 +242,7 @@ To use Conda, it typically has to be activated by running the shell command
in a terminal.
All required and recommended software can be installed from the command line by
\begin{bashcode}
conda install -c conda-forge -c bioconda infrared viennarna jupyter
conda install -c conda-forge -c bioconda 'infrared=1.0b' viennarna jupyter
\end{bashcode}
The command installs the packages \texttt{infrared}, \texttt{viennarna}, and \texttt{jupyter}; the flags \texttt{-c conda-forge -c bioconda} specify the required channels. For the largest part of the tutorial, only the package \texttt{\bf infrared} is required.
For energy evaluation and RNA structure prediction, which we use for advanced, negative design, we utilize the Vienna RNA package~\cite{Lorenz2011}. As of yet, it can not be installed under Windows via Conda; Windows users must therefore remove \texttt{\bf viennarna} from the installation command. Note that there is as well no other convenient way to install the \emph{Python interface} to the Vienna RNA package on Windows. Nevertheless, most of the the examples can be run without Python bindings to the Vienna RNA package. For some of the advanced design examples, we provide a work around, which will allow Windows users to run the examples, after installing the package using its Windows installer (\url{https://www.tbi.univie.ac.at/RNA/#download}, select operating system "Windows" to obtain the download link).
......@@ -253,8 +251,7 @@ For energy evaluation and RNA structure prediction, which we use for advanced, n
One can run all code of this tutorial from an IPython notebook using using \texttt{\bf jupyter} (or a comparable system).
The tutorial notebook is available from
Infrared's Gitlab repository \url{
https://gitlab.inria.fr/amibio/Infrared/-/blob/develop/Doc/bookchapter-tutorial.ipynb}.\todo{update}
Infrared's Gitlab repository \url{https://gitlab.inria.fr/amibio/Infrared/-/blob/v1.0b/Doc/bookchapter-tutorial.ipynb}.
Using \texttt{jupyter-notebook}, it is loaded by the command
\begin{bashcode}
jupyter-notebook bookchapter-tutorial.ipynb
......@@ -474,7 +471,7 @@ This example provides a first demonstration of the targeting flexibility due to
\subsection{Multiple-target structures---Complex dependencies}
Due to the compositionality of constraints (and functions) in \Infrared, defining multiple targets does not look any different than defining a single target structure. Thus, let us right away define model to target the structures of Fig.~\ref{fig:target-structures}, which were previously defined in the list \texttt{targets} with lenght \texttt{n}.
Due to the compositionality of constraints (and functions) in \Infrared, defining multiple targets does not look any different than defining a single target structure. Thus, let us right away define model to target the structures of Fig.~\ref{fig:target-structures}, which were previously defined in the list \texttt{targets} with length \texttt{n}.
\begin{Pythoncode}
model = Model(n,4)
......@@ -629,7 +626,7 @@ Following the code above, RNA sequences are first sampled without further limita
\begin{figure}
\centering
\includegraphicscenter[width=\textwidth]{Figs/count_matrix}
\caption{Base pair frequencies and disruptive base pairs in the first, second and final iteration of a typical run of our RNAPOND-like constraint generation design strategy targeting \texttt{"..(((..((((.....)))).((...(((.....)))...))...))).."}. In our triangular matrix visualization, each point refers to one potential base pair (i,j); base pair frequencies the generated sample are color coded and the target base pairs are highlighted by blue squares. By addign disruptive base pairs in each iteration (red squares), the frequencies of the base pairs are shifted towards the target ones.}
\caption{Base pair frequencies and disruptive base pairs in the first, second and final iteration of a typical run of our RNAPOND-like constraint generation design strategy targeting \texttt{"..(((..((((.....)))).((...(((.....)))...))...))).."}. In our triangular matrix visualization, each point refers to one potential base pair (i,j); base pair frequencies the generated sample are color coded and the target base pairs are highlighted by blue squares. By adding disruptive base pairs in each iteration (red squares), the frequencies of the base pairs are shifted towards the target ones.}
\label{fig:count-matrix}
\end{figure}
......@@ -687,7 +684,7 @@ Note that here the number of 1000 iterations and the temperature 0.015 were chos
\label{fig:stochasitic-optimization}
\end{figure}
Figure~\ref{fig:stochasitic-optimization} shows the best multi-defects in 48 runs after up to 5120 iterations. The best sequence \texttt{CCCUGUGCUCCAUGGGCCCCCGUCAGGGGACGGGG} that was found in these 48 runs of optimization had an multi-defect of 1.31. For this sequence, the target structures have resepective energies -16.9, -16.5, and -16.9 \kcalpermol; two of the targets have minimum free energy. This small experiment yields insights into the effectivity and convergence of the optimization procedure. For applications, it seems to suggest an optimization strategy combining restarts and moderately long runs.
Figure~\ref{fig:stochasitic-optimization} shows the best multi-defects in 48 runs after up to 5120 iterations. The best sequence \texttt{CCCUGUGCUCCAUGGGCCCCCGUCAGGGGACGGGG} that was found in these 48 runs of optimization had an multi-defect of 1.31. For this sequence, the target structures have respective energies -16.9, -16.5, and -16.9 \kcalpermol; two of the targets have minimum free energy. This small experiment yields insights into the effectivity and convergence of the optimization procedure. For applications, it seems to suggest an optimization strategy combining restarts and moderately long runs.
Remarkably, there are even solutions where all three targets have minimum free energy. For the sequence \texttt{CCCCUUGCCUCAAGGGCCCUCUUCAGAGGAAGGGG}, which was discovered by the same strategy, all three target structures have a free energy of -15.40 \kcalpermol (at a multi-defect of 1.21).
......@@ -740,9 +737,9 @@ The single riboswitches control transcription by forming a terminator in their g
We use \Infrared to suggest designs that connect two known aptamer constructs by a spacer region. Such design candidates could then be evaluated by biochemistry experts and tested in wet lab experiments (as e.g. done in \cite{Domin2017}).
In preparation, we define the four structures of Fig~\ref{fig:riboswitches} as dot-bracket strings. Moreover, we derive sequence strings from experimentally verified functional riboswitches, but replace some nucleotides by 'N' to leave additional freedom in the design (see definition of \texttt{seqTheo}, \texttt{aptTheo}, \texttt{termTheo}, \texttt{seqTet}, \texttt{aptTet}, \texttt{termTet} in Notes). These allow to compose the strucural targets and sequence strings for the tandem construct, including a 30nt spacer (see Notes).
In preparation, we define the four structures of Fig~\ref{fig:riboswitches} as dot-bracket strings. Moreover, we derive sequence strings from experimentally verified functional riboswitches, but replace some nucleotides by 'N' to leave additional freedom in the design (see definition of \texttt{seqTheo}, \texttt{aptTheo}, \texttt{termTheo}, \texttt{seqTet}, \texttt{aptTet}, \texttt{termTet} in Notes). These allow to compose the structural targets and sequence strings for the tandem construct, including a 30nt spacer (see Notes).
As goals for the computational design, we optimize the stability of the terminator structures (in absolute terms as well as in comparison to the structure ensemble), while keeping certain probabilities for the aptamer structure. Moreover, we want to avoid that the spacer region forms any stable structures or interfers with the structures of the riboswitch components. How to express such objectives as a function \texttt{rstd\_objective} is described in details in our Notes. We make use of free energy differences between the free RNA structure ensemble (free ensemble energy) and constrained ensembles, which are constraints to form either aptamer or terminator structure, or keep the space unpaired.
As goals for the computational design, we optimize the stability of the terminator structures (in absolute terms as well as in comparison to the structure ensemble), while keeping certain probabilities for the aptamer structure. Moreover, we want to avoid that the spacer region forms any stable structures or interferes with the structures of the riboswitch components. How to express such objectives as a function \texttt{rstd\_objective} is described in details in our Notes. We make use of free energy differences between the free RNA structure ensemble (free ensemble energy) and constrained ensembles, which are constraints to form either aptamer or terminator structure, or keep the space unpaired.
It becomes clear, that we face a combination of positive and negative design goals under additional constraints. We suggest to perform this optimization as seen before by a stochastic optimization procedure using resampling of components in a constraint network. Consequently, to implement the entire design approach, we define a function \texttt{rstd\_model} to set up a (resampling) model for the riboswitch tandem design and finally adapt the Metropolis-Hastings optimization scheme in \texttt{rstd\_optimze}. Code for both functions is provided in Notes.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment