Update on Overleaf.

b20daba3 · Sebastian Will · node · 87be5073 · b20daba3 · b20daba3
Commit b20daba3 authored 3 years ago by Sebastian Will Committed by node 3 years ago
--- a/biblio.bib
+++ b/biblio.bib
 % Encoding: UTF-8

+@article{Andronescu2007,
+	author = {Andronescu, Mirela and Condon, Anne and Hoos, Holger H. and Mathews, David H. and Murphy, Kevin P.},
+	title = {{Efficient parameter estimation for RNA secondary structure prediction}},
+	journal = {Bioinformatics},
+	volume = {23},
+	number = {13},
+	pages = {i19--i28},
+	year = {2007},
+	month = {Jul},
+	issn = {1367-4803},
+	publisher = {Oxford Academic},
+	doi = {10.1093/bioinformatics/btm223}
+}
+
+@article{Bodini2010,
+	author = {Bodini, Olivier and Ponty, Yann},
+	title = {{Multi-dimensional Boltzmann Sampling of Languages}},
+	journal = {Discrete Mathematics and Theoretical Computer Science},
+	volume = {DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10)},
+	pages = {49--64},
+	year = {2010},
+	month = {Jun},
+	publisher = {Discrete Mathematics and Theoretical Computer Science},
+	url = {https://hal.archives-ouvertes.fr/hal-00450763}
+}
+
 @PhdThesis{YPonty2006,
  author              = {{Ponty}, {Yann}},
  title               = {{Models for structured genomic sequences, random generation and applications}},

--- a/infrared-bookchapter.tex
+++ b/infrared-bookchapter.tex
@@ -256,7 +256,7 @@ Infrared's Gitlab repository (\url{https://gitlab.inria.fr/amibio/Infrared/-/tre

 Recall the first example from the introduction. We want to generate
 sequences compatible with a single target structure. In Infrared this idea
-can be expressed in the form of a constraint network model.
+can be expressed in the form of a constraint/function network model.

 For this purpose, given a target length $n$
 \begin{Pythoncode}
@@ -392,7 +392,7 @@ to its corresponding partition function from the table. After $X_i$ is
 determined, the choice of a value for $X_j$ becomes comparably simple. 

 For the previous examples, \Infrared's solving mechanism indeed boils down to fixing an arbitrary order of the variables and precomputing partition functions as described for the variables in base pairs. In the presence of more complex dependencies between variables, \Infrared still chooses values for the variables one-by-one in a predetermined optimized variable order. Given this order, it precomputes partial partition functions to derive the probabilities to choose values in order to sample from the desired Boltzmann distribution.
-The key to efficient uniform sampling in \Infrared is thus to precompute such partition functions as efficiently as possible. After the precomputation the choice for each variable is performed in constant time, resulting in linear time sampling. We are going to discuss further details of the \Infrared engine, when we progress to constraint network models with more complex dependencies as they result from simultaneously targeting several RNA structures.
+The key to efficient uniform sampling in \Infrared is thus to precompute such partition functions as efficiently as possible. After the precomputation the choice for each variable is performed in constant time, resulting in linear time sampling. We are going to discuss further details of the \Infrared engine, when we progress to network models with more complex dependencies as they result from simultaneously targeting several RNA structures.

 \subsubsection{Targeting specific \GC content.}
 Let us return to the control of the \GC content. Instead of direct tweaking of the weight, we can even ask Infrared to target a specific feature value. For example, we generate targeted samples with a \GC content of $75\% \pm 1\%$ from our model by
@@ -402,13 +402,13 @@ sampler.set_target(0.75*n, 0.01*n, 'gc')
 samples = [sampler.targeted_sample() for _ in range(1000)]
 \end{Pythoncode}
 Note the use of \texttt{Sampler}'s method \texttt{targeted\_sample()} in place of \texttt{sample()}. This method provides access to an automatic mechanism that 
-returns only samples within the tolerance from the target. To make such a rejection strategy effective, we iteratively sample, estimate the current mean of the feature value and then update the feature weight. Concretely, \Infrared implements a form of multi-dimensional Boltzmann sampling~\cite{Bodini-Ponty-mdbs-2010} as applied in RNARedPrint~\cite{hammer2019fixed}.
+returns only samples within the tolerance from the target. To make such a rejection strategy effective, we iteratively sample, estimate the current mean of the feature value and then update the feature weight. Concretely, \Infrared implements a form of multi-dimensional Boltzmann sampling~\cite{Bodini2010} as applied in RNARedPrint~\cite{hammer2019fixed}.

 \subsection{Controlling energy---Multiple features}

 While, for instructional purposes, we first presented how to target \GC content, an even more obvious target of RNA design is the energy of the target structure---or in other words, the affinity of the designs to the target structure. 

-Similar to the \GC content, RNA energy can be modeled as a sum of function values. This holds even for the detailed nearest-neighbor energy model of RNAs, where energy is composed of empirically determined or trained loop energies~\cite{NNDB, Andronescu}. Here, we focus on the much simpler base pair energy model, which has been demonstrated to be an effective proxy for the Turner (nearest neighbor) model in design applications~\cite{hammer2019fixed}.
+Similar to the \GC content, RNA energy can be modeled as a sum of function values. This holds even for the detailed nearest-neighbor energy model of RNAs, where energy is composed of empirically determined or trained loop energies~\cite{Turner2010,Andronescu2007}. Here, we focus on the much simpler base pair energy model, which has been demonstrated to be an effective proxy for the Turner (nearest neighbor) model in design applications~\cite{hammer2019fixed}.

 In this simple model, every type of base pair (A-U, C-G or G-U) receives a different energy. To define the feature \texttt{energy}, we impose a function for each base pair \texttt{(i,j)} in the target structure and moreover distinguish terminal and non-terminal base pairs, simply by \texttt{(i-1, j+1) not in bps}. \Infrared provides a default parameterization, which has been originally trained for use with RNARedPrint~\cite{hammer2019fixed}.
 \begin{Pythoncode}
@@ -500,7 +500,7 @@ Finally, as expected, \Infrared will indeed generate sequences that are compatib

 While suggested by the constraint modeling paradigm in \Infrared, it is not at all a priori obvious how this targeted sampling can be achieved. It is as well worthwhile to take a dive into workings of the machinery in order to understand the computational possibilities and limitations of the \Infrared framework.

-show: constraint network, tree decomposition, explain why complexity depends on tree width, hint at generality of this method,\dots
+show: constraint/function network, tree decomposition, explain why complexity depends on tree width, hint at generality of this method,\dots

 \begin{figure}
    \centering
@@ -513,14 +513,18 @@ show: constraint network, tree decomposition, explain why complexity depends on

 \subsubsection{Using disruptive constraints for negative design}

-mention: negative design due to disruptive constraints in RNAPOND, which makes use of the powerful FPT solving mechanism
+negative design due to disruptive constraints in RNAPOND, which makes use of the powerful FPT solving mechanism

-\subsection{Negative design by local search and resampling}
+\TODO{try to code this idea in as simple as possible variant; then see whether we can include code or just discuss the general idea, referring to the RNAPOND paper}
+
+\subsection{Negative design by local search, resampling!?}

 show local search for single target design / ensemble-defect, hint at added complexity in the presence of multiple targets\dots (note: e.g. RNADesign doesn't care).
 provide local resampling as an approach that makes reasonable use of the Infrared engine
 show: optimization of multi-defect

+\TODO{Can we design for multiple-targets based on local moves by simple point mutations? What is Steff's RNADesign.pl doing?}
+
 \subsection{Further application-specific design goals}

 discuss examples of further design goals, e.g. forbidding, enforcing motifs. Show how this can be added naively 
@@ -528,8 +532,29 @@ discuss examples of further design goals, e.g. forbidding, enforcing motifs. Sho
 specific negative design objectives, e.g. energy differences between target structures. Example of riboswitch design (artificial, Moerl-like)

 \section{Notes}
-\TODO{What should go into Notes? Any technical details that we need to explain more formally?  How to refer to Notes \dots?}
+\TODO{What should go into Notes? Any technical details that we need to explain more formally?  How to refer to Notes \dots? Some code that we don't want to put into main text could go here as well.}

+\subsection{Ensemble defect}
+Ensemble defect is a common negative design objective in single target RNA design. We show code to compute a variant of ensemble defect using the Vienna RNA library.
+\begin{Pythoncode}
+def ensemble_defect(sequence, structure):
+    pass
+\end{Pythoncode}
+
+\subsection{Multi-Defect}
+The multi-defect can be computed with the help of the Vienna RNA library.
+\begin{Pythoncode}
+def multi_defect(sequence, structures, xi=1):
+    n = len(structures)
+    fc = RNA.fold_compound(sequence)
+    ee = fc.pf()[1]        
+    eos = [fc.eval_structure(structure)
+        for structure in structures]
+    diff_ee = sum(1/n * eos[i] - ee for i in range(n))
+    diff_targets = sum(2/(n*(n-1)) * abs(eos[i]-eos[j])
+        for i in range(n) for j in range(n) if i<j)
+    return diff_ee + xi * diff_targets
+\end{Pythoncode}

 \bibliographystyle{plain}
 \bibliography{biblio}