From 87be507364e76bccabf47467e083379fcc4b2b7b Mon Sep 17 00:00:00 2001
From: Sebastian Will <swill@csail.mit.edu>
Date: Wed, 8 Sep 2021 12:30:15 +0200
Subject: [PATCH] updates, text on multi-target design

---
 infrared-bookchapter.tex | 58 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/infrared-bookchapter.tex b/infrared-bookchapter.tex
index 093fd03..16ceca9 100644
--- a/infrared-bookchapter.tex
+++ b/infrared-bookchapter.tex
@@ -378,7 +378,7 @@ To inform the choice of $X_i$ (before choosing $X_j$), we marginalize the sums o
 
 \begin{center}
   \begin{tabular}{c@{\quad}|@{\quad}c@{\quad}|@{\quad}c}
-    Value $X_i$ & Possible values $X_j$ & Partition function\\
+    Value of $X_i$ & Possible values of $X_j$ & Partition function (\GC content)\\
     \hline
     A & U & $\exp(w_\text{gc}\cdot 0)$\\
     C & G & $\exp(w_\text{gc}\cdot 2)$\\
@@ -392,7 +392,7 @@ to its corresponding partition function from the table. After $X_i$ is
 determined, the choice of a value for $X_j$ becomes comparably simple. 
 
 For the previous examples, \Infrared's solving mechanism indeed boils down to fixing an arbitrary order of the variables and precomputing partition functions as described for the variables in base pairs. In the presence of more complex dependencies between variables, \Infrared still chooses values for the variables one-by-one in a predetermined optimized variable order. Given this order, it precomputes partial partition functions to derive the probabilities to choose values in order to sample from the desired Boltzmann distribution.
-The key for efficient uniform sampling is thus to precompute such partition functions as efficiently as possible. After the precomputation the choice for each variable is performed in constant time, resulting in linear time sampling. We are going to discuss further details of the \Infrared engine, when we progress to constraint network models with more complex dependencies as they result from simultaneously targeting several RNA structures.
+The key to efficient uniform sampling in \Infrared is thus to precompute such partition functions as efficiently as possible. After the precomputation the choice for each variable is performed in constant time, resulting in linear time sampling. We are going to discuss further details of the \Infrared engine, when we progress to constraint network models with more complex dependencies as they result from simultaneously targeting several RNA structures.
 
 \subsubsection{Targeting specific \GC content.}
 Let us return to the control of the \GC content. Instead of direct tweaking of the weight, we can even ask Infrared to target a specific feature value. For example, we generate targeted samples with a \GC content of $75\% \pm 1\%$ from our model by
@@ -460,11 +460,47 @@ This example provides a first demonstration of the targeting flexibility due to
 
 \subsection{Multiple-target structures---Complex dependencies}
 
-show: constraint network, tree decomposition, explain why complexity depends on tree width, hint at generality of this method,\dots
+Due to the compositionality of constraints (and functions) in \Infrared, defining multiple targets does not look any different than defining a single target structure. Thus, let us right away define model to target the structures of Fig.~\ref{fig:target-structures}, which were previously defined in the list \texttt{targets} with lenght \texttt{n}.
 
-\subsubsection{Excursion 2: a deeper view into Infrared's sampling engine}
+\begin{Pythoncode}
+model = Model(n,4)
 
-mention: negative design due to disruptive constraints in RNAPOND, which makes use of the powerful FPT solving mechanism
+for k, target in enumerate(targets):
+    bps = parse(target)
+    model.add_constraints(BPComp(i,j) for (i,j) in bps)
+    model.add_functions([BPEnergy(i, j, (i-1, j+1) not in bps)
+                         for (i,j) in bps], f'energy{k}')
+
+model.add_functions([GCCont(i) for i in range(n)], 'gc')
+\end{Pythoncode}
+
+Note how we just add constraints and functions for each target structure, but define different names for their functions groups (\texttt{energy0}, \texttt{energy1}, \texttt{energy2}), such that we could control them separately. As well, note that we are going to control \GC content. Here, we could add further constraints like the ones for a specific IUPAC sequence.
+
+By now, it will appear natural to the attentive reader that we can go on by defining Turner energy features and specific targets.
+\begin{Pythoncode}
+for k, target in enumerate(targets):
+    model.add_feature(f'Energy{k}', f'energy{k}',
+        lambda sample, target=target:
+            RNA.energy_of_struct(ass_to_seq(sample), target))
+
+sampler = Sampler(model)
+
+sampler.set_target(0.75*n, 0.01*n, 'gc')
+sampler.set_target( -15, 1, 'Energy0')
+sampler.set_target( -20, 1, 'Energy1')
+sampler.set_target( -20, 1, 'Energy2')
+\end{Pythoncode}
+
+Finally, as expected, \Infrared will indeed generate sequences that are compatible to all structures and hit the prescribed target energies and \GC content.
+\begin{Pythoncode}
+    samples = [sampler.targeted_sample() for _ in range(10)]
+\end{Pythoncode}
+
+\subsubsection{Excursion 2: a deeper dive into Infrared's sampling engine}
+
+While suggested by the constraint modeling paradigm in \Infrared, it is not at all a priori obvious how this targeted sampling can be achieved. It is as well worthwhile to take a dive into workings of the machinery in order to understand the computational possibilities and limitations of the \Infrared framework.
+
+show: constraint network, tree decomposition, explain why complexity depends on tree width, hint at generality of this method,\dots
 
 \begin{figure}
     \centering
@@ -474,6 +510,11 @@ mention: negative design due to disruptive constraints in RNAPOND, which makes u
     \label{fig:dependency-graph}
 \end{figure}
 
+
+\subsubsection{Using disruptive constraints for negative design}
+
+mention: negative design due to disruptive constraints in RNAPOND, which makes use of the powerful FPT solving mechanism
+
 \subsection{Negative design by local search and resampling}
 
 show local search for single target design / ensemble-defect, hint at added complexity in the presence of multiple targets\dots (note: e.g. RNADesign doesn't care).
@@ -482,13 +523,16 @@ show: optimization of multi-defect
 
 \subsection{Further application-specific design goals}
 
-discuss examples of further design goals, e.g. forbidding, enforcing motifs. Show how this can be added naively (hint at automata improvements)
+discuss examples of further design goals, e.g. forbidding, enforcing motifs. Show how this can be added naively 
 
-specific negative design objectives, e.g. energy differences between target structures. Example of riboswitch design (Moerl-like)
+specific negative design objectives, e.g. energy differences between target structures. Example of riboswitch design (artificial, Moerl-like)
 
 \section{Notes}
+\TODO{What should go into Notes? Any technical details that we need to explain more formally?  How to refer to Notes \dots?}
+
 
 \bibliographystyle{plain}
 \bibliography{biblio}
 
+
 \end{document}
-- 
GitLab