Mentions légales du service

Skip to content

Tags

Tags give the ability to mark specific points in history as being important
  • soumissionCMSB2021
    63e36a62 · Update ·
  • OLA21_paperv1
    Submission OLA21 20210205
  • v1_softcontraints-PKN-binarizeddynamics
    e41e5367 · Update environment ·
    First version of the pipeline. \o/
    
    _________________
    
    Detailed explanations of the steps :
    ======================================
    
    Step 0 : About data
    --------------------
    We retrieved all the SBML files from Biomodels repository from their ftp.
    
    Then, for each SBML file :
    
    Step 1 : Obtaining a time serie
    ---------------------------------
    Simulation of ODEs that are reconstructed from the reaction rules of the SBML file.
    Tool : COPASI. Default values.
    Generate time serie of the system for 300 time unit (This number is  harcoded for now, but should because a parameter of the pipeline).
    
    Step 2 : sbml2lp
    -----------------
    Generation of ASP code expression strutural constraints (PKN extracted from reaction graph) and dynamical constraints (binarized timeseries).
    Input : SBML file + timeserie.
    Output : lp file (logic program) per agent.
    
    Step 3 : ASP solving.
    -----------------------
    Tool : clyngor (clingo wrapper in python).
    Solve the lp file of each agent and create all the possible ASs (Answer Sets).
    Each AS corresponds to a boolean logic function in disjunctive normal form, and which respect stutural constraints, and optimize a cost function. This cost function minimize the number of agents used as input (minimality constraint). and maximize the number of timestep from the binarized time serie that are explained by the boolean function.
    Note that in a close future (v2), we will take the minimisation of a metric (like MSE - Mean Square Error - or MAE - Mean Absolute Error), like in captots.
    
    Step 4 : Generation of BNs in bnet format.
    -----------------------------------------
    In the step 3, clingo generated Answer Sets corresponding to a boolean function able to explain the behavior of an agent. But a BN is composed from a set of boolean function (one function per agent).
    So in thiis step 4, we compose all the possible BNs by making all the possible combination of AS generated for each agent.
    Note : For now, we solve the ASP problem for each agent independently, then we create the possible BNs generating all the combination of AS afterwards.
    But we could also change the ASP encoding to solve the problem for all the agent at the same time.
    Thus the ASs woulf bu BNs. Unfortunaly, my current ASP encoding it not crazy optimized, I'm not sure glingo would manage to solve the program from big systems.
    
    Step 5 : Analyse generated BNs
    ---------------------------------
    Tool : PyBoolNet (It can open the BNs in the bnet format generated in step 4).
    I did not really work a lot on this step yet.
    I have some code to look for attractors (fixed point and cycles).
    It could be interesting to see which attractors are shared by the generated BNs, since it could be possible for 2 differents BNs - composed from different boolean functions - to have the same attractors. In this case, we would have no way to differentiate one better than the other. In the other case, a biological expertize might help differentiating one representing better biological truth.
    In the long terme, this biological knowledge could be added directly to the ASP constraints to synthetize only BNs having this characteristics.
    
    The succesive steps of the pipeline are encoded in snakemake. (see later for a quick presentation of snakemake).
    
    A diagram of this pipeline can be found on the B'UL (restrained access) :
    https://bul.univ-lorraine.fr/index.php/f/37188769
    
    _________________
    
    Cluster usage :
    ===============
    
    All the necessary tools and packages to run this pipeline are stored in a conda environment. (good practice presented by Patrice Ringot).
    The code is easily deployable on another computer thanks to the use of wrapit (a Docker wrapper).
    For now, I'm using the cluster from capsid instead of the cluster G5000 because it is easier to use.
    But it is in my TODO list to learn how to use G5000.
    
    About parallélisation :
    =======================
    
    Some of the steps can be parallelized to take less time.
    
    For the moment, clingo IS NOT launched in parallel on several cores.
    The only level on parallelisation is about the number of files on which the pipeline is launched in parallel.
    This number is the number of processors on the machine (hardcoded).
    
    For the next version of the pipeline, we should be able to change the parameters for this double parallelisation (at file level and clingo level).
    
    About snakemake:
    ===================
    
    Snakemake is like a mix between GNU make and Python.
    
    The different steps of a pipeline are ordered in a directed acyclic graph.
    For each step, we specify which entries are needed, which code is to be run, and which output is generated.
    
    We can chose running all the complete pipeline or just a piece.
    In any case, snakemake build the dependency of the steps and runs the steps necessary to retrive / updating the missing / outdated entry files.
    
    About minimal disjonctive normal form (min DNF):
    =================================================
    It exists a tons of differents manner to represent boolean functions.
    I gave a presentation during a capsid teatime before 2020 summer break explaning why whe chose to use the min DNF.
    
    _________________
    
    Hej, toi (probablement futur moi, vu qu'il y a jamais personne qui regarde en détail ce que je fais). J'espère que la lecture de ce document t'a été un peu utile. Bon courage pour tes tâches actuelles. Bisous. :-* :D