-
NGUYEN Linh Chi authoredNGUYEN Linh Chi authored
- Scriptmatrixdb
- Installation
- Automatization of data interaction graph compression
- Scriptmatrixdb.py documentation
- Example with the tabulated file "matrixdb_CORE27_example.mitab"
- Possible option
- Rules of rewriting molecules names
- Functional annotation of a protein list
- Script_bbl.py documentation
- Example of annotation obtained with a powernode using script "sciptmatrixdb.py"
- script_stat
- Installation
- script_stat.py documentation
- Using --tabfile
- Using --oldtab
- Using --graph option
- Using the --subtab option
- Using the --percent option
- string_from_matrixdb
- string_from_matrixdb.py documentation
- Usage examples
Scriptmatrixdb
This guide is about installing scriptmatrixdb.py.
Installation
To install package:
pip install powergrasp --no-cache-dir --no-deps
pip install pyasp -U --no-cache-dir
pip install pytest bubbletools networkx requests bs4 clyngor
More documentation about PowerGrASP or Bubble-tools can be found in their githubs.
Automatization of data interaction graph compression
Scriptmatrixdb.py documentation
Documentation obtained in the terminal with the command : python scriptmatrixdb.py -h
usage: scriptmatrixdb.py [-h] [--A col_number_alias_A]
[--B col_number_alias_B]
[--graph {powergraph,oriented}]
[--interac {human,human-mouse,chicken,mouse,mouse-rat,dog,taurus,rat,sheep,pig,none,unknown}]
[--render] [--annot Annotation] [--allpwrn]
[--score SCORE] [--pv pvalue] [--withoutCHEBI]
[--graphonly] [--aspfile ASPFILE] [--tabfile TABFILE]
INFILE
positional arguments:
INFILE .mitab input file
optional arguments:
-h, --help show this help message and exit
--A col_number_alias_A
The column number of the alias A (positive number).
--B col_number_alias_B
The column number of the alias B (positive number).
--graph {powergraph,oriented}
Powergraph or oriented graph.
--interac {human,human-mouse,chicken,mouse,mouse-rat,dog,taurus,rat,sheep,pig,none,unknown}
Taxon filter for the graph compression.
--render Generate a png image with the powergraph plugin (Oog)
on Cytoscape.
--annot Annotation csv file for DAVID
--allpwrn To have the maximum concept (all powernodes).
--score SCORE Minimum enrichment score (positive number).
--pv pvalue p-value for functional annotation (positive decimal
number between 0 and 1).
--withoutCHEBI Does not take into account the molecules with ChEBI
identification (non-protein molecules).
--graphonly To have only the powergraph.
--aspfile ASPFILE pre existing asp file from converter.
--tabfile TABFILE pre existing tabulated file from converter.
Example with the tabulated file "matrixdb_CORE27_example.mitab"
python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05
Running the script scriptmatrixdb.py
with an input file in named mitab format matrixdb_CORE27_example.mitab
.
Columns 5
and 6
(Aliases for A
and Aliases for B
) contain the interaction's molecules aliases.
The minimum enrichment score required for DAVID clusters is 2
.
P-value of the GO terms selected in this enrichment should not exceed 0.05
.
Possible option
-
Functional annotation in DAVID is done by selecting annotation categories. All of these categories can be found in the
annot_all.csv
file. By default, the script takes theannot.csv
file that contains annotation categories forGene Ontology
. For an annotation with DAVID's GO terms coming only from theGOTERM_BP_DIRECT
category. A sample file namedannot_matrixdb.csv
can be found inannot
folder and can be used with this command :python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05 --annot=annot/annot_matrixdb.csv
-
Option to obtain a compressed graph with only human interactions :
python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05 --interac=human
-
Option to obtain a compressed graph with only protein molecules :
python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05 --withoutCHEBI
-
Option to get only compressed graphic :
python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05 --graphonly
Rules of rewriting molecules names
To rewrite the graph compression's molecules names from the origin tabulated "mitab" file.
Rules for this rewrite have been established. To do this, there are 3 files in the annot
folder :
-
aliases.csv
is a tabulated file contains 2 columns, with first the molecules name after rewriting, then the name before rewriting. -
decomposables.csv
is a tabulated file contains 3 columns. First column contains the name of the dimeric molecule before rewriting and then in the other 2 columns, the 2 monomeric molecules names rewritten. -
taxon_aliases.csv
is a tabulated file that contains the species common name and its name found in the tabulated mitab file.
Functional annotation of a protein list
Script_bbl.py documentation
Documentation obtained in terminal with the command : python script_bbl.py -h
usage: script_bbl.py [-h] [--annot Annotation] [--withoutCHEBI]
[--pwrn Powernode_choice]
INFILE score pvalue FileRef
positional arguments:
INFILE Input file (.bbl).
score Minimum enrichment score (positive number).
pvalue p-value for functional annotation (positive decimal
number between 0 and 1).
FileRef Ref file for the bbl file '_tab.csv'.
optional arguments:
-h, --help show this help message and exit
--annot Annotation Annotation file.
--withoutCHEBI Does not take into account the molecules with ChEBI
identification (non-protein molecules).
--pwrn Powernode_choice
Chosen powernode from the bbl file (write 'powernode
name' if powernode name contains special characters)
Example of annotation obtained with a powernode using script "sciptmatrixdb.py"
Compressed graph matrixdb_CORE27_example.bbl
, was formed and placed in the folder matrixdb_CORE27_example
through the command :
python scriptmatrixdb.py matrixdb_CORE27_example.mitab 5 6 2 0.05 --graphonly
A powernode's annotation such as a powernode PWRN-"1,3-dimethyl-2-[2-oxopropyl thio]imidazolium chloride"-2-1
of the matrixdb_CORE27_example.bbl
file can be done with :
python script_bbl.py matrixdb_CORE27_example/matrixdb_CORE27_example.bbl 2 0.05 matrixdb_CORE27_example/matrixdb_CORE27_example_tab.csv --annot=annot/annot_matrixdb.csv --PWRN='PWRN-"1,3-dimethyl-2-[2-oxopropyl thio]imidazolium chloride"-2-1'
This script will then be created for each powernode chosen as for example with a powernode named PN_name
:
-
A
PN_name
folder containing for each direct descending powernode (such as : powernode_name) :- A tabulated file
List_of_powernode_name.csv
with a molecules list contained in this powernode. - A folder
List_of_powernode_name
with the same name as this file and containing :- A tabulated file
List_powernode_name.txt
containing the each molecule list with its identifier and its taxonomy. - A tabulated file
powernode_name_listinit.txt
containing molecules' identifiers list, before complementing with identifiers that are exclusive to the MatrixDB database. - A folder
powernode_name.txt
containing the molecules' protein identifiers list used then for the functional annotation with DAVID. - A tabulated file
powernode_name_listdelete.txt
Containing the molecules list not taken into account because identifier unknown or not corresponding to the tabulated file provided (example :matrixdb_CORE27_example_tab.csv
). (This file is created only if there is at least one molecule that is not taken into account.) - A folder
list_David
containing :- A tabulated file
List_David_powernode_name.csv
retrieved from the DAVID annotation, which contains the GO terms for each cluster. - A tabulated file
LIST_GOTERM_powernode_name.csv
containing the GO terms list selected according to the cluster enrichment score and the parameter defined pvalue.
- A tabulated file
- A folder
htmlfile
with the file obtained from the DAVID web page. (Only if the connection is successful.)
- A tabulated file
- A tabulated file
-
A tabulated file
RESULTAT_FINAL_PN_name.csv
, containing the set of results (powernodes names studied, Uniprot identifiers numbers taken into account in DAVID, GO terms number found, ...) for this powernode studied.
-> Moreover, a tabulated file Conclusion_Node
is created with the powernodes studied summary, that is to say those who do not have direct successors. (The biggest powernode possible.)
This file contains powernodes studied (= NODE), proteins identifiers numbers used to DAVID , GO terms number found, and the exact list of these GO terms.
script_stat
Perform different tasks on protein-protein interaction graph.
Installation
pip install matplotlib numpy scipy
script_stat.py documentation
Documentation obtained by using the command: python script_stat.py -h
usage: script_stat.py [-h] [--tabfile TABFILE] [--tabfile2 TABFILE2]
[--oldtab] [--withoutCHEBI]
[--graph {degree,coef,both,stacked,scatter,None}]
[--interac {human,human-mouse,chicken,mouse,mouse-rat,dog,taurus,rat,sheep,pig,none,unknown}]
[--withoutzero] [--bins BINS] [--log] [--subtab SUBTAB]
[--neighbour] [--conn] [--equi] [--cyto] [--tree]
[--hub HUB] [--percent PERCENT] [--iso] [--ap]
[--degree] [--origindata ORIGINDATA] [--test TEST]
INFILE
positional arguments:
INFILE .mitab input file
optional arguments:
-h, --help show this help message and exit
--tabfile TABFILE Pre existing tabulated file from converter.
--tabfile2 TABFILE2 Pre existing tabulated file from converter.
--oldtab If tabfile is w/ alias (no isoforms).
--withoutCHEBI Does not take into account the molecules with ChEBI
identification (non-protein molecules).
--graph {degree,coef,both,stacked,scatter,None}
Create histograms
--interac {human,human-mouse,chicken,mouse,mouse-rat,dog,taurus,rat,sheep,pig,none,unknown}
Taxon filter for the graph compression.
--withoutzero Does not take into account the protein with a coef of
zero.
--bins BINS Choose the number of bins in the graph (default: 100).
--log logarithm scale for y axis
--subtab SUBTAB coef (x) or coef interval (x-x), create a sub
tabulated file.
--neighbour Take into account the nodes neighbours for the
--subtab or --tree options if True.
--conn Count the connected components of the graph.
--equi Process the data in order to find the equivalence
group.
--cyto Generate a tabulated file with the coef of each node,
can be used for visualization in Cytoscape.
--tree Create a tabulated and an asp file without trees.
--hub HUB Threshold for the hub. ex: 0.01 if you want to remove
the 1 percent most connected nodes. Create a tabulated
and an asp file without the hubs.
--percent PERCENT Percentage of the nodes we want to keep (between 0 and
1). If 'all', does 10 to 90 percent files. Create a
tabulated file and an asp file w/ nodes w/ the smaller
degree.
--iso Create a file with listing the isoforms and their id.
--ap Detects the articulation points of the graph.
--degree Give info on degree distribution.
--origindata ORIGINDATA
Can be used to input the original data if the tabfile
is a percent file.
--test TEST Perform a given test
Using --tabfile
Pre-existing tabulated file generated by scriptmatrixdb.py. (ex: test/tab_example.csv
)
Using --oldtab
Option to be used when using a tabulated file that uses aliases that merge isoforms. This option allow the user to take into account the isoforms. Create new aliases if isoforms detected.
Using --graph option
The user can chose between different graph options:
- both: generate a degree distribution and a clustering coefficient distribution
- degree: generate a degree distribution
- coef: generate a clustering coefficient distribution
- stacked: generate a coefficient distribution with the degree annoted with colors
- scatter: generate a scatter plot crossing the degree and the clustering coefficient
The --graph option can be combined with other ones:
- --bins
- --log
- --withoutzero: can be used on the clustering coefficient distribution to remove the zero value, and on the degree distribution to remove the value 1
Examples:
-
Generate a degree distribution with a logarithmic scale for the y axis using an existing tabulated file
python script_stat.py foo.mitab --tabfile foo_tab.csv --graph degree --log
-
Generate a scatter plot
python script_stat.py foo.mitab --graph scatter
Using the --subtab option
-
Create a tabulated file with only the nodes with a coefficient of 0:
python script_stat.py foo.mitab --subtab 0
-
Create a tabulated file with the nodes with a coefficient ranging from 0.3 to 0.4:
python script_stat.py foo.mitab --subtab 0.3-0.4
This option can be used with the --neighbour option to extend the graph to the selected nodes neighbours.
-
Create a tabulated file with the nodes with a coefficient of 0 and their neighbours:
python script_stat.py foo.mitab --subtab 0 --neighbour
Using the --percent option
-
Create a tabulated file and a asp file from the original data containing 84% of the less connected nodes:
python script_stat.py foo.mitab --percent 0.84
string_from_matrixdb
Automatization of the comparison between a matrixDB network and its STRING equivalent.
Retrieve the nodes from MatrixDB and search for annotation in STRING
- map the identifiers into string identifiers
- produce visualization for the STRING networks * full annotation * (physical interaction only) --> TODO
- produce the corresponding interaction tables
- compare the two networks (MatrixDB and STRING) and produce a difference network + the corresponding difference table --> TODO
string_from_matrixdb.py documentation
Documentation obtained by using the command: python string_from_matrixdb.py -h
usage: string_from_matrixdb.py [-h] [--species SPECIES] [--mapping] [--visu]
[--inter] [--diff DIFF]
proteinlist
positional arguments:
proteinlist List of proteins to be searched in STRING.
optional arguments:
-h, --help show this help message and exit
--species SPECIES NCBI taxon identifiers
--mapping Mapping of the protein names only.
--visu Visualization of the networks only.
--inter Retrieve interactions table only.
--diff DIFF Network file (tabulated file) used to compare te result
found with the STRING database. Produce the difference
network and the difference table from two networks (ex:
MatrixDB and STRING).
Usage examples
-
Generate the network and the interaction table:
python tstring_from_matrixdb.py est/protein_list_2.txt
-
Identifiers mapping only:
python string_from_matrixdb.py test/protein_list.txt --mapping
-
Generate the STRING network only:
python string_from_matrixdb.py test/protein_list.txt --visu
-
Generate the STRING interaction table only:
python string_from_matrixdb.py test/protein_list_2.txt --inter