Commit 1769c4bc authored by MARIJON Pierre's avatar MARIJON Pierre

Update readme

parent b1a46e75
......@@ -2,11 +2,11 @@
KNOT: Knowledge Network Overlap exTraction is a tool for the investigation of fragmented long read assemblies.
Give an assembly and a set of reads to KNOT, it will output an information-rich contig graph in GFA format that tells you about adjacencies between contigs.
Give an assembly and a set of reads to KNOT, it will output an information-rich contig graph in CSVformat that tells you about adjacencies between contigs.
## Input
1) long reads (corrected or not) FASTQ
1) long reads (corrected or not) FASTA (no FASTQ allowed)
2) assembly graph (in gfa1) produced by an assembler
3) and contigs (in fasta) from the same assembler
......@@ -15,10 +15,15 @@ Give an assembly and a set of reads to KNOT, it will output an information-rich
KNOT outputs an Augmented Assembly Graph (AAG). The AAG is a directed graph where nodes are contigs. An edge is present if two contigs overlap or, if in the original string graph of the reads,
there exists a path between extremities of both contigs.
Output format (csv):
```
tig1 extremity, read of extremity of tig1, tig2 extremity, read of extremity of tig2, length of path between both extremities, number of reads in path mapped against contigs
```
Output format is in CSV format with 8 column:
1. tig1: tig name and extremity use in format {tig name}\_{extremity} e.g. tig00000001\_begin
2. read1: read id use to search path for tig1 extremity
3. tig2: other tig name and extremity
4. read2: read id use to search path for tig2 extremity
5. nb_read: nb\_read in path between read1 and read2 (include)
6. nb_base: nb\_base in path between read2 and read2
7. paths: id of read in path found between read1 and read2, separated by ;
8. nbread_contig: number of read assign for each contig in format {tig name}:{nb of read in paths assign to contig}/{nb of read in tig} not\_assign used to read not assigned to a contig, separated by ;.
This output can be used to manually investigate the result of an assembly.
Short paths between contigs are likely true adjacencies. Long paths are likely repeat-induced.
......@@ -42,10 +47,34 @@ You can use corrected long reads in place of raw_reads with `-m` option.
Full command line usage:
```
```
In addition, snakemake parameters can be used.
usage: KNOT [-h] -c CONTIGS [-g CONTIGS_GRAPH]
(-r RAW_READS | -C CORRECT_READS) -o OUTPUT
[--search-mode {base,node}]
[--contig-min-length CONTIG_MIN_LENGTH] [--read-type {pb,ont}]
[--help-all]
optional arguments:
-h, --help show this help message and exit
-c CONTIGS, --contigs CONTIGS
fasta file than contains contigs
-g CONTIGS_GRAPH, --contigs_graph CONTIGS_GRAPH
contigs graph
-r RAW_READS, --raw-reads RAW_READS
read used for assembly
-C CORRECT_READS, --correct-reads CORRECT_READS
read used for assembly
-o OUTPUT, --output OUTPUT
output prefix
--search-mode {base,node}
what path search optimize, number of base or number of
node
--contig-min-length CONTIG_MIN_LENGTH
contig with size lower this parameter are ignored
--read-type {pb,ont} type of input read, default pb
--help-all show knot help and snakemake help
```
In addition, snakemake parameters can be add after `--`.
## Installation
......@@ -83,8 +112,6 @@ Instruction:
pip3 install git+https://gitlab.inria.fr/pmarijon/knot.git
```
## How to update an already-installed KNOT?
### Conda installation
......@@ -104,7 +131,6 @@ conda env create -f conda_env.yml
pip3 install --upgrade git+https://gitlab.inria.fr/pmarijon/knot.git
```
# Description of the pipeline
![Globale pipeline](images/pipeline.png)
......@@ -117,8 +143,43 @@ Legend:
- output `#5D2971`
- pipeline internal tool `#FFD300`
# All output description
If you run knot with raw reads:
```
{output prefix}_AAG.csv # AAG result in format describe earlier
{output prefix}_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_graph.gfa # symbolic link to contig graph provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads.fasta # symbolic link to raw read provide as input
├── raw_reads.paf # self mapping of raw_reads
├── raw_reads_splited.fasta # raw reads without not covered sequence provide by yacrd
├── raw_reads_splited.gfa # overlap graph generate by fpa on raw_reads_splited self mapping
├── raw_reads_splited.paf # self mapping of raw_reads_splited
├── raw_reads.yacrd # yacrd output on raw_reads
└── read2asm.paf # mapping of read on contigs_filtred
```
If you run knot with corrected reads:
```
{output prefix}_AAG.csv # AAG result in format describe earlier
{output prefix}_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis, filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_graph.gfa # symbolic link to raw read provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads_splited.fasta # symbolic link to corrected read provide as input
├── raw_reads_splited.gfa # overlap graph generate by fpa on raw_reads_splited self mapping
├── raw_reads_splited.paf # self mappig of raw_reads_splited
└── read2asm.paf # mapping of read on contigs_filterd
```
# Citation
Please cite this Github URL for now, manuscript is in submission.
\ No newline at end of file
Please cite this Github URL for now, manuscript is in submission.
# About dataset
This is a 20x synthetic dataset created by LongISLND, with pacbio error model, on *Terriglobus roseus* (NC_108014.1).
Assembled with canu 1.7.1 `genomeSize=5m --corOutCoverage=20`.
# File description
- assembly_contig.fasta: sequence of contig generate by canu
- assembly_contig.gfa: contig graph generate by canu
- reads_raw.fasta: read generated by LongISLND
- reads_corrected.fasta: read corrected by canu
# How to run knot
On raw read:
```
knot -r demo/reads_raw.fasta -c demo/assembly_contig.fasta -g demo/assembly_graph.gfa -o raw
```
On corrected read:
```
knot -C demo/reads_raw.fasta -c demo/assembly_contig.fasta -g demo/assembly_graph.gfa -o raw
```
# About dataset
This is a 20x synthetic dataset created by LongISLND, with pacbio error model, on *Terriglobus roseus* (NC_108014.1).
Assembled with canu 1.7.1 `genomeSize=5m --corOutCoverage=20`.
# File description
- assembly_contig.fasta: sequence of contig generate by canu
- assembly_contig.gfa: contig graph generate by canu
- reads_raw.fasta: read generated by LongISLND
- reads_corrected.fasta: read corrected by canu
# How to run knot
On raw read:
```
knot -r demo/reads_raw.fasta -c demo/assembly_contig.fasta -g demo/assembly_graph.gfa -o raw
```
knot output are prefixed by `raw_`:
```
raw_AAG.csv # AAG format describe in Readme
raw_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_graph.gfa # symbolic link to contig graph provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads.fasta # symbolic link to raw read provide as input
├── raw_reads.paf # self mapping of raw_reads
├── raw_reads_splited.fasta # raw reads without not covered sequence provide by yacrd
├── raw_reads_splited.gfa # overlap graph generate by fpa on raw_reads_splited self mapping
├── raw_reads_splited.paf # self mapping of raw_reads_splited
├── raw_reads.yacrd # yacrd output on raw_reads
└── read2asm.paf # mapping of read on contigs_filtred
```
On corrected read:
```
knot -C demo/reads_raw.fasta -c demo/assembly_contig.fasta -g demo/assembly_graph.gfa -o raw
```
knot output are prefixed by `corrected_`:
```
corrected_AAG.csv # AAG format describe in Readme
corrected_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis, filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_graph.gfa # symbolic link to raw read provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads_splited.fasta # symbolic link to corrected read provide as input
├── raw_reads_splited.gfa # overlap graph generate by fpa on raw_reads_splited self mapping
├── raw_reads_splited.paf # self mappig of raw_reads_splited
└── read2asm.paf # mapping of read on contigs_filterd
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment