Commit 95e0effb authored by Rayan Chikhi's avatar Rayan Chikhi

Update Readme.md

parent 52771646
......@@ -28,15 +28,19 @@ You can find a demo dataset and instructions for using it, in the demo folder of
## Input
1) long reads (corrected or not) FASTA (no FASTQ allowed)
2) assembly graph (in gfa1) produced by an assembler
3) and contigs (in fasta) from the same assembler
2) contigs (in fasta) from the same assembler
3) (this one is optional:) assembly graph (in gfa1) produced by an assembler
## Output
KNOT outputs an Augmented Assembly Graph (AAG). The AAG is a directed graph where nodes are contigs. An edge is present if two contigs overlap or, if in the original string graph of the reads,
there exists a path between extremities of both contigs.
Output format is in CSV format with 8 column:
The AAG is present in the `{output prefix}_AAG.csv`. (Note: the other GFA files produced by KNOT are _not_ the AAG)
We recommend that you use the HTML report to look at the AAG first (see below on how to generate that report) but the raw CSV file can also be parsed directly.
Output AAG format is in CSV format with 8 column:
1. tig1: tig name and extremity use in format {tig name}\_{extremity} e.g. tig00000001\_begin
2. read1: read id use to search path for tig1 extremity
3. tig2: other tig name and extremity
......@@ -49,21 +53,21 @@ Output format is in CSV format with 8 column:
This output can be used to manually investigate the result of an assembly.
Short paths between contigs are likely true adjacencies. Long paths are likely repeat-induced.
More information about other file generated by knot are avaible in [output description](#output-description)
More information about other file generated by knot are available in [output description](#output-description)
## Usage
Assume that
- long reads are stored in `raw_reads.fasta`
- contigs are stored in `contigs.fasta`
- contig graph is stored in `contigs.gfa`
- (optional) contig graph is stored in `contigs.gfa`
### Run knot
Then run KNOT as:
```
knot -r raw_reads.fasta -c contigs.fasta -g contigs.gfa -o {output prefix}
knot -r raw_reads.fasta -c contigs.fasta [-g contigs.gfa] -o {output prefix}
```
knot will run a snakemake pipeline and produce `{output prefix}_AAG.csv` see [output section](#output) for more details, and a directory `{output prefix}_knot` where intermediate file are store.
......@@ -155,7 +159,7 @@ pip3 install git+https://gitlab.inria.fr/pmarijon/knot.git
The recommended way to update this tool is to remove the conda environement and reinstall it :
```
source deactivate knot_env
source deactivate
conda env remove -n knot_env
wget https://gitlab.inria.fr/pmarijon/knot/raw/master/conda_env.yml
conda env create -f conda_env.yml
......@@ -192,9 +196,9 @@ If you run knot with raw reads:
{output prefix}_AAG.csv # AAG result in format describe earlier
{output prefix}_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_filtred.fasta # contigs for which we have found overlaps between them
├── contigs_filtred.gfa # corresponding graph generated by fpa (from contigs_filtred.paf)
├── contigs_filtred.paf # corresponding paf file made using minimap
├── contigs_graph.gfa # symbolic link to contig graph provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads.fasta # symbolic link to raw read provide as input
......@@ -211,9 +215,9 @@ If you run knot with corrected reads:
{output prefix}_AAG.csv # AAG result in format describe earlier
{output prefix}_knot # knot working directory
├── contigs.fasta # symbolic link to contig sequence provide as input
├── contigs_filtred.fasta # contig keept in analysis, filter on length
├── contigs_filtred.gfa # contig graph generate by fpa on contig mapping (contigs_filtred.paf)
├── contigs_filtred.paf # mapping of filtred contig with minimap
├── contigs_filtred.fasta # contigs for which we have found overlaps between them
├── contigs_filtred.gfa # corresponding graph generated by fpa (from contigs_filtred.paf)
├── contigs_filtred.paf # corresponding paf file made using minimap
├── contigs_graph.gfa # symbolic link to raw read provide as input
├── ext_search.csv # read associated to each contig extremity
├── raw_reads_splited.fasta # symbolic link to corrected read provide as input
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment