KNOT: Knowledge Network Overlap exTraction is a tool for the investigation of fragmented long read assemblies.
Give an assembly and a set of reads to KNOT, it will output an information-rich contig graph in GFA format that tells you about adjacencies between contigs.
Give an assembly and a set of reads to KNOT, it will output an information-rich contig graph in CSVformat that tells you about adjacencies between contigs.
## Input
1) long reads (corrected or not) FASTQ
1) long reads (corrected or not) FASTA (no FASTQ allowed)
2) assembly graph (in gfa1) produced by an assembler
3) and contigs (in fasta) from the same assembler
...
...
@@ -15,10 +15,15 @@ Give an assembly and a set of reads to KNOT, it will output an information-rich
KNOT outputs an Augmented Assembly Graph (AAG). The AAG is a directed graph where nodes are contigs. An edge is present if two contigs overlap or, if in the original string graph of the reads,
there exists a path between extremities of both contigs.
Output format (csv):
```
tig1 extremity, read of extremity of tig1, tig2 extremity, read of extremity of tig2, length of path between both extremities, number of reads in path mapped against contigs
```
Output format is in CSV format with 8 column:
1. tig1: tig name and extremity use in format {tig name}\_{extremity} e.g. tig00000001\_begin
2. read1: read id use to search path for tig1 extremity
3. tig2: other tig name and extremity
4. read2: read id use to search path for tig2 extremity
5. nb_read: nb\_read in path between read1 and read2 (include)
6. nb_base: nb\_base in path between read2 and read2
7. paths: id of read in path found between read1 and read2, separated by ;
8. nbread_contig: number of read assign for each contig in format {tig name}:{nb of read in paths assign to contig}/{nb of read in tig} not\_assign used to read not assigned to a contig, separated by ;.
This output can be used to manually investigate the result of an assembly.
Short paths between contigs are likely true adjacencies. Long paths are likely repeat-induced.
...
...
@@ -42,10 +47,34 @@ You can use corrected long reads in place of raw_reads with `-m` option.