README.md 2.74 KB
Newer Older
MARIJON Pierre's avatar
MARIJON Pierre committed
1 2
# paf2gfa

3
Convert PAF (Pairwise Alignement Format) in [GFA1](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md) (Graphical Fragment Assembly)
MARIJON Pierre's avatar
MARIJON Pierre committed
4

5 6 7
## Requirement

- networkx
MARIJON Pierre's avatar
MARIJON Pierre committed
8

9
## Instalation
MARIJON Pierre's avatar
MARIJON Pierre committed
10 11

```
12
pip install git+https://gitlab.inria.fr/pmarijon/paf2gfa.git
MARIJON Pierre's avatar
MARIJON Pierre committed
13 14
```

15
## Usage in cli
MARIJON Pierre's avatar
MARIJON Pierre committed
16

17 18 19
```
paf2gfa --help
usage: paf2gfa [-h] [-c] [-i] [-p] paf gfa
MARIJON Pierre's avatar
MARIJON Pierre committed
20

21 22 23
positional arguments:
  paf
  gfa
MARIJON Pierre's avatar
MARIJON Pierre committed
24

25 26 27 28 29 30
optional arguments:
  -h, --help            show this help message and exit
  -c, --remove-all-containment
                        Remove all containment (default: False)
  -i, --remove-all-internal
                        Remove all internal match (default: False)
31 32
  -o INTERNAL_MATCH_THRESHOLD, --internal-match-threshold INTERNAL_MATCH_THRESHOLD
                        Set the default value of internal match (default: 0.8)
MARIJON Pierre's avatar
MARIJON Pierre committed
33
```
34

35 36
## Definition

37
This definition are inspired by [Miniasm section 2.4.2](https://academic.oup.com/bioinformatics/article/32/14/2103/1742895#95425019)
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62

### Containment

```
A: ------------>
B:      ---->
```

Read B are contained in read A (the container) if length of B are minus than length A and begin of B upper than begin of A and end of B minus than end of A.

### Internal match

```
                                     overhang A
A: -----------                      -------->
              \       mapping      /
               --------------------
               --------------------
              /                    \
B:      ------                      ------------------->
    overhang B
```

Overhang is the sum of minus length for each read around match zone.

63
If ratio overhang by mapping length is upper than 0.8 (default value) this overlap are an internal match.
64

65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## Usage as python module

```python

import paf2gfa

########
# Parse PAF
########
p = paf2gfa.Parser()
# or to ignore containment read and overlap
p = paf2gfa.Parser(containment=False)
# or to ignore internal match overlap
p = paf2gfa.Parser(internal=False)
# or to ignore internal match and containment
p = paf2gfa.Parser(False, False)

with open("my.gfa") as fh:
    p.parse_lines(fh) # Parser.parse_lines can parse any iterable
    # alternative usage
    for line in fh:
86 87 88 89 90
        p.parse_line(line)
        # or
        warning = p.parse_line(line)
        if warning is not None:
            print(str(warning))
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113

# another alternative usage
def paf_generator():
    # do some cool thing
    yield line

p.parse_lines(paf_generator()):

########
# Create GFA
########

gfa = p.get_gfa()

# or

for gfa_line in p.generate_gfa():
    # do some cool thing

########
# Acces to internal object
########
p.graph # get an networkx graph where overlap are store
MARIJON Pierre's avatar
MARIJON Pierre committed
114
```
115

116
## Bug report
117 118

For bug report you can send e-mail to pierre.marijon@inria.fr or create [issue](https://gitlab.inria.fr/pmarijon/paf2gfa/issues)